Files
smom-dbis-138/runbooks/troubleshooting.md
defiQUG 1fb7266469 Add Oracle Aggregator and CCIP Integration
- Introduced Aggregator.sol for Chainlink-compatible oracle functionality, including round-based updates and access control.
- Added OracleWithCCIP.sol to extend Aggregator with CCIP cross-chain messaging capabilities.
- Created .gitmodules to include OpenZeppelin contracts as a submodule.
- Developed a comprehensive deployment guide in NEXT_STEPS_COMPLETE_GUIDE.md for Phase 2 and smart contract deployment.
- Implemented Vite configuration for the orchestration portal, supporting both Vue and React frameworks.
- Added server-side logic for the Multi-Cloud Orchestration Portal, including API endpoints for environment management and monitoring.
- Created scripts for resource import and usage validation across non-US regions.
- Added tests for CCIP error handling and integration to ensure robust functionality.
- Included various new files and directories for the orchestration portal and deployment scripts.
2025-12-12 14:57:48 -08:00

6.3 KiB

Troubleshooting Guide

Common Issues and Solutions

Network Issues

Blocks Not Being Produced

Symptoms: No new blocks, validators not responding

Diagnosis:

# Check validator status
kubectl get pods -n besu-network -l component=validator

# Check logs
kubectl logs -n besu-network <validator-pod> --tail=100

# Check block number
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://<rpc-endpoint>

Solutions:

  1. Restart validators: kubectl rollout restart statefulset/besu-validator -n besu-network
  2. Check network connectivity
  3. Verify validator keys
  4. Check IBFT configuration
  5. Verify genesis file

Validators Not Peering

Symptoms: Validators not connecting to each other

Diagnosis:

# Check peer count
kubectl exec -n besu-network <validator-pod> -- \
  curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
  http://localhost:8545

# Check static nodes
kubectl get configmap besu-validator-config -n besu-network -o yaml

Solutions:

  1. Verify static-nodes.json configuration
  2. Check network policies
  3. Verify firewall rules
  4. Check P2P port (30303) connectivity
  5. Verify enode addresses

RPC Issues

RPC Endpoints Not Responding

Symptoms: RPC calls failing, timeouts

Diagnosis:

# Check RPC pod status
kubectl get pods -n besu-network -l component=rpc

# Check logs
kubectl logs -n besu-network <rpc-pod> --tail=100

# Test RPC endpoint
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://<rpc-endpoint>

Solutions:

  1. Restart RPC pods: kubectl rollout restart statefulset/besu-rpc -n besu-network
  2. Check Application Gateway status
  3. Verify network policies
  4. Check rate limiting
  5. Scale RPC nodes if needed

High Latency

Symptoms: Slow RPC responses

Diagnosis:

# Check pod resources
kubectl top pods -n besu-network -l component=rpc

# Check metrics
curl http://<rpc-pod>:9545/metrics

# Check sync status
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' \
  http://<rpc-endpoint>

Solutions:

  1. Scale RPC nodes
  2. Increase resource limits
  3. Check disk I/O
  4. Verify network connectivity
  5. Check for sync issues

Oracle Issues

Oracle Not Updating

Symptoms: Oracle price not updating, circuit breaker open

Diagnosis:

# Check oracle publisher status
kubectl get pods -n besu-network -l app=oracle-publisher

# Check logs
kubectl logs -n besu-network <oracle-pod> --tail=100

# Check health endpoint
curl http://<oracle-pod>:8080/health

# Check metrics
curl http://<oracle-pod>:8000/metrics

Solutions:

  1. Restart oracle publisher
  2. Check data sources
  3. Verify RPC connectivity
  4. Check private key access
  5. Verify circuit breaker configuration

Data Source Failures

Symptoms: Failed to fetch from data sources

Diagnosis:

# Check data source connectivity
curl <data-source-url>

# Check oracle publisher logs
kubectl logs -n besu-network <oracle-pod> | grep -i "data source"

Solutions:

  1. Verify data source URLs
  2. Check network connectivity
  3. Verify API keys
  4. Check rate limiting
  5. Update data source configuration

Storage Issues

Disk Full

Symptoms: Pods failing, disk space errors

Diagnosis:

# Check disk usage
kubectl exec -n besu-network <pod> -- df -h

# Check PVC usage
kubectl get pvc -n besu-network

# Check pod logs
kubectl logs -n besu-network <pod> | grep -i "disk\|space\|full"

Solutions:

  1. Increase PVC size
  2. Clean up old data
  3. Archive chaindata
  4. Use snap sync for RPC nodes
  5. Implement data retention policies

Slow Disk I/O

Symptoms: Slow sync, high latency

Diagnosis:

# Check disk I/O
kubectl exec -n besu-network <pod> -- iostat -x 1

# Check metrics
curl http://<pod>:9545/metrics | grep -i "disk\|io"

Solutions:

  1. Upgrade to Premium SSD
  2. Increase disk size
  3. Optimize Besu configuration
  4. Check for disk contention
  5. Use faster storage class

Monitoring Issues

Metrics Not Collecting

Symptoms: No metrics in Prometheus

Diagnosis:

# Check Prometheus targets
curl http://<prometheus>:9090/api/v1/targets

# Check service discovery
kubectl get servicemonitors -n besu-network

# Check pod metrics endpoint
curl http://<pod>:9545/metrics

Solutions:

  1. Verify ServiceMonitor configuration
  2. Check network policies
  3. Verify metrics endpoint
  4. Restart Prometheus
  5. Check service discovery configuration

Alerts Not Firing

Symptoms: Alerts not triggering

Diagnosis:

# Check Alertmanager status
curl http://<alertmanager>:9093/api/v1/status

# Check alert rules
kubectl get prometheusrules -n besu-network

# Check notification channels
kubectl get secret alertmanager-config -n besu-network -o yaml

Solutions:

  1. Verify alert rules
  2. Check Alertmanager configuration
  3. Verify notification channels
  4. Check alert thresholds
  5. Test alert rules

Debugging Commands

Network Debugging

# Check pod networking
kubectl exec -n besu-network <pod> -- ip addr

# Check DNS
kubectl exec -n besu-network <pod> -- nslookup <service>

# Check connectivity
kubectl exec -n besu-network <pod> -- ping <target>

Besu Debugging

# Check Besu version
kubectl exec -n besu-network <pod> -- /opt/besu/bin/besu --version

# Check configuration
kubectl exec -n besu-network <pod> -- cat /config/besu-config.toml

# Check logs
kubectl logs -n besu-network <pod> --tail=100 -f

Kubernetes Debugging

# Check pod status
kubectl describe pod <pod> -n besu-network

# Check events
kubectl get events -n besu-network --sort-by='.lastTimestamp'

# Check resources
kubectl top nodes
kubectl top pods -n besu-network

Useful Resources

Getting Help

  • Check logs first
  • Review monitoring dashboards
  • Consult runbooks
  • Contact on-call engineer
  • Escalate if needed