# Troubleshooting Guide ## Common Issues and Solutions ### Network Issues #### Blocks Not Being Produced **Symptoms**: No new blocks, validators not responding **Diagnosis**: ```bash # Check validator status kubectl get pods -n besu-network -l component=validator # Check logs kubectl logs -n besu-network --tail=100 # Check block number curl -X POST -H "Content-Type: application/json" \ --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \ http:// ``` **Solutions**: 1. Restart validators: `kubectl rollout restart statefulset/besu-validator -n besu-network` 2. Check network connectivity 3. Verify validator keys 4. Check IBFT configuration 5. Verify genesis file #### Validators Not Peering **Symptoms**: Validators not connecting to each other **Diagnosis**: ```bash # Check peer count kubectl exec -n besu-network -- \ curl -X POST -H "Content-Type: application/json" \ --data '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \ http://localhost:8545 # Check static nodes kubectl get configmap besu-validator-config -n besu-network -o yaml ``` **Solutions**: 1. Verify static-nodes.json configuration 2. Check network policies 3. Verify firewall rules 4. Check P2P port (30303) connectivity 5. Verify enode addresses ### RPC Issues #### RPC Endpoints Not Responding **Symptoms**: RPC calls failing, timeouts **Diagnosis**: ```bash # Check RPC pod status kubectl get pods -n besu-network -l component=rpc # Check logs kubectl logs -n besu-network --tail=100 # Test RPC endpoint curl -X POST -H "Content-Type: application/json" \ --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \ http:// ``` **Solutions**: 1. Restart RPC pods: `kubectl rollout restart statefulset/besu-rpc -n besu-network` 2. Check Application Gateway status 3. Verify network policies 4. Check rate limiting 5. Scale RPC nodes if needed #### High Latency **Symptoms**: Slow RPC responses **Diagnosis**: ```bash # Check pod resources kubectl top pods -n besu-network -l component=rpc # Check metrics curl http://:9545/metrics # Check sync status curl -X POST -H "Content-Type: application/json" \ --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' \ http:// ``` **Solutions**: 1. Scale RPC nodes 2. Increase resource limits 3. Check disk I/O 4. Verify network connectivity 5. Check for sync issues ### Oracle Issues #### Oracle Not Updating **Symptoms**: Oracle price not updating, circuit breaker open **Diagnosis**: ```bash # Check oracle publisher status kubectl get pods -n besu-network -l app=oracle-publisher # Check logs kubectl logs -n besu-network --tail=100 # Check health endpoint curl http://:8080/health # Check metrics curl http://:8000/metrics ``` **Solutions**: 1. Restart oracle publisher 2. Check data sources 3. Verify RPC connectivity 4. Check private key access 5. Verify circuit breaker configuration #### Data Source Failures **Symptoms**: Failed to fetch from data sources **Diagnosis**: ```bash # Check data source connectivity curl # Check oracle publisher logs kubectl logs -n besu-network | grep -i "data source" ``` **Solutions**: 1. Verify data source URLs 2. Check network connectivity 3. Verify API keys 4. Check rate limiting 5. Update data source configuration ### Storage Issues #### Disk Full **Symptoms**: Pods failing, disk space errors **Diagnosis**: ```bash # Check disk usage kubectl exec -n besu-network -- df -h # Check PVC usage kubectl get pvc -n besu-network # Check pod logs kubectl logs -n besu-network | grep -i "disk\|space\|full" ``` **Solutions**: 1. Increase PVC size 2. Clean up old data 3. Archive chaindata 4. Use snap sync for RPC nodes 5. Implement data retention policies #### Slow Disk I/O **Symptoms**: Slow sync, high latency **Diagnosis**: ```bash # Check disk I/O kubectl exec -n besu-network -- iostat -x 1 # Check metrics curl http://:9545/metrics | grep -i "disk\|io" ``` **Solutions**: 1. Upgrade to Premium SSD 2. Increase disk size 3. Optimize Besu configuration 4. Check for disk contention 5. Use faster storage class ### Monitoring Issues #### Metrics Not Collecting **Symptoms**: No metrics in Prometheus **Diagnosis**: ```bash # Check Prometheus targets curl http://:9090/api/v1/targets # Check service discovery kubectl get servicemonitors -n besu-network # Check pod metrics endpoint curl http://:9545/metrics ``` **Solutions**: 1. Verify ServiceMonitor configuration 2. Check network policies 3. Verify metrics endpoint 4. Restart Prometheus 5. Check service discovery configuration #### Alerts Not Firing **Symptoms**: Alerts not triggering **Diagnosis**: ```bash # Check Alertmanager status curl http://:9093/api/v1/status # Check alert rules kubectl get prometheusrules -n besu-network # Check notification channels kubectl get secret alertmanager-config -n besu-network -o yaml ``` **Solutions**: 1. Verify alert rules 2. Check Alertmanager configuration 3. Verify notification channels 4. Check alert thresholds 5. Test alert rules ## Debugging Commands ### Network Debugging ```bash # Check pod networking kubectl exec -n besu-network -- ip addr # Check DNS kubectl exec -n besu-network -- nslookup # Check connectivity kubectl exec -n besu-network -- ping ``` ### Besu Debugging ```bash # Check Besu version kubectl exec -n besu-network -- /opt/besu/bin/besu --version # Check configuration kubectl exec -n besu-network -- cat /config/besu-config.toml # Check logs kubectl logs -n besu-network --tail=100 -f ``` ### Kubernetes Debugging ```bash # Check pod status kubectl describe pod -n besu-network # Check events kubectl get events -n besu-network --sort-by='.lastTimestamp' # Check resources kubectl top nodes kubectl top pods -n besu-network ``` ## Useful Resources - [Besu Documentation](https://besu.hyperledger.org/) - [Kubernetes Documentation](https://kubernetes.io/docs/) - [Prometheus Documentation](https://prometheus.io/docs/) - [Grafana Documentation](https://grafana.com/docs/) ## Getting Help - Check logs first - Review monitoring dashboards - Consult runbooks - Contact on-call engineer - Escalate if needed