305 lines
6.3 KiB
Markdown
305 lines
6.3 KiB
Markdown
|
|
# Troubleshooting Guide
|
||
|
|
|
||
|
|
## Common Issues and Solutions
|
||
|
|
|
||
|
|
### Network Issues
|
||
|
|
|
||
|
|
#### Blocks Not Being Produced
|
||
|
|
|
||
|
|
**Symptoms**: No new blocks, validators not responding
|
||
|
|
|
||
|
|
**Diagnosis**:
|
||
|
|
```bash
|
||
|
|
# Check validator status
|
||
|
|
kubectl get pods -n besu-network -l component=validator
|
||
|
|
|
||
|
|
# Check logs
|
||
|
|
kubectl logs -n besu-network <validator-pod> --tail=100
|
||
|
|
|
||
|
|
# Check block number
|
||
|
|
curl -X POST -H "Content-Type: application/json" \
|
||
|
|
--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
|
||
|
|
http://<rpc-endpoint>
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Restart validators: `kubectl rollout restart statefulset/besu-validator -n besu-network`
|
||
|
|
2. Check network connectivity
|
||
|
|
3. Verify validator keys
|
||
|
|
4. Check IBFT configuration
|
||
|
|
5. Verify genesis file
|
||
|
|
|
||
|
|
#### Validators Not Peering
|
||
|
|
|
||
|
|
**Symptoms**: Validators not connecting to each other
|
||
|
|
|
||
|
|
**Diagnosis**:
|
||
|
|
```bash
|
||
|
|
# Check peer count
|
||
|
|
kubectl exec -n besu-network <validator-pod> -- \
|
||
|
|
curl -X POST -H "Content-Type: application/json" \
|
||
|
|
--data '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
|
||
|
|
http://localhost:8545
|
||
|
|
|
||
|
|
# Check static nodes
|
||
|
|
kubectl get configmap besu-validator-config -n besu-network -o yaml
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Verify static-nodes.json configuration
|
||
|
|
2. Check network policies
|
||
|
|
3. Verify firewall rules
|
||
|
|
4. Check P2P port (30303) connectivity
|
||
|
|
5. Verify enode addresses
|
||
|
|
|
||
|
|
### RPC Issues
|
||
|
|
|
||
|
|
#### RPC Endpoints Not Responding
|
||
|
|
|
||
|
|
**Symptoms**: RPC calls failing, timeouts
|
||
|
|
|
||
|
|
**Diagnosis**:
|
||
|
|
```bash
|
||
|
|
# Check RPC pod status
|
||
|
|
kubectl get pods -n besu-network -l component=rpc
|
||
|
|
|
||
|
|
# Check logs
|
||
|
|
kubectl logs -n besu-network <rpc-pod> --tail=100
|
||
|
|
|
||
|
|
# Test RPC endpoint
|
||
|
|
curl -X POST -H "Content-Type: application/json" \
|
||
|
|
--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
|
||
|
|
http://<rpc-endpoint>
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Restart RPC pods: `kubectl rollout restart statefulset/besu-rpc -n besu-network`
|
||
|
|
2. Check Application Gateway status
|
||
|
|
3. Verify network policies
|
||
|
|
4. Check rate limiting
|
||
|
|
5. Scale RPC nodes if needed
|
||
|
|
|
||
|
|
#### High Latency
|
||
|
|
|
||
|
|
**Symptoms**: Slow RPC responses
|
||
|
|
|
||
|
|
**Diagnosis**:
|
||
|
|
```bash
|
||
|
|
# Check pod resources
|
||
|
|
kubectl top pods -n besu-network -l component=rpc
|
||
|
|
|
||
|
|
# Check metrics
|
||
|
|
curl http://<rpc-pod>:9545/metrics
|
||
|
|
|
||
|
|
# Check sync status
|
||
|
|
curl -X POST -H "Content-Type: application/json" \
|
||
|
|
--data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' \
|
||
|
|
http://<rpc-endpoint>
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Scale RPC nodes
|
||
|
|
2. Increase resource limits
|
||
|
|
3. Check disk I/O
|
||
|
|
4. Verify network connectivity
|
||
|
|
5. Check for sync issues
|
||
|
|
|
||
|
|
### Oracle Issues
|
||
|
|
|
||
|
|
#### Oracle Not Updating
|
||
|
|
|
||
|
|
**Symptoms**: Oracle price not updating, circuit breaker open
|
||
|
|
|
||
|
|
**Diagnosis**:
|
||
|
|
```bash
|
||
|
|
# Check oracle publisher status
|
||
|
|
kubectl get pods -n besu-network -l app=oracle-publisher
|
||
|
|
|
||
|
|
# Check logs
|
||
|
|
kubectl logs -n besu-network <oracle-pod> --tail=100
|
||
|
|
|
||
|
|
# Check health endpoint
|
||
|
|
curl http://<oracle-pod>:8080/health
|
||
|
|
|
||
|
|
# Check metrics
|
||
|
|
curl http://<oracle-pod>:8000/metrics
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Restart oracle publisher
|
||
|
|
2. Check data sources
|
||
|
|
3. Verify RPC connectivity
|
||
|
|
4. Check private key access
|
||
|
|
5. Verify circuit breaker configuration
|
||
|
|
|
||
|
|
#### Data Source Failures
|
||
|
|
|
||
|
|
**Symptoms**: Failed to fetch from data sources
|
||
|
|
|
||
|
|
**Diagnosis**:
|
||
|
|
```bash
|
||
|
|
# Check data source connectivity
|
||
|
|
curl <data-source-url>
|
||
|
|
|
||
|
|
# Check oracle publisher logs
|
||
|
|
kubectl logs -n besu-network <oracle-pod> | grep -i "data source"
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Verify data source URLs
|
||
|
|
2. Check network connectivity
|
||
|
|
3. Verify API keys
|
||
|
|
4. Check rate limiting
|
||
|
|
5. Update data source configuration
|
||
|
|
|
||
|
|
### Storage Issues
|
||
|
|
|
||
|
|
#### Disk Full
|
||
|
|
|
||
|
|
**Symptoms**: Pods failing, disk space errors
|
||
|
|
|
||
|
|
**Diagnosis**:
|
||
|
|
```bash
|
||
|
|
# Check disk usage
|
||
|
|
kubectl exec -n besu-network <pod> -- df -h
|
||
|
|
|
||
|
|
# Check PVC usage
|
||
|
|
kubectl get pvc -n besu-network
|
||
|
|
|
||
|
|
# Check pod logs
|
||
|
|
kubectl logs -n besu-network <pod> | grep -i "disk\|space\|full"
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Increase PVC size
|
||
|
|
2. Clean up old data
|
||
|
|
3. Archive chaindata
|
||
|
|
4. Use snap sync for RPC nodes
|
||
|
|
5. Implement data retention policies
|
||
|
|
|
||
|
|
#### Slow Disk I/O
|
||
|
|
|
||
|
|
**Symptoms**: Slow sync, high latency
|
||
|
|
|
||
|
|
**Diagnosis**:
|
||
|
|
```bash
|
||
|
|
# Check disk I/O
|
||
|
|
kubectl exec -n besu-network <pod> -- iostat -x 1
|
||
|
|
|
||
|
|
# Check metrics
|
||
|
|
curl http://<pod>:9545/metrics | grep -i "disk\|io"
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Upgrade to Premium SSD
|
||
|
|
2. Increase disk size
|
||
|
|
3. Optimize Besu configuration
|
||
|
|
4. Check for disk contention
|
||
|
|
5. Use faster storage class
|
||
|
|
|
||
|
|
### Monitoring Issues
|
||
|
|
|
||
|
|
#### Metrics Not Collecting
|
||
|
|
|
||
|
|
**Symptoms**: No metrics in Prometheus
|
||
|
|
|
||
|
|
**Diagnosis**:
|
||
|
|
```bash
|
||
|
|
# Check Prometheus targets
|
||
|
|
curl http://<prometheus>:9090/api/v1/targets
|
||
|
|
|
||
|
|
# Check service discovery
|
||
|
|
kubectl get servicemonitors -n besu-network
|
||
|
|
|
||
|
|
# Check pod metrics endpoint
|
||
|
|
curl http://<pod>:9545/metrics
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Verify ServiceMonitor configuration
|
||
|
|
2. Check network policies
|
||
|
|
3. Verify metrics endpoint
|
||
|
|
4. Restart Prometheus
|
||
|
|
5. Check service discovery configuration
|
||
|
|
|
||
|
|
#### Alerts Not Firing
|
||
|
|
|
||
|
|
**Symptoms**: Alerts not triggering
|
||
|
|
|
||
|
|
**Diagnosis**:
|
||
|
|
```bash
|
||
|
|
# Check Alertmanager status
|
||
|
|
curl http://<alertmanager>:9093/api/v1/status
|
||
|
|
|
||
|
|
# Check alert rules
|
||
|
|
kubectl get prometheusrules -n besu-network
|
||
|
|
|
||
|
|
# Check notification channels
|
||
|
|
kubectl get secret alertmanager-config -n besu-network -o yaml
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Verify alert rules
|
||
|
|
2. Check Alertmanager configuration
|
||
|
|
3. Verify notification channels
|
||
|
|
4. Check alert thresholds
|
||
|
|
5. Test alert rules
|
||
|
|
|
||
|
|
## Debugging Commands
|
||
|
|
|
||
|
|
### Network Debugging
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check pod networking
|
||
|
|
kubectl exec -n besu-network <pod> -- ip addr
|
||
|
|
|
||
|
|
# Check DNS
|
||
|
|
kubectl exec -n besu-network <pod> -- nslookup <service>
|
||
|
|
|
||
|
|
# Check connectivity
|
||
|
|
kubectl exec -n besu-network <pod> -- ping <target>
|
||
|
|
```
|
||
|
|
|
||
|
|
### Besu Debugging
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check Besu version
|
||
|
|
kubectl exec -n besu-network <pod> -- /opt/besu/bin/besu --version
|
||
|
|
|
||
|
|
# Check configuration
|
||
|
|
kubectl exec -n besu-network <pod> -- cat /config/besu-config.toml
|
||
|
|
|
||
|
|
# Check logs
|
||
|
|
kubectl logs -n besu-network <pod> --tail=100 -f
|
||
|
|
```
|
||
|
|
|
||
|
|
### Kubernetes Debugging
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check pod status
|
||
|
|
kubectl describe pod <pod> -n besu-network
|
||
|
|
|
||
|
|
# Check events
|
||
|
|
kubectl get events -n besu-network --sort-by='.lastTimestamp'
|
||
|
|
|
||
|
|
# Check resources
|
||
|
|
kubectl top nodes
|
||
|
|
kubectl top pods -n besu-network
|
||
|
|
```
|
||
|
|
|
||
|
|
## Useful Resources
|
||
|
|
|
||
|
|
- [Besu Documentation](https://besu.hyperledger.org/)
|
||
|
|
- [Kubernetes Documentation](https://kubernetes.io/docs/)
|
||
|
|
- [Prometheus Documentation](https://prometheus.io/docs/)
|
||
|
|
- [Grafana Documentation](https://grafana.com/docs/)
|
||
|
|
|
||
|
|
## Getting Help
|
||
|
|
|
||
|
|
- Check logs first
|
||
|
|
- Review monitoring dashboards
|
||
|
|
- Consult runbooks
|
||
|
|
- Contact on-call engineer
|
||
|
|
- Escalate if needed
|
||
|
|
|