runbooks/troubleshooting.md

# Troubleshooting Guide

## Common Issues and Solutions

### Network Issues

#### Blocks Not Being Produced

**Symptoms**: No new blocks, validators not responding

**Diagnosis**:
```bash
# Check validator status
kubectl get pods -n besu-network -l component=validator

# Check logs
kubectl logs -n besu-network <validator-pod> --tail=100

# Check block number
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://<rpc-endpoint>
```

**Solutions**:
1. Restart validators: `kubectl rollout restart statefulset/besu-validator -n besu-network`
2. Check network connectivity
3. Verify validator keys
4. Check IBFT configuration
5. Verify genesis file

#### Validators Not Peering

**Symptoms**: Validators not connecting to each other

**Diagnosis**:
```bash
# Check peer count
kubectl exec -n besu-network <validator-pod> -- \
  curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
  http://localhost:8545

# Check static nodes
kubectl get configmap besu-validator-config -n besu-network -o yaml
```

**Solutions**:
1. Verify static-nodes.json configuration
2. Check network policies
3. Verify firewall rules
4. Check P2P port (30303) connectivity
5. Verify enode addresses

### RPC Issues

#### RPC Endpoints Not Responding

**Symptoms**: RPC calls failing, timeouts

**Diagnosis**:
```bash
# Check RPC pod status
kubectl get pods -n besu-network -l component=rpc

# Check logs
kubectl logs -n besu-network <rpc-pod> --tail=100

# Test RPC endpoint
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://<rpc-endpoint>
```

**Solutions**:
1. Restart RPC pods: `kubectl rollout restart statefulset/besu-rpc -n besu-network`
2. Check Application Gateway status
3. Verify network policies
4. Check rate limiting
5. Scale RPC nodes if needed

#### High Latency

**Symptoms**: Slow RPC responses

**Diagnosis**:
```bash
# Check pod resources
kubectl top pods -n besu-network -l component=rpc

# Check metrics
curl http://<rpc-pod>:9545/metrics

# Check sync status
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' \
  http://<rpc-endpoint>
```

**Solutions**:
1. Scale RPC nodes
2. Increase resource limits
3. Check disk I/O
4. Verify network connectivity
5. Check for sync issues

### Oracle Issues

#### Oracle Not Updating

**Symptoms**: Oracle price not updating, circuit breaker open

**Diagnosis**:
```bash
# Check oracle publisher status
kubectl get pods -n besu-network -l app=oracle-publisher

# Check logs
kubectl logs -n besu-network <oracle-pod> --tail=100

# Check health endpoint
curl http://<oracle-pod>:8080/health

# Check metrics
curl http://<oracle-pod>:8000/metrics
```

**Solutions**:
1. Restart oracle publisher
2. Check data sources
3. Verify RPC connectivity
4. Check private key access
5. Verify circuit breaker configuration

#### Data Source Failures

**Symptoms**: Failed to fetch from data sources

**Diagnosis**:
```bash
# Check data source connectivity
curl <data-source-url>

# Check oracle publisher logs
kubectl logs -n besu-network <oracle-pod> | grep -i "data source"
```

**Solutions**:
1. Verify data source URLs
2. Check network connectivity
3. Verify API keys
4. Check rate limiting
5. Update data source configuration

### Storage Issues

#### Disk Full

**Symptoms**: Pods failing, disk space errors

**Diagnosis**:
```bash
# Check disk usage
kubectl exec -n besu-network <pod> -- df -h

# Check PVC usage
kubectl get pvc -n besu-network

# Check pod logs
kubectl logs -n besu-network <pod> | grep -i "disk\|space\|full"
```

**Solutions**:
1. Increase PVC size
2. Clean up old data
3. Archive chaindata
4. Use snap sync for RPC nodes
5. Implement data retention policies

#### Slow Disk I/O

**Symptoms**: Slow sync, high latency

**Diagnosis**:
```bash
# Check disk I/O
kubectl exec -n besu-network <pod> -- iostat -x 1

# Check metrics
curl http://<pod>:9545/metrics | grep -i "disk\|io"
```

**Solutions**:
1. Upgrade to Premium SSD
2. Increase disk size
3. Optimize Besu configuration
4. Check for disk contention
5. Use faster storage class

### Monitoring Issues

#### Metrics Not Collecting

**Symptoms**: No metrics in Prometheus

**Diagnosis**:
```bash
# Check Prometheus targets
curl http://<prometheus>:9090/api/v1/targets

# Check service discovery
kubectl get servicemonitors -n besu-network

# Check pod metrics endpoint
curl http://<pod>:9545/metrics
```

**Solutions**:
1. Verify ServiceMonitor configuration
2. Check network policies
3. Verify metrics endpoint
4. Restart Prometheus
5. Check service discovery configuration

#### Alerts Not Firing

**Symptoms**: Alerts not triggering

**Diagnosis**:
```bash
# Check Alertmanager status
curl http://<alertmanager>:9093/api/v1/status

# Check alert rules
kubectl get prometheusrules -n besu-network

# Check notification channels
kubectl get secret alertmanager-config -n besu-network -o yaml
```

**Solutions**:
1. Verify alert rules
2. Check Alertmanager configuration
3. Verify notification channels
4. Check alert thresholds
5. Test alert rules

## Debugging Commands

### Network Debugging

```bash
# Check pod networking
kubectl exec -n besu-network <pod> -- ip addr

# Check DNS
kubectl exec -n besu-network <pod> -- nslookup <service>

# Check connectivity
kubectl exec -n besu-network <pod> -- ping <target>
```

### Besu Debugging

```bash
# Check Besu version
kubectl exec -n besu-network <pod> -- /opt/besu/bin/besu --version

# Check configuration
kubectl exec -n besu-network <pod> -- cat /config/besu-config.toml

# Check logs
kubectl logs -n besu-network <pod> --tail=100 -f
```

### Kubernetes Debugging

```bash
# Check pod status
kubectl describe pod <pod> -n besu-network

# Check events
kubectl get events -n besu-network --sort-by='.lastTimestamp'

# Check resources
kubectl top nodes
kubectl top pods -n besu-network
```

## Useful Resources

- [Besu Documentation](https://besu.hyperledger.org/)
- [Kubernetes Documentation](https://kubernetes.io/docs/)
- [Prometheus Documentation](https://prometheus.io/docs/)
- [Grafana Documentation](https://grafana.com/docs/)

## Getting Help

- Check logs first
- Review monitoring dashboards
- Consult runbooks
- Contact on-call engineer
- Escalate if needed
Add Oracle Aggregator and CCIP Integration - Introduced Aggregator.sol for Chainlink-compatible oracle functionality, including round-based updates and access control. - Added OracleWithCCIP.sol to extend Aggregator with CCIP cross-chain messaging capabilities. - Created .gitmodules to include OpenZeppelin contracts as a submodule. - Developed a comprehensive deployment guide in NEXT_STEPS_COMPLETE_GUIDE.md for Phase 2 and smart contract deployment. - Implemented Vite configuration for the orchestration portal, supporting both Vue and React frameworks. - Added server-side logic for the Multi-Cloud Orchestration Portal, including API endpoints for environment management and monitoring. - Created scripts for resource import and usage validation across non-US regions. - Added tests for CCIP error handling and integration to ensure robust functionality. - Included various new files and directories for the orchestration portal and deployment scripts. 2025-12-12 14:57:48 -08:00			`# Troubleshooting Guide`

			`## Common Issues and Solutions`

			`### Network Issues`

			`#### Blocks Not Being Produced`

			`Symptoms: No new blocks, validators not responding`

			`Diagnosis:`
			```bash
			`# Check validator status`
			`kubectl get pods -n besu-network -l component=validator`

			`# Check logs`
			`kubectl logs -n besu-network <validator-pod> --tail=100`

			`# Check block number`
			`curl -X POST -H "Content-Type: application/json" \`
			`--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \`
			`http://<rpc-endpoint>`
			```

			`Solutions:`
			1. Restart validators: `kubectl rollout restart statefulset/besu-validator -n besu-network`
			`2. Check network connectivity`
			`3. Verify validator keys`
			`4. Check IBFT configuration`
			`5. Verify genesis file`

			`#### Validators Not Peering`

			`Symptoms: Validators not connecting to each other`

			`Diagnosis:`
			```bash
			`# Check peer count`
			`kubectl exec -n besu-network <validator-pod> -- \`
			`curl -X POST -H "Content-Type: application/json" \`
			`--data '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \`
			`http://localhost:8545`

			`# Check static nodes`
			`kubectl get configmap besu-validator-config -n besu-network -o yaml`
			```

			`Solutions:`
			`1. Verify static-nodes.json configuration`
			`2. Check network policies`
			`3. Verify firewall rules`
			`4. Check P2P port (30303) connectivity`
			`5. Verify enode addresses`

			`### RPC Issues`

			`#### RPC Endpoints Not Responding`

			`Symptoms: RPC calls failing, timeouts`

			`Diagnosis:`
			```bash
			`# Check RPC pod status`
			`kubectl get pods -n besu-network -l component=rpc`

			`# Check logs`
			`kubectl logs -n besu-network <rpc-pod> --tail=100`

			`# Test RPC endpoint`
			`curl -X POST -H "Content-Type: application/json" \`
			`--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \`
			`http://<rpc-endpoint>`
			```

			`Solutions:`
			1. Restart RPC pods: `kubectl rollout restart statefulset/besu-rpc -n besu-network`
			`2. Check Application Gateway status`
			`3. Verify network policies`
			`4. Check rate limiting`
			`5. Scale RPC nodes if needed`

			`#### High Latency`

			`Symptoms: Slow RPC responses`

			`Diagnosis:`
			```bash
			`# Check pod resources`
			`kubectl top pods -n besu-network -l component=rpc`

			`# Check metrics`
			`curl http://<rpc-pod>:9545/metrics`

			`# Check sync status`
			`curl -X POST -H "Content-Type: application/json" \`
			`--data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' \`
			`http://<rpc-endpoint>`
			```

			`Solutions:`
			`1. Scale RPC nodes`
			`2. Increase resource limits`
			`3. Check disk I/O`
			`4. Verify network connectivity`
			`5. Check for sync issues`

			`### Oracle Issues`

			`#### Oracle Not Updating`

			`Symptoms: Oracle price not updating, circuit breaker open`

			`Diagnosis:`
			```bash
			`# Check oracle publisher status`
			`kubectl get pods -n besu-network -l app=oracle-publisher`

			`# Check logs`
			`kubectl logs -n besu-network <oracle-pod> --tail=100`

			`# Check health endpoint`
			`curl http://<oracle-pod>:8080/health`

			`# Check metrics`
			`curl http://<oracle-pod>:8000/metrics`
			```

			`Solutions:`
			`1. Restart oracle publisher`
			`2. Check data sources`
			`3. Verify RPC connectivity`
			`4. Check private key access`
			`5. Verify circuit breaker configuration`

			`#### Data Source Failures`

			`Symptoms: Failed to fetch from data sources`

			`Diagnosis:`
			```bash
			`# Check data source connectivity`
			`curl <data-source-url>`

			`# Check oracle publisher logs`
			`kubectl logs -n besu-network <oracle-pod> \| grep -i "data source"`
			```

			`Solutions:`
			`1. Verify data source URLs`
			`2. Check network connectivity`
			`3. Verify API keys`
			`4. Check rate limiting`
			`5. Update data source configuration`

			`### Storage Issues`

			`#### Disk Full`

			`Symptoms: Pods failing, disk space errors`

			`Diagnosis:`
			```bash
			`# Check disk usage`
			`kubectl exec -n besu-network <pod> -- df -h`

			`# Check PVC usage`
			`kubectl get pvc -n besu-network`

			`# Check pod logs`
			`kubectl logs -n besu-network <pod> \| grep -i "disk\\|space\\|full"`
			```

			`Solutions:`
			`1. Increase PVC size`
			`2. Clean up old data`
			`3. Archive chaindata`
			`4. Use snap sync for RPC nodes`
			`5. Implement data retention policies`

			`#### Slow Disk I/O`

			`Symptoms: Slow sync, high latency`

			`Diagnosis:`
			```bash
			`# Check disk I/O`
			`kubectl exec -n besu-network <pod> -- iostat -x 1`

			`# Check metrics`
			`curl http://<pod>:9545/metrics \| grep -i "disk\\|io"`
			```

			`Solutions:`
			`1. Upgrade to Premium SSD`
			`2. Increase disk size`
			`3. Optimize Besu configuration`
			`4. Check for disk contention`
			`5. Use faster storage class`

			`### Monitoring Issues`

			`#### Metrics Not Collecting`

			`Symptoms: No metrics in Prometheus`

			`Diagnosis:`
			```bash
			`# Check Prometheus targets`
			`curl http://<prometheus>:9090/api/v1/targets`

			`# Check service discovery`
			`kubectl get servicemonitors -n besu-network`

			`# Check pod metrics endpoint`
			`curl http://<pod>:9545/metrics`
			```

			`Solutions:`
			`1. Verify ServiceMonitor configuration`
			`2. Check network policies`
			`3. Verify metrics endpoint`
			`4. Restart Prometheus`
			`5. Check service discovery configuration`

			`#### Alerts Not Firing`

			`Symptoms: Alerts not triggering`

			`Diagnosis:`
			```bash
			`# Check Alertmanager status`
			`curl http://<alertmanager>:9093/api/v1/status`

			`# Check alert rules`
			`kubectl get prometheusrules -n besu-network`

			`# Check notification channels`
			`kubectl get secret alertmanager-config -n besu-network -o yaml`
			```

			`Solutions:`
			`1. Verify alert rules`
			`2. Check Alertmanager configuration`
			`3. Verify notification channels`
			`4. Check alert thresholds`
			`5. Test alert rules`

			`## Debugging Commands`

			`### Network Debugging`

			```bash
			`# Check pod networking`
			`kubectl exec -n besu-network <pod> -- ip addr`

			`# Check DNS`
			`kubectl exec -n besu-network <pod> -- nslookup <service>`

			`# Check connectivity`
			`kubectl exec -n besu-network <pod> -- ping <target>`
			```

			`### Besu Debugging`

			```bash
			`# Check Besu version`
			`kubectl exec -n besu-network <pod> -- /opt/besu/bin/besu --version`

			`# Check configuration`
			`kubectl exec -n besu-network <pod> -- cat /config/besu-config.toml`

			`# Check logs`
			`kubectl logs -n besu-network <pod> --tail=100 -f`
			```

			`### Kubernetes Debugging`

			```bash
			`# Check pod status`
			`kubectl describe pod <pod> -n besu-network`

			`# Check events`
			`kubectl get events -n besu-network --sort-by='.lastTimestamp'`

			`# Check resources`
			`kubectl top nodes`
			`kubectl top pods -n besu-network`
			```

			`## Useful Resources`

			`- [Besu Documentation](https://besu.hyperledger.org/)`
			`- [Kubernetes Documentation](https://kubernetes.io/docs/)`
			`- [Prometheus Documentation](https://prometheus.io/docs/)`
			`- [Grafana Documentation](https://grafana.com/docs/)`

			`## Getting Help`

			`- Check logs first`
			`- Review monitoring dashboards`
			`- Consult runbooks`
			`- Contact on-call engineer`
			`- Escalate if needed`