- Organized 252 files across project - Root directory: 187 → 2 files (98.9% reduction) - Moved configuration guides to docs/04-configuration/ - Moved troubleshooting guides to docs/09-troubleshooting/ - Moved quick start guides to docs/01-getting-started/ - Moved reports to reports/ directory - Archived temporary files - Generated comprehensive reports and documentation - Created maintenance scripts and guides All files organized according to established standards.
19 KiB
Troubleshooting FAQ
Common issues and solutions for Besu validated set deployment.
Table of Contents
Estimated Reading Time: 30 minutes
Progress: Check off sections as you read
- ✅ Container Issues - Container troubleshooting
- ✅ Service Issues - Service troubleshooting
- ✅ Network Issues - Network troubleshooting
- ✅ Consensus Issues - Consensus troubleshooting
- ✅ Configuration Issues - Configuration troubleshooting
- ✅ Performance Issues - Performance troubleshooting
- ✅ Additional Common Questions - More FAQs
Container Issues
Q: Container won't start
Symptoms: pct status <vmid> shows "stopped" or errors during startup
Solutions:
# Check container status
pct status <vmid>
# View container console
pct console <vmid>
# Check logs
journalctl -u pve-container@<vmid>
# Check container configuration
pct config <vmid>
# Try starting manually
pct start <vmid>
Common Causes:
- Insufficient resources (RAM, disk)
- Network configuration errors
- Invalid container configuration
- OS template issues
Click to expand advanced troubleshooting steps
Advanced Diagnostics:
# Check container resources
pct list --full | grep <vmid>
# Check Proxmox host resources
free -h
df -h
# Check container logs in detail
journalctl -u pve-container@<vmid> -n 100 --no-pager
# Verify container template
pveam list | grep <template-name>
Q: Container runs out of disk space
Symptoms: Services fail, "No space left on device" errors
Solutions:
# Check disk usage
pct exec <vmid> -- df -h
# Check Besu database size
pct exec <vmid> -- du -sh /data/besu/database/
# Clean up old logs
pct exec <vmid> -- journalctl --vacuum-time=7d
# Increase disk size (if using LVM)
pct resize <vmid> rootfs +10G
Q: Container network issues
Symptoms: Cannot ping, cannot connect to services
Solutions:
# Check network configuration
pct config <vmid> | grep net0
# Check if container has IP
pct exec <vmid> -- ip addr show
# Check routing
pct exec <vmid> -- ip route
# Restart container networking
pct stop <vmid>
pct start <vmid>
Service Issues
Q: Besu service won't start
Symptoms: systemctl status besu-validator shows failed
Solutions:
# Check service status
pct exec <vmid> -- systemctl status besu-validator
# View service logs
pct exec <vmid> -- journalctl -u besu-validator -n 100
# Check for configuration errors
pct exec <vmid> -- besu --config-file=/etc/besu/config-validator.toml --help
# Verify configuration file syntax
pct exec <vmid> -- cat /etc/besu/config-validator.toml
Common Causes:
- Missing configuration files
- Invalid configuration syntax
- Missing validator keys
- Port conflicts
- Insufficient resources
Q: Service starts but crashes
Symptoms: Service starts then stops, high restart count
Solutions:
# Check crash logs
pct exec <vmid> -- journalctl -u besu-validator --since "10 minutes ago"
# Check for out of memory
pct exec <vmid> -- dmesg | grep -i "out of memory"
# Check system resources
pct exec <vmid> -- free -h
pct exec <vmid> -- df -h
# Check JVM heap settings
pct exec <vmid> -- cat /etc/systemd/system/besu-validator.service | grep BESU_OPTS
Q: Service shows as active but not responding
Symptoms: Service status shows "active" but RPC/P2P not responding
Solutions:
# Check if process is actually running
pct exec <vmid> -- ps aux | grep besu
# Check if ports are listening
pct exec <vmid> -- netstat -tuln | grep -E "30303|8545|9545"
# Check firewall rules
pct exec <vmid> -- iptables -L -n
# Test connectivity
pct exec <vmid> -- curl -s http://localhost:8545
Network Issues
Q: Nodes cannot connect to peers
Symptoms: Low or zero peer count, "No peers" in logs
Solutions:
# Check static-nodes.json
pct exec <vmid> -- cat /etc/besu/static-nodes.json
# Check permissions-nodes.toml
pct exec <vmid> -- cat /etc/besu/permissions-nodes.toml
# Verify enode URLs are correct
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode
# Check P2P port is open
pct exec <vmid> -- netstat -tuln | grep 30303
# Test connectivity to peer
pct exec <vmid> -- ping -c 3 <peer-ip>
Common Causes:
- Incorrect enode URLs in static-nodes.json
- Firewall blocking P2P port (30303)
- Nodes not in permissions-nodes.toml
- Network connectivity issues
Q: Invalid enode URL errors
Symptoms: "Invalid enode URL syntax" or "Invalid node ID" in logs
Solutions:
# Check node ID length (must be 128 hex chars)
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode | \
sed 's|^enode://||' | cut -d'@' -f1 | wc -c
# Should output 129 (128 chars + newline)
# Fix node IDs using allowlist scripts
./scripts/besu-collect-all-enodes.sh
./scripts/besu-generate-allowlist.sh
./scripts/besu-deploy-allowlist.sh
Q: RPC endpoint not accessible
Symptoms: Cannot connect to RPC on port 8545
Solutions:
# Check if RPC is enabled (validators typically don't have RPC)
pct exec <vmid> -- grep -i "rpc-http-enabled" /etc/besu/config-*.toml
# Check if RPC port is listening
pct exec <vmid> -- netstat -tuln | grep 8545
# Check firewall
pct exec <vmid> -- iptables -L -n | grep 8545
# Test from container
pct exec <vmid> -- curl -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
http://localhost:8545
# Check host allowlist in config
pct exec <vmid> -- grep -i "host-allowlist\|rpc-http-host" /etc/besu/config-*.toml
Consensus Issues
Q: No blocks being produced
Symptoms: Block height not increasing, "No blocks" in logs
Solutions:
# Check validator service is running
pct exec <vmid> -- systemctl status besu-validator
# Check validator keys
pct exec <vmid> -- ls -la /keys/validators/
# Check consensus logs
pct exec <vmid> -- journalctl -u besu-validator | grep -i "consensus\|qbft\|proposing"
# Verify validators are in genesis (if static validators)
pct exec <vmid> -- cat /etc/besu/genesis.json | grep -A 20 "qbft"
# Check peer connectivity
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
http://localhost:8545
Common Causes:
- Validator keys missing or incorrect
- Not enough validators online
- Network connectivity issues
- Consensus configuration errors
Q: Validator not participating in consensus
Symptoms: Validator running but not producing blocks
Solutions:
# Verify validator address
pct exec <vmid> -- cat /keys/validators/validator-*/address.txt
# Check if address is in validator contract (for dynamic validators)
# Or check genesis.json (for static validators)
pct exec <vmid> -- cat /etc/besu/genesis.json | python3 -m json.tool | grep -A 10 "qbft"
# Verify validator keys are loaded
pct exec <vmid> -- journalctl -u besu-validator | grep -i "validator.*key"
# Check for permission errors
pct exec <vmid> -- journalctl -u besu-validator | grep -i "permission\|denied"
Configuration Issues
Q: Configuration file not found
Symptoms: "File not found" errors, service won't start
Solutions:
# List all config files
pct exec <vmid> -- ls -la /etc/besu/
# Verify required files exist
pct exec <vmid> -- test -f /etc/besu/genesis.json && echo "genesis.json OK" || echo "genesis.json MISSING"
pct exec <vmid> -- test -f /etc/besu/config-validator.toml && echo "config OK" || echo "config MISSING"
# Copy missing files
# (Use copy-besu-config.sh script)
./scripts/copy-besu-config.sh /path/to/smom-dbis-138
Q: Invalid configuration syntax
Symptoms: "Invalid option" or syntax errors in logs
Solutions:
# Validate TOML syntax
pct exec <vmid> -- python3 -c "import tomllib; open('/etc/besu/config-validator.toml').read()" 2>&1
# Validate JSON syntax
pct exec <vmid> -- python3 -m json.tool /etc/besu/genesis.json > /dev/null
# Check for deprecated options
pct exec <vmid> -- journalctl -u besu-validator | grep -i "deprecated\|unknown option"
# Review Besu documentation for current options
Q: Path errors in configuration
Symptoms: "File not found" errors with paths like "/config/genesis.json"
Solutions:
# Check configuration file paths
pct exec <vmid> -- grep -E "genesis-file|data-path" /etc/besu/config-validator.toml
# Correct paths should be:
# genesis-file="/etc/besu/genesis.json"
# data-path="/data/besu"
# Fix paths if needed
pct exec <vmid> -- sed -i 's|/config/|/etc/besu/|g' /etc/besu/config-validator.toml
Performance Issues
Q: High CPU usage
Symptoms: Container CPU usage > 80% consistently
Solutions:
# Check CPU usage
pct exec <vmid> -- top -bn1 | head -20
# Check JVM GC activity
pct exec <vmid> -- journalctl -u besu-validator | grep -i "gc\|pause"
# Adjust JVM settings if needed
# Edit /etc/systemd/system/besu-validator.service
# Adjust BESU_OPTS and JAVA_OPTS
# Consider allocating more CPU cores
pct set <vmid> --cores 4
Q: High memory usage
Symptoms: Container running out of memory, OOM kills
Solutions:
# Check memory usage
pct exec <vmid> -- free -h
# Check JVM heap settings
pct exec <vmid> -- ps aux | grep besu | grep -oP 'Xm[xs]\K[0-9]+[gm]'
# Reduce heap size if too large
# Edit /etc/systemd/system/besu-validator.service
# Adjust BESU_OPTS="-Xmx4g" to appropriate size
# Or increase container memory
pct set <vmid> --memory 8192
Q: Slow sync or block processing
Symptoms: Blocks processing slowly, falling behind
Solutions:
# Check database size and health
pct exec <vmid> -- du -sh /data/besu/database/
# Check disk I/O
pct exec <vmid> -- iostat -x 1 5
# Consider using SSD storage
# Check network latency
pct exec <vmid> -- ping -c 10 <peer-ip>
# Verify sufficient peers
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
http://localhost:8545 | python3 -c "import sys, json; print(len(json.load(sys.stdin).get('result', [])))"
General Troubleshooting Commands
# View all container statuses
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do
echo "=== Container $vmid ==="
pct status $vmid
done
# Check all service statuses
for vmid in 1000 1001 1002 1003 1004; do
pct exec $vmid -- systemctl status besu-validator --no-pager -l | head -10
done
# View recent logs from all nodes
for vmid in 1000 1001 1002 1003 1004; do
echo "=== Logs for container $vmid ==="
pct exec $vmid -- journalctl -u besu-validator -n 20 --no-pager
done
# Check network connectivity between nodes
pct exec 1000 -- ping -c 3 192.168.11.14 # validator to validator
# Verify RPC endpoint (RPC nodes only)
pct exec 2500 -- curl -s -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
http://localhost:8545 | python3 -m json.tool
Getting Help
If issues persist:
-
Collect Information:
- Service logs:
journalctl -u besu-validator -n 100 - Container status:
pct status <vmid> - Configuration:
pct exec <vmid> -- cat /etc/besu/config-validator.toml - Network:
pct exec <vmid> -- ip addr show
- Service logs:
-
Check Documentation:
-
Validate Configuration:
- Run prerequisites check:
./scripts/validation/check-prerequisites.sh - Validate validators:
./scripts/validation/validate-validator-set.sh
- Run prerequisites check:
-
Review Logs:
- Check deployment logs:
logs/deploy-validated-set-*.log - Check service logs in containers
- Check Proxmox host logs
- Check deployment logs:
Additional Common Questions
Q: How do I add a new VMID?
Answer:
- Check available VMID ranges in VMID_ALLOCATION_FINAL.md
- Select an appropriate VMID from the designated range for your service
- Verify the VMID is not already in use:
pct list | grep <vmid>orqm list | grep <vmid> - Document the assignment in VMID_ALLOCATION_FINAL.md
- Use the VMID when creating containers/VMs
Example:
# Check if VMID 2503 is available
pct list | grep 2503
qm list | grep 2503
# If available, create container with VMID 2503
pct create 2503 ...
Related Documentation:
Q: What's the difference between public and private RPC?
Answer:
| Feature | Public RPC | Private RPC |
|---|---|---|
| Discovery | Enabled | Disabled |
| Permissioning | Disabled | Enabled |
| Access | Public (CORS: *) | Restricted (internal only) |
| APIs | ETH, NET, WEB3 (read-only) | ETH, NET, WEB3, ADMIN, DEBUG (full) |
| Use Case | dApps, external users | Internal services, admin |
| ChainID | 0x8a (138) or 0x1 (wallet compatibility) | 0x8a (138) |
| Domain | rpc-http-pub.d-bis.org | rpc-http-prv.d-bis.org |
Public RPC:
- Accessible from the internet
- Used by dApps and external tools
- Read-only APIs for security
- May report chainID 0x1 for MetaMask compatibility
Private RPC:
- Internal network only
- Used by internal services and administration
- Full API access including ADMIN and DEBUG
- Strict permissioning and access control
Related Documentation:
Q: How do I troubleshoot Cloudflare tunnel issues?
Answer:
Step 1: Check Tunnel Status
# Check cloudflared container status
pct status 102
# Check tunnel logs
pct logs 102 --tail 50
# Verify tunnel is running
pct exec 102 -- ps aux | grep cloudflared
Step 2: Verify Configuration
# Check tunnel configuration
pct exec 102 -- cat /etc/cloudflared/config.yaml
# Verify credentials file exists
pct exec 102 -- ls -la /etc/cloudflared/*.json
Step 3: Test Connectivity
# Test from internal network
curl -I http://192.168.11.21:80
# Test from external (through Cloudflare)
curl -I https://explorer.d-bis.org
Step 4: Check Cloudflare Dashboard
- Verify tunnel is healthy in Cloudflare Zero Trust dashboard
- Check ingress rules are configured correctly
- Verify DNS records point to tunnel
Common Issues:
- Tunnel not running → Restart:
pct restart 102 - Configuration error → Check YAML syntax
- Credentials invalid → Regenerate tunnel token
- DNS not resolving → Check Cloudflare DNS settings
Related Documentation:
- Cloudflare Tunnel Routing Architecture ⭐⭐⭐
- Cloudflare Routing Master Reference ⭐⭐⭐
- Troubleshooting Quick Reference ⭐⭐⭐
Q: What's the recommended storage configuration?
Answer:
For R630 Compute Nodes:
- Boot drives (2×600GB): ZFS mirror (recommended) or hardware RAID1
- Data SSDs (6×250GB): ZFS pool with one of:
- Striped mirrors (if pairs available)
- RAIDZ1 (single parity, 5 drives usable)
- RAIDZ2 (double parity, 4 drives usable)
- High-write workloads: Dedicated dataset with quotas
For ML110 Management Node:
- Standard Proxmox storage configuration
- Sufficient space for templates and backups
Storage Best Practices:
- Use ZFS for data integrity and snapshots
- Enable compression for space efficiency
- Set quotas for containers to prevent disk exhaustion
- Regular backups to external storage
Related Documentation:
Q: How do I migrate from flat LAN to VLANs?
Answer:
Phase 1: Preparation
- Review VLAN plan in NETWORK_ARCHITECTURE.md
- Document current IP assignments
- Plan IP address migration for each service
- Create rollback plan
Phase 2: Network Configuration
- Configure ES216G switches with VLAN trunks
- Enable VLAN-aware bridge on Proxmox hosts
- Create VLAN interfaces on ER605 router
- Test VLAN connectivity
Phase 3: Service Migration
- Migrate services one VLAN at a time
- Start with non-critical services
- Update container/VM network configuration
- Verify connectivity after each migration
Phase 4: Validation
- Test all services on new VLANs
- Verify routing between VLANs
- Test egress NAT pools
- Document final configuration
Migration Order (Recommended):
- Management services (VLAN 11) - Already active
- Monitoring/observability (VLAN 120, 121)
- Besu network (VLANs 110, 111, 112)
- CCIP network (VLANs 130, 132, 133, 134)
- Service layer (VLAN 160)
- Sovereign tenants (VLANs 200-203)
Related Documentation:
Related Documentation
Operational Procedures
- OPERATIONAL_RUNBOOKS.md - Complete operational runbooks
- QBFT_TROUBLESHOOTING.md - QBFT consensus troubleshooting
- BESU_ALLOWLIST_QUICK_START.md - Allowlist troubleshooting
Deployment & Configuration
- DEPLOYMENT_STATUS_CONSOLIDATED.md - Current deployment status
- NETWORK_ARCHITECTURE.md - Network architecture reference
- VALIDATED_SET_DEPLOYMENT_GUIDE.md - Deployment guide
Monitoring
- MONITORING_SUMMARY.md - Monitoring setup
- BLOCK_PRODUCTION_MONITORING.md - Block production monitoring
Reference
- MASTER_INDEX.md - Complete documentation index
Last Updated: 2025-01-20
Version: 1.0