# Troubleshooting FAQ **Last Updated:** 2026-01-31 **Document Version:** 1.0 **Status:** Active Documentation --- Common issues and solutions for Besu validated set deployment. ## Table of Contents **Estimated Reading Time:** 30 minutes **Progress:** Check off sections as you read 1. ✅ [Container Issues](#container-issues) - *Container troubleshooting* 2. ✅ [Service Issues](#service-issues) - *Service troubleshooting* 3. ✅ [Network Issues](#network-issues) - *Network troubleshooting* 4. ✅ [Consensus Issues](#consensus-issues) - *Consensus troubleshooting* 5. ✅ [Configuration Issues](#configuration-issues) - *Configuration troubleshooting* 6. ✅ [Performance Issues](#performance-issues) - *Performance troubleshooting* 7. ✅ [Additional Common Questions](#additional-common-questions) - *More FAQs* 8. [RPC errors -32001 / -32602 / gas 32xxx](RPC_ERRORS_32001_32602.md) - *Nonce too low, Invalid params, gas when deploying* --- ## Troubleshooting Flow (Decision Tree) 1. **Is the service/container down?** → Check logs (`journalctl -u pve-container@`, `systemctl status`), then [Container Issues](#container-issues) or [Service Issues](#service-issues). 2. **Network/connectivity issue?** → Check ping, curl, DNS, firewall; see [Network Issues](#network-issues). 3. **Consensus / QBFT?** → See [QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md) and [Consensus Issues](#consensus-issues). 4. **Configuration or performance?** → See [Configuration Issues](#configuration-issues), [Performance Issues](#performance-issues), or [Additional Common Questions](#additional-common-questions). --- ## Container Issues ### Q: Container won't start **Symptoms**: `pct status ` shows "stopped" or errors during startup **Solutions**: ```bash # Check container status pct status # View container console pct console # Check logs journalctl -u pve-container@ # Check container configuration pct config # Try starting manually pct start ``` **Common Causes**: - Insufficient resources (RAM, disk) - Network configuration errors - Invalid container configuration - OS template issues
Click to expand advanced troubleshooting steps **Advanced Diagnostics:** ```bash # Check container resources pct list --full | grep # Check Proxmox host resources free -h df -h # Check container logs in detail journalctl -u pve-container@ -n 100 --no-pager # Verify container template pveam list | grep ```
--- ### Q: Container runs out of disk space **Symptoms**: Services fail, "No space left on device" errors **Solutions**: ```bash # Check disk usage pct exec -- df -h # Check Besu database size pct exec -- du -sh /data/besu/database/ # Clean up old logs pct exec -- journalctl --vacuum-time=7d # Increase disk size (if using LVM) pct resize rootfs +10G ``` --- ### Q: Container network issues **Symptoms**: Cannot ping, cannot connect to services **Solutions**: ```bash # Check network configuration pct config | grep net0 # Check if container has IP pct exec -- ip addr show # Check routing pct exec -- ip route # Restart container networking pct stop pct start ``` --- ## Service Issues ### Q: Besu service won't start **Symptoms**: `systemctl status besu-validator` shows failed **Solutions**: ```bash # Check service status pct exec -- systemctl status besu-validator # View service logs pct exec -- journalctl -u besu-validator -n 100 # Check for configuration errors pct exec -- besu --config-file=/etc/besu/config-validator.toml --help # Verify configuration file syntax pct exec -- cat /etc/besu/config-validator.toml ``` **Common Causes**: - Missing configuration files - Invalid configuration syntax - Missing validator keys - Port conflicts - Insufficient resources --- ### Q: Service starts but crashes **Symptoms**: Service starts then stops, high restart count **Solutions**: ```bash # Check crash logs pct exec -- journalctl -u besu-validator --since "10 minutes ago" # Check for out of memory pct exec -- dmesg | grep -i "out of memory" # Check system resources pct exec -- free -h pct exec -- df -h # Check JVM heap settings pct exec -- cat /etc/systemd/system/besu-validator.service | grep BESU_OPTS ``` --- ### Q: Service shows as active but not responding **Symptoms**: Service status shows "active" but RPC/P2P not responding **Solutions**: ```bash # Check if process is actually running pct exec -- ps aux | grep besu # Check if ports are listening pct exec -- netstat -tuln | grep -E "30303|8545|9545" # Check firewall rules pct exec -- iptables -L -n # Test connectivity pct exec -- curl -s http://localhost:8545 ``` --- ## Network Issues ### Q: Nodes cannot connect to peers **Symptoms**: Low or zero peer count, "No peers" in logs **Solutions**: ```bash # Check static-nodes.json pct exec -- cat /etc/besu/static-nodes.json # Check permissions-nodes.toml pct exec -- cat /etc/besu/permissions-nodes.toml # Verify enode URLs are correct pct exec -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode # Check P2P port is open pct exec -- netstat -tuln | grep 30303 # Test connectivity to peer pct exec -- ping -c 3 ``` **Common Causes**: - Incorrect enode URLs in static-nodes.json - Firewall blocking P2P port (30303) - Nodes not in permissions-nodes.toml - Network connectivity issues --- ### Q: Invalid enode URL errors **Symptoms**: "Invalid enode URL syntax" or "Invalid node ID" in logs **Solutions**: ```bash # Check node ID length (must be 128 hex chars) pct exec -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode | \ sed 's|^enode://||' | cut -d'@' -f1 | wc -c # Should output 129 (128 chars + newline) # Fix node IDs using allowlist scripts ./scripts/besu-collect-all-enodes.sh ./scripts/besu-generate-allowlist.sh ./scripts/besu-deploy-allowlist.sh ``` --- ### Q: RPC endpoint not accessible **Symptoms**: Cannot connect to RPC on port 8545 **Solutions**: ```bash # Check if RPC is enabled (validators typically don't have RPC) pct exec -- grep -i "rpc-http-enabled" /etc/besu/config-*.toml # Check if RPC port is listening pct exec -- netstat -tuln | grep 8545 # Check firewall pct exec -- iptables -L -n | grep 8545 # Test from container pct exec -- curl -X POST -H "Content-Type: application/json" \ -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \ http://localhost:8545 # Check host allowlist in config pct exec -- grep -i "host-allowlist\|rpc-http-host" /etc/besu/config-*.toml ``` --- ## Consensus Issues ### Q: No blocks being produced **Symptoms**: Block height not increasing, "No blocks" in logs **Solutions**: ```bash # Check validator service is running pct exec -- systemctl status besu-validator # Check validator keys pct exec -- ls -la /keys/validators/ # Check consensus logs pct exec -- journalctl -u besu-validator | grep -i "consensus\|qbft\|proposing" # Verify validators are in genesis (if static validators) pct exec -- cat /etc/besu/genesis.json | grep -A 20 "qbft" # Check peer connectivity pct exec -- curl -s -X POST -H "Content-Type: application/json" \ -d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \ http://localhost:8545 ``` **Common Causes**: - Validator keys missing or incorrect - Not enough validators online - Network connectivity issues - Consensus configuration errors --- ### Q: Validator not participating in consensus **Symptoms**: Validator running but not producing blocks **Solutions**: ```bash # Verify validator address pct exec -- cat /keys/validators/validator-*/address.txt # Check if address is in validator contract (for dynamic validators) # Or check genesis.json (for static validators) pct exec -- cat /etc/besu/genesis.json | python3 -m json.tool | grep -A 10 "qbft" # Verify validator keys are loaded pct exec -- journalctl -u besu-validator | grep -i "validator.*key" # Check for permission errors pct exec -- journalctl -u besu-validator | grep -i "permission\|denied" ``` --- ## Configuration Issues ### Q: Configuration file not found **Symptoms**: "File not found" errors, service won't start **Solutions**: ```bash # List all config files pct exec -- ls -la /etc/besu/ # Verify required files exist pct exec -- test -f /etc/besu/genesis.json && echo "genesis.json OK" || echo "genesis.json MISSING" pct exec -- test -f /etc/besu/config-validator.toml && echo "config OK" || echo "config MISSING" # Copy missing files # (Use copy-besu-config.sh script) ./scripts/copy-besu-config.sh /path/to/smom-dbis-138 ``` --- ### Q: Invalid configuration syntax **Symptoms**: "Invalid option" or syntax errors in logs **Solutions**: ```bash # Validate TOML syntax pct exec -- python3 -c "import tomllib; open('/etc/besu/config-validator.toml').read()" 2>&1 # Validate JSON syntax pct exec -- python3 -m json.tool /etc/besu/genesis.json > /dev/null # Check for deprecated options pct exec -- journalctl -u besu-validator | grep -i "deprecated\|unknown option" # Review Besu documentation for current options ``` --- ### Q: Path errors in configuration **Symptoms**: "File not found" errors with paths like "/config/genesis.json" **Solutions**: ```bash # Check configuration file paths pct exec -- grep -E "genesis-file|data-path" /etc/besu/config-validator.toml # Correct paths should be: # genesis-file="/etc/besu/genesis.json" # data-path="/data/besu" # Fix paths if needed pct exec -- sed -i 's|/config/|/etc/besu/|g' /etc/besu/config-validator.toml ``` --- ## Performance Issues ### Q: High CPU usage **Symptoms**: Container CPU usage > 80% consistently **Solutions**: ```bash # Check CPU usage pct exec -- top -bn1 | head -20 # Check JVM GC activity pct exec -- journalctl -u besu-validator | grep -i "gc\|pause" # Adjust JVM settings if needed # Edit /etc/systemd/system/besu-validator.service # Adjust BESU_OPTS and JAVA_OPTS # Consider allocating more CPU cores pct set --cores 4 ``` --- ### Q: High memory usage **Symptoms**: Container running out of memory, OOM kills **Solutions**: ```bash # Check memory usage pct exec -- free -h # Check JVM heap settings pct exec -- ps aux | grep besu | grep -oP 'Xm[xs]\K[0-9]+[gm]' # Reduce heap size if too large # Edit /etc/systemd/system/besu-validator.service # Adjust BESU_OPTS="-Xmx4g" to appropriate size # Or increase container memory pct set --memory 8192 ``` --- ### Q: Slow sync or block processing **Symptoms**: Blocks processing slowly, falling behind **Solutions**: ```bash # Check database size and health pct exec -- du -sh /data/besu/database/ # Check disk I/O pct exec -- iostat -x 1 5 # Consider using SSD storage # Check network latency pct exec -- ping -c 10 # Verify sufficient peers pct exec -- curl -s -X POST -H "Content-Type: application/json" \ -d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \ http://localhost:8545 | python3 -c "import sys, json; print(len(json.load(sys.stdin).get('result', [])))" ``` --- ## General Troubleshooting Commands ```bash # View all container statuses for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do echo "=== Container $vmid ===" pct status $vmid done # Check all service statuses for vmid in 1000 1001 1002 1003 1004; do pct exec $vmid -- systemctl status besu-validator --no-pager -l | head -10 done # View recent logs from all nodes for vmid in 1000 1001 1002 1003 1004; do echo "=== Logs for container $vmid ===" pct exec $vmid -- journalctl -u besu-validator -n 20 --no-pager done # Check network connectivity between nodes pct exec 1000 -- ping -c 3 192.168.11.14 # validator to validator # Verify RPC endpoint (RPC nodes only) pct exec 2500 -- curl -s -X POST -H "Content-Type: application/json" \ -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \ http://localhost:8545 | python3 -m json.tool ``` --- ## Getting Help If issues persist: 1. **Collect Information**: - Service logs: `journalctl -u besu-validator -n 100` - Container status: `pct status ` - Configuration: `pct exec -- cat /etc/besu/config-validator.toml` - Network: `pct exec -- ip addr show` 2. **Check Documentation**: - [Besu Nodes File Reference](../06-besu/BESU_NODES_FILE_REFERENCE.md) - [Deployment Guide](../03-deployment/VALIDATED_SET_DEPLOYMENT_GUIDE.md) - [Besu Documentation](https://besu.hyperledger.org/) 3. **Validate Configuration**: - Run prerequisites check: `./scripts/validation/check-prerequisites.sh` - Validate validators: `./scripts/validation/validate-validator-set.sh` 4. **Review Logs**: - Check deployment logs: `logs/deploy-validated-set-*.log` - Check service logs in containers - Check Proxmox host logs --- ## Additional Common Questions ### Q: How do I add a new VMID? **Answer:** 1. Check available VMID ranges in [VMID_ALLOCATION_FINAL.md](../02-architecture/VMID_ALLOCATION_FINAL.md) 2. Select an appropriate VMID from the designated range for your service 3. Verify the VMID is not already in use: `pct list | grep ` or `qm list | grep ` 4. Document the assignment in VMID_ALLOCATION_FINAL.md 5. Use the VMID when creating containers/VMs **Example:** ```bash # Check if VMID 2503 is available pct list | grep 2503 qm list | grep 2503 # If available, create container with VMID 2503 pct create 2503 ... ``` **Related Documentation:** - [VMID Allocation Registry](../02-architecture/VMID_ALLOCATION_FINAL.md) ⭐⭐⭐ - [Quick Reference Cards](../12-quick-reference/QUICK_REFERENCE_CARDS.md) (VMID and network) ⭐⭐⭐ --- ### Q: What's the difference between public and private RPC? **Answer:** | Feature | Public RPC | Private RPC | |---------|-----------|-------------| | **Discovery** | Enabled | Disabled | | **Permissioning** | Disabled | Enabled | | **Access** | Public (CORS: *) | Restricted (internal only) | | **APIs** | ETH, NET, WEB3 (read-only) | ETH, NET, WEB3, ADMIN, DEBUG (full) | | **Use Case** | dApps, external users | Internal services, admin | | **ChainID** | 0x8a (138) or 0x1 (wallet compatibility) | 0x8a (138) | | **Domain** | rpc-http-pub.d-bis.org | rpc-http-prv.d-bis.org | **Public RPC:** - Accessible from the internet - Used by dApps and external tools - Read-only APIs for security - May report chainID 0x1 for MetaMask compatibility **Private RPC:** - Internal network only - Used by internal services and administration - Full API access including ADMIN and DEBUG - Strict permissioning and access control **Related Documentation:** - [RPC Node Types Architecture](../05-network/RPC_NODE_TYPES_ARCHITECTURE.md) ⭐⭐ - [RPC Template Types](../05-network/RPC_TEMPLATE_TYPES.md) ⭐ --- ### Q: How do I troubleshoot Cloudflare tunnel issues? **Answer:** **Step 1: Check Tunnel Status** ```bash # Check cloudflared container status pct status 102 # Check tunnel logs pct logs 102 --tail 50 # Verify tunnel is running pct exec 102 -- ps aux | grep cloudflared ``` **Step 2: Verify Configuration** ```bash # Check tunnel configuration pct exec 102 -- cat /etc/cloudflared/config.yaml # Verify credentials file exists pct exec 102 -- ls -la /etc/cloudflared/*.json ``` **Step 3: Test Connectivity** ```bash # Test from internal network curl -I http://192.168.11.21:80 # Test from external (through Cloudflare) curl -I https://explorer.d-bis.org ``` **Step 4: Check Cloudflare Dashboard** - Verify tunnel is healthy in Cloudflare Zero Trust dashboard - Check ingress rules are configured correctly - Verify DNS records point to tunnel **Common Issues:** - Tunnel not running → Restart: `pct restart 102` - Configuration error → Check YAML syntax - Credentials invalid → Regenerate tunnel token - DNS not resolving → Check Cloudflare DNS settings **Related Documentation:** - [Cloudflare Tunnel Routing Architecture](../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md) ⭐⭐⭐ - [Cloudflare Routing Master Reference](../05-network/CLOUDFLARE_ROUTING_MASTER.md) ⭐⭐⭐ - [Troubleshooting Quick Reference](../12-quick-reference/TROUBLESHOOTING_QUICK_REFERENCE.md) ⭐⭐⭐ --- ### Q: What's the recommended storage configuration? **Answer:** **For R630 Compute Nodes:** - **Boot drives (2×600GB):** ZFS mirror (recommended) or hardware RAID1 - **Data SSDs (6×250GB):** ZFS pool with one of: - Striped mirrors (if pairs available) - RAIDZ1 (single parity, 5 drives usable) - RAIDZ2 (double parity, 4 drives usable) - **High-write workloads:** Dedicated dataset with quotas **For ML110 Management Node:** - Standard Proxmox storage configuration - Sufficient space for templates and backups **Storage Best Practices:** - Use ZFS for data integrity and snapshots - Enable compression for space efficiency - Set quotas for containers to prevent disk exhaustion - Regular backups to external storage **Related Documentation:** - [Network Architecture - Storage Orchestration](../02-architecture/NETWORK_ARCHITECTURE.md#53-storage-orchestration-r630) ⭐⭐⭐ - [Backup and Restore](../03-deployment/BACKUP_AND_RESTORE.md) ⭐⭐ --- ### Q: How do I migrate from flat LAN to VLANs? **Answer:** **Phase 1: Preparation** 1. Review VLAN plan in [NETWORK_ARCHITECTURE.md](../02-architecture/NETWORK_ARCHITECTURE.md) 2. Document current IP assignments 3. Plan IP address migration for each service 4. Create rollback plan **Phase 2: Network Configuration** 1. Configure ES216G switches with VLAN trunks 2. Enable VLAN-aware bridge on Proxmox hosts 3. Create VLAN interfaces on ER605 router 4. Test VLAN connectivity **Phase 3: Service Migration** 1. Migrate services one VLAN at a time 2. Start with non-critical services 3. Update container/VM network configuration 4. Verify connectivity after each migration **Phase 4: Validation** 1. Test all services on new VLANs 2. Verify routing between VLANs 3. Test egress NAT pools 4. Document final configuration **Migration Order (Recommended):** 1. Management services (VLAN 11) - Already active 2. Monitoring/observability (VLAN 120, 121) 3. Besu network (VLANs 110, 111, 112) 4. CCIP network (VLANs 130, 132, 133, 134) 5. Service layer (VLAN 160) 6. Sovereign tenants (VLANs 200-203) **Related Documentation:** - [Network Architecture - VLAN Orchestration](../02-architecture/NETWORK_ARCHITECTURE.md#3-layer-2--vlan-orchestration-plan) ⭐⭐⭐ - [Orchestration Deployment Guide - VLAN Enablement](../02-architecture/ORCHESTRATION_DEPLOYMENT_GUIDE.md#phase-1--vlan-enablement) ⭐⭐⭐ --- ## Additional Common Questions (Expanded) ### Q: How do I find which VMID uses a given IP? **Answer:** See [NETWORK_CONFIGURATION_MASTER.md](../11-references/NETWORK_CONFIGURATION_MASTER.md) for IP ranges by service type and VMID. Use `pct list` or `qm list` on the Proxmox host to list containers/VMs and their config (including IP). ### Q: What's the difference between public and private RPC? **Answer:** **Public RPC** (e.g. rpc-http-pub.d-bis.org) is exposed for external clients; may have rate limits and JWT. **Private RPC** (e.g. rpc-http-prv.d-bis.org) is for internal or trusted clients. See [05-network/CLOUDFLARE_ROUTING_MASTER.md](../05-network/CLOUDFLARE_ROUTING_MASTER.md) for domain → backend mapping. ### Q: Cloudflare tunnel not connecting – where do I start? **Answer:** 1) Check cloudflared service on the tunnel host (VMID 102 or NPMplus). 2) Verify credentials and tunnel ID. 3) Check [04-configuration/cloudflare/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md) and [05-network/CLOUDFLARE_ROUTING_MASTER.md](../05-network/CLOUDFLARE_ROUTING_MASTER.md). 4) Confirm NPMplus (192.168.11.167) is reachable from UDM Pro port forward. ### Q: Recommended storage configuration for RPC nodes? **Answer:** Use SSD for Besu data directory; avoid NFS for Besu unless tested. See [02-architecture/NETWORK_ARCHITECTURE.md](../02-architecture/NETWORK_ARCHITECTURE.md) and deployment guides for node layout. Run `scripts/audit-proxmox-rpc-storage.sh` to check restrictions. --- ## Related Documentation ### Operational Procedures - **[OPERATIONAL_RUNBOOKS.md](../03-deployment/OPERATIONAL_RUNBOOKS.md)** - Complete operational runbooks - **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting - **[BESU_ALLOWLIST_QUICK_START.md](../06-besu/BESU_ALLOWLIST_QUICK_START.md)** - Allowlist troubleshooting ### Deployment & Configuration - **[DEPLOYMENT_STATUS_CONSOLIDATED.md](../03-deployment/DEPLOYMENT_STATUS_CONSOLIDATED.md)** - Current deployment status - **[NETWORK_ARCHITECTURE.md](../02-architecture/NETWORK_ARCHITECTURE.md)** - Network architecture reference - **[VALIDATED_SET_DEPLOYMENT_GUIDE.md](../03-deployment/VALIDATED_SET_DEPLOYMENT_GUIDE.md)** - Deployment guide ### Monitoring - **[MONITORING_SUMMARY.md](../08-monitoring/MONITORING_SUMMARY.md)** - Monitoring setup - **[BLOCK_PRODUCTION_MONITORING.md](../08-monitoring/BLOCK_PRODUCTION_MONITORING.md)** - Block production monitoring ### Reference - **[MASTER_INDEX.md](../MASTER_INDEX.md)** - Complete documentation index --- **Last Updated:** 2025-01-20 **Version:** 1.0