Files

Deploy to Phoenix / deploy (push) Has been cancelled

Details

docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates

- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-12 15:46:57 -08:00

21 KiB

Raw Blame History

Troubleshooting FAQ

Last Updated: 2026-01-31
Document Version: 1.0
Status: Active Documentation

Common issues and solutions for Besu validated set deployment.

Estimated Reading Time: 30 minutes
Progress: Check off sections as you read

✅ Container Issues - Container troubleshooting
✅ Service Issues - Service troubleshooting
✅ Network Issues - Network troubleshooting
✅ Consensus Issues - Consensus troubleshooting
✅ Configuration Issues - Configuration troubleshooting
✅ Performance Issues - Performance troubleshooting
✅ Additional Common Questions - More FAQs

Troubleshooting Flow (Decision Tree)

Is the service/container down? → Check logs (journalctl -u pve-container@<vmid>, systemctl status), then Container Issues or Service Issues.
Network/connectivity issue? → Check ping, curl, DNS, firewall; see Network Issues.
Consensus / QBFT? → See QBFT_TROUBLESHOOTING.md and Consensus Issues.
Configuration or performance? → See Configuration Issues, Performance Issues, or Additional Common Questions.

Container Issues

Q: Container won't start

Symptoms: pct status <vmid> shows "stopped" or errors during startup

Solutions:

# Check container status
pct status <vmid>

# View container console
pct console <vmid>

# Check logs
journalctl -u pve-container@<vmid>

# Check container configuration
pct config <vmid>

# Try starting manually
pct start <vmid>

Common Causes:

Insufficient resources (RAM, disk)
Network configuration errors
Invalid container configuration
OS template issues

Click to expand advanced troubleshooting steps

Advanced Diagnostics:

# Check container resources
pct list --full | grep <vmid>

# Check Proxmox host resources
free -h
df -h

# Check container logs in detail
journalctl -u pve-container@<vmid> -n 100 --no-pager

# Verify container template
pveam list | grep <template-name>

Q: Container runs out of disk space

Symptoms: Services fail, "No space left on device" errors

Solutions:

# Check disk usage
pct exec <vmid> -- df -h

# Check Besu database size
pct exec <vmid> -- du -sh /data/besu/database/

# Clean up old logs
pct exec <vmid> -- journalctl --vacuum-time=7d

# Increase disk size (if using LVM)
pct resize <vmid> rootfs +10G

Q: Container network issues

Symptoms: Cannot ping, cannot connect to services

Solutions:

# Check network configuration
pct config <vmid> | grep net0

# Check if container has IP
pct exec <vmid> -- ip addr show

# Check routing
pct exec <vmid> -- ip route

# Restart container networking
pct stop <vmid>
pct start <vmid>

Service Issues

Q: Besu service won't start

Symptoms: systemctl status besu-validator shows failed

Solutions:

# Check service status
pct exec <vmid> -- systemctl status besu-validator

# View service logs
pct exec <vmid> -- journalctl -u besu-validator -n 100

# Check for configuration errors
pct exec <vmid> -- besu --config-file=/etc/besu/config-validator.toml --help

# Verify configuration file syntax
pct exec <vmid> -- cat /etc/besu/config-validator.toml

Common Causes:

Missing configuration files
Invalid configuration syntax
Missing validator keys
Port conflicts
Insufficient resources

Q: Service starts but crashes

Symptoms: Service starts then stops, high restart count

Solutions:

# Check crash logs
pct exec <vmid> -- journalctl -u besu-validator --since "10 minutes ago"

# Check for out of memory
pct exec <vmid> -- dmesg | grep -i "out of memory"

# Check system resources
pct exec <vmid> -- free -h
pct exec <vmid> -- df -h

# Check JVM heap settings
pct exec <vmid> -- cat /etc/systemd/system/besu-validator.service | grep BESU_OPTS

Q: Service shows as active but not responding

Symptoms: Service status shows "active" but RPC/P2P not responding

Solutions:

# Check if process is actually running
pct exec <vmid> -- ps aux | grep besu

# Check if ports are listening
pct exec <vmid> -- netstat -tuln | grep -E "30303|8545|9545"

# Check firewall rules
pct exec <vmid> -- iptables -L -n

# Test connectivity
pct exec <vmid> -- curl -s http://localhost:8545

Network Issues

Q: Nodes cannot connect to peers

Symptoms: Low or zero peer count, "No peers" in logs

Solutions:

# Check static-nodes.json
pct exec <vmid> -- cat /etc/besu/static-nodes.json

# Check permissions-nodes.toml
pct exec <vmid> -- cat /etc/besu/permissions-nodes.toml

# Verify enode URLs are correct
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode

# Check P2P port is open
pct exec <vmid> -- netstat -tuln | grep 30303

# Test connectivity to peer
pct exec <vmid> -- ping -c 3 <peer-ip>

Common Causes:

Incorrect enode URLs in static-nodes.json
Firewall blocking P2P port (30303)
Nodes not in permissions-nodes.toml
Network connectivity issues

Q: Invalid enode URL errors

Symptoms: "Invalid enode URL syntax" or "Invalid node ID" in logs

Solutions:

# Check node ID length (must be 128 hex chars)
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode | \
    sed 's|^enode://||' | cut -d'@' -f1 | wc -c

# Should output 129 (128 chars + newline)

# Fix node IDs using allowlist scripts
./scripts/besu-collect-all-enodes.sh
./scripts/besu-generate-allowlist.sh
./scripts/besu-deploy-allowlist.sh

Q: RPC endpoint not accessible

Symptoms: Cannot connect to RPC on port 8545

Solutions:

# Check if RPC is enabled (validators typically don't have RPC)
pct exec <vmid> -- grep -i "rpc-http-enabled" /etc/besu/config-*.toml

# Check if RPC port is listening
pct exec <vmid> -- netstat -tuln | grep 8545

# Check firewall
pct exec <vmid> -- iptables -L -n | grep 8545

# Test from container
pct exec <vmid> -- curl -X POST -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
    http://localhost:8545

# Check host allowlist in config
pct exec <vmid> -- grep -i "host-allowlist\|rpc-http-host" /etc/besu/config-*.toml

Consensus Issues

Q: No blocks being produced

Symptoms: Block height not increasing, "No blocks" in logs

Solutions:

# Check validator service is running
pct exec <vmid> -- systemctl status besu-validator

# Check validator keys
pct exec <vmid> -- ls -la /keys/validators/

# Check consensus logs
pct exec <vmid> -- journalctl -u besu-validator | grep -i "consensus\|qbft\|proposing"

# Verify validators are in genesis (if static validators)
pct exec <vmid> -- cat /etc/besu/genesis.json | grep -A 20 "qbft"

# Check peer connectivity
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
    http://localhost:8545

Common Causes:

Validator keys missing or incorrect
Not enough validators online
Network connectivity issues
Consensus configuration errors

Q: Validator not participating in consensus

Symptoms: Validator running but not producing blocks

Solutions:

# Verify validator address
pct exec <vmid> -- cat /keys/validators/validator-*/address.txt

# Check if address is in validator contract (for dynamic validators)
# Or check genesis.json (for static validators)
pct exec <vmid> -- cat /etc/besu/genesis.json | python3 -m json.tool | grep -A 10 "qbft"

# Verify validator keys are loaded
pct exec <vmid> -- journalctl -u besu-validator | grep -i "validator.*key"

# Check for permission errors
pct exec <vmid> -- journalctl -u besu-validator | grep -i "permission\|denied"

Configuration Issues

Q: Configuration file not found

Symptoms: "File not found" errors, service won't start

Solutions:

# List all config files
pct exec <vmid> -- ls -la /etc/besu/

# Verify required files exist
pct exec <vmid> -- test -f /etc/besu/genesis.json && echo "genesis.json OK" || echo "genesis.json MISSING"
pct exec <vmid> -- test -f /etc/besu/config-validator.toml && echo "config OK" || echo "config MISSING"

# Copy missing files
# (Use copy-besu-config.sh script)
./scripts/copy-besu-config.sh /path/to/smom-dbis-138

Q: Invalid configuration syntax

Symptoms: "Invalid option" or syntax errors in logs

Solutions:

# Validate TOML syntax
pct exec <vmid> -- python3 -c "import tomllib; open('/etc/besu/config-validator.toml').read()" 2>&1

# Validate JSON syntax
pct exec <vmid> -- python3 -m json.tool /etc/besu/genesis.json > /dev/null

# Check for deprecated options
pct exec <vmid> -- journalctl -u besu-validator | grep -i "deprecated\|unknown option"

# Review Besu documentation for current options

Q: Path errors in configuration

Symptoms: "File not found" errors with paths like "/config/genesis.json"

Solutions:

# Check configuration file paths
pct exec <vmid> -- grep -E "genesis-file|data-path" /etc/besu/config-validator.toml

# Correct paths should be:
# genesis-file="/etc/besu/genesis.json"
# data-path="/data/besu"

# Fix paths if needed
pct exec <vmid> -- sed -i 's|/config/|/etc/besu/|g' /etc/besu/config-validator.toml

Performance Issues

Q: High CPU usage

Symptoms: Container CPU usage > 80% consistently

Solutions:

# Check CPU usage
pct exec <vmid> -- top -bn1 | head -20

# Check JVM GC activity
pct exec <vmid> -- journalctl -u besu-validator | grep -i "gc\|pause"

# Adjust JVM settings if needed
# Edit /etc/systemd/system/besu-validator.service
# Adjust BESU_OPTS and JAVA_OPTS

# Consider allocating more CPU cores
pct set <vmid> --cores 4

Q: High memory usage

Symptoms: Container running out of memory, OOM kills

Solutions:

# Check memory usage
pct exec <vmid> -- free -h

# Check JVM heap settings
pct exec <vmid> -- ps aux | grep besu | grep -oP 'Xm[xs]\K[0-9]+[gm]'

# Reduce heap size if too large
# Edit /etc/systemd/system/besu-validator.service
# Adjust BESU_OPTS="-Xmx4g" to appropriate size

# Or increase container memory
pct set <vmid> --memory 8192

Q: Slow sync or block processing

Symptoms: Blocks processing slowly, falling behind

Solutions:

# Check database size and health
pct exec <vmid> -- du -sh /data/besu/database/

# Check disk I/O
pct exec <vmid> -- iostat -x 1 5

# Consider using SSD storage
# Check network latency
pct exec <vmid> -- ping -c 10 <peer-ip>

# Verify sufficient peers
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
    http://localhost:8545 | python3 -c "import sys, json; print(len(json.load(sys.stdin).get('result', [])))"

General Troubleshooting Commands

# View all container statuses
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do
    echo "=== Container $vmid ==="
    pct status $vmid
done

# Check all service statuses
for vmid in 1000 1001 1002 1003 1004; do
    pct exec $vmid -- systemctl status besu-validator --no-pager -l | head -10
done

# View recent logs from all nodes
for vmid in 1000 1001 1002 1003 1004; do
    echo "=== Logs for container $vmid ==="
    pct exec $vmid -- journalctl -u besu-validator -n 20 --no-pager
done

# Check network connectivity between nodes
pct exec 1000 -- ping -c 3 192.168.11.14  # validator to validator

# Verify RPC endpoint (RPC nodes only)
pct exec 2500 -- curl -s -X POST -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
    http://localhost:8545 | python3 -m json.tool

Getting Help

If issues persist:

Collect Information:
- Service logs: journalctl -u besu-validator -n 100
- Container status: pct status <vmid>
- Configuration: pct exec <vmid> -- cat /etc/besu/config-validator.toml
- Network: pct exec <vmid> -- ip addr show
Check Documentation:
Validate Configuration:
- Run prerequisites check: ./scripts/validation/check-prerequisites.sh
- Validate validators: ./scripts/validation/validate-validator-set.sh
Review Logs:
- Check deployment logs: logs/deploy-validated-set-*.log
- Check service logs in containers
- Check Proxmox host logs

Additional Common Questions

Q: How do I add a new VMID?

Answer:

Check available VMID ranges in VMID_ALLOCATION_FINAL.md
Select an appropriate VMID from the designated range for your service
Verify the VMID is not already in use: pct list | grep <vmid> or qm list | grep <vmid>
Document the assignment in VMID_ALLOCATION_FINAL.md
Use the VMID when creating containers/VMs

Example:

# Check if VMID 2503 is available
pct list | grep 2503
qm list | grep 2503

# If available, create container with VMID 2503
pct create 2503 ...

Related Documentation:

VMID Allocation Registry ⭐⭐⭐
Quick Reference Cards (VMID and network) ⭐⭐⭐

Q: What's the difference between public and private RPC?

Answer:

Feature	Public RPC	Private RPC
Discovery	Enabled	Disabled
Permissioning	Disabled	Enabled
Access	Public (CORS: *)	Restricted (internal only)
APIs	ETH, NET, WEB3 (read-only)	ETH, NET, WEB3, ADMIN, DEBUG (full)
Use Case	dApps, external users	Internal services, admin
ChainID	0x8a (138) or 0x1 (wallet compatibility)	0x8a (138)
Domain	rpc-http-pub.d-bis.org	rpc-http-prv.d-bis.org

Public RPC:

Accessible from the internet
Used by dApps and external tools
Read-only APIs for security
May report chainID 0x1 for MetaMask compatibility

Private RPC:

Internal network only
Used by internal services and administration
Full API access including ADMIN and DEBUG
Strict permissioning and access control

Related Documentation:

RPC Node Types Architecture ⭐⭐
RPC Template Types ⭐

Q: How do I troubleshoot Cloudflare tunnel issues?

Answer:

Step 1: Check Tunnel Status

# Check cloudflared container status
pct status 102

# Check tunnel logs
pct logs 102 --tail 50

# Verify tunnel is running
pct exec 102 -- ps aux | grep cloudflared

Step 2: Verify Configuration

# Check tunnel configuration
pct exec 102 -- cat /etc/cloudflared/config.yaml

# Verify credentials file exists
pct exec 102 -- ls -la /etc/cloudflared/*.json

Step 3: Test Connectivity

# Test from internal network
curl -I http://192.168.11.21:80

# Test from external (through Cloudflare)
curl -I https://explorer.d-bis.org

Step 4: Check Cloudflare Dashboard

Verify tunnel is healthy in Cloudflare Zero Trust dashboard
Check ingress rules are configured correctly
Verify DNS records point to tunnel

Common Issues:

Tunnel not running → Restart: pct restart 102
Configuration error → Check YAML syntax
Credentials invalid → Regenerate tunnel token
DNS not resolving → Check Cloudflare DNS settings

Related Documentation:

Cloudflare Tunnel Routing Architecture ⭐⭐⭐
Cloudflare Routing Master Reference ⭐⭐⭐
Troubleshooting Quick Reference ⭐⭐⭐

Q: What's the recommended storage configuration?

Answer:

For R630 Compute Nodes:

Boot drives (2×600GB): ZFS mirror (recommended) or hardware RAID1
Data SSDs (6×250GB): ZFS pool with one of:
- Striped mirrors (if pairs available)
- RAIDZ1 (single parity, 5 drives usable)
- RAIDZ2 (double parity, 4 drives usable)
High-write workloads: Dedicated dataset with quotas

For ML110 Management Node:

Standard Proxmox storage configuration
Sufficient space for templates and backups

Storage Best Practices:

Use ZFS for data integrity and snapshots
Enable compression for space efficiency
Set quotas for containers to prevent disk exhaustion
Regular backups to external storage

Related Documentation:

Network Architecture - Storage Orchestration ⭐⭐⭐
Backup and Restore ⭐⭐

Q: How do I migrate from flat LAN to VLANs?

Answer:

Phase 1: Preparation

Review VLAN plan in NETWORK_ARCHITECTURE.md
Document current IP assignments
Plan IP address migration for each service
Create rollback plan

Phase 2: Network Configuration

Configure ES216G switches with VLAN trunks
Enable VLAN-aware bridge on Proxmox hosts
Create VLAN interfaces on ER605 router
Test VLAN connectivity

Phase 3: Service Migration

Migrate services one VLAN at a time
Start with non-critical services
Update container/VM network configuration
Verify connectivity after each migration

Phase 4: Validation

Test all services on new VLANs
Verify routing between VLANs
Test egress NAT pools
Document final configuration

Migration Order (Recommended):

Management services (VLAN 11) - Already active
Monitoring/observability (VLAN 120, 121)
Besu network (VLANs 110, 111, 112)
CCIP network (VLANs 130, 132, 133, 134)
Service layer (VLAN 160)
Sovereign tenants (VLANs 200-203)

Related Documentation:

Network Architecture - VLAN Orchestration ⭐⭐⭐
Orchestration Deployment Guide - VLAN Enablement ⭐⭐⭐

Additional Common Questions (Expanded)

Q: How do I find which VMID uses a given IP?

Answer: See NETWORK_CONFIGURATION_MASTER.md for IP ranges by service type and VMID. Use pct list or qm list on the Proxmox host to list containers/VMs and their config (including IP).

Q: What's the difference between public and private RPC?

Answer: Public RPC (e.g. rpc-http-pub.d-bis.org) is exposed for external clients; may have rate limits and JWT. Private RPC (e.g. rpc-http-prv.d-bis.org) is for internal or trusted clients. See 05-network/CLOUDFLARE_ROUTING_MASTER.md for domain → backend mapping.

Q: Cloudflare tunnel not connecting – where do I start?

Answer: 1) Check cloudflared service on the tunnel host (VMID 102 or NPMplus). 2) Verify credentials and tunnel ID. 3) Check 04-configuration/cloudflare/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md and 05-network/CLOUDFLARE_ROUTING_MASTER.md. 4) Confirm NPMplus (192.168.11.167) is reachable from UDM Pro port forward.

Q: Recommended storage configuration for RPC nodes?

Answer: Use SSD for Besu data directory; avoid NFS for Besu unless tested. See 02-architecture/NETWORK_ARCHITECTURE.md and deployment guides for node layout. Run scripts/audit-proxmox-rpc-storage.sh to check restrictions.

Operational Procedures

OPERATIONAL_RUNBOOKS.md - Complete operational runbooks
QBFT_TROUBLESHOOTING.md - QBFT consensus troubleshooting
BESU_ALLOWLIST_QUICK_START.md - Allowlist troubleshooting

Deployment & Configuration

DEPLOYMENT_STATUS_CONSOLIDATED.md - Current deployment status
NETWORK_ARCHITECTURE.md - Network architecture reference
VALIDATED_SET_DEPLOYMENT_GUIDE.md - Deployment guide

Monitoring

MONITORING_SUMMARY.md - Monitoring setup
BLOCK_PRODUCTION_MONITORING.md - Block production monitoring

Reference

MASTER_INDEX.md - Complete documentation index

Last Updated: 2025-01-20
Version: 1.0

21 KiB Raw Blame History Unescape Escape

Troubleshooting FAQ

Table of Contents

Troubleshooting Flow (Decision Tree)

Container Issues

Q: Container won't start

Q: Container runs out of disk space

Q: Container network issues

Service Issues

Q: Besu service won't start

Q: Service starts but crashes

Q: Service shows as active but not responding

Network Issues

Q: Nodes cannot connect to peers

Q: Invalid enode URL errors

Q: RPC endpoint not accessible

Consensus Issues

Q: No blocks being produced

Q: Validator not participating in consensus

Configuration Issues

Q: Configuration file not found

Q: Invalid configuration syntax

Q: Path errors in configuration

Performance Issues

Q: High CPU usage

Q: High memory usage

Q: Slow sync or block processing

General Troubleshooting Commands

Getting Help

Additional Common Questions

Q: How do I add a new VMID?

Q: What's the difference between public and private RPC?

Q: How do I troubleshoot Cloudflare tunnel issues?

Q: What's the recommended storage configuration?

Q: How do I migrate from flat LAN to VLANs?

Additional Common Questions (Expanded)

Q: How do I find which VMID uses a given IP?

Q: What's the difference between public and private RPC?

Q: Cloudflare tunnel not connecting – where do I start?

Q: Recommended storage configuration for RPC nodes?

Related Documentation

Operational Procedures

Deployment & Configuration

Monitoring

Reference

21 KiB

Raw Blame History