Files
proxmox/docs/09-troubleshooting/TROUBLESHOOTING_FAQ.md

12 KiB

Troubleshooting FAQ

Common issues and solutions for Besu validated set deployment.

Table of Contents

  1. Container Issues
  2. Service Issues
  3. Network Issues
  4. Consensus Issues
  5. Configuration Issues
  6. Performance Issues

Container Issues

Q: Container won't start

Symptoms: pct status <vmid> shows "stopped" or errors during startup

Solutions:

# Check container status
pct status <vmid>

# View container console
pct console <vmid>

# Check logs
journalctl -u pve-container@<vmid>

# Check container configuration
pct config <vmid>

# Try starting manually
pct start <vmid>

Common Causes:

  • Insufficient resources (RAM, disk)
  • Network configuration errors
  • Invalid container configuration
  • OS template issues

Q: Container runs out of disk space

Symptoms: Services fail, "No space left on device" errors

Solutions:

# Check disk usage
pct exec <vmid> -- df -h

# Check Besu database size
pct exec <vmid> -- du -sh /data/besu/database/

# Clean up old logs
pct exec <vmid> -- journalctl --vacuum-time=7d

# Increase disk size (if using LVM)
pct resize <vmid> rootfs +10G

Q: Container network issues

Symptoms: Cannot ping, cannot connect to services

Solutions:

# Check network configuration
pct config <vmid> | grep net0

# Check if container has IP
pct exec <vmid> -- ip addr show

# Check routing
pct exec <vmid> -- ip route

# Restart container networking
pct stop <vmid>
pct start <vmid>

Service Issues

Q: Besu service won't start

Symptoms: systemctl status besu-validator shows failed

Solutions:

# Check service status
pct exec <vmid> -- systemctl status besu-validator

# View service logs
pct exec <vmid> -- journalctl -u besu-validator -n 100

# Check for configuration errors
pct exec <vmid> -- besu --config-file=/etc/besu/config-validator.toml --help

# Verify configuration file syntax
pct exec <vmid> -- cat /etc/besu/config-validator.toml

Common Causes:

  • Missing configuration files
  • Invalid configuration syntax
  • Missing validator keys
  • Port conflicts
  • Insufficient resources

Q: Service starts but crashes

Symptoms: Service starts then stops, high restart count

Solutions:

# Check crash logs
pct exec <vmid> -- journalctl -u besu-validator --since "10 minutes ago"

# Check for out of memory
pct exec <vmid> -- dmesg | grep -i "out of memory"

# Check system resources
pct exec <vmid> -- free -h
pct exec <vmid> -- df -h

# Check JVM heap settings
pct exec <vmid> -- cat /etc/systemd/system/besu-validator.service | grep BESU_OPTS

Q: Service shows as active but not responding

Symptoms: Service status shows "active" but RPC/P2P not responding

Solutions:

# Check if process is actually running
pct exec <vmid> -- ps aux | grep besu

# Check if ports are listening
pct exec <vmid> -- netstat -tuln | grep -E "30303|8545|9545"

# Check firewall rules
pct exec <vmid> -- iptables -L -n

# Test connectivity
pct exec <vmid> -- curl -s http://localhost:8545

Network Issues

Q: Nodes cannot connect to peers

Symptoms: Low or zero peer count, "No peers" in logs

Solutions:

# Check static-nodes.json
pct exec <vmid> -- cat /etc/besu/static-nodes.json

# Check permissions-nodes.toml
pct exec <vmid> -- cat /etc/besu/permissions-nodes.toml

# Verify enode URLs are correct
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode

# Check P2P port is open
pct exec <vmid> -- netstat -tuln | grep 30303

# Test connectivity to peer
pct exec <vmid> -- ping -c 3 <peer-ip>

Common Causes:

  • Incorrect enode URLs in static-nodes.json
  • Firewall blocking P2P port (30303)
  • Nodes not in permissions-nodes.toml
  • Network connectivity issues

Q: Invalid enode URL errors

Symptoms: "Invalid enode URL syntax" or "Invalid node ID" in logs

Solutions:

# Check node ID length (must be 128 hex chars)
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode | \
    sed 's|^enode://||' | cut -d'@' -f1 | wc -c

# Should output 129 (128 chars + newline)

# Fix node IDs using allowlist scripts
./scripts/besu-collect-all-enodes.sh
./scripts/besu-generate-allowlist.sh
./scripts/besu-deploy-allowlist.sh

Q: RPC endpoint not accessible

Symptoms: Cannot connect to RPC on port 8545

Solutions:

# Check if RPC is enabled (validators typically don't have RPC)
pct exec <vmid> -- grep -i "rpc-http-enabled" /etc/besu/config-*.toml

# Check if RPC port is listening
pct exec <vmid> -- netstat -tuln | grep 8545

# Check firewall
pct exec <vmid> -- iptables -L -n | grep 8545

# Test from container
pct exec <vmid> -- curl -X POST -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
    http://localhost:8545

# Check host allowlist in config
pct exec <vmid> -- grep -i "host-allowlist\|rpc-http-host" /etc/besu/config-*.toml

Consensus Issues

Q: No blocks being produced

Symptoms: Block height not increasing, "No blocks" in logs

Solutions:

# Check validator service is running
pct exec <vmid> -- systemctl status besu-validator

# Check validator keys
pct exec <vmid> -- ls -la /keys/validators/

# Check consensus logs
pct exec <vmid> -- journalctl -u besu-validator | grep -i "consensus\|qbft\|proposing"

# Verify validators are in genesis (if static validators)
pct exec <vmid> -- cat /etc/besu/genesis.json | grep -A 20 "qbft"

# Check peer connectivity
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
    http://localhost:8545

Common Causes:

  • Validator keys missing or incorrect
  • Not enough validators online
  • Network connectivity issues
  • Consensus configuration errors

Q: Validator not participating in consensus

Symptoms: Validator running but not producing blocks

Solutions:

# Verify validator address
pct exec <vmid> -- cat /keys/validators/validator-*/address.txt

# Check if address is in validator contract (for dynamic validators)
# Or check genesis.json (for static validators)
pct exec <vmid> -- cat /etc/besu/genesis.json | python3 -m json.tool | grep -A 10 "qbft"

# Verify validator keys are loaded
pct exec <vmid> -- journalctl -u besu-validator | grep -i "validator.*key"

# Check for permission errors
pct exec <vmid> -- journalctl -u besu-validator | grep -i "permission\|denied"

Configuration Issues

Q: Configuration file not found

Symptoms: "File not found" errors, service won't start

Solutions:

# List all config files
pct exec <vmid> -- ls -la /etc/besu/

# Verify required files exist
pct exec <vmid> -- test -f /etc/besu/genesis.json && echo "genesis.json OK" || echo "genesis.json MISSING"
pct exec <vmid> -- test -f /etc/besu/config-validator.toml && echo "config OK" || echo "config MISSING"

# Copy missing files
# (Use copy-besu-config.sh script)
./scripts/copy-besu-config.sh /path/to/smom-dbis-138

Q: Invalid configuration syntax

Symptoms: "Invalid option" or syntax errors in logs

Solutions:

# Validate TOML syntax
pct exec <vmid> -- python3 -c "import tomllib; open('/etc/besu/config-validator.toml').read()" 2>&1

# Validate JSON syntax
pct exec <vmid> -- python3 -m json.tool /etc/besu/genesis.json > /dev/null

# Check for deprecated options
pct exec <vmid> -- journalctl -u besu-validator | grep -i "deprecated\|unknown option"

# Review Besu documentation for current options

Q: Path errors in configuration

Symptoms: "File not found" errors with paths like "/config/genesis.json"

Solutions:

# Check configuration file paths
pct exec <vmid> -- grep -E "genesis-file|data-path" /etc/besu/config-validator.toml

# Correct paths should be:
# genesis-file="/etc/besu/genesis.json"
# data-path="/data/besu"

# Fix paths if needed
pct exec <vmid> -- sed -i 's|/config/|/etc/besu/|g' /etc/besu/config-validator.toml

Performance Issues

Q: High CPU usage

Symptoms: Container CPU usage > 80% consistently

Solutions:

# Check CPU usage
pct exec <vmid> -- top -bn1 | head -20

# Check JVM GC activity
pct exec <vmid> -- journalctl -u besu-validator | grep -i "gc\|pause"

# Adjust JVM settings if needed
# Edit /etc/systemd/system/besu-validator.service
# Adjust BESU_OPTS and JAVA_OPTS

# Consider allocating more CPU cores
pct set <vmid> --cores 4

Q: High memory usage

Symptoms: Container running out of memory, OOM kills

Solutions:

# Check memory usage
pct exec <vmid> -- free -h

# Check JVM heap settings
pct exec <vmid> -- ps aux | grep besu | grep -oP 'Xm[xs]\K[0-9]+[gm]'

# Reduce heap size if too large
# Edit /etc/systemd/system/besu-validator.service
# Adjust BESU_OPTS="-Xmx4g" to appropriate size

# Or increase container memory
pct set <vmid> --memory 8192

Q: Slow sync or block processing

Symptoms: Blocks processing slowly, falling behind

Solutions:

# Check database size and health
pct exec <vmid> -- du -sh /data/besu/database/

# Check disk I/O
pct exec <vmid> -- iostat -x 1 5

# Consider using SSD storage
# Check network latency
pct exec <vmid> -- ping -c 10 <peer-ip>

# Verify sufficient peers
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
    http://localhost:8545 | python3 -c "import sys, json; print(len(json.load(sys.stdin).get('result', [])))"

General Troubleshooting Commands

# View all container statuses
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do
    echo "=== Container $vmid ==="
    pct status $vmid
done

# Check all service statuses
for vmid in 1000 1001 1002 1003 1004; do
    pct exec $vmid -- systemctl status besu-validator --no-pager -l | head -10
done

# View recent logs from all nodes
for vmid in 1000 1001 1002 1003 1004; do
    echo "=== Logs for container $vmid ==="
    pct exec $vmid -- journalctl -u besu-validator -n 20 --no-pager
done

# Check network connectivity between nodes
pct exec 1000 -- ping -c 3 192.168.11.14  # validator to validator

# Verify RPC endpoint (RPC nodes only)
pct exec 2500 -- curl -s -X POST -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
    http://localhost:8545 | python3 -m json.tool

Getting Help

If issues persist:

  1. Collect Information:

    • Service logs: journalctl -u besu-validator -n 100
    • Container status: pct status <vmid>
    • Configuration: pct exec <vmid> -- cat /etc/besu/config-validator.toml
    • Network: pct exec <vmid> -- ip addr show
  2. Check Documentation:

  3. Validate Configuration:

    • Run prerequisites check: ./scripts/validation/check-prerequisites.sh
    • Validate validators: ./scripts/validation/validate-validator-set.sh
  4. Review Logs:

    • Check deployment logs: logs/deploy-validated-set-*.log
    • Check service logs in containers
    • Check Proxmox host logs

Operational Procedures

Deployment & Configuration

Monitoring

Reference


Last Updated: 2025-01-20
Version: 1.0