Files
proxmox/docs/04-configuration/BESU_DEPLOYMENT_MONITORING.md
defiQUG fbda1b4beb
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:46:57 -08:00

11 KiB

Besu Configuration Deployment Monitoring Guide

Last Updated: 2026-01-31
Document Version: 1.0
Status: Active Documentation


Date: 2026-01-17
Purpose: Guide for monitoring Besu configuration deployments and verifying correct operation


Overview

After deploying cleaned Besu configurations to running nodes, monitor the deployment to ensure services start correctly, configuration changes are applied, and no issues arise.


Post-Deployment Monitoring Period

Recommended: 24-48 hours after deployment

Intensive Monitoring: First 4-6 hours
Standard Monitoring: 24-48 hours
Ongoing Monitoring: Regular health checks


Monitoring Checklist

Immediate (0-1 hour after deployment)

  • Verify all services started successfully
  • Check for configuration errors in logs
  • Verify no restart loops
  • Check logging levels are correct
  • Test RPC endpoints (if applicable)

Short-term (1-6 hours after deployment)

  • Monitor service status
  • Check for configuration-related errors
  • Verify network connectivity
  • Test consensus participation (validators)
  • Test archive queries (sentries)

Medium-term (6-48 hours after deployment)

  • Monitor resource usage (memory, CPU, disk)
  • Check peer connections
  • Verify sync status
  • Monitor for performance issues
  • Check metrics endpoints

Service Status Verification

Check Systemd Service Status

# For each node (example for validator 1000)
pct exec 1000 -- systemctl status besu-validator.service

# Check if service is active
pct exec 1000 -- systemctl is-active besu-validator.service
# Expected: "active"

# Check service logs
pct exec 1000 -- journalctl -u besu-validator.service -n 50 --no-pager

Verify No Restart Loops

# Check restart count (should be 0 or low after deployment)
pct exec 1000 -- systemctl show besu-validator.service | grep NRestart
# Expected: NRestart=0 or low number

# Check for frequent restarts
pct exec 1000 -- journalctl -u besu-validator.service --since "1 hour ago" | grep "Started\|Stopped" | tail -10

Configuration Verification

Verify Logging Levels

Validators and RPC: Should log at WARN level Sentry nodes: Should log at INFO level

# Check Besu logs for logging level (should show WARN or INFO)
pct exec 1000 -- journalctl -u besu-validator.service -n 20 | grep -i "log\|WARN\|INFO"

# Validators/RPC: Should see WARN-level messages (minimal logs)
# Sentries: Should see INFO-level messages (detailed logs)

Check for Configuration Errors

# Look for configuration errors
pct exec 1000 -- journalctl -u besu-validator.service | grep -i "error\|unknown option\|configuration"

# Should NOT see:
# - "Unknown options in TOML configuration file"
# - "Configuration error"
# - Deprecated option warnings

Functional Verification

Validator Nodes

Check Consensus Participation:

# Verify validator is synced
curl -X POST http://192.168.11.100:8545 \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'
# Expected: false (fully synced)

# Note: Validators have RPC disabled, so use internal tools or metrics

Check Metrics (validators enable metrics):

curl http://192.168.11.100:9545/metrics | grep besu_blocks_total

Sentry Nodes (Archive)

Check Archive Functionality:

# Test historical query (verify archive mode)
curl -X POST http://192.168.11.150:8545 \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_getBalance","params":["0x0000000000000000000000000000000000000000","0x100"],"id":1}'
# Should return historical balance (archive nodes only)

Check Sync Status:

curl -X POST http://192.168.11.150:8545 \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'
# Expected: false (fully synced)

RPC Nodes

Test RPC Endpoints:

# Test HTTP-RPC
curl -X POST http://192.168.11.250:8545 \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'

# Test chain ID
curl -X POST http://192.168.11.250:8545 \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
# Expected: "0x8a" (138 in hex)

Verify Logging Level (should be WARN, minimal logs):

# Check logs show minimal output (WARN level)
pct exec 2500 -- journalctl -u besu-rpc.service -n 20 --no-pager
# Should see mostly warnings/errors, not info messages

Network Connectivity

Peer Connections

Check Peer Count:

# Via metrics (if available)
curl http://192.168.11.150:9545/metrics | grep besu_peers

# Via logs (look for peer connection messages)
pct exec 1500 -- journalctl -u besu-sentry.service | grep -i "peer\|connected"

Expected:

  • Validators: Connected to sentries (and other validators)
  • Sentries: Connected to validators and external peers
  • RPC: Connected to internal peers (sentries/validators)

Performance Monitoring

Resource Usage

Memory Usage:

# Check Besu process memory
pct exec 1000 -- ps aux | grep besu | awk '{print $4,$11}'

# Check systemd memory limit
pct exec 1000 -- systemctl show besu-validator.service | grep MemoryMax

CPU Usage:

# Monitor CPU usage
pct exec 1000 -- top -bn1 | grep besu

Disk I/O:

# Check disk usage
pct exec 1500 -- df -h /data/besu

# Check database size
pct exec 1500 -- du -sh /data/besu/database/

Configuration Drift Detection

Compare Running Configs to Templates

# Use audit script
./scripts/audit-besu-configs.sh

# Manual comparison
# 1. Copy running config from node
pct exec 1000 -- cat /etc/besu/config-validator.toml > /tmp/running-config.toml

# 2. Compare to template
diff /tmp/running-config.toml smom-dbis-138-proxmox/templates/besu-configs/config-validator.toml

Expected: Running configs should match templates (after deployment)


Troubleshooting

Issue: Service Fails to Start

Symptoms:

  • Service status: failed or inactive
  • Frequent restarts
  • Configuration errors in logs

Diagnosis:

# Check service status
pct exec 1000 -- systemctl status besu-validator.service

# Check logs for errors
pct exec 1000 -- journalctl -u besu-validator.service -n 100 --no-pager

Common Causes:

  1. Configuration syntax error
  2. Deprecated options still present
  3. Invalid option values
  4. Missing required files (genesis.json, etc.)

Resolution:

  1. Validate config with validate-besu-config.sh
  2. Check for deprecated options
  3. Review Besu logs for specific errors
  4. Restore from backup if needed

Issue: Configuration Not Applied

Symptoms:

  • Logging level unchanged
  • Service running but with old settings

Diagnosis:

# Check if config file was updated
pct exec 1000 -- stat /etc/besu/config-validator.toml

# Check actual logging level in Besu logs
pct exec 1000 -- journalctl -u besu-validator.service | grep -i "logging\|WARN\|INFO"

Resolution:

  1. Verify config file was copied correctly
  2. Ensure service was restarted after config update
  3. Check for file permission issues
  4. Verify Besu is reading correct config file

Issue: Logging Level Incorrect

Symptoms:

  • Validators showing INFO logs (should be WARN)
  • RPC nodes showing INFO logs (should be WARN)
  • Sentries showing WARN logs (should be INFO)

Diagnosis:

# Check config file logging setting
pct exec 1000 -- grep "^logging" /etc/besu/config-validator.toml
# Expected: logging="WARN" for validators

# Check actual log output
pct exec 1000 -- journalctl -u besu-validator.service -n 20
# Should see minimal logs (WARN level)

Resolution:

  1. Verify config file has correct logging="WARN" or logging="INFO"
  2. Ensure service was restarted
  3. Clear log cache if needed: journalctl --vacuum-time=1s

Monitoring Scripts

Automated Monitoring

Create monitoring script to check all nodes:

#!/bin/bash
# monitor-besu-deployment.sh

NODES=(1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502)

for vmid in "${NODES[@]}"; do
    echo "Checking VMID $vmid..."
    
    # Check service status
    status=$(pct exec $vmid -- systemctl is-active besu-*.service 2>/dev/null || echo "unknown")
    echo "  Service status: $status"
    
    # Check for errors in logs
    errors=$(pct exec $vmid -- journalctl -u besu-*.service --since "1 hour ago" | grep -i "error" | wc -l)
    echo "  Errors in last hour: $errors"
    
    # Check restart count
    restarts=$(pct exec $vmid -- systemctl show besu-*.service | grep NRestart | cut -d= -f2 | head -1)
    echo "  Restart count: $restarts"
done

Success Criteria

Deployment Successful If:

All services running:

  • Systemd status: active
  • No restart loops
  • Services stable for 24+ hours

Configuration applied:

  • Logging levels correct (WARN for validators/RPC, INFO for sentries)
  • No deprecated options in use
  • All configs match templates

Functionality verified:

  • Validators participating in consensus
  • Sentries providing archive queries
  • RPC nodes serving API requests
  • Network connectivity normal

No errors:

  • No configuration errors in logs
  • No "Unknown options" errors
  • Services starting cleanly

Monitoring Timeline

Hour 0-1: Immediate Verification

  • Service status
  • Configuration errors
  • Basic functionality

Hour 1-6: Intensive Monitoring

  • Service stability
  • Performance metrics
  • Network connectivity
  • Detailed verification

Hour 6-24: Standard Monitoring

  • Ongoing health checks
  • Resource usage
  • Performance trends

Day 2+: Ongoing Monitoring

  • Regular health checks
  • Performance monitoring
  • Configuration drift detection

Post-Deployment Checklist

  • All services running (validators, sentries, RPC)
  • No configuration errors in logs
  • Logging levels correct (WARN/INFO as appropriate)
  • No restart loops
  • Validators participating in consensus
  • Sentries providing archive queries
  • RPC nodes serving API requests
  • Network connectivity normal
  • Peer connections healthy
  • Resource usage within expected ranges
  • Configuration drift: None detected

  • scripts/deploy-besu-configs.sh - Deployment script
  • scripts/audit-besu-configs.sh - Configuration audit
  • scripts/validate-besu-config.sh - Configuration validation
  • docs/04-configuration/BESU_CONFIGURATION_GUIDE.md - Configuration reference

Last Updated: 2026-01-17
Status: Monitoring Guide