Files
proxmox/docs/09-troubleshooting/RPC_NODES_BLOCK_PRODUCTION_FIX.md
defiQUG bea1903ac9
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
Sync all local changes: docs, config, scripts, submodule refs, verification evidence
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-21 15:46:06 -08:00

11 KiB
Raw Permalink Blame History

RPC Nodes Block Production — Fix Runbook

Purpose: Fix RPC nodes that do not respond or report block 0 so all RPCs see chain 138 and current block production.

Core Besu RPC (VMID 2101) — quick fix and full runbook

VMID 2101 is the Chain 138 Core RPC (admin/deploy; RPC_URL_138 = http://192.168.11.211:8545). It runs on r630-01 (192.168.11.11).

Health check (run first to see status):

./scripts/maintenance/health-check-rpc-2101.sh

One-command fix (from a host with SSH to r630-01):

./scripts/maintenance/fix-core-rpc-2101.sh

Options: --dry-run (print actions only); --restart-only (skip starting the container; only restart Besu inside CT). If Besu fails with JNA/NoClassDefFoundError, run ./scripts/maintenance/fix-rpc-2101-jna-reinstall.sh then re-run the fix script.

Manual steps (if script cannot be used):

  1. SSH to r630-01: ssh root@192.168.11.11
  2. Start container if stopped: pct start 2101
  3. Inside CT, start/restart Besu: pct exec 2101 -- systemctl restart besu-rpc or pct exec 2101 -- systemctl start besu
  4. Wait 3060s, then verify: curl -s -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' http://192.168.11.211:8545/

Ping works but curl to :8545 doesnt?

If you can ping the Core RPC IP (e.g. 192.168.11.211) but curl to http://192.168.11.211:8545 returns nothing or fails:

  1. Use POST with JSON-RPC — Besus HTTP RPC expects a POST with a JSON body. A plain curl http://...:8545 (GET) often returns nothing or empty. Always test with:

    curl -s -X POST -H "Content-Type: application/json" \
      -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' \
      http://192.168.11.211:8545
    

    Expect something like {"jsonrpc":"2.0","id":1,"result":"0x8a"} for chain 138.

  2. Check Besu is running on the RPC host (VMID 2101): from the Proxmox host (e.g. r630-01), run pct exec 2101 -- systemctl status besu-rpc. If its down, start/restart as in the manual steps above.

  3. Check firewall — Ensure TCP port 8545 is allowed from the client to the RPC host. From the client: nc -zv 192.168.11.211 8545 (or telnet 192.168.11.211 8545). If it doesnt connect, a firewall (host or network) is likely blocking.

  4. Config binding — Besu config should have rpc-http-host="0.0.0.0" and rpc-http-port=8545 so it listens on all interfaces. If you changed it to 127.0.0.1, remote curl will not reach it.

VMID 2101: All steps to fix and prevent

  1. Check containerssh root@192.168.11.11 "pct status 2101"; if not running, pct start 2101.
  2. Check diskpct exec 2101 -- df -h / and pct exec 2101 -- du -sh /data/besu. If near full, free space or resize CT disk (see Common fix #8).
  3. Check Besu servicepct exec 2101 -- systemctl status besu-rpc. If failing, pct exec 2101 -- journalctl -u besu-rpc -n 80 --no-pager.
  4. Fix "No space left" — Free space or resize LV; then systemctl restart besu-rpc (see #8).
  5. Fix JNA / NoClassDefFoundError — Run ./scripts/maintenance/fix-rpc-2101-jna-reinstall.sh (reinstalls Besu in CT), then ./scripts/maintenance/fix-core-rpc-2101.sh (see Common fix #9).
  6. Verify RPCcurl -s -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' http://192.168.11.211:8545/ → expect "result":"0x8a".
  7. Prevent recurrence — Run ./scripts/maintenance/check-disk-all-vmids.sh and ./scripts/storage-monitor.sh regularly; alert when root or /data/besu usage > 85%.

See also: 502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md (rpc-http-prv).

Quick status check

# From project root; requires curl and network to 192.168.11.x
for entry in 2101:192.168.11.211 2102:192.168.11.212 2201:192.168.11.221 2301:192.168.11.232 2303:192.168.11.233 2304:192.168.11.234 2305:192.168.11.235 2306:192.168.11.236 2400:192.168.11.240 2401:192.168.11.241 2402:192.168.11.242 2403:192.168.11.243 2500:192.168.11.172 2501:192.168.11.173 2502:192.168.11.174 2503:192.168.11.246 2504:192.168.11.247 2505:192.168.11.248; do
  vmid="${entry%%:*}"; ip="${entry#*:}"
  r=$(curl -s -m 3 -X POST -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' -H "Content-Type: application/json" "http://${ip}:8545" 2>/dev/null)
  echo "$vmid $ip: ${r:-no response}"
done

Fixes applied (2026-02-09)

VMID Issue Fix
2102 "Host not authorized" Added host-allowlist=["*"] to /etc/besu/config-rpc.toml, restarted besu-rpc.service.
2201 Unknown option tx-pool-min-score Removed line from config, restarted besu-rpc.service.
2303 Wrong permissions path + tx-pool-min-score Set permissions-nodes-config-file="/etc/besu/permissions-nodes.toml", removed tx-pool-min-score, restarted.
2301 Block 0 (syncing) No config change; node is syncing. Wait or check peers.
2401 discovery + allowlist + paths Set static-nodes/permissions/genesis to /etc/besu/, discovery-enabled=false. Still failing: genesis mismatch with existing /data/besu — either restore original genesis or resync (clear /data/besu and restart).
25002505 besu.service: /opt/besu/bin/besu missing or config errors 2500: Installed Besu 23.10.3 to /opt, fixed config (removed qbft-enabled, log-destination, rpc-http-api-enable-unsafe-txsigning, fast-sync-min-peers, PERSONAL/MINER from API). Still failing: "Supplied file does not contain valid keyPair" (nodekey). 25012505: Same pattern — ensure /opt/besu/bin/besu exists (run fix-besu-installation.sh or install tarball), fix config.toml for Besu 23.10, ensure genesis.json and valid nodekey.

Common fixes

1. Host not authorized (RPC returns JSON "Host not authorized")

Add to the nodes Besu TOML config (e.g. /etc/besu/config-rpc.toml):

host-allowlist=["*"]

Then: systemctl restart besu-rpc.service (or besu.service).

2. Unknown option tx-pool-min-score

Remove the line from the config (not supported in some Besu versions):

pct exec VMID -- sed -i '/tx-pool-min-score/d' /etc/besu/*.toml
pct exec VMID -- systemctl restart besu-rpc.service

3. Wrong permissions or static-nodes path

Ensure config uses /etc/besu/:

  • permissions-nodes-config-file="/etc/besu/permissions-nodes.toml"
  • static-nodes-file="/etc/besu/static-nodes.json"
  • genesis-file="/etc/besu/genesis.json"

Redeploy canonical lists: bash scripts/deploy-besu-node-lists-to-all.sh.

4. Discovery vs allowlist

If you see "Specified node(s) not in nodes-allowlist", either add those enodes to permissions-nodes.toml and redeploy, or set discovery-enabled=false so the node only uses static-nodes (all of which must be in the allowlist).

5. Besu binary missing (/opt/besu/bin/besu)

On containers that lack Besu (15051508 sentries, 25012505 RPCs):

  • Permanent install (recommended):
    bash scripts/besu/install-besu-permanent-on-missing-nodes.sh
    Installs Besu 23.10.3 in each CT (download inside container), deploys config/genesis/node lists, enables and starts the service. Sentries get besu-sentry.service, RPCs get besu.service + config.toml. Allow ~510 minutes per node (first run installs Java + Besu). Use --dry-run to see which VMIDs would be updated.

  • Legacy (tarball already in CT):
    scripts/fix-besu-installation.sh (expects tarball in each container /opt).

6. Genesis mismatch ("Supplied genesis block does not match chain data")

Either:

  • Restore the original genesis file that matches existing /data/besu, or
  • Resync from block 0: back up then remove /data/besu (or use a new data-path), set correct genesis, restart.

7. Invalid keyPair / nodekey

Ensure the node has a valid nodekey (e.g. /data/besu/nodekey). If the config references a key file, fix the path or regenerate (Besu can create nodekey on first run if data-path is empty).

8. No space left on device (RocksDB in /data/besu)

If Besu fails with RocksDBException: ... No space left on device for a file under /data/besu/database/:

  • Immediate: Free space: remove old logs, temporary files, or snapshots inside the CT; or from the Proxmox host, resize the CTs disk (e.g. pct resize 2101 rootfs +20G if the storage allows).
  • Inside CT: df -h / and du -sh /data/besu; clear caches or old chain data only if you accept resync: e.g. rm -rf /data/besu/caches/* (Besu will recreate). Do not delete /data/besu/database unless you intend to resync from genesis.
  • Then: systemctl restart besu-rpc (or besu.service).

9. JNA / NoClassDefFoundError (Besu fails to start)

If journalctl -u besu-rpc shows NoClassDefFoundError: Could not initialize class com.sun.jna.Native or "Did not find Udev class from JNA":

  • Usually a Java/Besu or classpath issue (conflicting JNA, or broken install). Options: reinstall Besu in the CT (same or supported version), or ensure a single consistent JNA on the classpath; upgrade Java if its outdated.
  • On VMID 2101: run ./scripts/maintenance/fix-rpc-2101-jna-reinstall.sh (reinstalls Besu in the CT to fix JNA), then ./scripts/maintenance/fix-core-rpc-2101.sh. Or reinstall Besu manually per BESU_PATH_REFERENCE.md and restart the service.

Scripts

  • Health check VMID 2101: scripts/maintenance/health-check-rpc-2101.sh — container, besu-rpc service, port 8545, eth_chainId, eth_blockNumber. Run from LAN.
  • Fix Core RPC 2101: scripts/maintenance/fix-core-rpc-2101.sh — start CT if needed, restart Besu, verify RPC.
  • Fix 2101 JNA (reinstall Besu): scripts/maintenance/fix-rpc-2101-jna-reinstall.sh [--dry-run] — when Besu fails with NoClassDefFoundError/JNA; then re-run fix-core-rpc-2101.sh.
  • Check disk in all VMIDs: scripts/maintenance/check-disk-all-vmids.sh [--csv] — root filesystem usage for every running container on ml110, r630-01, r630-02. Use for prevention and audits.
  • Host-level storage: scripts/storage-monitor.sh — Proxmox storage and volume groups; alerts at 80%/90%.
  • Deploy node lists to all: scripts/deploy-besu-node-lists-to-all.sh
  • Verify lists on all: scripts/verify/verify-static-permissions-on-all-besu-nodes.sh --checksum
  • Restart Besu on all: scripts/besu/restart-besu-reload-node-lists.sh
  • Install Besu permanently on nodes missing it (15051508, 25012505): scripts/besu/install-besu-permanent-on-missing-nodes.sh (no tarball needed; downloads inside each CT).
  • Fix Besu install when tarball already in CT: scripts/fix-besu-installation.sh.

Reference