Co-authored-by: Cursor <cursoragent@cursor.com>
4.6 KiB
RPC and Validator Testing — Runbook
Last Updated: 2026-02-18
Purpose: Single flow to fix and test the Chain 138 Core RPC (VMID 2101) and the 5 validators (1000–1004). Run from repo root with SSH to r630-01 (192.168.11.11) and ml110 (192.168.11.10).
Quick verification (no SSH for RPC-only)
From anywhere with access to http://192.168.11.211:8545:
./scripts/verify/verify-rpc-2101-approve-and-sync.sh
./scripts/monitoring/monitor-blockchain-health.sh
All possible peers vs connected:
./scripts/verify/check-rpc-2101-all-peers.sh
Lists connected peer IPs and allowlist IPs that are not yet connected (source: config/besu-node-lists/permissions-nodes.toml).
Peer topology and plan: See PEER_CONNECTIONS_PLAN.md for peer counts by node, the 9 IPs not connected to 2101, and a plan (allowlist cleanup, 2101↔2102, 2201 P2P, optional more peers).
Full fix and test sequence (from LAN with SSH)
Run in order. Allow 2–5 minutes after restarts for validators to become active and block production to resume.
1. Make validator VMIDs writable (if validators crash with "Read-only file system" / JNA)
Validators 1000–1004 can remount read-only after ext4 errors; Besu then fails with UnsatisfiedLinkError: Read-only file system when writing JNA temp files.
./scripts/maintenance/make-validator-vmids-writable-via-ssh.sh
Then restart validators (step 3).
2. Deploy node lists (permissions + static nodes) to all nodes including RPC 2101
Ensures RPC 2101 and all validators have the same allowlist (all 5 validators, sentries, RPCs).
./scripts/deploy-besu-node-lists-to-all.sh
3. Validator permissioning and restart (validators only; uses /var/lib/besu/)
./scripts/fix-validator-permissioning-toml.sh
4. Validator tx-pool config and restart
./scripts/fix-all-validators-and-txpool.sh
5. Restart RPC 2101 (reload node lists)
./scripts/maintenance/fix-core-rpc-2101.sh --restart-only
6. Make RPC VMIDs writable (if RPC 2101 or 2500–2505 have read-only issues)
./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh
Then re-run step 5 if needed.
7. Verify and monitor
Wait 1–2 minutes, then:
./scripts/verify/verify-rpc-2101-approve-and-sync.sh
./scripts/monitoring/monitor-blockchain-health.sh
Expected: RPC Chain 138, ≥5 peers, all 5 validator IPs in peer list (or 24+ peers), block production advancing, 5/5 validators active. Block production can take 3–5 minutes to resume after validator restarts; if still stalled with 5/5 active, see BLOCK_PRODUCTION_FIX_RUNBOOK.md and check validator logs for "Proposed block" / QBFT.
One-shot script (optional)
To run the full fix sequence in one go (no validator writable step; add manually if needed):
./scripts/deploy-besu-node-lists-to-all.sh && \
./scripts/fix-validator-permissioning-toml.sh && \
./scripts/fix-all-validators-and-txpool.sh && \
./scripts/maintenance/fix-core-rpc-2101.sh --restart-only
Then wait 90s and run ./scripts/verify/verify-rpc-2101-approve-and-sync.sh and ./scripts/monitoring/monitor-blockchain-health.sh.
Troubleshooting
| Symptom | Action |
|---|---|
| Validator "activating" or crash-loop | Check logs: ssh root@192.168.11.11 "pct exec 1002 -- journalctl -u besu-validator -n 50 --no-pager". If "Read-only file system" or JNA: run make-validator-vmids-writable-via-ssh.sh then restart validators. |
| Only 2/5 validator IPs in RPC peers | RPC may connect via sentries; 24 peers is OK. To get all 5 validator IPs in peer list, ensure config/besu-node-lists/permissions-nodes.toml includes all 5 and run deploy-besu-node-lists-to-all.sh, then restart RPC 2101. |
| Block production stalled | Ensure 4/5 or 5/5 validators active (QBFT quorum). Run fix-validator-permissioning-toml.sh and fix-all-validators-and-txpool.sh; if validators are read-only, run make-validator-vmids-writable-via-ssh.sh first. |
| RPC 2101 not responding | Run ./scripts/maintenance/health-check-rpc-2101.sh then ./scripts/maintenance/fix-core-rpc-2101.sh. |
References
- BLOCK_PRODUCTION_FIX_RUNBOOK.md
- STUCK_TX_ROOT_CAUSE_AND_GUARDRAILS.md
- ../09-troubleshooting/RPC_NODES_BLOCK_PRODUCTION_FIX.md
- OPTIONAL_DEPLOYMENTS_START_HERE.md § Fix stuck transaction, step 4 (verify RPC 2101)