Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
- Config, docs, scripts, and backup manifests - Submodule refs unchanged (m = modified content in submodules) Made-with: Cursor
4.1 KiB
4.1 KiB
Concrete Next Steps: RPC 2101 and Storage (thin5 / data)
Last updated: 2026-02-28
1. VMID 2101 (Core RPC) — RPC not responding
Symptom: Container running, besu-rpc active, but RPC (e.g. eth_blockNumber) returns no response from 192.168.11.211:8545.
Run order (from project root, on LAN with SSH to r630-01)
| Step | Action | Command |
|---|---|---|
| 1 | Diagnose | bash scripts/maintenance/health-check-rpc-2101.sh |
| 2a | If read-only / database not writable | bash scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh (then re-run step 1) |
| 2b | If JNA / NoClassDefFoundError in logs | bash scripts/maintenance/fix-rpc-2101-jna-reinstall.sh (then step 3) |
| 3 | Fix (start CT if needed, restart Besu, verify) | bash scripts/maintenance/fix-core-rpc-2101.sh |
| 4 | Verify | bash scripts/health/check-rpc-vms-health.sh — 2101 should show block number |
Optional: fix-core-rpc-2101.sh --restart-only if the container is already running and you only want to restart the Besu service.
Docs: docs/09-troubleshooting/RPC_NODES_BLOCK_PRODUCTION_FIX.md, docs/03-deployment/RPC_2101_READONLY_FIX.md (if present).
2. r630-02 thin5 — 84.6% used (monitor / reduce)
Risk: thin5 is approaching the 85% WARN threshold; LVM thin pools can become slow or fail above ~90%.
Immediate
| Step | Action | Command / notes |
|---|---|---|
| 1 | See which containers use thin5 | On r630-02: `ssh root@192.168.11.12 'pct list; for v in $(pct list 2>/dev/null |
| 2 | Check disk usage inside those CTs | bash scripts/maintenance/check-disk-all-vmids.sh — find VMIDs on r630-02 with high % |
| 3 | Free space inside CTs (Besu/DB, logs) | Per VMID: pct exec <vmid> -- du -sh /data /var/log 2>/dev/null; prune logs, old snapshots, or Besu temp if safe |
| 4 | Optional: migrate one CT to another thin | If thin5 stays high: backup CT, restore to thin2/thin3/thin4/thin6 (e.g. pct restore <vmid> /path/to/dump --storage thin2) |
Ongoing
| Step | Action | Command / notes |
|---|---|---|
| 5 | Track growth | bash scripts/monitoring/collect-storage-growth-data.sh --append (or install cron: bash scripts/maintenance/schedule-storage-growth-cron.sh --install) |
| 6 | Prune old snapshots (on host) | bash scripts/monitoring/prune-storage-snapshots.sh (weekly; keeps last 30 days) |
3. r630-01 data / local-lvm — 71.9% used (monitor)
Risk: Still healthy; monitor so it does not reach 85%+.
Immediate
| Step | Action | Command / notes |
|---|---|---|
| 1 | Snapshot + growth check | bash scripts/monitoring/collect-storage-growth-data.sh — review logs/storage-growth/ |
| 2 | Identify large CTs on r630-01 | bash scripts/maintenance/check-disk-all-vmids.sh — ml110 + r630-01; VMIDs 2101, 2500–2505 are on r630-01 |
Ongoing
| Step | Action | Command / notes |
|---|---|---|
| 3 | Same as thin5 | Use schedule-storage-growth-cron.sh --install for weekly collection + prune |
| 4 | Before new deployments | Re-run bash scripts/audit-proxmox-rpc-storage.sh and check data% / local-lvm% |
Quick reference
| Item | Script | Purpose |
|---|---|---|
| 2101 health | scripts/maintenance/health-check-rpc-2101.sh |
Diagnose Core RPC |
| 2101 fix | scripts/maintenance/fix-core-rpc-2101.sh |
Restart Besu, verify RPC |
| 2101 read-only | scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh |
e2fsck RPC VMIDs on r630-01 |
| 2101 JNA | scripts/maintenance/fix-rpc-2101-jna-reinstall.sh |
Reinstall Besu in 2101 |
| Storage audit | scripts/audit-proxmox-rpc-storage.sh |
All hosts + RPC rootfs mapping |
| Disk in CTs | scripts/maintenance/check-disk-all-vmids.sh |
Root / usage per running CT |
| Storage growth | scripts/monitoring/collect-storage-growth-data.sh |
Snapshot pvesm/lvs/df |
| Growth cron | scripts/maintenance/schedule-storage-growth-cron.sh --install |
Weekly collect + prune |