Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
Co-authored-by: Cursor <cursoragent@cursor.com>
5.4 KiB
5.4 KiB
Maintenance scripts review
Date: 2026-02-15
Scope: RPC/502 fix flow, writability step, runner, and related docs.
1. Flow overview
| Step | Script | Purpose |
|---|---|---|
| 0 | make-rpc-vmids-writable-via-ssh.sh |
Stop 2101, 2500–2505 on r630-01; e2fsck rootfs; start; verify /tmp writable |
| 1 | resolve-and-fix-all-via-proxmox-ssh.sh |
Dev VM IP .59, start containers, DBIS services (r630-01, ml110) |
| 2 | fix-rpc-2101-jna-reinstall.sh |
Reinstall Besu in 2101 (JNA fix), use /tmp in CT, set java.io.tmpdir=/data/besu/tmp |
| 3 | install-besu-permanent-on-missing-nodes.sh |
Install Besu on 1505–1508 (ml110), 2500–2505 (r630-01) where missing |
| 4 | address-all-remaining-502s.sh |
fix-all-502s-comprehensive + NPM proxy update + RPC diagnostics |
| 5 | verify-end-to-end-routing.sh |
E2E (optional via --e2e) |
Single entry point: ./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh [--no-npm] [--e2e] [--dry-run]
2. What works well
- Writability first: Step 0 fixes read-only root (ext4 errors) so steps 2 and 3 can write to CTs. All seven RPC VMIDs (2101, 2500–2505) are handled on r630-01.
- Clear ordering: Make writable → resolve/start → fix 2101 → install Besu on missing → address 502s → E2E. Dependencies are respected.
- Config-driven: Hosts and IPs come from
config/ip-addresses.conf(PROXMOX_HOST_R630_01, etc.). - Idempotent / skip logic: resolve-and-fix skips if already correct; install-besu-permanent skips VMIDs that already have
/opt/besu/bin/besu. - Docs linked: 502_DEEP_DIVE (§ Read-only CT), CHECK_ALL_UPDATES (§9 Remaining fixes), maintenance README all reference the runner and make-writable script.
- JNA tmpdir: Standalone installer and 2101 fix set
-Djava.io.tmpdir=/data/besu/tmpso Besu/JNA work when/tmpis restricted. - Apt resilience: Standalone installer allows
apt-get updateto fail (e.g. command-not-found I/O error) and still requiresjavaandwgetbefore continuing.
3. Gaps and risks
- Step 2 (2101) can be slow: Apt install inside the CT can take 5–15+ minutes; the runner has no per-step timeout, so the whole run can appear to hang at “Installing packages…”.
- Errors hidden: The runner uses
2>/dev/nullon each step and only prints “Done” or “Step had warnings.” Failures (e.g. 2101 install fail, 2505 install fail) are not surfaced unless you read the full output. - Disk space: 2502/2504 have historically hit “No space left on device” in
/data/besu(RocksDB). The scripts do not check or resize CT disk; that remains manual (e.g.pct resize <vmid> rootfs +50Gor free space inside CT). - LV name assumption: make-rpc-vmids-writable assumes LVs are
/dev/pve/vm-<vmid>-disk-0. Different storage or naming would need script changes. - Single host for RPC: make-rpc-vmids-writable only targets r630-01. If any RPC VMIDs are moved to ml110/r630-02, the script would need to be extended (or a second call with a different host).
4. Recommendations and completion
- Optional verbose mode: ✅ Done. Runner supports
--verbose; when set, step output is not redirected (no2>/dev/null), so failures are visible. - Optional timeout for step 2: ✅ Done.
STEP2_TIMEOUT(default 900) applies to the 2101 fix; exit code 124 is detected and a message tells the user to re-run the fix manually. UseSTEP2_TIMEOUT=0to disable. - §9 checklist: ✅ CHECK_ALL_UPDATES §9 includes "RPC CTs read-only → make-rpc-vmids-writable first"; operators have a single place for order of operations.
- Disk check (future): Not implemented. Optionally run
pct exec <vmid> -- df -h / /data/besubefore install/fix and warn if usage > 90%.
5. File reference
| File | Role |
|---|---|
scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh |
Main runner (steps 0–5) |
scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh |
e2fsck 2101, 2500–2505 on r630-01 |
scripts/maintenance/address-all-remaining-502s.sh |
Backends + NPM + diagnostics |
scripts/maintenance/fix-rpc-2101-jna-reinstall.sh |
2101 Besu reinstall, /tmp + JNA tmpdir |
scripts/install-besu-in-ct-standalone.sh |
In-CT Besu install; apt tolerant; JNA tmpdir |
scripts/besu/install-besu-permanent-on-missing-nodes.sh |
Besu on 1505–1508, 2500–2505; writability check |
docs/00-meta/502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md |
Root causes, Read-only CT, 2101/2500–2505 fixes |
docs/05-network/CHECK_ALL_UPDATES_AND_CLOUDFLARE_TUNNELS.md |
Config, tunnels, verification, §9 remaining fixes |
6. Quick commands
# Full run (writable → fix → install → 502s → E2E)
./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh --e2e
# Show all step output (no 2>/dev/null)
./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh --e2e --verbose
# Step 2 (2101 fix) timeout: default 900s; disable with 0
STEP2_TIMEOUT=1200 ./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh --e2e
STEP2_TIMEOUT=0 ./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh --e2e
# Only make RPC CTs writable
./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh
# Dry-run (print steps only)
./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh --dry-run
Reports and diagnostics: docs/04-configuration/verification-evidence/ (RPC diagnostics, E2E reports).