Files
proxmox/docs/03-deployment/RPC_2101_READONLY_FIX.md
defiQUG b3a8fe4496
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
chore: sync all changes to Gitea
- Config, docs, scripts, and backup manifests
- Submodule refs unchanged (m = modified content in submodules)

Made-with: Cursor
2026-03-02 11:37:34 -08:00

4.6 KiB
Raw Permalink Blame History

RPC 2101 (Core) — Read-only filesystem fix

VMID 2101 (192.168.11.211, Chain 138 Core RPC) can fail with Besu in a crash loop and port 8545 connection refused. Root cause observed: Read-only file system on /data/besu/database/.

Cause

  • Kernel I/O errors on the host (Proxmox 192.168.11.11): Buffer I/O error on device dm-*, EXT4-fs: failed to convert unwritten extents, potential data loss.
  • ext4 remounts the filesystem read-only to avoid further corruption. Besu then fails with:
    RocksDBException: While appending to file: /data/besu/database/... : Read-only file system.
  • Besu may also crash at startup with JNA: UnsatisfiedLinkError: Failed to create temporary file for ... libjnidispatch.so: Read-only file system — JNA needs a writable temp dir (e.g. /tmp or java.io.tmpdir); if the whole root is ro, startup fails before RPC binds.

Before deploying contracts

Contract deployments use Core RPC only (no Public fallback). Fix read-only and verify health first:

  1. Fix read-only: ./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh
  2. Health check: ./scripts/maintenance/health-check-rpc-2101.sh (must pass)
  3. Deploy: ./scripts/deployment/deploy-transaction-mirror-and-pmm-pool-after-txpool-clear.sh

If you get "Known transaction" (stuck tx at deployer nonce), clear the Core RPC tx pool: ./scripts/clear-all-transaction-pools.sh then retry deploy.

Fixing 2101 (operator)

  1. SSH to Proxmox host: ssh root@192.168.11.11
  2. Check kernel logs for I/O errors:
    dmesg | grep -E "Buffer I/O|EXT4-fs|dm-"
    Identify which dm-* (LV) is affected; ls -la /dev/mapper/pve-vm--2101--disk--0 shows 2101s device (e.g. dm-45).
  3. Storage health: Check LVM and disks (e.g. lvs, pvs, smartctl on underlying disks). Replace or repair failing hardware.
  4. Remount read-write (only if storage is known good):
    • Stop the container: pct stop 2101
    • From the host, the container root is mounted by Proxmox; after fixing storage you may need to run fsck on the LV or reboot the host. If the filesystem was remounted ro due to transient error, sometimes a container stop/start helps (host remounts the LV).
    • Start the container: pct start 2101
    • Inside container verify: pct exec 2101 -- touch /data/besu/database/.write_test && rm /data/besu/database/.write_test
  5. Restart Besu RPC:
    pct exec 2101 -- systemctl restart besu-rpc.service
    Then: ./scripts/check-network-rpc-138.sh 192.168.11.211 8545

If still read-only after make-writable

If make-rpc-vmids-writable-via-ssh.sh completes but inside the container /tmp, /data/besu/database, or /data/besu/tmp are still read-only (touch fails with "Read-only file system"):

  • e2fsck may have reported Error writing file system info: Input/output error — the underlying storage (LV or disk on the host) may be failing.
  • Thin pool 100% full: CT 2101 (and other RPC nodes) use the LVM thin pool pve/data. If the pool is 100% full (lvs pve/data shows Data% 100.00), writes can fail and the kernel may remount the filesystem read-only. Fix: On the Proxmox host, extend the pool if the VG has free space: lvextend -L +80G pve/data (adjust size). Then re-run make-writable and restart the container. Alternatively migrate the CT to another pool (e.g. thin1) or free space by removing/moving other LVs.
  • On the Proxmox host: check dmesg | grep -E 'I/O error|dm-|ext4', and run smartctl / LVM checks on the storage backing the CT. If the LV or disk has persistent I/O errors, fix or replace storage, then re-run make-rpc-vmids-writable-via-ssh.sh, or migrate the CT to healthy storage.

TransactionMirror address

Set TRANSACTION_MIRROR_ADDRESS in smom-dbis-138/.env from the deploy script output. A previous deploy used 0xE362aa10D3Af1A16880A799b78D18F923403B55a; use the script output as source of truth.

Scripts

  • Make Core writable (fix read-only): ./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh — run first when 2101 is read-only.
  • Health check: ./scripts/maintenance/health-check-rpc-2101.sh — container, service, port, RPC eth_chainId/eth_blockNumber, and database writability.
  • Fix/restart Besu: ./scripts/maintenance/fix-core-rpc-2101.sh [--dry-run] [--restart-only].
  • Check/start RPC service: ./scripts/check-and-start-rpc-2101.sh (cannot fix read-only; only restarts the service).
  • Network check: ./scripts/check-network-rpc-138.sh [HOST] [PORT] (default 192.168.11.211 8545).
  • Deploy (Core only): ./scripts/deployment/deploy-transaction-mirror-and-pmm-pool-after-txpool-clear.sh. No Public fallback; fix Core first.