- Config, docs, scripts, and backup manifests - Submodule refs unchanged (m = modified content in submodules) Made-with: Cursor
4.6 KiB
RPC 2101 (Core) — Read-only filesystem fix
VMID 2101 (192.168.11.211, Chain 138 Core RPC) can fail with Besu in a crash loop and port 8545 connection refused. Root cause observed: Read-only file system on /data/besu/database/.
Cause
- Kernel I/O errors on the host (Proxmox 192.168.11.11):
Buffer I/O error on device dm-*,EXT4-fs: failed to convert unwritten extents,potential data loss. - ext4 remounts the filesystem read-only to avoid further corruption. Besu then fails with:
RocksDBException: While appending to file: /data/besu/database/... : Read-only file system. - Besu may also crash at startup with JNA:
UnsatisfiedLinkError: Failed to create temporary file for ... libjnidispatch.so: Read-only file system— JNA needs a writable temp dir (e.g./tmporjava.io.tmpdir); if the whole root is ro, startup fails before RPC binds.
Before deploying contracts
Contract deployments use Core RPC only (no Public fallback). Fix read-only and verify health first:
- Fix read-only:
./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh - Health check:
./scripts/maintenance/health-check-rpc-2101.sh(must pass) - Deploy:
./scripts/deployment/deploy-transaction-mirror-and-pmm-pool-after-txpool-clear.sh
If you get "Known transaction" (stuck tx at deployer nonce), clear the Core RPC tx pool: ./scripts/clear-all-transaction-pools.sh then retry deploy.
Fixing 2101 (operator)
- SSH to Proxmox host:
ssh root@192.168.11.11 - Check kernel logs for I/O errors:
dmesg | grep -E "Buffer I/O|EXT4-fs|dm-"
Identify which dm-* (LV) is affected;ls -la /dev/mapper/pve-vm--2101--disk--0shows 2101’s device (e.g. dm-45). - Storage health: Check LVM and disks (e.g.
lvs,pvs,smartctlon underlying disks). Replace or repair failing hardware. - Remount read-write (only if storage is known good):
- Stop the container:
pct stop 2101 - From the host, the container root is mounted by Proxmox; after fixing storage you may need to run
fsckon the LV or reboot the host. If the filesystem was remounted ro due to transient error, sometimes a container stop/start helps (host remounts the LV). - Start the container:
pct start 2101 - Inside container verify:
pct exec 2101 -- touch /data/besu/database/.write_test && rm /data/besu/database/.write_test
- Stop the container:
- Restart Besu RPC:
pct exec 2101 -- systemctl restart besu-rpc.service
Then:./scripts/check-network-rpc-138.sh 192.168.11.211 8545
If still read-only after make-writable
If make-rpc-vmids-writable-via-ssh.sh completes but inside the container /tmp, /data/besu/database, or /data/besu/tmp are still read-only (touch fails with "Read-only file system"):
- e2fsck may have reported
Error writing file system info: Input/output error— the underlying storage (LV or disk on the host) may be failing. - Thin pool 100% full: CT 2101 (and other RPC nodes) use the LVM thin pool pve/data. If the pool is 100% full (
lvs pve/datashows Data% 100.00), writes can fail and the kernel may remount the filesystem read-only. Fix: On the Proxmox host, extend the pool if the VG has free space:lvextend -L +80G pve/data(adjust size). Then re-run make-writable and restart the container. Alternatively migrate the CT to another pool (e.g. thin1) or free space by removing/moving other LVs. - On the Proxmox host: check
dmesg | grep -E 'I/O error|dm-|ext4', and runsmartctl/ LVM checks on the storage backing the CT. If the LV or disk has persistent I/O errors, fix or replace storage, then re-runmake-rpc-vmids-writable-via-ssh.sh, or migrate the CT to healthy storage.
TransactionMirror address
Set TRANSACTION_MIRROR_ADDRESS in smom-dbis-138/.env from the deploy script output. A previous deploy used 0xE362aa10D3Af1A16880A799b78D18F923403B55a; use the script output as source of truth.
Scripts
- Make Core writable (fix read-only):
./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh— run first when 2101 is read-only. - Health check:
./scripts/maintenance/health-check-rpc-2101.sh— container, service, port, RPC eth_chainId/eth_blockNumber, and database writability. - Fix/restart Besu:
./scripts/maintenance/fix-core-rpc-2101.sh[--dry-run] [--restart-only]. - Check/start RPC service:
./scripts/check-and-start-rpc-2101.sh(cannot fix read-only; only restarts the service). - Network check:
./scripts/check-network-rpc-138.sh [HOST] [PORT](default 192.168.11.211 8545). - Deploy (Core only):
./scripts/deployment/deploy-transaction-mirror-and-pmm-pool-after-txpool-clear.sh. No Public fallback; fix Core first.