Files
proxmox/docs/03-deployment/RPC_2101_READONLY_FIX.md
defiQUG b3a8fe4496
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
chore: sync all changes to Gitea
- Config, docs, scripts, and backup manifests
- Submodule refs unchanged (m = modified content in submodules)

Made-with: Cursor
2026-03-02 11:37:34 -08:00

58 lines
4.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# RPC 2101 (Core) — Read-only filesystem fix
**VMID 2101** (192.168.11.211, Chain 138 Core RPC) can fail with Besu in a crash loop and **port 8545 connection refused**. Root cause observed: **Read-only file system** on `/data/besu/database/`.
## Cause
- **Kernel I/O errors** on the host (Proxmox 192.168.11.11): `Buffer I/O error on device dm-*`, `EXT4-fs: failed to convert unwritten extents`, `potential data loss`.
- ext4 remounts the filesystem **read-only** to avoid further corruption. Besu then fails with:
`RocksDBException: While appending to file: /data/besu/database/... : Read-only file system`.
- Besu may also crash at startup with **JNA**: `UnsatisfiedLinkError: Failed to create temporary file for ... libjnidispatch.so: Read-only file system` — JNA needs a writable temp dir (e.g. `/tmp` or `java.io.tmpdir`); if the whole root is ro, startup fails before RPC binds.
## Before deploying contracts
Contract deployments use **Core RPC only** (no Public fallback). Fix read-only and verify health first:
1. **Fix read-only:** `./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh`
2. **Health check:** `./scripts/maintenance/health-check-rpc-2101.sh` (must pass)
3. **Deploy:** `./scripts/deployment/deploy-transaction-mirror-and-pmm-pool-after-txpool-clear.sh`
If you get **"Known transaction"** (stuck tx at deployer nonce), clear the Core RPC tx pool: `./scripts/clear-all-transaction-pools.sh` then retry deploy.
## Fixing 2101 (operator)
1. **SSH to Proxmox host:** `ssh root@192.168.11.11`
2. **Check kernel logs for I/O errors:**
`dmesg | grep -E "Buffer I/O|EXT4-fs|dm-"`
Identify which dm-* (LV) is affected; `ls -la /dev/mapper/pve-vm--2101--disk--0` shows 2101s device (e.g. dm-45).
3. **Storage health:** Check LVM and disks (e.g. `lvs`, `pvs`, `smartctl` on underlying disks). Replace or repair failing hardware.
4. **Remount read-write (only if storage is known good):**
- Stop the container: `pct stop 2101`
- From the host, the container root is mounted by Proxmox; after fixing storage you may need to run `fsck` on the LV or reboot the host. If the filesystem was remounted ro due to transient error, sometimes a container stop/start helps (host remounts the LV).
- Start the container: `pct start 2101`
- Inside container verify: `pct exec 2101 -- touch /data/besu/database/.write_test && rm /data/besu/database/.write_test`
5. **Restart Besu RPC:**
`pct exec 2101 -- systemctl restart besu-rpc.service`
Then: `./scripts/check-network-rpc-138.sh 192.168.11.211 8545`
### If still read-only after make-writable
If `make-rpc-vmids-writable-via-ssh.sh` completes but inside the container **`/tmp`, `/data/besu/database`, or `/data/besu/tmp`** are still read-only (`touch` fails with "Read-only file system"):
- **e2fsck** may have reported `Error writing file system info: Input/output error` — the **underlying storage** (LV or disk on the host) may be failing.
- **Thin pool 100% full:** CT 2101 (and other RPC nodes) use the LVM thin pool **pve/data**. If the pool is 100% full (`lvs pve/data` shows Data% 100.00), writes can fail and the kernel may remount the filesystem read-only. **Fix:** On the Proxmox host, extend the pool if the VG has free space: `lvextend -L +80G pve/data` (adjust size). Then re-run make-writable and restart the container. Alternatively migrate the CT to another pool (e.g. thin1) or free space by removing/moving other LVs.
- On the Proxmox host: check `dmesg | grep -E 'I/O error|dm-|ext4'`, and run `smartctl` / LVM checks on the storage backing the CT. If the LV or disk has persistent I/O errors, fix or replace storage, then re-run `make-rpc-vmids-writable-via-ssh.sh`, or migrate the CT to healthy storage.
## TransactionMirror address
Set `TRANSACTION_MIRROR_ADDRESS` in `smom-dbis-138/.env` from the deploy script output. A previous deploy used **0xE362aa10D3Af1A16880A799b78D18F923403B55a**; use the script output as source of truth.
## Scripts
- **Make Core writable (fix read-only):** `./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh` — run first when 2101 is read-only.
- **Health check:** `./scripts/maintenance/health-check-rpc-2101.sh` — container, service, port, RPC eth_chainId/eth_blockNumber, and database writability.
- **Fix/restart Besu:** `./scripts/maintenance/fix-core-rpc-2101.sh` [--dry-run] [--restart-only].
- **Check/start RPC service:** `./scripts/check-and-start-rpc-2101.sh` (cannot fix read-only; only restarts the service).
- **Network check:** `./scripts/check-network-rpc-138.sh [HOST] [PORT]` (default 192.168.11.211 8545).
- **Deploy (Core only):** `./scripts/deployment/deploy-transaction-mirror-and-pmm-pool-after-txpool-clear.sh`. No Public fallback; fix Core first.