58 lines
4.6 KiB
Markdown
58 lines
4.6 KiB
Markdown
|
|
# RPC 2101 (Core) — Read-only filesystem fix
|
|||
|
|
|
|||
|
|
**VMID 2101** (192.168.11.211, Chain 138 Core RPC) can fail with Besu in a crash loop and **port 8545 connection refused**. Root cause observed: **Read-only file system** on `/data/besu/database/`.
|
|||
|
|
|
|||
|
|
## Cause
|
|||
|
|
|
|||
|
|
- **Kernel I/O errors** on the host (Proxmox 192.168.11.11): `Buffer I/O error on device dm-*`, `EXT4-fs: failed to convert unwritten extents`, `potential data loss`.
|
|||
|
|
- ext4 remounts the filesystem **read-only** to avoid further corruption. Besu then fails with:
|
|||
|
|
`RocksDBException: While appending to file: /data/besu/database/... : Read-only file system`.
|
|||
|
|
- Besu may also crash at startup with **JNA**: `UnsatisfiedLinkError: Failed to create temporary file for ... libjnidispatch.so: Read-only file system` — JNA needs a writable temp dir (e.g. `/tmp` or `java.io.tmpdir`); if the whole root is ro, startup fails before RPC binds.
|
|||
|
|
|
|||
|
|
## Before deploying contracts
|
|||
|
|
|
|||
|
|
Contract deployments use **Core RPC only** (no Public fallback). Fix read-only and verify health first:
|
|||
|
|
|
|||
|
|
1. **Fix read-only:** `./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh`
|
|||
|
|
2. **Health check:** `./scripts/maintenance/health-check-rpc-2101.sh` (must pass)
|
|||
|
|
3. **Deploy:** `./scripts/deployment/deploy-transaction-mirror-and-pmm-pool-after-txpool-clear.sh`
|
|||
|
|
|
|||
|
|
If you get **"Known transaction"** (stuck tx at deployer nonce), clear the Core RPC tx pool: `./scripts/clear-all-transaction-pools.sh` then retry deploy.
|
|||
|
|
|
|||
|
|
## Fixing 2101 (operator)
|
|||
|
|
|
|||
|
|
1. **SSH to Proxmox host:** `ssh root@192.168.11.11`
|
|||
|
|
2. **Check kernel logs for I/O errors:**
|
|||
|
|
`dmesg | grep -E "Buffer I/O|EXT4-fs|dm-"`
|
|||
|
|
Identify which dm-* (LV) is affected; `ls -la /dev/mapper/pve-vm--2101--disk--0` shows 2101’s device (e.g. dm-45).
|
|||
|
|
3. **Storage health:** Check LVM and disks (e.g. `lvs`, `pvs`, `smartctl` on underlying disks). Replace or repair failing hardware.
|
|||
|
|
4. **Remount read-write (only if storage is known good):**
|
|||
|
|
- Stop the container: `pct stop 2101`
|
|||
|
|
- From the host, the container root is mounted by Proxmox; after fixing storage you may need to run `fsck` on the LV or reboot the host. If the filesystem was remounted ro due to transient error, sometimes a container stop/start helps (host remounts the LV).
|
|||
|
|
- Start the container: `pct start 2101`
|
|||
|
|
- Inside container verify: `pct exec 2101 -- touch /data/besu/database/.write_test && rm /data/besu/database/.write_test`
|
|||
|
|
5. **Restart Besu RPC:**
|
|||
|
|
`pct exec 2101 -- systemctl restart besu-rpc.service`
|
|||
|
|
Then: `./scripts/check-network-rpc-138.sh 192.168.11.211 8545`
|
|||
|
|
|
|||
|
|
### If still read-only after make-writable
|
|||
|
|
|
|||
|
|
If `make-rpc-vmids-writable-via-ssh.sh` completes but inside the container **`/tmp`, `/data/besu/database`, or `/data/besu/tmp`** are still read-only (`touch` fails with "Read-only file system"):
|
|||
|
|
|
|||
|
|
- **e2fsck** may have reported `Error writing file system info: Input/output error` — the **underlying storage** (LV or disk on the host) may be failing.
|
|||
|
|
- **Thin pool 100% full:** CT 2101 (and other RPC nodes) use the LVM thin pool **pve/data**. If the pool is 100% full (`lvs pve/data` shows Data% 100.00), writes can fail and the kernel may remount the filesystem read-only. **Fix:** On the Proxmox host, extend the pool if the VG has free space: `lvextend -L +80G pve/data` (adjust size). Then re-run make-writable and restart the container. Alternatively migrate the CT to another pool (e.g. thin1) or free space by removing/moving other LVs.
|
|||
|
|
- On the Proxmox host: check `dmesg | grep -E 'I/O error|dm-|ext4'`, and run `smartctl` / LVM checks on the storage backing the CT. If the LV or disk has persistent I/O errors, fix or replace storage, then re-run `make-rpc-vmids-writable-via-ssh.sh`, or migrate the CT to healthy storage.
|
|||
|
|
|
|||
|
|
## TransactionMirror address
|
|||
|
|
|
|||
|
|
Set `TRANSACTION_MIRROR_ADDRESS` in `smom-dbis-138/.env` from the deploy script output. A previous deploy used **0xE362aa10D3Af1A16880A799b78D18F923403B55a**; use the script output as source of truth.
|
|||
|
|
|
|||
|
|
## Scripts
|
|||
|
|
|
|||
|
|
- **Make Core writable (fix read-only):** `./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh` — run first when 2101 is read-only.
|
|||
|
|
- **Health check:** `./scripts/maintenance/health-check-rpc-2101.sh` — container, service, port, RPC eth_chainId/eth_blockNumber, and database writability.
|
|||
|
|
- **Fix/restart Besu:** `./scripts/maintenance/fix-core-rpc-2101.sh` [--dry-run] [--restart-only].
|
|||
|
|
- **Check/start RPC service:** `./scripts/check-and-start-rpc-2101.sh` (cannot fix read-only; only restarts the service).
|
|||
|
|
- **Network check:** `./scripts/check-network-rpc-138.sh [HOST] [PORT]` (default 192.168.11.211 8545).
|
|||
|
|
- **Deploy (Core only):** `./scripts/deployment/deploy-transaction-mirror-and-pmm-pool-after-txpool-clear.sh`. No Public fallback; fix Core first.
|