# RPC Node Troubleshooting Report — VMID 2505 (besu-rpc-luis-0x8a) **Date**: 2026-01-05 **VMID**: 2505 **IP**: 192.168.11.201 **Role**: Named RPC node (Luis / Chain 0x8a) ## Symptoms - From client: TCP connection to `192.168.11.201:8545` succeeded, but HTTP never returned any bytes (hung). - `pct exec 2505 -- ...` timed out repeatedly (container could not spawn commands). ## Diagnosis - **Container memory pressure** was extreme: - `pvesh ... status/current` showed memory essentially maxed and swap nearly fully used. - The container init process (`/sbin/init`) was in **D (uninterruptible sleep)** with a stack indicating it was blocked waiting on page-in (`filemap_fault` / `folio_wait_bit_common`), consistent with **swap/IO thrash**. - After restarting the container, RPC still did not come up because: - The Besu systemd unit had `Environment="BESU_OPTS=-Xmx8g -Xms8g"` while the container only had **~4GB** before (and later **6GB**). This can cause severe memory pressure/OOM behavior and prevent services from becoming responsive. - Besu logs indicated it was performing **RocksDB compaction** at startup; the oversized heap made recovery worse. ## Remediation / Fixes Applied ### 1) Make storage available to start the container on node `ml110` Starting VMID 2505 initially failed with: - `storage 'local-lvm' is not available on node 'ml110'` Root cause: `/etc/pve/storage.cfg` restricted `local-lvm` to node `r630-01`, but this VMID was running on `ml110`. Fix: Updated `/etc/pve/storage.cfg` to include `ml110` for `lvmthin: local-lvm` (backup created first). After this, `local-lvm` became active on `ml110` and the container could start. ### 2) Increase VMID 2505 memory/swap - Updated VMID 2505 to **memory=6144MB**, **swap=1024MB**. ### 3) Reduce Besu heap to fit container memory Inside VMID 2505: - Updated `/etc/systemd/system/besu-rpc.service`: - From: `BESU_OPTS=-Xmx8g -Xms8g` - To: `BESU_OPTS=-Xms2g -Xmx4g` - Ran: `systemctl daemon-reload && systemctl restart besu-rpc` - Confirmed listeners came up on `:8545` (HTTP RPC), `:8546` (WS), `:9545` (metrics) ## Verification - External JSON-RPC works again: - `eth_chainId` returns `0x8a` - `eth_blockNumber` returns a valid block - Full fleet retest: - Report: `reports/rpc_nodes_test_20260105_062846.md` - Result: **Reachable 12/12**, **Authorized+responding 12/12**, **Block spread Δ0** ## Follow-ups / Recommendations - Keep Besu heap aligned to container memory (avoid `Xmx` near/above memory limit). - Investigate why node `ml110` is hosting VMIDs whose storage is restricted to `r630-01` in `storage.cfg` (possible migration/renaming mismatch). - The Proxmox host `ml110` showed extremely high load earlier; consider checking IO wait and overall node health if issues recur.