Files
proxmox/docs/archive/00-meta-pruned/VMID_2101_CHANGES_AND_FAILURES.md
defiQUG bea1903ac9
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
Sync all local changes: docs, config, scripts, submodule refs, verification evidence
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-21 15:46:06 -08:00

6.9 KiB

VMID 2101 (Core RPC) — Changes and Why Failures Continue

Purpose: List all changes (including write/lock-related) made to VMID 2101 and what is causing the Core RPC to keep failing.


1. Changes made to VMID 2101

1.1 Make-writable (e2fsck) — make-rpc-vmids-writable-via-ssh.sh

Action What it does Effect on 2101
pct stop 2101 Stops the container Unmounts rootfs; LV may stay active or be deactivated by Proxmox.
lvchange -ay /dev/pve/vm-2101-disk-0 Activates the LV So e2fsck can run on the block device.
e2fsck -f -y /dev/pve/vm-2101-disk-0 Full fsck, non-interactive FILE SYSTEM WAS MODIFIED — fixes ext4 errors, journal, inodes. Clears the condition that caused the kernel to remount root read-only. No “write lock” is added; this allows the fs to be mounted read-write again.
lvchange -an /dev/pve/vm-2101-disk-0 Deactivates the LV LV is taken offline before start. On some setups this could be problematic if pct start does not reliably re-activate the LV before mount.
pct start 2101 Starts the container Rootfs is mounted (typically rw after a successful e2fsck).

Evidence (from run): Logs showed FILE SYSTEM WAS MODIFIED and “e2fsck done for 2101”, then “VMID 2101 writable”. So the filesystem was corrected and the CT was brought back up.

1.2 Fix 2101 JNA reinstall — fix-rpc-2101-jna-reinstall.sh

Action What it does Effect on 2101
Writability check touch /tmp/.w and touch /opt/.w in CT Fails and exits if /tmp (or /opt) is not writable.
Stop Besu systemctl stop besu-rpc.service besu.service Stops RPC so files can be replaced.
Backup /opt/besu mv /opt/besu /opt/besu.bak.<timestamp> Removes or renames existing Besu install.
Installer in /tmp Copies install-besu-in-ct-standalone.sh to CT /tmp, runs with TMPDIR=/tmp Uses /tmp so install works even if /root is read-only.
Run install script install-besu-in-ct-standalone.sh (NODE_TYPE=rpc) Runs apt-get update and apt-get install openjdk-17-jdk wget ..., downloads Besu tarball, extracts to /opt/besu, creates besu-rpc.service with -Djava.io.tmpdir=/data/besu/tmp.
Post-install mkdir -p /data/besu/tmp, ensure -Djava.io.tmpdir=/data/besu/tmp in besu-rpc.service, daemon-reload So JNA/native libs use a writable dir; avoids “Read-only file system” for JNA.
Deploy genesis/node lists Push genesis.json, static-nodes.json, permissions-nodes.toml to /etc/besu Config for Chain 138.
Start besu-rpc systemctl start besu-rpc.service Brings Core RPC up.

What actually happened in runs: The script stalled at “Installing packages…” (apt inside the CT). So:

  • Root was made writable (e2fsck).
  • The JNA reinstall script did not complete: apt hung or was very slow.
  • Result: no valid /opt/besu/bin/besu (or incomplete install), besu-rpc inactive, so Core RPC keeps failing.

1.3 Other scripts that touch 2101

  • fix-core-rpc-2101.sh: Only starts/restarts the CT and besu-rpc / besu service; no filesystem or “write lock” changes.
  • fix-all-502s-comprehensive.sh: Ensures nodekey in /data/besu, then runs fix-core-rpc-2101.sh; no e2fsck or LV changes.
  • install-besu-in-ct-standalone.sh (when run inside 2101): Writes to /opt/besu, /etc/besu, /data/besu, /var/log/besu and creates systemd unit; adds -Djava.io.tmpdir=/data/besu/tmp (reduces risk of JNA write issues, does not add a lock).

2. “Write locks” and read-only behavior

  • No explicit “write lock” is set by these scripts. The only lock-like behavior is the read-only root that the kernel sets when ext4 hits errors; e2fsck is what removes that by repairing the fs.
  • e2fsck can set the “filesystem needs checking” flag or clear it; it does not leave a persistent write lock. After a successful e2fsck and pct start, the rootfs should mount read-write.
  • lvchange -an in the make-writable script deactivates the LV right before pct start. In normal Proxmox behavior, starting the CT should activate the LV again. If your host or storage stack behaves differently, deactivating the LV before start could in theory lead to start failures or odd state; removing lvchange -an (or running it only when the CT is not about to be started) avoids that possibility.

3. Why Core RPC (2101) continues to fail

From logs and summaries:

  1. JNA reinstall never finished
    The fix script repeatedly stalls at “Installing packages…” (apt in the CT). So:

    • /opt/besu is missing or from an old/incomplete install.
    • besu-rpc.service is inactive or fails (e.g. NoClassDefFoundError for JNA, or missing binary).
    • RPC on 192.168.11.211:8545 never comes up or stays down.
  2. Root was fixed, but the service was not
    Making the CT writable (e2fsck) succeeded; the service fix (reinstall Besu + JNA tmpdir) did not complete, so 2101 stays in a “writable but no working Besu” state.

  3. Possible contributing factors

    • lvchange -an before pct start in the make-writable script (see above).
    • Apt in the CT slow or hanging (network, mirrors, or I/O).
    • If root ever goes read-only again (e.g. new ext4 errors), later fix attempts will again hit “/tmp not writable” until make-writable is run again.

  1. Complete the 2101 fix once (no e2fsck unless needed)

    • Ensure 2101 is running and writable (if in doubt, run make-rpc-vmids-writable-via-ssh.sh once).
    • Run only the 2101 fix with enough time for apt to finish:
      ./scripts/maintenance/fix-rpc-2101-jna-reinstall.sh
      
    • If it still stalls on apt, log into the CT and run apt by hand, then re-run the fix script (or install Besu manually and set -Djava.io.tmpdir=/data/besu/tmp and start besu-rpc).
  2. Optional: make-writable script

    • Remove lvchange -an before pct start in make-rpc-vmids-writable-via-ssh.sh, or run it only when the CT will not be started immediately, so the LV is not deactivated right before start.
  3. Verify

    • After Besu is installed and besu-rpc is started:
      • pct exec 2101 -- systemctl status besu-rpc
      • pct exec 2101 -- ss -tlnp | grep 8545
      • curl -s -X POST -H 'Content-Type: application/json' -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' http://192.168.11.211:8545/

5. Reference