Files

Deploy to Phoenix / deploy (push) Has been cancelled

Details

chore: update submodule references and documentation

- Marked submodules ai-mcp-pmm-controller, explorer-monorepo, and smom-dbis-138 as dirty to reflect recent changes.
- Updated documentation to clarify operator script usage, including dotenv loading and task execution instructions.
- Enhanced the README and various index files to provide clearer navigation and task completion guidance.

Made-with: Cursor

2026-03-04 02:03:08 -08:00

6.2 KiB

Raw Permalink Blame History

Proxmox load balancing runbook

Purpose: Reduce load on the busiest node (r630-01) by migrating selected LXC containers to r630-02. Also frees space on r630-01 when moving to another host. Note: ml110 is being repurposed to OPNsense/pfSense (WAN aggregator); migrate workloads off ml110 to r630-01/r630-02 before repurpose — see ML110_OPNSENSE_PFSENSE_WAN_AGGREGATOR.md.

Before you start: If you are considering adding a third or fourth R630 to the cluster first, see PROXMOX_ADD_THIRD_FOURTH_R630_DECISION.md — including whether you already have r630-03/r630-04 (powered off) to bring online.

Current imbalance (typical):

Node	IP	LXC count	Load (1/5/15)	Notes
r630-01	192.168.11.11	58	56 / 81 / 92	Heavily loaded
r630-02	192.168.11.12	23	~4 / 4 / 4	Light
ml110	192.168.11.10	18	~7 / 7 / 9	Repurposing to OPNsense/pfSense — migrate workloads off to r630-01/r630-02

Ways to balance:

Cross-host migration (r630-01 → r630-02) — Moves workload off r630-01. IP stays the same if the container uses a static IP; only the Proxmox host changes. (ml110 is no longer a migration target; migrate containers off ml110 first.)
Same-host storage migration (r630-01 data → thin1) — Frees space on the data pool and can improve I/O; does not reduce CPU/load by much. See MIGRATION_PLAN_R630_01_DATA.md.

1. Check cluster (live migrate vs backup/restore)

If all nodes are in the same Proxmox cluster, you can try live migration (faster, less downtime):

ssh root@192.168.11.11 "pvecm status"
ssh root@192.168.11.12 "pvecm status"

If both show the same cluster name and list each other: use pct migrate <VMID> <target_node> --restart from any cluster node (run on r630-01 or from a host that SSHs to r630-01).
If nodes are not in a cluster (or migrate fails due to storage): use backup → copy → restore with the script below.

2. Cross-host migration (r630-01 → r630-02)

Script (backup/restore; works without shared storage):

cd /path/to/proxmox

# One container (replace VMID and target storage)
./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh <VMID> [target_storage] [--destroy-source]

# Examples
./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh 3501 thin1 --dry-run
./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh 3501 thin1 --destroy-source

Target storage on r630-02: Check with ssh root@192.168.11.12 "pvesm status". Common: thin1, thin2, thin5, thin6.

If cluster works (live migrate):

ssh root@192.168.11.11 "pct migrate <VMID> r630-02 --storage thin1 --restart"
# Then remove source CT if desired: pct destroy <VMID> --purge 1

3. Good candidates to move (r630-01 → r630-02)

Containers that reduce load and are safe to move (no critical chain/consensus; IP can stay static). Prefer moving several smaller ones rather than one critical RPC.

VMID	Name / role	Notes
3500	oracle-publisher-1	Oracle publisher
3501	ccip-monitor-1	CCIP monitor
7804	gov-portals-dev	Gov portals (already migrated in past; verify current host)
8640	vault-phoenix-1	Vault (if not critical path)
8642	vault-phoenix-3	Vault
10232	CT10232	Small service
10235	npmplus-alltra-hybx	NPMplus instance (has its own NPM; update UDM port forward if needed)
10236	npmplus-fourth	NPMplus instance
10030–10092	order-* (identity, intake, finance, etc.)	Order stack; move as a group if desired
10200–10210	order-prometheus, grafana, opensearch, haproxy	Monitoring/HA; move with order-* or after

Do not move (keep on r630-01 for now):

10233 — npmplus (main NPMplus; 76.53.10.36 → .167)
2101 — besu-rpc-core-1 (core RPC for deploy/admin)
2500–2505 — RPC alltra/hybx (critical RPCs)
1000–1002, 1500–1502 — validators and sentries (consensus)
10130, 10150, 10151 — dbis-frontend, dbis-api (core apps; move only with a plan)
100, 101, 102, 103, 104, 105 — mail, datacenter, cloudflared, omada, gitea (infra)

4. Migrating workloads off ml110 (before OPNsense/pfSense repurpose)

ml110 (192.168.11.10) is being repurposed to OPNsense/pfSense (WAN aggregator between 6–10 cable modems and UDM Pros). All containers/VMs on ml110 must be migrated to r630-01 or r630-02 before the repurpose.

If cluster: ssh root@192.168.11.10 "pct migrate <VMID> r630-01 --storage <storage> --restart" or ... r630-02 ...
If no cluster: Use backup on ml110, copy to r630-01 or r630-02, restore there (see MIGRATE_CT_R630_01_TO_R630_02.md and adapt for source=ml110, target=r630-01 or r630-02).

After all workloads are off ml110, remove ml110 from the cluster (or reinstall the node with OPNsense/pfSense). See ML110_OPNSENSE_PFSENSE_WAN_AGGREGATOR.md.

5. After migration

IP: Containers keep the same IP if they use static IP in the CT config; no change needed for NPM/DNS if they point by IP.
Docs: Update any runbooks or configs that assume “VMID X is on r630-01” (e.g. config/ip-addresses.conf comments, backup scripts).
Verify: Re-run bash scripts/check-all-proxmox-hosts.sh and confirm load and container counts.

6. Quick reference

Goal	Command / doc
Check current load	`bash scripts/check-all-proxmox-hosts.sh`
Migrate one CT (r630-01 → r630-02)	`./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh <VMID> thin1 [--destroy-source]`
Same-host (data → thin1)	MIGRATION_PLAN_R630_01_DATA.md, `migrate-ct-r630-01-data-to-thin1.sh`
Full migration doc	MIGRATE_CT_R630_01_TO_R630_02.md

6.2 KiB Raw Permalink Blame History Unescape Escape