117 lines
6.2 KiB
Markdown
117 lines
6.2 KiB
Markdown
|
|
# Proxmox load balancing runbook
|
|||
|
|
|
|||
|
|
**Purpose:** Reduce load on the busiest node (r630-01) by migrating selected LXC containers to r630-02. Also frees space on r630-01 when moving to another host. **Note:** ml110 is being repurposed to OPNsense/pfSense (WAN aggregator); migrate workloads *off* ml110 to r630-01/r630-02 before repurpose — see [ML110_OPNSENSE_PFSENSE_WAN_AGGREGATOR.md](../11-references/ML110_OPNSENSE_PFSENSE_WAN_AGGREGATOR.md).
|
|||
|
|
|
|||
|
|
**Before you start:** If you are considering adding a **third or fourth R630** to the cluster first, see [PROXMOX_ADD_THIRD_FOURTH_R630_DECISION.md](PROXMOX_ADD_THIRD_FOURTH_R630_DECISION.md) — including whether you already have r630-03/r630-04 (powered off) to bring online.
|
|||
|
|
|
|||
|
|
**Current imbalance (typical):**
|
|||
|
|
|
|||
|
|
| Node | IP | LXC count | Load (1/5/15) | Notes |
|
|||
|
|
|----------|---------------|-----------|------------------|--------------|
|
|||
|
|
| r630-01 | 192.168.11.11 | 58 | 56 / 81 / 92 | Heavily loaded |
|
|||
|
|
| r630-02 | 192.168.11.12 | 23 | ~4 / 4 / 4 | Light |
|
|||
|
|
| ml110 | 192.168.11.10 | 18 | ~7 / 7 / 9 | **Repurposing to OPNsense/pfSense** — migrate workloads off to r630-01/r630-02 |
|
|||
|
|
|
|||
|
|
**Ways to balance:**
|
|||
|
|
|
|||
|
|
1. **Cross-host migration (r630-01 → r630-02)** — Moves workload off r630-01. IP stays the same if the container uses a static IP; only the Proxmox host changes. (ml110 is no longer a migration target; migrate containers *off* ml110 first.)
|
|||
|
|
2. **Same-host storage migration (r630-01 data → thin1)** — Frees space on the `data` pool and can improve I/O; does not reduce CPU/load by much. See [MIGRATION_PLAN_R630_01_DATA.md](MIGRATION_PLAN_R630_01_DATA.md).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Check cluster (live migrate vs backup/restore)
|
|||
|
|
|
|||
|
|
If all nodes are in the **same Proxmox cluster**, you can try **live migration** (faster, less downtime):
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
ssh root@192.168.11.11 "pvecm status"
|
|||
|
|
ssh root@192.168.11.12 "pvecm status"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- If both show the **same cluster name** and list each other: use `pct migrate <VMID> <target_node> --restart` from any cluster node (run on r630-01 or from a host that SSHs to r630-01).
|
|||
|
|
- If nodes are **not** in a cluster (or migrate fails due to storage): use **backup → copy → restore** with the script below.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Cross-host migration (r630-01 → r630-02)
|
|||
|
|
|
|||
|
|
**Script (backup/restore; works without shared storage):**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
cd /path/to/proxmox
|
|||
|
|
|
|||
|
|
# One container (replace VMID and target storage)
|
|||
|
|
./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh <VMID> [target_storage] [--destroy-source]
|
|||
|
|
|
|||
|
|
# Examples
|
|||
|
|
./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh 3501 thin1 --dry-run
|
|||
|
|
./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh 3501 thin1 --destroy-source
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Target storage on r630-02:** Check with `ssh root@192.168.11.12 "pvesm status"`. Common: `thin1`, `thin2`, `thin5`, `thin6`.
|
|||
|
|
|
|||
|
|
**If cluster works (live migrate):**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
ssh root@192.168.11.11 "pct migrate <VMID> r630-02 --storage thin1 --restart"
|
|||
|
|
# Then remove source CT if desired: pct destroy <VMID> --purge 1
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Good candidates to move (r630-01 → r630-02)
|
|||
|
|
|
|||
|
|
Containers that **reduce load** and are **safe to move** (no critical chain/consensus; IP can stay static). Prefer moving several smaller ones rather than one critical RPC.
|
|||
|
|
|
|||
|
|
| VMID | Name / role | Notes |
|
|||
|
|
|--------|------------------------|-------|
|
|||
|
|
| 3500 | oracle-publisher-1 | Oracle publisher |
|
|||
|
|
| 3501 | ccip-monitor-1 | CCIP monitor |
|
|||
|
|
| 7804 | gov-portals-dev | Gov portals (already migrated in past; verify current host) |
|
|||
|
|
| 8640 | vault-phoenix-1 | Vault (if not critical path) |
|
|||
|
|
| 8642 | vault-phoenix-3 | Vault |
|
|||
|
|
| 10232 | CT10232 | Small service |
|
|||
|
|
| 10235 | npmplus-alltra-hybx | NPMplus instance (has its own NPM; update UDM port forward if needed) |
|
|||
|
|
| 10236 | npmplus-fourth | NPMplus instance |
|
|||
|
|
| 10030–10092 | order-* (identity, intake, finance, etc.) | Order stack; move as a group if desired |
|
|||
|
|
| 10200–10210 | order-prometheus, grafana, opensearch, haproxy | Monitoring/HA; move with order-* or after |
|
|||
|
|
|
|||
|
|
**Do not move (keep on r630-01 for now):**
|
|||
|
|
|
|||
|
|
- **10233** — npmplus (main NPMplus; 76.53.10.36 → .167)
|
|||
|
|
- **2101** — besu-rpc-core-1 (core RPC for deploy/admin)
|
|||
|
|
- **2500–2505** — RPC alltra/hybx (critical RPCs)
|
|||
|
|
- **1000–1002, 1500–1502** — validators and sentries (consensus)
|
|||
|
|
- **10130, 10150, 10151** — dbis-frontend, dbis-api (core apps; move only with a plan)
|
|||
|
|
- **100, 101, 102, 103, 104, 105** — mail, datacenter, cloudflared, omada, gitea (infra)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Migrating workloads *off* ml110 (before OPNsense/pfSense repurpose)
|
|||
|
|
|
|||
|
|
ml110 (192.168.11.10) is being **repurposed to OPNsense/pfSense** (WAN aggregator between 6–10 cable modems and UDM Pros). All containers/VMs on ml110 must be **migrated to r630-01 or r630-02** before the repurpose.
|
|||
|
|
|
|||
|
|
- **If cluster:** `ssh root@192.168.11.10 "pct migrate <VMID> r630-01 --storage <storage> --restart"` or `... r630-02 ...`
|
|||
|
|
- **If no cluster:** Use backup on ml110, copy to r630-01 or r630-02, restore there (see [MIGRATE_CT_R630_01_TO_R630_02.md](../03-deployment/MIGRATE_CT_R630_01_TO_R630_02.md) and adapt for source=ml110, target=r630-01 or r630-02).
|
|||
|
|
|
|||
|
|
After all workloads are off ml110, remove ml110 from the cluster (or reinstall the node with OPNsense/pfSense). See [ML110_OPNSENSE_PFSENSE_WAN_AGGREGATOR.md](../11-references/ML110_OPNSENSE_PFSENSE_WAN_AGGREGATOR.md).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. After migration
|
|||
|
|
|
|||
|
|
- **IP:** Containers keep the same IP if they use static IP in the CT config; no change needed for NPM/DNS if they point by IP.
|
|||
|
|
- **Docs:** Update any runbooks or configs that assume “VMID X is on r630-01” (e.g. `config/ip-addresses.conf` comments, backup scripts).
|
|||
|
|
- **Verify:** Re-run `bash scripts/check-all-proxmox-hosts.sh` and confirm load and container counts.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Quick reference
|
|||
|
|
|
|||
|
|
| Goal | Command / doc |
|
|||
|
|
|------|----------------|
|
|||
|
|
| Check current load | `bash scripts/check-all-proxmox-hosts.sh` |
|
|||
|
|
| Migrate one CT (r630-01 → r630-02) | `./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh <VMID> thin1 [--destroy-source]` |
|
|||
|
|
| Same-host (data → thin1) | [MIGRATION_PLAN_R630_01_DATA.md](MIGRATION_PLAN_R630_01_DATA.md), `migrate-ct-r630-01-data-to-thin1.sh` |
|
|||
|
|
| Full migration doc | [MIGRATE_CT_R630_01_TO_R630_02.md](../03-deployment/MIGRATE_CT_R630_01_TO_R630_02.md) |
|