feat(it-ops): live inventory, drift API, Keycloak IT role, portal sync hint
- Add scripts/it-ops (Proxmox collector, IPAM drift, export orchestrator) - Add sankofa-it-read-api stub with optional CORS and refresh - Add systemd examples for read API, weekly inventory export, timer - Add live-inventory-drift GitHub workflow (dispatch + weekly) - Add IT controller spec, runbooks, Keycloak ensure-it-admin-role script - Note IT_READ_API env on portal sync completion output Made-with: Cursor
This commit is contained in:
177
docs/02-architecture/SANKOFA_IT_OPERATIONS_CONTROLLER_SPEC.md
Normal file
177
docs/02-architecture/SANKOFA_IT_OPERATIONS_CONTROLLER_SPEC.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# Sankofa IT operations controller — architecture spec
|
||||
|
||||
**Status:** Draft for engineering and IT leadership alignment
|
||||
**Last updated:** 2026-04-08 (Phase 0 live-first inventory section added)
|
||||
**Audience:** IT team, platform ops, Sankofa admin product owners
|
||||
|
||||
---
|
||||
|
||||
## 1. Goals
|
||||
|
||||
You need a single operational program that covers:
|
||||
|
||||
| Capability | Intent |
|
||||
|------------|--------|
|
||||
| **IP inventory** | Authoritative list of every LAN/WAN/VIP assignment, owner, service, and lifecycle (no drift between spreadsheets and `config/ip-addresses.conf`). |
|
||||
| **VLAN design** | Move from today’s **flat VLAN 11** to the **planned segmentation** (validators, RPC, explorer, Sankofa services, tenants) without breaking production. |
|
||||
| **Port mapping** | Physical: switch port ↔ patch panel ↔ host NIC ↔ logical bond/VLAN. Logical: UDM port forwards ↔ NPM host ↔ upstream CT/VM. |
|
||||
| **Host efficiency** | Compare **actual** Proxmox capacity (CPU/RAM/storage/network) to workload placement; drive consolidation, spare-node use, and subscription/licensing discipline. |
|
||||
| **IT admin UI** | **HTML controller** under the **Sankofa admin** surface so the IT team can view/control interfaces, assign **licenses/entitlements**, run **provisioning** workflows, and support **billing** (quotes, usage, invoices handoff). |
|
||||
|
||||
This document defines **how** that fits your existing stack (Proxmox cluster, UDM Pro, UniFi, NPMplus, Keycloak, Phoenix/dbis_core marketplace) and a **phased** path so you do not boil the ocean.
|
||||
|
||||
---
|
||||
|
||||
## 2. Current state (facts from this repo)
|
||||
|
||||
- **IP truth is split** across `config/ip-addresses.conf`, `docs/04-configuration/ALL_VMIDS_ENDPOINTS.md`, and `docs/11-references/NETWORK_CONFIGURATION_MASTER.md`. Automated snapshots: `scripts/verify/poll-proxmox-cluster-hardware.sh`, `reports/status/hardware_and_connected_inventory_*.md`.
|
||||
- **VLANs:** Production today is **VLAN 11 only** for `192.168.11.0/24`. **Planned** VLANs (110–112, 120, 160, 200–203) are documented in [NETWORK_CONFIGURATION_MASTER.md](../11-references/NETWORK_CONFIGURATION_MASTER.md) but **not** implemented as separate broadcast domains on the wire.
|
||||
- **Sankofa admin:** `admin.sankofa.nexus` is **client SSO administration** today (same upstream as the portal unless split). See [FQDN_EXPECTED_CONTENT.md](EXPECTED_WEB_CONTENT.md), [ALL_VMIDS_ENDPOINTS.md](../04-configuration/ALL_VMIDS_ENDPOINTS.md) (VMID **7801**). Portal source: sibling repo **`Sankofa/portal`** (`scripts/deployment/sync-sankofa-portal-7801.sh`).
|
||||
- **Marketplace / commercial:** Partner IRU flows live in **`dbis_core`** (API + React); native infra is mostly **docs + Proxmox**, not one database. See [SANKOFA_MARKETPLACE_SURFACES.md](../03-deployment/SANKOFA_MARKETPLACE_SURFACES.md).
|
||||
|
||||
**Gap:** There is **no** single product today that unifies IPAM, switch port data, Proxmox actions, UniFi, and billing under Sankofa admin. This spec is the blueprint to add it.
|
||||
|
||||
---
|
||||
|
||||
## 3. Target architecture (recommended)
|
||||
|
||||
### 3.1 UI placement
|
||||
|
||||
| Option | Pros | Cons |
|
||||
|--------|------|------|
|
||||
| **A — New `/it` (or `/ops`) app route** inside **`Sankofa/portal`**, gated by Keycloak group `sankofa-it-admin` | One TLS hostname, shared session patterns, fastest path for “under admin” | Portal bundle grows; must isolate client admin vs IT super-admin |
|
||||
| **B — Dedicated host** `it.sankofa.nexus` → small **Next.js/Vite** SPA + BFF | Strong separation, independent deploy cadence | Extra NPM row, cert, pipeline |
|
||||
| **C — Embed Grafana + NetBox** only | Quick graphs / IPAM | Weak billing/licensing story; less “Sankofa branded” control |
|
||||
|
||||
**Recommendation:** **Option A** for MVP (fastest), with **API on a dedicated backend** so you can later move the shell to **B** without rewriting integrations.
|
||||
|
||||
### 3.2 Backend (“control plane API”)
|
||||
|
||||
Introduce a **small service** (name e.g. `sankofa-it-api`) **not** on the public internet without auth:
|
||||
|
||||
- **Network:** VLAN 11 only or **private listener** + NPM **internal** host; OIDC **client credentials** or **user JWT** from Keycloak.
|
||||
- **Responsibilities:**
|
||||
- **Read models:** IPAM, devices, port maps, Proxmox inventory snapshot, UniFi device list (cached).
|
||||
- **Write models:** change requests with **audit log** (who/when/what); optional **approval** queue for destructive actions.
|
||||
- **Connectors (adapters):** Proxmox API, UniFi Network API (UDM), NPM API (already scripted in repo), optional NetBox later.
|
||||
- **Do not** put Proxmox root tokens in the browser; **BFF** holds secrets server-side.
|
||||
|
||||
### 3.3 Data model (minimum viable)
|
||||
|
||||
| Entity | Fields (illustrative) |
|
||||
|--------|------------------------|
|
||||
| **Subnet / VLAN** | id, vlan_id, cidr, name, environment, routing notes |
|
||||
| **IP assignment** | address, hostname, vmid?, mac?, vlan, owner_team, service, source_ref (`ip-addresses.conf` key), status |
|
||||
| **Physical port map** | switch_id, switch_port, panel_ref, far_end_host, far_end_nic, vlan_membership, speed, lacp_group |
|
||||
| **Host / hypervisor** | serial, model, cluster node, CPU/RAM/disk summary (from poll script / Proxmox) |
|
||||
| **License / entitlement** | sku_id, seat_count, valid_from/to, bound_org_or_project, external_ref (Stripe/subscription id) |
|
||||
| **Provisioning job** | type (create_ct, resize_disk, assign_ip), payload, status, correlation_id |
|
||||
|
||||
Start with **Postgres** (you already run many PG instances; a **dedicated small CT** for IT data avoids coupling to app databases).
|
||||
|
||||
### 3.4 Billing and licenses
|
||||
|
||||
Treat **billing** as **integrations**, not a from-scratch ERP:
|
||||
|
||||
- **Licenses / seats:** map to **entitlements** table + Keycloak **groups** or custom claims for “can open IT console / can approve provision.”
|
||||
- **Usage metering:** Proxmox **storage and CPU** per VMID, NPM bandwidth (optional), public IP count — **async jobs** pushing aggregates nightly.
|
||||
- **Invoicing:** export to **Stripe Billing**, **QuickBooks**, or **NetSuite** via CSV/API; the controller shows **status** and **line items**, not necessarily full double-entry ledger on day one.
|
||||
|
||||
Partner marketplace pricing already has patterns in **`dbis_core`**; **native** infra SKUs should either **reuse** `IruOffering`-style tables or **link** by `external_sku_id` to avoid two unrelated catalogs.
|
||||
|
||||
---
|
||||
|
||||
## 4. VLAN and efficiency priorities (what matters most first)
|
||||
|
||||
Aligned with [NETWORK_CONFIGURATION_MASTER.md](../11-references/NETWORK_CONFIGURATION_MASTER.md):
|
||||
|
||||
1. **Document + enforce IP uniqueness** before VLAN migration (ARP incidents already noted around Keycloak IP in E2E docs). Automated **diff**: live `ip neigh` / Proxmox CT IPs vs IPAM.
|
||||
2. **Segment in this order:** (a) **out-of-band / IPMI** if any, (b) **tenant-facing** workloads (VLAN 200+), (c) **Besu validators/RPC**, (d) **Sankofa app tier** — so blast radius reduction matches risk.
|
||||
3. **Use spare cluster capacity:** **r630-03** / **r630-04** are cluster members with large local/ceph-related storage; placing **new** stateless or batch workloads there reduces pressure on r630-01/02 (see network master narrative).
|
||||
4. **ML110 cutover:** WAN aggregator repurpose changes **.10** from Proxmox to firewall; the controller’s IPAM must flag **migration status** per host.
|
||||
|
||||
---
|
||||
|
||||
## 5. Port mapping deliverables
|
||||
|
||||
| Layer | Tool / owner | Output |
|
||||
|-------|----------------|--------|
|
||||
| **Physical (UniFi XG + patch)** | IT + DCIM template | Spreadsheet or **NetBox** cables + interfaces |
|
||||
| **UDM** | UniFi export + manual | Port forward matrix (already partially in network master) |
|
||||
| **NPM** | `scripts/nginx-proxy-manager/update-npmplus-proxy-hosts-api.sh` + API | Proxy host rows = **logical** port map to upstream |
|
||||
| **Proxmox** | `vmbr`, VLAN-aware flags | Map CT `net0` → bridge → VLAN |
|
||||
|
||||
The HTML controller should show a **joined view**: *public hostname → NPM → LAN IP:port → VMID → node → switch port* (where data exists).
|
||||
|
||||
---
|
||||
|
||||
## 5.1 Live data strategy (source of truth)
|
||||
|
||||
| Layer | Primary live source | Declared fallback | Drift handling |
|
||||
|-------|---------------------|-------------------|----------------|
|
||||
| **VMID, node, status, guest IP** | Proxmox: `pvesh get /cluster/resources` + guest config files on shared `/etc/pve` | [ALL_VMIDS_ENDPOINTS.md](../04-configuration/ALL_VMIDS_ENDPOINTS.md) | VMID/IP mismatch; guests only in doc or only on cluster |
|
||||
| **Hypervisor capacity** | `scripts/verify/poll-proxmox-cluster-hardware.sh` | [PROXMOX_HOSTS_COMPLETE_HARDWARE_CONFIG.md](PROXMOX_HOSTS_COMPLETE_HARDWARE_CONFIG.md) | Refresh after hardware changes |
|
||||
| **LAN env keys** | Parsed literals from `ip-addresses.conf` | Same file in git | `guest_ips_not_in_ip_addresses_conf` vs `ip_addresses_conf_ips_not_on_guests`; exclude `PROXMOX_HOST_*`, `NETWORK_GATEWAY`, `UDM_PRO_*`, `WAN_AGGREGATOR_*` from “missing guest” noise |
|
||||
| **Public edge** | NPM API (fleet scripts) | E2E tables | Hostname → upstream drift |
|
||||
| **Switch/AP** | UniFi Network API | NetBox / spreadsheet | Manual until imported |
|
||||
|
||||
**Freshness:** every artifact includes ISO8601 **`collected_at`**; failed collectors must record **`error`** in `drift.json` and must not be presented as current in the IT UI.
|
||||
|
||||
---
|
||||
|
||||
## 6. Phased roadmap
|
||||
|
||||
| Phase | Scope | Exit criteria |
|
||||
|-------|--------|----------------|
|
||||
| **0 — Inventory hardening (live-first)** | **Runtime truth:** Proxmox `pvesh /cluster/resources` + per-guest config (`net0` / `ipconfig0`) for IP, merged with **`config/ip-addresses.conf`** as **declared** literals; emit **`live_inventory.json`** + **`drift.json`** with **`collected_at`**; duplicate guest IPs → fail or alert. **Scripts (add under `scripts/it-ops/`):** `export-live-inventory-and-drift.sh` (SSH to seed node, pipe `lib/collect_inventory_remote.py`), `compute_ipam_drift.py` (merge + drift). **CI:** `.github/workflows/live-inventory-drift.yml` — `workflow_dispatch` + weekly schedule; on GitHub-hosted runners without LAN, collector exits 0 after writing `drift.json` with `seed_unreachable`. **UI/BFF later:** never show inventory without freshness metadata. |
|
||||
| **1 — Read-only IT dashboard** | Keycloak group `sankofa-it-admin`; SPA pages: IPs, VLAN plan (current vs target), cluster nodes, hardware poll link | IT can onboard without SSH |
|
||||
| **2 — Port map CRUD** | DB + UI for switch/port; import from UniFi API | Export CSV/NetBox |
|
||||
| **3 — Controlled provisioning** | BFF + Proxmox API: start/stop scoped CT, **dry-run default** (align with `proxmox-production-safety` rules) | Audit log + allowlists |
|
||||
| **4 — Entitlements + billing hooks** | License assignment UI; Stripe (or chosen) webhook → entitlement | Invoice export for finance |
|
||||
|
||||
---
|
||||
|
||||
## 7. Security and governance
|
||||
|
||||
- **Separate** IT super-admin from **client** `admin.sankofa.nexus` users (different Keycloak groups).
|
||||
- **MFA** required for IT group; **break-glass** local Proxmox access documented, not exposed in UI.
|
||||
- **Change management:** any **write** to network edge (UDM) or **production** Proxmox requires ticket id in API payload (optional field, enforced in policy later).
|
||||
|
||||
---
|
||||
|
||||
## 8. Related documents
|
||||
|
||||
| Topic | Doc |
|
||||
|-------|-----|
|
||||
| IPs, VLAN plan, port forwards | [NETWORK_CONFIGURATION_MASTER.md](../11-references/NETWORK_CONFIGURATION_MASTER.md) |
|
||||
| VMID ↔ IP | [ALL_VMIDS_ENDPOINTS.md](../04-configuration/ALL_VMIDS_ENDPOINTS.md) |
|
||||
| Cabling / 10G | [13_NODE_NETWORK_AND_CABLING_CHECKLIST.md](../11-references/13_NODE_NETWORK_AND_CABLING_CHECKLIST.md) |
|
||||
| Marketplace vs portal | [SANKOFA_MARKETPLACE_SURFACES.md](../03-deployment/SANKOFA_MARKETPLACE_SURFACES.md) |
|
||||
| FQDN roles | [EXPECTED_WEB_CONTENT.md](EXPECTED_WEB_CONTENT.md) |
|
||||
| Hardware poll | `scripts/verify/poll-proxmox-cluster-hardware.sh`, `reports/status/hardware_and_connected_inventory_*.md` |
|
||||
| Proxmox safety | `.cursor/rules/proxmox-production-safety.mdc`, `scripts/lib/proxmox-production-guard.sh` |
|
||||
|
||||
---
|
||||
|
||||
## 9. Next engineering actions (concrete)
|
||||
|
||||
**Done in-repo (Phase 0+):**
|
||||
|
||||
1. **`scripts/it-ops/`** — remote collector (`lib/collect_inventory_remote.py`), `compute_ipam_drift.py` (merges **`ip-addresses.conf`** + **`ALL_VMIDS_ENDPOINTS.md`** table rows), `export-live-inventory-and-drift.sh` → `reports/status/live_inventory.json` + `drift.json`.
|
||||
2. **Read API stub** — `services/sankofa-it-read-api/server.py` (GET `/health`, `/v1/inventory/live`, `/v1/inventory/drift`; POST refresh with API key). systemd example: `config/systemd/sankofa-it-read-api.service.example`.
|
||||
3. **Workflow** `.github/workflows/live-inventory-drift.yml` — `workflow_dispatch` + weekly; artifacts; no LAN on default runners.
|
||||
4. **Validation** — `scripts/validation/validate-config-files.sh` runs `py_compile` on IT scripts + read API.
|
||||
5. **Docs** — [SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md](../03-deployment/SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md), [SANKOFA_IT_OPS_KEYCLOAK_PORTAL_NEXT_STEPS.md](../03-deployment/SANKOFA_IT_OPS_KEYCLOAK_PORTAL_NEXT_STEPS.md).
|
||||
6. **Keycloak automation (proxmox repo)** — `scripts/deployment/keycloak-sankofa-ensure-it-admin-role.sh` creates realm role **`sankofa-it-admin`**; operators still assign the role to users in Admin Console.
|
||||
7. **Portal `/it` (Sankofa/portal repo, sibling clone)** — `src/app/it/page.tsx`, `src/app/api/it/*` (server proxy + `IT_READ_API_URL` / `IT_READ_API_KEY` on CT 7801); credentials **`ADMIN`** propagated into JWT roles for bootstrap (`src/lib/auth.ts`).
|
||||
8. **LAN schedule examples** — `config/systemd/sankofa-it-inventory-export.timer.example` + `.service.example` for weekly `export-live-inventory-and-drift.sh`.
|
||||
|
||||
**Remaining (other repos / product):**
|
||||
|
||||
1. **Full BFF** with OIDC (Keycloak) and Postgres — **`dbis_core` vs dedicated CT** — decide once.
|
||||
2. **Keycloak** — assign **`sankofa-it-admin`** to real IT users (role creation is scripted; mapping is manual policy).
|
||||
3. **Deploy** — `sync-sankofa-portal-7801.sh` after pulling portal changes; set **`IT_READ_API_URL`** on the portal LXC.
|
||||
4. **Schedule on LAN** — enable the timer on a host with repo + SSH to Proxmox; optional same cadence for `poll-proxmox-cluster-hardware.sh`.
|
||||
5. **UniFi / NPM** live collectors — Phase 2 of this spec.
|
||||
|
||||
This spec does **not** replace change control; it gives you a **single product vision** so IP, VLAN, ports, hosts, licenses, and billing support evolve together instead of in silos.
|
||||
@@ -0,0 +1,48 @@
|
||||
# IT operations UI — Keycloak and Sankofa portal next steps
|
||||
|
||||
**Purpose:** Close the gap between Phase 0 (live inventory scripts + read API) and the full **Sankofa admin** IT controller described in [SANKOFA_IT_OPERATIONS_CONTROLLER_SPEC.md](../02-architecture/SANKOFA_IT_OPERATIONS_CONTROLLER_SPEC.md).
|
||||
|
||||
---
|
||||
|
||||
## 1. Keycloak
|
||||
|
||||
1. Create realm role **`sankofa-it-admin`** (idempotent): `bash scripts/deployment/keycloak-sankofa-ensure-it-admin-role.sh` (needs `KEYCLOAK_ADMIN_PASSWORD` in repo `.env`, SSH to Proxmox, CT 7802). Then assign the role to IT staff in the Keycloak Admin Console (or use a group + token mapper if you prefer group claims).
|
||||
2. Map **only** platform IT staff; require **MFA** at realm or IdP policy.
|
||||
3. **Do not** reuse client-admin groups used for `admin.sankofa.nexus` tenant administration unless policy explicitly allows.
|
||||
4. Optional: client scope **it-ops** with claim `it_admin=true` for the IT BFF audience.
|
||||
|
||||
**Reference:** Keycloak CT / VMID in [ALL_VMIDS_ENDPOINTS.md](../04-configuration/ALL_VMIDS_ENDPOINTS.md); portal login runbook `scripts/deployment/enable-sankofa-portal-login-7801.sh`.
|
||||
|
||||
---
|
||||
|
||||
## 2. Sankofa portal (`Sankofa/portal` repo)
|
||||
|
||||
1. **Implemented:** protected route **`/it`** (`src/app/it/page.tsx`) gated by **`sankofa-it-admin`** / **`ADMIN`** (credentials bootstrap). API proxies: `GET /api/it/drift`, `GET /api/it/inventory`, `POST /api/it/refresh`.
|
||||
2. **Configure on CT 7801:** **`IT_READ_API_URL`** (e.g. `http://192.168.11.<host>:8787`) and optional **`IT_READ_API_KEY`** (server-only; never `NEXT_PUBLIC_*`). Proxies to the read API on VLAN 11.
|
||||
3. **Do not** expose `IT_READ_API_KEY` or Proxmox credentials to the browser bundle.
|
||||
4. Display **`collected_at`** from JSON; show a stale warning if older than your SLO (e.g. 24h).
|
||||
|
||||
**Deploy:** `scripts/deployment/sync-sankofa-portal-7801.sh` after portal changes.
|
||||
|
||||
---
|
||||
|
||||
## 3. NPM
|
||||
|
||||
Add an **internal** proxy host (optional TLS) from a hostname such as `it-api.sankofa.nexus` (LAN-only DNS) to **`127.0.0.1:8787`** on the host running the read API, **or** bind the service on a dedicated CT IP and point NPM at that upstream.
|
||||
|
||||
---
|
||||
|
||||
## 4. Full BFF (later)
|
||||
|
||||
Replace `services/sankofa-it-read-api/server.py` with a service that:
|
||||
|
||||
- Validates **OIDC** (Keycloak) JWTs.
|
||||
- Stores **audit** rows for refresh and future writes.
|
||||
- Adds **UniFi** and **NPM** collectors with `collected_at` per domain.
|
||||
|
||||
---
|
||||
|
||||
## Related
|
||||
|
||||
- [SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md](SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md)
|
||||
- [SANKOFA_MARKETPLACE_SURFACES.md](SANKOFA_MARKETPLACE_SURFACES.md) (native vs partner; catalog alignment)
|
||||
368
docs/03-deployment/SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md
Normal file
368
docs/03-deployment/SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md
Normal file
@@ -0,0 +1,368 @@
|
||||
# IT ops Phase 0 — live inventory scripts (implementation appendix)
|
||||
|
||||
**Purpose:** Canonical copy of Phase 0 scripts (also on disk under `scripts/it-ops/`). Use this page if you need to restore or review inline.
|
||||
**Spec:** [SANKOFA_IT_OPERATIONS_CONTROLLER_SPEC.md](../02-architecture/SANKOFA_IT_OPERATIONS_CONTROLLER_SPEC.md) section 5.1 and Phase 0.
|
||||
|
||||
## File layout
|
||||
|
||||
| Path | Role |
|
||||
|------|------|
|
||||
| `scripts/it-ops/lib/collect_inventory_remote.py` | Run on PVE via SSH stdin (`python3 -`) |
|
||||
| `scripts/it-ops/compute_ipam_drift.py` | Local: merge live JSON + `config/ip-addresses.conf` + **`ALL_VMIDS_ENDPOINTS.md`** pipe tables (`--all-vmids-md`) |
|
||||
| `scripts/it-ops/export-live-inventory-and-drift.sh` | Orchestrator: ping seed, SSH, write `reports/status/` |
|
||||
| `services/sankofa-it-read-api/server.py` | Read-only HTTP: `/v1/inventory/live`, `/v1/inventory/drift` |
|
||||
| `.github/workflows/live-inventory-drift.yml` | `workflow_dispatch` + weekly (graceful skip without LAN) |
|
||||
|
||||
**Exit codes (`compute_ipam_drift.py`):** **2** = duplicate guest IP; **0** otherwise. **`vmid_ip_mismatch_live_vs_all_vmids_doc`** in `drift.json` is informational (docs often lag live CT config).
|
||||
|
||||
---
|
||||
|
||||
## `scripts/it-ops/lib/collect_inventory_remote.py`
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""Run ON a Proxmox cluster node (as root). Stdout: JSON live guest inventory."""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
|
||||
|
||||
def _run(cmd: list[str]) -> str:
|
||||
return subprocess.check_output(cmd, text=True, stderr=subprocess.DEVNULL)
|
||||
|
||||
|
||||
def _extract_ip_from_net_line(line: str) -> str | None:
|
||||
m = re.search(r"ip=([0-9.]+)", line)
|
||||
return m.group(1) if m else None
|
||||
|
||||
|
||||
def _read_config(path: str) -> str:
|
||||
try:
|
||||
with open(path, encoding="utf-8", errors="replace") as f:
|
||||
return f.read()
|
||||
except OSError:
|
||||
return ""
|
||||
|
||||
|
||||
def main() -> None:
|
||||
collected_at = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||
try:
|
||||
raw = _run(
|
||||
["pvesh", "get", "/cluster/resources", "--output-format", "json"]
|
||||
)
|
||||
resources = json.loads(raw)
|
||||
except (subprocess.CalledProcessError, json.JSONDecodeError) as e:
|
||||
json.dump(
|
||||
{
|
||||
"collected_at": collected_at,
|
||||
"error": f"pvesh_cluster_resources_failed: {e}",
|
||||
"guests": [],
|
||||
},
|
||||
sys.stdout,
|
||||
indent=2,
|
||||
)
|
||||
return
|
||||
|
||||
guests: list[dict] = []
|
||||
for r in resources:
|
||||
t = r.get("type")
|
||||
if t not in ("lxc", "qemu"):
|
||||
continue
|
||||
vmid = r.get("vmid")
|
||||
node = r.get("node")
|
||||
if vmid is None or not node:
|
||||
continue
|
||||
vmid_s = str(vmid)
|
||||
name = r.get("name") or ""
|
||||
status = r.get("status") or ""
|
||||
|
||||
if t == "lxc":
|
||||
cfg_path = f"/etc/pve/nodes/{node}/lxc/{vmid_s}.conf"
|
||||
else:
|
||||
cfg_path = f"/etc/pve/nodes/{node}/qemu-server/{vmid_s}.conf"
|
||||
|
||||
body = _read_config(cfg_path)
|
||||
ip = ""
|
||||
for line in body.splitlines():
|
||||
if line.startswith("net0:"):
|
||||
got = _extract_ip_from_net_line(line)
|
||||
if got:
|
||||
ip = got
|
||||
break
|
||||
if not ip and t == "qemu":
|
||||
for line in body.splitlines():
|
||||
if line.startswith("ipconfig0:"):
|
||||
got = _extract_ip_from_net_line(line)
|
||||
if got:
|
||||
ip = got
|
||||
break
|
||||
if not ip and t == "qemu":
|
||||
for line in body.splitlines():
|
||||
if line.startswith("net0:"):
|
||||
got = _extract_ip_from_net_line(line)
|
||||
if got:
|
||||
ip = got
|
||||
break
|
||||
|
||||
guests.append(
|
||||
{
|
||||
"vmid": vmid_s,
|
||||
"type": t,
|
||||
"node": str(node),
|
||||
"name": name,
|
||||
"status": status,
|
||||
"ip": ip,
|
||||
"config_path": cfg_path,
|
||||
}
|
||||
)
|
||||
|
||||
out = {
|
||||
"collected_at": collected_at,
|
||||
"guests": sorted(guests, key=lambda g: int(g["vmid"])),
|
||||
}
|
||||
json.dump(out, sys.stdout, indent=2)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `scripts/it-ops/compute_ipam_drift.py`
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""Merge live JSON with config/ip-addresses.conf; write live_inventory.json + drift.json."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
IPV4_RE = re.compile(
|
||||
r"(?<![0-9.])(?:[0-9]{1,3}\.){3}[0-9]{1,3}(?![0-9.])"
|
||||
)
|
||||
|
||||
|
||||
def parse_ip_addresses_conf(path: Path) -> tuple[dict[str, str], set[str]]:
|
||||
var_map: dict[str, str] = {}
|
||||
all_ips: set[str] = set()
|
||||
if not path.is_file():
|
||||
return var_map, all_ips
|
||||
for line in path.read_text(encoding="utf-8", errors="replace").splitlines():
|
||||
s = line.strip()
|
||||
if not s or s.startswith("#") or "=" not in s:
|
||||
continue
|
||||
key, _, val = s.partition("=")
|
||||
key = key.strip()
|
||||
val = val.strip()
|
||||
if val.startswith('"') and val.endswith('"'):
|
||||
val = val[1:-1]
|
||||
elif val.startswith("'") and val.endswith("'"):
|
||||
val = val[1:-1]
|
||||
var_map[key] = val
|
||||
for m in IPV4_RE.findall(val):
|
||||
all_ips.add(m)
|
||||
return var_map, all_ips
|
||||
|
||||
|
||||
def hypervisor_related_keys(var_map: dict[str, str]) -> set[str]:
|
||||
keys = set()
|
||||
for k in var_map:
|
||||
ku = k.upper()
|
||||
if any(
|
||||
x in ku
|
||||
for x in (
|
||||
"PROXMOX_HOST",
|
||||
"PROXMOX_ML110",
|
||||
"PROXMOX_R630",
|
||||
"PROXMOX_R750",
|
||||
"WAN_AGGREGATOR",
|
||||
"NETWORK_GATEWAY",
|
||||
"UDM_PRO",
|
||||
"PUBLIC_IP_GATEWAY",
|
||||
"PUBLIC_IP_ER605",
|
||||
)
|
||||
):
|
||||
keys.add(k)
|
||||
return keys
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser()
|
||||
ap.add_argument("--live", type=Path, help="live JSON file (default stdin)")
|
||||
ap.add_argument("--ip-conf", type=Path, default=Path("config/ip-addresses.conf"))
|
||||
ap.add_argument("--out-dir", type=Path, required=True)
|
||||
args = ap.parse_args()
|
||||
|
||||
live_raw = args.live.read_text(encoding="utf-8") if args.live else sys.stdin.read()
|
||||
live = json.loads(live_raw)
|
||||
guests = live.get("guests") or []
|
||||
var_map, conf_ips = parse_ip_addresses_conf(args.ip_conf)
|
||||
hyp_keys = hypervisor_related_keys(var_map)
|
||||
hyp_ips: set[str] = set()
|
||||
for k in hyp_keys:
|
||||
if k not in var_map:
|
||||
continue
|
||||
for m in IPV4_RE.findall(var_map[k]):
|
||||
hyp_ips.add(m)
|
||||
|
||||
ip_to_vmids: dict[str, list[str]] = {}
|
||||
for g in guests:
|
||||
ip = (g.get("ip") or "").strip()
|
||||
if not ip:
|
||||
continue
|
||||
ip_to_vmids.setdefault(ip, []).append(g.get("vmid", "?"))
|
||||
|
||||
duplicate_ips = {ip: vms for ip, vms in ip_to_vmids.items() if len(vms) > 1}
|
||||
guest_ip_set = set(ip_to_vmids.keys())
|
||||
conf_only = sorted(conf_ips - guest_ip_set - hyp_ips)
|
||||
live_only = sorted(guest_ip_set - conf_ips)
|
||||
|
||||
drift = {
|
||||
"collected_at": live.get("collected_at"),
|
||||
"guest_count": len(guests),
|
||||
"duplicate_ips": duplicate_ips,
|
||||
"guest_ips_not_in_ip_addresses_conf": live_only,
|
||||
"ip_addresses_conf_ips_not_on_guests": conf_only,
|
||||
"hypervisor_and_infra_ips_excluded_from_guest_match": sorted(hyp_ips),
|
||||
"notes": [],
|
||||
}
|
||||
if live.get("error"):
|
||||
drift["notes"].append(live["error"])
|
||||
|
||||
inv_out = {
|
||||
"collected_at": live.get("collected_at"),
|
||||
"source": "proxmox_cluster_pvesh_plus_config",
|
||||
"guests": guests,
|
||||
}
|
||||
|
||||
args.out_dir.mkdir(parents=True, exist_ok=True)
|
||||
(args.out_dir / "live_inventory.json").write_text(
|
||||
json.dumps(inv_out, indent=2), encoding="utf-8"
|
||||
)
|
||||
(args.out_dir / "drift.json").write_text(
|
||||
json.dumps(drift, indent=2), encoding="utf-8"
|
||||
)
|
||||
print(f"Wrote {args.out_dir / 'live_inventory.json'}")
|
||||
print(f"Wrote {args.out_dir / 'drift.json'}")
|
||||
sys.exit(2 if duplicate_ips else 0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `scripts/it-ops/export-live-inventory-and-drift.sh`
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
# Live Proxmox guest inventory + drift vs config/ip-addresses.conf.
|
||||
# Usage: bash scripts/it-ops/export-live-inventory-and-drift.sh
|
||||
# Requires: SSH key root@SEED, python3 locally and on PVE.
|
||||
set -euo pipefail
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
# shellcheck source=/dev/null
|
||||
source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
|
||||
SEED="${SEED_HOST:-${PROXMOX_HOST_R630_01:-192.168.11.11}}"
|
||||
OUT_DIR="${OUT_DIR:-${PROJECT_ROOT}/reports/status}"
|
||||
TS="$(date +%Y%m%d_%H%M%S)"
|
||||
TMP="${TMPDIR:-/tmp}/live_inv_${TS}.json"
|
||||
PY="${SCRIPT_DIR}/lib/collect_inventory_remote.py"
|
||||
|
||||
mkdir -p "$OUT_DIR"
|
||||
|
||||
stub_unreachable() {
|
||||
python3 - <<PY
|
||||
import json, datetime
|
||||
print(json.dumps({
|
||||
"collected_at": datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ"),
|
||||
"error": "seed_unreachable",
|
||||
"guests": [],
|
||||
}, indent=2))
|
||||
PY
|
||||
}
|
||||
|
||||
if ! ping -c1 -W2 "$SEED" >/dev/null 2>&1; then
|
||||
stub_unreachable >"$TMP"
|
||||
else
|
||||
if ! ssh -o BatchMode=yes -o ConnectTimeout=15 -o StrictHostKeyChecking=no \
|
||||
"root@${SEED}" "python3 -" <"$PY" >"$TMP" 2>/dev/null; then
|
||||
stub_unreachable >"$TMP"
|
||||
fi
|
||||
fi
|
||||
|
||||
set +e
|
||||
python3 "${SCRIPT_DIR}/compute_ipam_drift.py" --live "$TMP" \
|
||||
--ip-conf "${PROJECT_ROOT}/config/ip-addresses.conf" --out-dir "$OUT_DIR"
|
||||
DRIFT_RC=$?
|
||||
set -e
|
||||
|
||||
cp -f "$OUT_DIR/live_inventory.json" "${OUT_DIR}/live_inventory_${TS}.json" 2>/dev/null || true
|
||||
cp -f "$OUT_DIR/drift.json" "${OUT_DIR}/drift_${TS}.json" 2>/dev/null || true
|
||||
rm -f "$TMP"
|
||||
echo "Latest: ${OUT_DIR}/live_inventory.json , ${OUT_DIR}/drift.json"
|
||||
# Exit 2 when duplicate_ips present (for CI).
|
||||
exit "${DRIFT_RC}"
|
||||
```
|
||||
|
||||
After creating files: `chmod +x scripts/it-ops/export-live-inventory-and-drift.sh scripts/it-ops/compute_ipam_drift.py`
|
||||
|
||||
---
|
||||
|
||||
## `.github/workflows/live-inventory-drift.yml`
|
||||
|
||||
```yaml
|
||||
name: Live inventory and IPAM drift
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
schedule:
|
||||
- cron: '25 6 * * 1'
|
||||
|
||||
jobs:
|
||||
drift:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Export live inventory (LAN optional)
|
||||
run: |
|
||||
set +e
|
||||
bash scripts/it-ops/export-live-inventory-and-drift.sh
|
||||
echo "exit=$?"
|
||||
continue-on-error: true
|
||||
- name: Upload artifacts
|
||||
uses: actions/upload-artifact@v4
|
||||
if: always()
|
||||
with:
|
||||
name: live-inventory-drift
|
||||
path: |
|
||||
reports/status/live_inventory.json
|
||||
reports/status/drift.json
|
||||
```
|
||||
|
||||
**Note:** On GitHub-hosted runners the collector usually writes `seed_unreachable`; use a **self-hosted LAN runner** for real data, or run the shell script on the operator workstation.
|
||||
|
||||
---
|
||||
|
||||
## `AGENTS.md` row (Quick pointers table)
|
||||
|
||||
Add:
|
||||
|
||||
`| IT live inventory + drift (LAN) | `bash scripts/it-ops/export-live-inventory-and-drift.sh` → `reports/status/live_inventory.json`, `drift.json` — see [docs/03-deployment/SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md](docs/03-deployment/SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md) |`
|
||||
|
||||
---
|
||||
|
||||
## `docs/MASTER_INDEX.md`
|
||||
|
||||
Add a row pointing to this deployment appendix and the updated spec.
|
||||
Reference in New Issue
Block a user