Add Sankofa consolidated hub operator tooling

This commit is contained in:
defiQUG
2026-04-13 21:41:14 -07:00
parent 49740f1a59
commit b7eebb87b3
42 changed files with 2635 additions and 14 deletions

View File

@@ -0,0 +1,96 @@
# Non-chain ecosystem — hyperscaler-style design and deployment
**Status:** Architecture / target operating model
**Last updated:** 2026-04-13
**Scope:** Everything **except** blockchain-adjacent guests and services (Besu validators and RPC lanes, Blockscout-style explorers, bridge relayers, Chain 138 deploy paths, token-aggregation **runtime** tied to chain RPC). Those stay on their own **chain plane** with chain-specific runbooks. This document is the **application and edge plane** for Sankofa, Phoenix, DBIS core, portals, NPM, identity, and supporting data.
---
## 1. What “ecosystem” means here
A coherent **platform**: operators and clients interact through a small number of **managed surfaces** (DNS, TLS, APIs, portals), backed by **clear boundaries** (identity, data, observability, change management). Hyperscalers do not run “one random VM per microsite”; they run **regional edge**, **shared app runtimes**, **managed data**, and **global control planes** with strict contracts.
Your non-chain ecosystem should **feel** like that: fewer hand-crafted snowflakes, more **repeatable cells** (LXC or VM patterns), **declared** upstreams, and **observable** health—not a flat list of unrelated CTs.
---
## 2. Hyperscaler concepts mapped to this program
| Hyperscaler idea | Plain language | This ecosystem (non-chain) |
|------------------|----------------|----------------------------|
| **Region** | Geography / failure domain | **LAN site** (e.g. VLAN 11 + Proxmox cluster) — one “region” today; multi-site is a later region pair. |
| **Availability zone** | Independent power/network within a region | **Proxmox nodes** (e.g. r630-01 vs r630-04) — place **stateless edge** and **burst** workloads across nodes; keep **tightly coupled** DB + app tiers co-located unless latency and HA analysis say otherwise. |
| **Edge / front door** | TLS termination, routing, WAF | **NPMplus** (and optional Cloudflare in front) — single place for certs, forced HTTPS, and upstream policy. |
| **API gateway / mesh ingress** | One front for many backends | **Phoenix API hub** (nginx Tier 1 today; optional BFF Tier 2) — `/graphql`, `/api`, consistent headers, rate limits, `TRUST_PROXY` alignment for `dbis_core`. |
| **Managed Kubernetes / App Service** | Standard runtime for web APIs | **LXC templates**: one pattern for “Node + systemd”, one for “nginx static only”, one for “Postgres only” — same packages, same hardening checklist. |
| **Identity (IdP)** | Central auth | **Keycloak** — realms, clients, MFA policy; portals are **clients**, not bespoke login servers. |
| **Managed database** | Durable state, backups, PITR | **Postgres** for Phoenix / portal data — backups, restore drills, connection limits documented. |
| **Service directory** | What runs where | **`ALL_VMIDS_ENDPOINTS.md`** + `config/ip-addresses.conf` + (when adopted) **hub env** — treat as **service catalog**, not tribal knowledge. |
| **Observability** | Metrics, logs, traces | **Per-cell**: node_exporter or similar where you standardize; **aggregator** (Grafana/Loki stack when you add it) — same pattern as “send logs to the regional pipeline.” |
| **Landing zone / policy** | Guardrails before workloads land | **`PROXMOX_OPS_APPLY`**, `PROXMOX_OPS_ALLOWED_VMIDS`, dry-run scripts, `proxmox-production-guard.sh` — “no mutation without contract,” similar to Azure Policy / SCP ideas at small scale. |
| **IaC / GitOps** | Desired state from repo | **This repo**: scripts + `config/` + runbooks; optional future **declarative** host config (e.g. cloud-init templates per role) so new CTs are **cloned from role**, not artisanal. |
---
## 3. Target cell types (non-chain)
Design the fleet from a **small menu** of cell types; anything that does not fit forces a design review.
1. **Edge-static cell** — nginx only; multiple `server_name` or `map $host`; static `root` per product line. Lowest RAM. Good for marketing, entity microsites (exported), status pages, and **SPAs that only talk to APIs** (no server-only NextAuth on that host). **IRU / marketplace discovery** often stays **dynamic** (SSR or browser app against `dbis_core`) until a deliberate static-export pipeline exists—do not assume static-first fits all catalog UX.
2. **Edge-SSR cell** — one Node process (or small cluster later) for NextAuth / server components; **one** cell per “SSR family,” not one per brand, where host-based routing suffices.
3. **API hub cell** — nginx (or future BFF) only; upstreams to Phoenix Apollo and `dbis_core` over LAN. **Prefer placement** on a node with headroom (see [SANKOFA_R630_01_CONSOLIDATION_AND_HUB_PLACEMENT_GOAL.md](../03-deployment/SANKOFA_R630_01_CONSOLIDATION_AND_HUB_PLACEMENT_GOAL.md)).
4. **Data cell** — Postgres (and optional read replica pattern later); no arbitrary co-install of app servers.
5. **Identity cell** — Keycloak; isolated upgrades and backup story.
6. **Operator / control** — NPM, IT read API, inventory jobs — same hardening and backup discipline as “regional tooling” accounts in public cloud.
**Anti-pattern:** one-off CTs that mix “random nginx + cron + manual edits” without a role name in the catalog.
---
## 4. Practices to adopt (hyperscaler-aligned)
- **Single edge story:** NPM (and DNS) as the **only** public entry contract; internal IPs are implementation details.
- **Hub-and-spoke APIs:** clients talk to **one** Phoenix-facing origin where possible; backends stay private on LAN. **CORS** allowlists must include **every browser origin** that calls the API (portal, admin, studio, marketplace SPAs)—not only hostnames served by the static web hub.
- **Blast radius:** consolidating statics **reduces** attack surface and cert sprawl; moving hubs off overloaded nodes **reduces** correlated failure under load.
- **Versioned change:** runbooks + script `--dry-run` first; VMID allowlists for mutations.
- **Observability contract:** every cell exposes **`/health`** (or documented equivalent) and logs to a **single** retention policy.
- **Naming:** FQDN → owner → cell type in docs (already directionally in `FQDN_EXPECTED_CONTENT.md` / E2E lists).
---
## 5. Explicit exclusion (blockchain plane)
Do **not** fold these into the “hyperscaler-style non-chain cell” menu without a **dedicated** chain runbook merge:
- Besu validators, sentries, core/public RPC CTs
- Blockscout / explorer stacks
- CCIP / relay / XDC-zero **chain** workers
- Chain 138 deploy RPC paths and token-aggregation **as chain execution** (writes, signers, keeper paths)
**Boundary nuance:** a **read-only** token-aggregation or quote service that only calls **public** RPC may be operated like an **edge-adjacent** app cell; anything holding **keys**, executing **writes**, or coupling to **validator** timing stays on the **chain plane**.
They remain a **separate plane** with different SLOs, upgrade windows, and safety rules. The **non-chain** ecosystem **integrates** with them only via **documented APIs and RPC URLs**, not by sharing generic web cells.
---
## 6. Related documents
- [SANKOFA_PHOENIX_CONSOLIDATED_FRONTEND_AND_API.md](./SANKOFA_PHOENIX_CONSOLIDATED_FRONTEND_AND_API.md)
- [SANKOFA_R630_01_CONSOLIDATION_AND_HUB_PLACEMENT_GOAL.md](../03-deployment/SANKOFA_R630_01_CONSOLIDATION_AND_HUB_PLACEMENT_GOAL.md)
- [SANKOFA_PHOENIX_CANONICAL_BOUNDARIES_AND_TAXONOMY.md](./SANKOFA_PHOENIX_CANONICAL_BOUNDARIES_AND_TAXONOMY.md)
- [PROXMOX_LOAD_BALANCING_RUNBOOK.md](../04-configuration/PROXMOX_LOAD_BALANCING_RUNBOOK.md)
- [PUBLIC_SECTOR_TENANCY_MARKETPLACE_AND_DEPLOYMENT_BASELINE.md](./PUBLIC_SECTOR_TENANCY_MARKETPLACE_AND_DEPLOYMENT_BASELINE.md) (tenancy and catalog vs marketing)
- [NON_CHAIN_ECOSYSTEM_PLAN_REVIEW_AND_GAPS.md](./NON_CHAIN_ECOSYSTEM_PLAN_REVIEW_AND_GAPS.md) (inconsistencies, P0/P1 backlog, NPM/WebSocket/`TRUST_PROXY`)
---
## 7. Adoption (incremental)
You do not need a “big bang.” Order of operations:
1. Name current CTs against the **cell types** in section 3; mark gaps.
2. Stand up **one** edge-static or API-hub cell on a **nonr630-01** node as a template.
3. Migrate **lowest-risk** FQDNs (static marketing) first; then API hub; then SSR if needed.
4. Retire redundant CTs after rollback window; update inventory and `get_host_for_vmid`.
Fill a short decision log in [SANKOFA_R630_01_CONSOLIDATION_AND_HUB_PLACEMENT_GOAL.md](../03-deployment/SANKOFA_R630_01_CONSOLIDATION_AND_HUB_PLACEMENT_GOAL.md) as you execute.

View File

@@ -0,0 +1,166 @@
# Non-chain ecosystem plan — detailed review, gaps, and inconsistencies
**Purpose:** Critical review of the consolidated Phoenix / web hub / r630-01 offload / hyperscaler-style documents and scripts as of **2026-04-13**. Use this as a **remediation backlog**; update linked docs when items close.
**Scope reviewed:**
[NON_CHAIN_ECOSYSTEM_HYPERSCALER_STYLE_MODEL.md](./NON_CHAIN_ECOSYSTEM_HYPERSCALER_STYLE_MODEL.md),
[SANKOFA_PHOENIX_CONSOLIDATED_FRONTEND_AND_API.md](./SANKOFA_PHOENIX_CONSOLIDATED_FRONTEND_AND_API.md),
[SANKOFA_R630_01_CONSOLIDATION_AND_HUB_PLACEMENT_GOAL.md](../03-deployment/SANKOFA_R630_01_CONSOLIDATION_AND_HUB_PLACEMENT_GOAL.md),
`scripts/deployment/install-sankofa-api-hub-nginx-on-pve.sh`,
`scripts/verify/verify-sankofa-consolidated-hub-lan.sh`,
`config/ip-addresses.conf` hub defaults,
`scripts/lib/load-project-env.sh` `get_host_for_vmid`.
---
## 1. Cross-document consistency
| Topic | Hyper-scaler model | Consolidated hub doc | r630-01 goal doc | Verdict |
|-------|---------------------|----------------------|------------------|---------|
| Chain vs non-chain boundary | Explicit exclusion list | Matches | Matches | **Aligned** |
| API hub Tier 1 | Gateway row | Tier 1 nginx | Phase 2 move hub off 7800 | **Aligned**; live state (hub on **7800**) is **interim** per r630 doc |
| Web hub | Edge-static / SSR cells | Options A/B/C | Phase 1 | **Aligned** |
| Load relief | Fewer cells + placement | “Moving hubs” note | Non-goal: nginx CPU on same node | **Aligned** |
| NPM | Single edge story | Fewer upstream IPs possible | NPM repoint | **Partial gap:** NPM often still **one row per FQDN**; “fewer rows” is **upstream IP convergence**, not necessarily fewer proxy host records (see §4.1). |
---
## 2. Technical gaps (must fix in implementation, not only docs)
### 2.1 `TRUST_PROXY` and client IP for `dbis_core` (high)
**Issue:** Tier-1 nginx forwards `X-Forwarded-For` / `X-Real-IP`, but `dbis_core` IRU rate limits and abuse logic require **`TRUST_PROXY=1`** (and correct **trusted hop**: NPM → hub → app). If `dbis_core` does not trust the hub IP, it sees **only the hubs** LAN address for all users.
**Remediation:** Document in cutover checklist: set `TRUST_PROXY=1` on `dbis_core` **and** restrict trusted proxy list to **NPM** and **API hub** subnets/IPs. Add integration test: rate limit key changes when `X-Forwarded-For` varies.
**Doc fix:** Already mentioned in consolidated §3.3; add explicit **“before NPM → hub cutover”** gate in [SANKOFA_PHOENIX_CONSOLIDATED_FRONTEND_AND_API.md](./SANKOFA_PHOENIX_CONSOLIDATED_FRONTEND_AND_API.md) operator checklist.
**Repo (2026-04-13):** `dbis_core` supports **`TRUST_PROXY_HOPS`** (110) so Express `trust proxy` matches NPM-only vs NPM→hub→app; see `dbis_core/.env.example`. IP allowlisting for proxies remains an ops/network task.
### 2.2 GraphQL WebSocket through NPM + hub (high)
**Issue:** `graphql-ws` requires **Upgrade** end-to-end. NPM custom locations must **allow WebSockets**; hub nginx already sets `Upgrade` / `Connection` to Apollo. If NPM strips or times out upgrades, subscriptions break **silently** for some clients.
**Remediation:** Add explicit E2E: `wscat` or Apollo subscription smoke **through public URL** after any NPM port/path change. Document NPM “Websockets support” toggle if applicable.
**Repo:** `scripts/verify/smoke-phoenix-graphql-wss-public.sh` (curl **HTTP 101** upgrade on `wss://…/graphql-ws`; use `PHOENIX_WSS_INCLUDE_LAN=1` for hub `:8080`).
### 2.3 CORS and browser origins (medium)
**Issue:** Consolidated doc says CORS allowlist “web hub FQDNs only.” Browsers calling **`https://phoenix.sankofa.nexus/graphql`** from **`https://portal.sankofa.nexus`** are **cross-origin**; allowlist must include **portal**, **admin**, **studio**, and any SPA origins that call the API—not only the web hub static hostnames.
**Remediation:** Replace wording with **“all documented browser origins that invoke Phoenix or `dbis_core` from the browser.”** Cross-ref [SANKOFA_MARKETPLACE_SURFACES.md](../03-deployment/SANKOFA_MARKETPLACE_SURFACES.md) for IRU public routes.
### 2.4 Health check path in operator checklist (low — doc error)
**Issue:** Cutover checklist suggested `GET /api/v1/health`; `dbis_core` exposes **`/health`** and **`/v1/health`**, not under `/api/v1/`.
**Remediation:** Checklist corrected in consolidated doc to **`/health` via hub** (`/api/` prefix does not apply to root health).
### 2.5 Dual public paths (4000 vs 8080) during migration (medium)
**Issue:** While both ports are open, **clients can bypass** hub policies (CORS, future WAF) by targeting **:4000** directly if firewalled only at NPM. Hyperscaler model prefers **one** ingress.
**Remediation:** After NPM cutover to **8080**, **firewall** Phoenix **:4000** to **localhost + hub IP only** on CT 7800, or bind Apollo to **127.0.0.1** only (application config change—needs Phoenix runbook).
**Repo (2026-04-13):** `scripts/deployment/ensure-sankofa-phoenix-apollo-bind-loopback-7800.sh` sets **`HOST=127.0.0.1`** for Fastify on **7800** when hub upstream is **127.0.0.1:4000**.
### 2.6 Stock `nginx` package disabled on 7800 (medium)
**Issue:** Installer `systemctl disable nginx` removes the default **Debian `nginx.service`**. If operators expect `nginx` for ad-hoc static files on that CT, they lose it. Today intentional for **dedicated** `sankofa-phoenix-api-hub.service`.
**Remediation:** Document on CT 7800: **only** `sankofa-phoenix-api-hub` serves nginx; do not re-enable stock unit without conflict check.
### 2.7 `proxy_pass` URI and trailing slashes (low)
**Issue:** `location /api/` + `proxy_pass http://dbis_core_rest;` preserves URI prefix—correct for `dbis_core` mounted at `/api/v1`. If any route is mounted at root on upstream, mismatch possible.
**Remediation:** Keep; add note: new BFF routes must use **distinct prefixes** (`/bff/`) to avoid colliding with Apollo or `dbis_core`.
---
## 3. Inventory and automation gaps
### 3.1 `get_host_for_vmid` omits explicit Sankofa VMIDs (medium)
**Issue:** Sankofa stack VMIDs **78007806** fell through to **default** `*)` → r630-01. Behavior matched inventory but was **implicit**—easy to break if default changes.
**Remediation:** Add explicit `7800|7801|7802|7803|7806` case arm to `get_host_for_vmid` with comment “Sankofa Phoenix stack — verify with `pct list` when migrating.”
**Repo (2026-04-13):** Explicit **`78007806`** arm on r630-01 in `scripts/lib/load-project-env.sh` (includes gov portals 7804 and studio 7805).
### 3.2 Fleet scripts and hub env vars (medium)
**Issue:** `IP_SANKOFA_PHOENIX_API_HUB` / `SANKOFA_PHOENIX_API_HUB_PORT` exist in `ip-addresses.conf`, but **`update-npmplus-proxy-hosts-api.sh`** (and friends) may still **hardcode** or use only `IP_SANKOFA_PHOENIX_API` + `4000`.
**Remediation:** Grep fleet scripts; add optional branch: when `SANKOFA_PHOENIX_API_HUB_PORT=8080` and flag file or env `SANKOFA_NPM_USE_API_HUB=1`, emit upstream **:8080**. Until then, document **manual** NPM row for hub cutover.
**Repo (2026-04-13):** `update-npmplus-proxy-hosts-api.sh` uses **`SANKOFA_NPM_PHOENIX_PORT`** (default `SANKOFA_PHOENIX_API_PORT`) and **`IP_SANKOFA_NPM_PHOENIX_API`** for `phoenix.sankofa.nexus` / `www.phoenix`. See [SANKOFA_API_HUB_NPM_CUTOVER_AND_POST_CUTOVER_RUNBOOK.md](../03-deployment/SANKOFA_API_HUB_NPM_CUTOVER_AND_POST_CUTOVER_RUNBOOK.md).
### 3.3 `PROXMOX_HOST` for install script (low)
**Issue:** `install-sankofa-api-hub-nginx-on-pve.sh` defaults `PROXMOX_HOST` to r630-01. For hub on **r630-04**, operator must export `PROXMOX_HOST`—easy to miss.
**Remediation:** Script header already mentions; add **one-line echo** of resolved host at start of `--apply` (done partially); extend dry-run to print `get_host_for_vmid` suggestion when `SANKOFA_API_HUB_TARGET_NODE` set (future env).
**Repo (2026-04-13):** Header states **PROXMOX_HOST = PVE node**; dry-run prints **`get_host_for_vmid`** when `load-project-env.sh` is sourced.
---
## 4. Hyperscaler model — internal tensions
### 4.1 “Single edge” vs NPM reality
**Tension:** Model says NPM is the **only** public entry contract. Technically true for TLS, but **NPM** often implements **one proxy host per FQDN**. Hyperscalers use **one ALB** with many rules. **Semantic alignment:** treat NPM as **ALB-equivalent**; “single edge” means **single trust and cert pipeline**, not literally one row.
### 4.2 Static-first IRU / marketplace
**Tension:** [SANKOFA_PHOENIX_CONSOLIDATED_FRONTEND_AND_API.md](./SANKOFA_PHOENIX_CONSOLIDATED_FRONTEND_AND_API.md) suggests static export for IRU/marketplace **where compatible**. Today much of partner discovery is **dynamic** (`dbis_core` + Phoenix marketplace). **Over-optimistic** without a “dynamic shell + CDN” alternative.
**Remediation:** In NON_CHAIN doc §3, clarify **Edge-static** is for **marketing and post-login SPAs that only call APIs**; **IRU public catalog** may remain **Edge-SSR** or **API-driven SPA** until a static export pipeline exists.
### 4.3 Token-aggregation and “chain plane” boundary
**Tension:** [NON_CHAIN_ECOSYSTEM_HYPERSCALER_STYLE_MODEL.md](./NON_CHAIN_ECOSYSTEM_HYPERSCALER_STYLE_MODEL.md) excludes **token-aggregation runtime tied to chain RPC**. Many deployments colocate **token-aggregation** with **explorer** or **info** nginx—**hybrid**. Risk: teams mis-classify a service and consolidate wrong CT.
**Remediation:** Add one line: **“Token-aggregation API that only proxies to public RPC may be treated as edge-adjacent; workers that hold keys or execute chain writes stay chain-plane.”**
### 4.4 Postgres coupling
**Tension:** r630 doc says stack is **tightly coupled** for latency. Hyperscaler “managed DB” often implies **network separation**. Acceptable as **single-AZ** pattern; document **when** splitting Phoenix API from **7803** Postgres requires **read replicas** or **connection pooler** (PgBouncer) first.
---
## 5. Missing runbook sections (add over time)
| Missing item | Why it matters |
|--------------|----------------|
| **Backup/restore** before hub install and before `pct migrate` | Hub nginx does not replace backup discipline for Postgres / Keycloak. |
| **Keycloak redirect URIs** when origins move to web hub IP/hostnames | OIDC failures post-cutover. |
| **Certificate issuance** when many FQDNs share one upstream IP | NPM still requests certs per host; rate limits / ACME. |
| **Rollback:** restore NPM upstream + `systemctl start nginx` on 7800? | Dual-stack rollback path. |
| **SLO / error budget** | Hyperscaler practice; currently implicit. |
| **CI for `nginx -t`** on example configs | GitHub Actions: `.github/workflows/validate-sankofa-nginx-examples.yml` (Gitea: mirror or add equivalent workflow). |
---
## 6. Document maintenance items (quick fixes)
1. **Consolidated doc §5** — ensure artifact table always lists **`install-sankofa-api-hub-nginx-on-pve.sh`** and **`verify-sankofa-consolidated-hub-lan.sh`** next to other operator scripts.
2. **Consolidated §3.2 Tier 1** — prefer **LAN upstream to `dbis_core`** as the default narrative (colocated `127.0.0.1:3000` is the special case). **Clarified** in repo.
3. **Decision log** — “Web hub pattern” vs filled API tier: use **TBD / interim** until a web hub is chosen. **Updated** in repo.
4. **This file** linked from [NON_CHAIN_ECOSYSTEM_HYPERSCALER_STYLE_MODEL.md](./NON_CHAIN_ECOSYSTEM_HYPERSCALER_STYLE_MODEL.md) §6 and [MASTER_INDEX.md](../MASTER_INDEX.md).
---
## 7. Prioritized remediation backlog
| Priority | Item | Owner |
|----------|------|--------|
| P0 | Verify `TRUST_PROXY` + **`TRUST_PROXY_HOPS`** + production trust boundaries for `dbis_core` when using hub | **LAN:** `TRUST_PROXY=1` on **10150/10151** via `ensure-dbis-api-trust-proxy-on-ct.sh`; validate rate-limit keys from two public IPs |
| P0 | WebSocket E2E through NPM after hub port change | **Done:** `smoke-phoenix-graphql-wss-public.sh`**HTTP 101**; `pnpm run verify:phoenix-graphql-ws-subscription`**connection_ack** (remove unused `@fastify/websocket` on 7800 if RSV1; see runbook). |
| P1 | CORS / allowed origins list includes all browser callers | App + API |
| P1 | Firewall or bind Apollo to localhost after NPM → 8080 | **Done:** `ensure-sankofa-phoenix-apollo-bind-loopback-7800.sh` on **7800** (or use firewall plan if HOST cannot be set) |
| P2 | Explicit `get_host_for_vmid` entries for 78007806 | **Done** in `load-project-env.sh` — re-verify on migrate |
| P2 | NPM fleet **`SANKOFA_NPM_PHOENIX_PORT`** / **`IP_SANKOFA_NPM_PHOENIX_API`** | **Done** in `update-npmplus-proxy-hosts-api.sh` |
| P3 | Backup/rollback runbook sections | [SANKOFA_API_HUB_NPM_CUTOVER_AND_POST_CUTOVER_RUNBOOK.md](../03-deployment/SANKOFA_API_HUB_NPM_CUTOVER_AND_POST_CUTOVER_RUNBOOK.md) §0 / §5 |
| P3 | Clarify static-first vs dynamic IRU in NON_CHAIN §3 | Docs |
---
## 8. Conclusion
The plan is **directionally sound**: chain plane separation, cell typing, phased offload from r630-01, and Tier-1 API hub are **consistent**. The largest **gaps** are **operational truth** items (client IP trust, WebSockets, CORS wording, dual-port exposure) and **automation drift** (NPM scripts vs new env vars, implicit VMID→host). Closing **P0P1** before wide NPM cutover matches how hyperscalers treat **ingress migrations**: prove identity and transport contracts first, then shift traffic.

View File

@@ -0,0 +1,158 @@
# Sankofa Phoenix — consolidated non-chain frontend and API hub
**Status:** Architecture proposal (resource conservation)
**Last updated:** 2026-04-13
**LAN status (operator):** Tier-1 API hub **nginx on VMID 7800** listening **`http://192.168.11.50:8080`** (`sankofa-phoenix-api-hub.service`). Apollo (Fastify) binds **`127.0.0.1:4000`** only (`HOST=127.0.0.1` in `/opt/sankofa-api/.env`; apply: `scripts/deployment/ensure-sankofa-phoenix-apollo-bind-loopback-7800.sh`). NPM → **:8080** + WebSocket upgrades is live for `phoenix.sankofa.nexus` (fleet 2026-04-13). Install hub: `scripts/deployment/install-sankofa-api-hub-nginx-on-pve.sh` with `PROXMOX_OPS_APPLY=1` + `PROXMOX_OPS_ALLOWED_VMIDS=7800`. Readiness: `scripts/verify/verify-sankofa-consolidated-hub-lan.sh`, hub GraphQL `scripts/verify/smoke-phoenix-api-hub-lan.sh`, WebSocket upgrade `scripts/verify/smoke-phoenix-graphql-wss-public.sh` (`pnpm run verify:phoenix-graphql-wss`), graphql-ws handshake `pnpm run verify:phoenix-graphql-ws-subscription`, hub `/graphql-ws` headers `scripts/deployment/ensure-sankofa-phoenix-api-hub-graphql-ws-proxy-headers-7800.sh`.
**r630-01 load goal:** consolidating frontends and **moving hub LXCs** to quieter nodes is what reduces guest count and hypervisor pressure — see [SANKOFA_R630_01_CONSOLIDATION_AND_HUB_PLACEMENT_GOAL.md](../03-deployment/SANKOFA_R630_01_CONSOLIDATION_AND_HUB_PLACEMENT_GOAL.md).
**Ecosystem shape (non-chain, hyperscaler-style):** [NON_CHAIN_ECOSYSTEM_HYPERSCALER_STYLE_MODEL.md](./NON_CHAIN_ECOSYSTEM_HYPERSCALER_STYLE_MODEL.md) (cell types, edge vs chain plane).
**Scope:** Non-blockchain Sankofa / Phoenix surfaces only. **Out of scope:** Chain 138 explorer, Besu/RPC, CCIP/relayers, token-aggregation compute — keep those on dedicated LXCs/VMs per existing runbooks.
---
## 1. Problem
Today, multiple LXCs/VMIDs often run **one primary workload each** (portal, corporate web, Phoenix API, DBIS API, gov dev shells, etc.). Each Node or Next process carries **base RAM** (V8 heap, file watchers in dev, separate copies of dependencies). Nginx-only static sites are cheap; **many separate Node servers are not**.
This document defines a **consolidated runtime** that:
1. Puts **all non-chain web frontends** behind **one LAN endpoint** (one LXC or one Docker host — your choice), using **static-first** or **one Node process** where SSR is required.
2. Puts **all Phoenix-facing backend traffic** behind **one logical API** (one public origin and port): GraphQL (current Phoenix), REST/BFF (`dbis_core` and future middleware), health, and webhooks.
Canonical surface taxonomy remains [SANKOFA_PHOENIX_CANONICAL_BOUNDARIES_AND_TAXONOMY.md](./SANKOFA_PHOENIX_CANONICAL_BOUNDARIES_AND_TAXONOMY.md). Consolidation changes **packaging**, not the names of visitor vs client vs operator paths.
---
## 2. Single “web hub” LXC (frontends)
### 2.1 Option A — Static-first (lowest RAM)
**When:** Marketing pages, IRU/marketplace **after** static export, simple entity microsites, post-login SPAs that call the API hub only.
- Build: `next build` with `output: 'export'` **where compatible** (no server-only APIs on those routes).
- Serve: **nginx** with one `server` per FQDN (`server_name`) or one server + `map $host $site_root` → different `root` directories under `/var/www/...`.
- **NPM:** All affected FQDNs point to the **same** upstream `http://<WEB_HUB_IP>:80`.
**Tradeoff:** NextAuth / OIDC callback flows and server components need either **client-only OIDC** (PKCE) against Keycloak or a **small** SSR slice (see option B).
### 2.2 Option B — One Node process for all SSR Next apps (moderate RAM)
**When:** Portal (`portal.sankofa.nexus`), admin, or any app that must keep `getServerSideProps`, NextAuth, or middleware.
- **Monorepo** (e.g. Turborepo/Nx): multiple Next “apps” merged into **one deployable** using:
- **Next multi-zone** (primary + mounted sub-apps), or
- **Single Next 15 app** with `middleware.ts` rewriting by `Host`, or
- **Single custom server** (less ideal) proxying to child apps — avoid unless necessary.
**Outcome:** One `node` process (or one `standalone` output + one PID supervisor) on **one port** (e.g. 3000). Nginx in front optional (TLS termination usually at NPM).
### 2.3 Option C — Hybrid (practical migration)
- **nginx:** static corporate apex, static entity sites, docs mirrors.
- **One Node:** portal + Phoenix “shell” that must stay dynamic.
Still **fewer** LXCs than “one LXC per microsite.”
### 2.4 What stays out of this box
- Blockscout / explorer stacks
- `info.defi-oracle.io`, MEV GUI, relay health — separate nginx LXCs as today unless you explicitly merge **static** mirrors only
- Keycloak — **keep separate** (identity is its own security domain)
---
## 3. Single consolidated API (Phoenix hub)
### 3.1 Responsibilities
| Path family | Today (typical) | Hub role |
|-------------|-----------------|----------|
| `/graphql`, `/graphql-ws` | Phoenix VMID 7800 :4000 | **Reverse proxy** to existing Apollo until merged in code |
| `/api/v1/*`, `/api-docs` | `dbis_core` (e.g. :3000) | **Reverse proxy** mount |
| `/health` | Multiple | **Aggregate** (optional): hub returns 200 only if subgraphs pass |
| Future BFF | N/A | **Implement in hub** (session, composition, rate limits) |
**Naming:** Introduce an internal service name e.g. `sankofa-phoenix-hub-api`. Public FQDN can remain `phoenix.sankofa.nexus` or split to `api.phoenix.sankofa.nexus` for clarity; NPM decides.
### 3.2 Implementation tiers (phased)
**Tier 1 — Thin hub (fastest, lowest risk)**
One process: **nginx** or **Caddy**. **Typical production pattern:** hub on its own LXC or same CT as Apollo — `proxy_pass` Phoenix to **`127.0.0.1:4000`** when colocated, and `dbis_core` to **`IP_DBIS_API:3000`** (LAN) as in `install-sankofa-api-hub-nginx-on-pve.sh`. **Single public port** (e.g. 443 behind NPM → **8080** on the hub). Before NPM sends public traffic to the hub, validate **`TRUST_PROXY`** and trusted proxy hops for `dbis_core` (see [NON_CHAIN_ECOSYSTEM_PLAN_REVIEW_AND_GAPS.md](./NON_CHAIN_ECOSYSTEM_PLAN_REVIEW_AND_GAPS.md) §2.1).
**Tier 2 — Application hub**
Single **Node** (Fastify/Express) app: validates JWT once, applies rate limits, `proxy` to subgraphs, adds **BFF** routes (`/bff/portal/...`).
**Tier 3 — Monolith (long-term)**
Merge routers and schema into one codebase — only after boundaries and ownership are clear.
### 3.3 Middleware cross-cutting
Centralize in the hub:
- **CORS** allowlist (origins = web hub FQDNs only)
- **Rate limiting** (especially IRU public POST — align with `dbis_core` **`TRUST_PROXY=1`** and a **trusted proxy list** that includes NPM and this hubs LAN IP, or rate limits see only the hub)
- **Request ID** propagation
- **mTLS** or IP allowlist for operator-only routes (optional)
---
## 4. NPM and inventory
After cutover:
- **Fewer distinct upstream IPs** in NPM (many FQDNs can point at the **same** `IP:port`); NPM may still use **one proxy host record per FQDN** for TLS—equivalent to one ALB with many listener rules, not literally one row total. Host-based routing then lives in **web hub** nginx (`server_name` / `map`) or in **Next** `middleware.ts`.
- Update [ALL_VMIDS_ENDPOINTS.md](../04-configuration/ALL_VMIDS_ENDPOINTS.md) and `get_host_for_vmid` in `scripts/lib/load-project-env.sh` when VMIDs are **retired** or **replaced** by hub VMIDs.
- **`config/ip-addresses.conf`** defines optional hub variables that **default to the current discrete CT IPs** (`IP_SANKOFA_WEB_HUB` → portal IP, `IP_SANKOFA_PHOENIX_API_HUB` → Phoenix API IP). Override in `.env` when hub LXCs exist.
---
## 5. Concrete file references in this repo
| Artifact | Purpose |
|----------|---------|
| [config/nginx/sankofa-non-chain-frontends.example.conf](../../config/nginx/sankofa-non-chain-frontends.example.conf) | Example **host → static root** nginx for web hub |
| [config/nginx/sankofa-phoenix-api-hub.example.conf](../../config/nginx/sankofa-phoenix-api-hub.example.conf) | Example **path → upstream** for API hub (Tier 1); tune `upstream` to LAN or `127.0.0.1` when colocated |
| [config/nginx/sankofa-hub-main.example.conf](../../config/nginx/sankofa-hub-main.example.conf) | Top-level `nginx.conf` for web hub CT (`-c` for systemd) |
| [config/nginx/sankofa-api-hub-main.example.conf](../../config/nginx/sankofa-api-hub-main.example.conf) | Top-level `nginx.conf` for API hub CT |
| [config/systemd/sankofa-non-chain-web-hub-nginx.service.example](../../config/systemd/sankofa-non-chain-web-hub-nginx.service.example) | systemd unit for web hub nginx |
| [config/systemd/sankofa-phoenix-api-hub-nginx.service.example](../../config/systemd/sankofa-phoenix-api-hub-nginx.service.example) | systemd unit for API hub nginx |
| [config/compose/sankofa-consolidated-runtime.example.yml](../../config/compose/sankofa-consolidated-runtime.example.yml) | Optional Docker Compose sketch (API hub container only) |
| [scripts/verify/check-sankofa-consolidated-nginx-examples.sh](../../scripts/verify/check-sankofa-consolidated-nginx-examples.sh) | **`nginx -t`** on example snippets (host `nginx` or **Docker** fallback) |
| [scripts/deployment/plan-sankofa-consolidated-hub-cutover.sh](../../scripts/deployment/plan-sankofa-consolidated-hub-cutover.sh) | Read-only cutover reminder + resolved env from `load-project-env.sh` |
| [scripts/deployment/install-sankofa-api-hub-nginx-on-pve.sh](../../scripts/deployment/install-sankofa-api-hub-nginx-on-pve.sh) | Tier-1 hub install on CT (`--dry-run` / `--apply` + `PROXMOX_OPS_*`) |
| [scripts/verify/verify-sankofa-consolidated-hub-lan.sh](../../scripts/verify/verify-sankofa-consolidated-hub-lan.sh) | Read-only LAN smoke (Phoenix, portal, dbis `/health`, Keycloak realm) |
---
## 6. Operator cutover checklist (complete in order)
1. Run `bash scripts/verify/check-sankofa-consolidated-nginx-examples.sh` (CI or laptop).
2. Provision **one** non-chain web hub LXC and/or **one** API hub LXC (or colocate nginx on an existing CT — document the choice).
3. Copy and edit nginx snippets from `config/nginx/` into `/etc/sankofa-web-hub/` and `/etc/sankofa-phoenix-api-hub/` per systemd examples; install **systemd** units from `config/systemd/*.example` (drop `.example`, adjust paths).
4. Set **`.env`** overrides: `IP_SANKOFA_WEB_HUB`, `SANKOFA_WEB_HUB_PORT`, `IP_SANKOFA_PHOENIX_API_HUB`, `SANKOFA_PHOENIX_API_HUB_PORT` (see `plan-sankofa-consolidated-hub-cutover.sh` output after `source scripts/lib/load-project-env.sh`).
5. **Dry-run** NPM upstream changes; then apply during a maintenance window. Confirm **WebSocket** (GraphQL subscriptions) through NPM if clients use `graphql-ws`.
6. Smoke: `curl -fsS http://<API_HUB>:<PORT>/health`, GraphQL POST to `/graphql`, **`dbis_core`** health via hub as **`GET /api-docs`** or **`GET /health`** on upstream `:3000` through `/api/` only if mounted there — simplest: `curl` **`http://<hub>:<port>/api-docs`** (proxied) per [NON_CHAIN_ECOSYSTEM_PLAN_REVIEW_AND_GAPS.md](./NON_CHAIN_ECOSYSTEM_PLAN_REVIEW_AND_GAPS.md) §2.4.
7. Update inventory docs and VMID table; decommission retired CTs only after rollback window. Optionally **bind Apollo to 127.0.0.1:4000** or firewall **:4000** from LAN once NPM uses hub only ([NON_CHAIN_ECOSYSTEM_PLAN_REVIEW_AND_GAPS.md](./NON_CHAIN_ECOSYSTEM_PLAN_REVIEW_AND_GAPS.md) §2.5).
---
## 7. Related docs
- [SANKOFA_PHOENIX_CANONICAL_BOUNDARIES_AND_TAXONOMY.md](./SANKOFA_PHOENIX_CANONICAL_BOUNDARIES_AND_TAXONOMY.md)
- [SANKOFA_MARKETPLACE_SURFACES.md](../03-deployment/SANKOFA_MARKETPLACE_SURFACES.md)
- [ENTITY_INSTITUTIONS_WEB_PORTAL_COMPLETION.md](../03-deployment/ENTITY_INSTITUTIONS_WEB_PORTAL_COMPLETION.md)
- [SERVICE_DESCRIPTIONS.md](./SERVICE_DESCRIPTIONS.md)
- [NON_CHAIN_ECOSYSTEM_PLAN_REVIEW_AND_GAPS.md](./NON_CHAIN_ECOSYSTEM_PLAN_REVIEW_AND_GAPS.md) (gaps, inconsistencies, P0/P1 backlog)
---
## 8. Decision log (fill when adopted)
| Decision | Choice | Date |
|----------|--------|------|
| Web hub pattern | **TBD** (interim: discrete CTs; target: A / B / C) | |
| API hub Tier | **1** (nginx on VMID 7800, LAN 2026-04-13) | 2026-04-13 |
| Public API hostname | phoenix.sankofa.nexus (NPM → **8080** hub; Apollo **127.0.0.1:4000**) | 2026-04-13 |
| Retired VMIDs | none | |

View File

@@ -1,10 +1,18 @@
# Sankofa Services - Service Descriptions
**Last Updated:** 2026-03-25
**Last Updated:** 2026-04-13
**Status:** Active Documentation
---
## Consolidated runtime (optional)
To reduce LXC count for **non-chain** web and to expose **one** Phoenix-facing API origin (GraphQL + `dbis_core` REST behind path routes), see [SANKOFA_PHOENIX_CONSOLIDATED_FRONTEND_AND_API.md](./SANKOFA_PHOENIX_CONSOLIDATED_FRONTEND_AND_API.md). `config/ip-addresses.conf` adds `IP_SANKOFA_WEB_HUB` and `IP_SANKOFA_PHOENIX_API_HUB` (defaulting to todays portal and Phoenix API IPs until you set hub LXCs in `.env`). Blockchain-adjacent stacks (explorer, RPC, relayers) stay **out** of this consolidation.
For **how** the non-chain fleet should be designed (edge cells, API hub, IdP, data) in hyperscaler-style terms—**excluding** the blockchain plane—see [NON_CHAIN_ECOSYSTEM_HYPERSCALER_STYLE_MODEL.md](./NON_CHAIN_ECOSYSTEM_HYPERSCALER_STYLE_MODEL.md).
---
## Brand and Product Relationship
### Company and Product Analogy
@@ -41,8 +49,8 @@ This document describes the purpose and function of each service in the Sankofa
- **Purpose:** Cloud infrastructure management portal (API service)
- **VMID:** 7800
- **IP:** 192.168.11.50
- **Port:** 4000
- **External Access:** https://phoenix.sankofa.nexus, https://www.phoenix.sankofa.nexus
- **Port:** **4000** (Apollo direct) and **`8080`** (optional Tier-1 **API hub** nginx: `/graphql` → 4000, `/api``dbis_core` on `IP_DBIS_API:3000`)
- **External Access:** https://phoenix.sankofa.nexus, https://www.phoenix.sankofa.nexus (NPM upstream may stay **4000** until you cut over to **8080**)
**Details:**
- GraphQL API service for Phoenix cloud platform