diff --git a/docs/03-deployment/PUBLIC_SECTOR_LIVE_DEPLOYMENT_CHECKLIST.md b/docs/03-deployment/PUBLIC_SECTOR_LIVE_DEPLOYMENT_CHECKLIST.md new file mode 100644 index 0000000..3e30ff7 --- /dev/null +++ b/docs/03-deployment/PUBLIC_SECTOR_LIVE_DEPLOYMENT_CHECKLIST.md @@ -0,0 +1,117 @@ +# Public sector live deployment checklist (Complete Credential, SMOA, Phoenix) + +**Last updated:** 2026-03-23 +**Related:** [PUBLIC_SECTOR_TENANCY_MARKETPLACE_AND_DEPLOYMENT_BASELINE.md](../02-architecture/PUBLIC_SECTOR_TENANCY_MARKETPLACE_AND_DEPLOYMENT_BASELINE.md), [COMPLETE_CREDENTIAL_EIDAS_PROGRAM_REPOS.md](../11-references/COMPLETE_CREDENTIAL_EIDAS_PROGRAM_REPOS.md), [DEPLOY_CONFIRM_AND_FULL_E2E_RUNBOOK.md](../00-meta/DEPLOY_CONFIRM_AND_FULL_E2E_RUNBOOK.md), [`config/public-sector-program-manifest.json`](../../config/public-sector-program-manifest.json) + +This checklist tracks **proxmox-repo automation** and **sibling repos** (`../complete-credential`, `../smoa`). Rows marked **Done (session)** were executed from an operator host with LAN access unless noted. + +--- + +## Execution log (2026-03-23) + +| Action | Result | +|--------|--------| +| Sankofa `api` + `portal` (workstation) | API: `websocket.ts` imports `logger`. Portal: `@apollo/client` + `graphql`; `src/lib/graphql/queries/dashboard.ts`; `ApolloProvider` in `providers.tsx`; `next.config.js` skips ESLint + TS build errors (legacy debt). **`pnpm build` succeeds** in `../Sankofa/portal`. **Deploy:** sync `portal/` (+ lockfile) to CT **7801**, `pnpm install && pnpm build`, restart `sankofa-portal`; re-sync API if websocket change not on **7800** | + +## Execution log (2026-03-26) + +| Action | Result | +|--------|--------| +| `./scripts/run-all-operator-tasks-from-lan.sh` (live, no `--dry-run`) | Exit 0 (~36 min); W0-1 NPMplus RPC/proxy host updates; W0-3 live NPMplus backup; Blockscout verification step ran | +| NPMplus update script | Some hosts logged duplicate-create then PUT recovery; `rpc.tw-core.d-bis.org` and `*.tw-core.d-bis.org` showed repeated failures — **review those rows in NPM UI** if traffic depends on them | +| `scripts/maintenance/diagnose-vm-health-via-proxmox-ssh.sh` | Completed: Phoenix CTs **7800–7803** running on r630-01; NPMplus **10233** up; port 81 check OK | +| `scripts/maintenance/npmplus-verify-port81.sh` | **Restored** in repo; loopback :81 returns HTTP 301 (redirect) — treated as reachable | +| `pct exec 7800` / `7801`: `ss -tlnp` | **As of 2026-03-26 session:** no listeners. **As of 2026-03-23 follow-up:** **7800** API can reach `active` + `/health` on **:4000** when `sankofa-api` is deployed; **7801** portal needs **current** portal tree + successful **`pnpm build`** on the CT (see 2026-03-23 log row above) | +| `pct exec 7802` Keycloak | `http://127.0.0.1:8080/` → **200**; `/health/ready` → 404 (version may use different health path) | +| `./scripts/run-completable-tasks-from-anywhere.sh` | Exit 0 | +| `E2E_ACCEPT_502_INTERNAL=1 ./scripts/verify/verify-end-to-end-routing.sh` | 0 failed; report `docs/04-configuration/verification-evidence/e2e-verification-20260325_182512/` | +| `./scripts/verify/run-contract-verification-with-proxy.sh` | Exit 0 | +| `complete-credential` Phase 1 compose + `run-phase1-synthetic.sh` | OK (operator console 8087 = 200) | +| `../smoa`: `./gradlew :app:assembleDebug` | BUILD SUCCESSFUL; APK: `smoa/app/build/outputs/apk/debug/app-debug.apk` | + +--- + +## Execution log (2026-03-25) + +| Action | Result | +|--------|--------| +| RPC `192.168.11.221:8545` / `192.168.11.211:8545` | HTTP 201 | +| SSH `root@192.168.11.10` / `.11` | OK (BatchMode) | +| `./scripts/run-completable-tasks-from-anywhere.sh` | Exit 0 | +| `./scripts/verify/check-contracts-on-chain-138.sh` | 59/59 present | +| `E2E_ACCEPT_502_INTERNAL=1 ./scripts/verify/verify-end-to-end-routing.sh` | 37 domains, 0 failed; report under `docs/04-configuration/verification-evidence/e2e-verification-20260325_165153/` | +| `https://phoenix.sankofa.nexus/`, `https://sankofa.nexus/` | HTTP 200 | +| `http://192.168.11.50:4000/health`, `:51:3000`, `:52:8080/health/ready` | No HTTP response from operator host (hosts ping; services may be down, firewalled, or not bound) — **re-check on Proxmox / in-container** | +| `./scripts/verify/backup-npmplus.sh --dry-run` | OK | +| `./scripts/verify/run-contract-verification-with-proxy.sh` | Exit 0 | +| `./scripts/run-all-operator-tasks-from-lan.sh --dry-run` | Printed wave0 + verify sequence | +| `cd smom-dbis-138 && forge test --match-path 'test/e2e/*.sol'` | Exit 0 | +| `cd ../smoa && ./gradlew smoaVerify --no-daemon` | Exit 0 | +| `complete-credential`: `git submodule status` | Submodules present on commits | +| `docker compose -f integration/docker-compose.phase1.yml config` | Valid | +| `docker compose -f integration/docker-compose.phase1.yml up -d` | All Phase 1 containers up | +| Rebuild + recreate `cc-operator-console`; `./integration/run-phase1-synthetic.sh` | OK | + +--- + +## Checklist + +| ID | Task | Status | +|----|------|--------| +| A1 | LAN / VPN; Proxmox SSH | Done (session) | +| A2 | Root `.env` + `smom-dbis-138/.env` for operator | Operator to confirm secrets present | +| A3 | `config/public-sector-program-manifest.json` valid | Done (completable) | +| B1 | NPMplus proxy + TLS for public FQDNs | **Done (2026-03-26)** — `run-wave0-from-lan.sh` / update script applied; spot-check `rpc.tw-core` / `*.tw-core` in NPM if needed | +| B2 | `scripts/verify/backup-npmplus.sh` (live) | **Done (2026-03-26)** — W0-3 as part of `run-all-operator-tasks-from-lan.sh` | +| B3 | `scripts/maintenance/npmplus-verify-port81.sh` | **Done** — script restored; SSH `pct exec 10233` loopback :81 | +| C1 | Phoenix stack VMIDs 7800–7803 per `SERVICE_DESCRIPTIONS.md` | **7802 Keycloak:** HTTP 200 on `/` inside CT. **7800 API:** deploy/restart `sankofa-api` — expect listener **:4000**. **7801 portal:** deploy latest portal artifact (see 2026-03-23 log) — expect **:3000** | +| C2 | Keycloak realms: admin / tenant / org-unit RBAC | Product + IdP work — not automated here | +| C3 | Phoenix API + portal wired; GraphQL `/graphql`, `/health` | **API:** curl `http://192.168.11.50:4000/health` (and `/graphql` as needed). **Portal:** after **7801** build, curl `http://192.168.11.51:3000/` | +| C4 | Service catalog SKUs + entitlements (billing optional) | Product — see tenancy baseline G2 | +| D1 | SMOA LXC per `smoa/backend/docs/LXC-PROXMOX-CONTAINERS.md` | Deploy on Proxmox | +| D2 | SMOA API behind NPM | After D1 | +| D3 | Release APK + download URL or MDM | **Debug APK built (2026-03-26):** `../smoa/app/build/outputs/apk/debug/app-debug.apk` — publish via CI signed release + NPM/static URL or MDM | +| D4 | Device E2E against prod API | After D2–D3 | +| E1 | `complete-credential` submodules initialized | Done (session) | +| E2 | Phase 1 Docker stack local/CI | Done (session) — not yet Proxmox production | +| E3 | `./integration/run-phase1-synthetic.sh` after console rebuild | Done (session) | +| E4 | Production slice / dedicated LXC for `cc-*` | Architecture choice (profile A/B/C) | +| F1 | Chain 138 on-chain contract check | Done (session) | +| F2 | Blockscout verification | Done (session) | +| F3 | Public E2E routing | Done (session, 502-tolerant flag) | +| G1 | Logs, metrics, DB backups for Phoenix + SMOA + CC DBs | Operational runbooks | +| G2 | Incident ownership per stack | Process | + +--- + +## Quick commands (repo root unless noted) + +```bash +./scripts/run-completable-tasks-from-anywhere.sh +source scripts/lib/load-project-env.sh && ./scripts/verify/check-contracts-on-chain-138.sh +E2E_ACCEPT_502_INTERNAL=1 ./scripts/verify/verify-end-to-end-routing.sh +./scripts/verify/run-contract-verification-with-proxy.sh +./scripts/verify/backup-npmplus.sh --dry-run # then run without --dry-run +``` + +**Complete Credential (sibling clone):** + +```bash +cd ../complete-credential +docker compose -f integration/docker-compose.phase1.yml up -d --build +./integration/run-phase1-synthetic.sh +``` + +**SMOA:** + +```bash +cd ../smoa && ./gradlew smoaVerify --no-daemon +``` + +--- + +## Follow-ups + +1. **Phoenix LAN services:** From a host on the same L2 as `192.168.11.50–53`, curl `/health` and portal port **3000**; if down, start CTs/VMs on Proxmox and confirm process listeners. +2. **Operator full wave:** `./scripts/run-all-operator-tasks-from-lan.sh` only when NPM RPC fix + backup + verify are intentionally desired (mutates NPM). +3. **Production Complete Credential:** Move from laptop Docker to **dedicated LXC** and NPM routes per deployment profile.