Files
proxmox/docs/00-meta/NEXT_STEPS_OPERATOR.md
defiQUG 2a6d3cfc7f
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
Update submodule references and improve CI workflow
- Update submodule references for explorer-monorepo and smom-dbis-138 to latest commits.
- Modify CI workflow to include shellcheck installation and enforce error severity for script checks.
- Update contract addresses in configuration and documentation to reflect the new canonical addresses for CCIPWETH9Bridge and CCIP Router.
- Revise integration test documentation to align with updated contract addresses and deployment statuses.

Made-with: Cursor
2026-03-24 22:50:52 -07:00

258 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Next Steps — Operator Runbook
**Last Updated:** 2026-02-20
**Purpose:** Single runbook of copy-paste commands for all remaining operator/LAN/creds steps. Use after automated steps are done.
**References:** [REMAINING_WORK_DETAILED_STEPS.md](REMAINING_WORK_DETAILED_STEPS.md), [WAVE2_WAVE3_OPERATOR_CHECKLIST.md](WAVE2_WAVE3_OPERATOR_CHECKLIST.md), [INFRA_DEPLOYMENT_LOCKED_AND_LOADED.md](../03-deployment/INFRA_DEPLOYMENT_LOCKED_AND_LOADED.md). **Single fixes checklist (required + optional):** [FIXES_PREPARED.md](../04-configuration/FIXES_PREPARED.md). **Full fixes (validators, block/tx, Sentries, RPCs, network, optional):** [FULL_FIXES_PREPARED.md](../04-configuration/FULL_FIXES_PREPARED.md). **All next steps (consolidated):** [NEXT_STEPS_ALL.md](NEXT_STEPS_ALL.md). **Dev/Codespaces (76.53.10.40):** [DEV_CODESPACES_NEXT_STEPS_CHECKLIST.md](../04-configuration/DEV_CODESPACES_NEXT_STEPS_CHECKLIST.md). **Dev/Codespaces completion evidence:** [DEV_CODESPACES_COMPLETION_20260207.md](../04-configuration/verification-evidence/DEV_CODESPACES_COMPLETION_20260207.md).
---
## Completed in this session (2026-02-20)
| Item | Result |
|------|--------|
| Completable tasks | `run-completable-tasks-from-anywhere.sh` — config validation OK, on-chain 45/45, run-all-validation --skip-genesis OK, reconcile-env --print. |
| Doc consolidation | NEXT_STEPS_INDEX, DOCUMENTATION_CONSOLIDATION_PLAN; Batch 4+5 → 00-meta-pruned; root cleanup → archive/root-cleanup-20260220; ARCHIVE_CANDIDATES "Last reviewed" set. |
## Completed in previous session (2026-02-19)
| Item | Result |
|------|--------|
| Completable tasks | `run-completable-tasks-from-anywhere.sh` — config, 46 on-chain, validation passed. |
| Operator script | `run-all-operator-tasks-from-lan.sh` — W0-1 skipped (off-LAN); Blockscout verify attempted (Blockscout unreachable). |
| RPC 2101 verify | `verify-rpc-2101-approve-and-sync.sh` — ✅ Chain 138, 19 peers, 5 validators, blocks advancing. |
| 502 script | `address-all-remaining-502s.sh` — backends 10130/10150/10151 OK; Besu 2101 restarted (finish from LAN for NPMplus). |
| Optional Phase 9 | Smart accounts kit (informational) — ran; next: deploy EntryPoint/AccountFactory/Paymaster. |
| E2E verification | `verify-end-to-end-routing.sh` with E2E_ACCEPT_502_INTERNAL=1 — run (report in verification-evidence). |
**Still from LAN:** NPMplus backup, Blockscout verification, full 502/NPMplus proxy update. See [COMPLETION_STATUS_20260215](../archive/00-meta-pruned/COMPLETION_STATUS_20260215.md).
---
## Completed in previous session (2026-02-06)
| Item | Result |
|------|--------|
| Validation | `run-all-validation.sh --skip-genesis` — passed |
| W1-1 dry-run | `setup-ssh-key-auth.sh --dry-run` — steps printed |
| W1-2 dry-run | `firewall-proxmox-8006.sh --dry-run` — UFW commands printed (ADMIN_CIDR=192.168.11.0/24) |
| NPMplus backup | `backup-npmplus.sh` — ran successfully (local + on host); backup pulled to `backups/npmplus/backup-20260206_171756.tar.gz` |
| Bridge dry-run | `run-send-cross-chain.sh 0.01 --dry-run` — simulated (real run when PRIVATE_KEY/LINK ready) |
| .env NPM | NPM_URL/NPM_HOST set to 192.168.11.167:81 (use .167 if .166 refuses) |
| **Copy to host** | Scripts copied to **root@192.168.11.11:/tmp/proxmox-scripts-run** (wave0, backup, secure-validator-keys, create-missing-containers, schedule cron scripts, daily-weekly-checks) |
| **Wave 0 on host** | Ran on r630-01: W0-1 (19 NPMplus proxy hosts updated), W0-3 (backup); backup also on host at `.../backups/npmplus/backup-20260206_171756.tar.gz` |
| **Backup pulled** | Host backup copied to local `backups/npmplus/backup-20260206_171756.tar.gz` |
| **Validator keys** | `secure-validator-keys.sh --dry-run` run on host — 10001002 would be secured; 10031004 not running, skipped. Use `--apply` on host when ready. |
| **Cron scripts on host** | schedule-npmplus-backup-cron.sh and schedule-daily-weekly-cron.sh (and daily-weekly-checks.sh) copied; use `--show` then `--install` from `/tmp/proxmox-scripts-run` if you want cron there (note: /tmp may be cleared on reboot; for permanent cron, clone repo to a persistent path on the host). |
| **Cron installed on host** | NPMplus backup cron (03:00) and daily/weekly cron (08:00 daily, Sun 09:00 weekly) installed on root@192.168.11.11. Logs: `/tmp/proxmox-scripts-run/logs/npmplus-backup.log`, `daily-weekly-checks.log`. |
| **Validator keys applied** | `secure-validator-keys.sh` run on host (no --dry-run): VMIDs 1000, 1001, 1002 secured (chmod 600/700, chown besu); 1003, 1004 not running, skipped. |
---
## Wave 0 — Gates
### W0-2: sendCrossChain (real)
**When:** PRIVATE_KEY and LINK (or fee token) approved in `.env`; you are ready to broadcast.
```bash
cd /path/to/proxmox
# Optional: dry-run first
bash scripts/bridge/run-send-cross-chain.sh 0.01 --dry-run
# Real (no --dry-run)
bash scripts/bridge/run-send-cross-chain.sh 0.01
# Or with recipient:
bash scripts/bridge/run-send-cross-chain.sh 0.01 0xYourRecipientAddress
```
Bridge contract (reference): `0xcacfd227A040002e49e2e01626363071324f820a`. Ensure `CCIPWETH9_BRIDGE_CHAIN138` and `RPC_URL_138`/`CHAIN138_RPC` in `.env`.
### W0-3: NPMplus backup (re-run anytime)
Backup already ran once; re-run when NPMplus is up and you want a fresh backup:
```bash
cd /path/to/proxmox
bash scripts/verify/backup-npmplus.sh
```
From a host without NPM API access, use: `bash scripts/run-via-proxmox-ssh.sh wave0 --host 192.168.11.11` (r630-01) to run W0-1 + W0-3 on the host.
---
## Crontab (install on jump host or Proxmox node)
```bash
cd /path/to/proxmox
# Show lines
bash scripts/maintenance/schedule-npmplus-backup-cron.sh --show
bash scripts/maintenance/schedule-daily-weekly-cron.sh --show
# Install
bash scripts/maintenance/schedule-npmplus-backup-cron.sh --install
bash scripts/maintenance/schedule-daily-weekly-cron.sh --install
```
---
## Wave 1 — Security (run on each Proxmox host or via SSH)
### W1-1: SSH key-based auth (disable password)
**Pre-requisite:** Deploy SSH keys to all hosts (`ssh-copy-id root@<host>`); test login; have break-glass access.
```bash
cd /path/to/proxmox
# On each Proxmox host (or: ssh root@192.168.11.11 'cd /path/to/proxmox && bash scripts/security/setup-ssh-key-auth.sh --apply')
bash scripts/security/setup-ssh-key-auth.sh --apply
```
### W1-2: Firewall — restrict Proxmox API port 8006
**Pre-requisite:** Run on host where UFW is used (or apply equivalent iptables). Default CIDR: 192.168.11.0/24.
```bash
cd /path/to/proxmox
# Dry-run (already done)
bash scripts/security/firewall-proxmox-8006.sh --dry-run
# Apply (allow only ADMIN_CIDR)
bash scripts/security/firewall-proxmox-8006.sh --apply
# Or with custom CIDR:
bash scripts/security/firewall-proxmox-8006.sh --apply 192.168.11.0/24
```
Then verify: `https://<proxmox-ip>:8006` only from allowed IPs.
### W1-19: Secure validator keys (on Proxmox host as root)
```bash
cd /path/to/proxmox
bash scripts/secure-validator-keys.sh --dry-run # review
bash scripts/secure-validator-keys.sh # apply (chmod 600, chown besu)
```
---
---
## VMIDs 2506, 2507, 2508 — Destroyed 2026-02-08
Containers 2506, 2507, 2508 were **removed and destroyed** on all Proxmox hosts. Script: `scripts/destroy-vmids-2506-2508.sh`. Besu RPC range is **25002505** only. See [MISSING_CONTAINERS_LIST.md](../03-deployment/MISSING_CONTAINERS_LIST.md).
---
## Dev/Codespaces (76.53.10.40) — Full completion
**Single ordered checklist:** [04-configuration/DEV_CODESPACES_NEXT_STEPS_CHECKLIST.md](../04-configuration/DEV_CODESPACES_NEXT_STEPS_CHECKLIST.md) — Phases 17 (fourth NPMplus, dev VM, UDM port forward, Cloudflare tunnel, NPMplus proxy hosts, projects/dotenv, verification).
**Key commands (after fourth NPMplus and dev VM exist):**
| Step | Command |
|------|---------|
| Create fourth NPMplus LXC (10236 @ 192.168.11.170) | `bash scripts/npmplus/create-npmplus-fourth-container.sh` |
| Create dev VM (5700 @ 192.168.11.59) | `bash scripts/create-dev-vm-5700.sh` |
| Setup dev VM users + Gitea | `ssh root@192.168.11.11 "pct exec 5700 -- bash -s" < scripts/setup-dev-vm-users-and-gitea.sh` |
| Tunnel + DNS (set CLOUDFLARE_TUNNEL_ID_DEV_CODESPACES in .env first) | `bash scripts/cloudflare/configure-dev-codespaces-tunnel-and-dns.sh` |
| Fourth NPMplus proxy hosts | `NPM_URL=https://192.168.11.170:81 NPM_PASSWORD='...' bash scripts/nginx-proxy-manager/update-npmplus-fourth-proxy-hosts.sh` |
UDM Pro: add port forward 76.53.10.40 → 192.168.11.170 (80/81/443), optional 22 → 192.168.11.59. See [UDM_PRO_DEV_CODESPACES_PORT_FORWARD.md](../04-configuration/UDM_PRO_DEV_CODESPACES_PORT_FORWARD.md).
---
## Wave 2 & Wave 3 — Full checklist
Use the ordered checklist:
- **[WAVE2_WAVE3_OPERATOR_CHECKLIST.md](WAVE2_WAVE3_OPERATOR_CHECKLIST.md)** — W2-1 (monitoring) through W2-8 (NPMplus HA), then W3-1 (CCIP Fleet), W3-2 (Phase 4 isolation).
Summary:
| Wave | Tasks |
|------|--------|
| W2-1 | Monitoring stack (Prometheus, Grafana, Loki, Alertmanager) |
| W2-2 | Grafana via Cloudflare Access; alerts |
| W2-3 | VLAN enablement (UDM Pro, Proxmox bridge) |
| W2-4 | Phase 3 CCIP: Ops/Admin (54005401); NAT; scripts |
| W2-5 | Phase 4 sovereign tenant VLANs |
| W2-6 | ~~25062508~~ Destroyed 2026-02-08 (RPC 25002505 only) |
| W2-7 | DBIS services (1010010151) |
| W2-8 | NPMplus HA (optional) |
| W3-1 | CCIP Fleet (commit/execute/RMN nodes) |
| W3-2 | Phase 4 tenant isolation enforcement |
---
## Explorer SSL (manual)
If **explorer.d-bis.org** shows "Your connection isn't private":
1. Open NPMplus: **https://192.168.11.167:81** (credentials: `NPM_EMAIL`, `NPM_PASSWORD` from `.env`).
2. SSL Certificates → Add Let's Encrypt for `explorer.d-bis.org` (DNS Challenge + Cloudflare credential if needed).
3. Proxy Hosts → explorer.d-bis.org → SSL tab → assign cert, Force SSL, Save.
See [EXPLORER_TROUBLESHOOTING.md](../04-configuration/EXPLORER_TROUBLESHOOTING.md).
---
## E2E 502s (when public domains return 502)
From **LAN** (SSH to Proxmox + reach NPMplus):
| Goal | Command |
|------|---------|
| Fix all 502 backends + NPMplus proxy + RPC diagnostics | `./scripts/maintenance/address-all-remaining-502s.sh` |
| Also Besu config fix + E2E at end | `./scripts/maintenance/address-all-remaining-502s.sh --run-besu-fix --e2e` |
| Re-run E2E only | `./scripts/verify/verify-end-to-end-routing.sh` |
**Runbook:** [502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md](502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md).
---
## Remaining (operator only)
- **W0-2** — sendCrossChain real (when PRIVATE_KEY/LINK ready).
- **W1-1 / W1-2** — SSH key auth and firewall 8006 `--apply` on each Proxmox host (after keys deployed / CIDR decided).
- **Cron** — ✅ Installed on root@192.168.11.11 (NPMplus 03:00; daily 08:00; weekly Sun 09:00). Re-install if you move repo to a permanent path.
- **Validator keys** — ✅ Applied on host for 10001002; 10031004 skipped (not running). Re-run when 1003/1004 are up if needed.
- **25062508** — Destroyed 2026-02-08; no action.
- **Wave 2 / 3** — Monitoring, VLAN, CCIP, NPMplus HA, Phase 4 per WAVE2_WAVE3_OPERATOR_CHECKLIST.
- **Explorer SSL** — Let's Encrypt for explorer.d-bis.org in NPMplus UI (see above). One-time (and after NPMplus restore if certs lost).
- **Explorer VM 5000 thin pool** — If thin1-r630-02 is >85% or full, migrate VMID 5000 to thin5 per [BLOCKSCOUT_FIX_RUNBOOK.md](../03-deployment/BLOCKSCOUT_FIX_RUNBOOK.md) § "Fix: Migrate VM 5000 to thin5". Weekly cron now checks thin pool (138a); act when it warns or fails.
- **NPMplus cert 134 (cross-all.defi-oracle.io)** — If verification reports "cert files missing" for cert ID 134: in NPMplus at https://192.168.11.167:81 → SSL Certificates → find cross-all.defi-oracle.io → re-save or request Let's Encrypt again to restore cert files on disk.
- **Dev/Codespaces (76.53.10.40)** — Complete all phases in [DEV_CODESPACES_NEXT_STEPS_CHECKLIST.md](../04-configuration/DEV_CODESPACES_NEXT_STEPS_CHECKLIST.md): fourth NPMplus (10236), dev VM (5700), UDM port forward, Cloudflare tunnel, NPMplus fourth proxy hosts, Let's Encrypt, rsync/dotenv, verification.
---
## After running "complete all next steps"
1. **Automated (workspace):** `bash scripts/run-all-next-steps.sh` — report in `docs/04-configuration/verification-evidence/NEXT_STEPS_RUN_*.md`.
2. **Validators + tx-pool:** `bash scripts/fix-all-validators-and-txpool.sh` (requires SSH to .10, .11).
3. **Flush stuck tx (if any):** `bash scripts/flush-stuck-tx-rpc-and-validators.sh --full` (clears RPC 2101 + validators 10001004).
4. **Verify from LAN:** From a host on 192.168.11.x run `bash scripts/monitoring/monitor-blockchain-health.sh` and `bash scripts/skip-stuck-transactions.sh`. See [NEXT_STEPS_COMPLETION_RUN_20260208.md](../04-configuration/verification-evidence/NEXT_STEPS_COMPLETION_RUN_20260208.md) § Verify from LAN.
---
## Quick command index
| Goal | Command |
|------|---------|
| **Run all automated next steps** | `bash scripts/run-all-next-steps.sh` (validation, E2E, explorer check, dry-runs; report in verification-evidence/NEXT_STEPS_RUN_*.md) |
| W0-2 real | `bash scripts/bridge/run-send-cross-chain.sh 0.01` |
| W0-3 backup | `bash scripts/verify/backup-npmplus.sh` |
| W0 from LAN | `bash scripts/run-wave0-from-lan.sh` |
| W1-1 apply | `bash scripts/security/setup-ssh-key-auth.sh --apply` (on each host) |
| W1-2 apply | `bash scripts/security/firewall-proxmox-8006.sh --apply` |
| NPMplus cron | `bash scripts/maintenance/schedule-npmplus-backup-cron.sh --install` |
| Daily/weekly cron | `bash scripts/maintenance/schedule-daily-weekly-cron.sh --install` |
| Validator keys | On Proxmox: `bash scripts/secure-validator-keys.sh` (after --dry-run) |
| Wave 0 via SSH | `bash scripts/run-via-proxmox-ssh.sh wave0 --host 192.168.11.11` |
| Request cert (via SSH) | `bash scripts/run-via-proxmox-ssh.sh request-cert --host 192.168.11.11` |
| Fourth NPMplus container | `bash scripts/npmplus/create-npmplus-fourth-container.sh` |
| Dev VM create | `bash scripts/create-dev-vm-5700.sh` |
| Dev/Codespaces tunnel+DNS | `bash scripts/cloudflare/configure-dev-codespaces-tunnel-and-dns.sh` (set CLOUDFLARE_TUNNEL_ID_DEV_CODESPACES in .env) |
| Fourth NPMplus proxy hosts | `NPM_URL=https://192.168.11.170:81 NPM_PASSWORD='...' bash scripts/nginx-proxy-manager/update-npmplus-fourth-proxy-hosts.sh` |
| **Address all 502s (LAN)** | `./scripts/maintenance/address-all-remaining-502s.sh` (use `--run-besu-fix --e2e` for full flow) |
| E2E routing (after NPMplus/DNS change) | `bash scripts/verify/verify-end-to-end-routing.sh` |
| Explorer E2E from LAN (after frontend/Blockscout deploy) | `bash explorer-monorepo/scripts/e2e-test-explorer.sh` |
| Blockscout migrations (version/config change) | On r630-02: `bash scripts/fix-blockscout-ssl-and-migrations.sh` — see [BLOCKSCOUT_FIX_RUNBOOK.md](../03-deployment/BLOCKSCOUT_FIX_RUNBOOK.md) |
| When decommissioning RPC used by explorer | Update Blockscout RPC URL on VM 5000; restart Blockscout — see [OPERATIONAL_RUNBOOKS.md](../03-deployment/OPERATIONAL_RUNBOOKS.md) § "When decommissioning or changing RPC nodes" |