Files
proxmox/docs/05-network/E2E_CLOUDFLARE_DOMAINS_RUNBOOK.md

180 lines
9.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# E2E Success Runbook: Cloudflare Domains
**Last Updated:** 2026-02-05
**Status:** Active
**Purpose:** Achieve and verify complete end-to-end success for all public endpoints reachable via Cloudflare DNS (and optionally Fastly). All domains must pass DNS, SSL, and HTTP/RPC/WebSocket tests.
---
## Goal
- **DNS**: Every domain resolves (to `76.53.10.36` or, if using Fastly, to any valid IP when `ACCEPT_ANY_DNS=1`).
- **SSL**: Valid certificate for each domain (HTTPS).
- **HTTP/API/Web**: Each web/API domain returns 2xx (or acceptable) over HTTPS.
- **RPC**: Each RPC domain responds to `eth_chainId` with `0x8a` (138).
- **WebSocket**: Each RPC-WS domain accepts WebSocket upgrade (101 or full wscat test).
---
## Domains Under Test
The verification script covers all public domains that require access from Cloudflare (and edge) to NPMplus and backends. Source of truth: [RPC_ENDPOINTS_MASTER.md](../04-configuration/RPC_ENDPOINTS_MASTER.md).
| Domain | Type | Backend |
|--------|------|---------|
| explorer.d-bis.org | web | 192.168.11.140:80 |
| rpc-http-pub.d-bis.org | rpc-http | 192.168.11.221:8545 |
| rpc-ws-pub.d-bis.org | rpc-ws | 192.168.11.221:8546 |
| rpc.d-bis.org | rpc-http | 192.168.11.221:8545 |
| rpc2.d-bis.org | rpc-http | 192.168.11.221:8545 |
| ws.rpc.d-bis.org | rpc-ws | 192.168.11.221:8546 |
| ws.rpc2.d-bis.org | rpc-ws | 192.168.11.221:8546 |
| rpc-http-prv.d-bis.org | rpc-http | 192.168.11.211:8545 |
| rpc-ws-prv.d-bis.org | rpc-ws | 192.168.11.211:8546 |
| dbis-admin.d-bis.org | web | 192.168.11.130:80 |
| dbis-api.d-bis.org | api | 192.168.11.155:3000 |
| dbis-api-2.d-bis.org | api | 192.168.11.156:3000 |
| secure.d-bis.org | web | 192.168.11.130:80 |
| mim4u.org, www, secure, training | web | 192.168.11.37:80 |
| sankofa.nexus, www | web | 192.168.11.51:3000 |
| phoenix.sankofa.nexus, www | web | 192.168.11.50:4000 |
| the-order.sankofa.nexus | web | TBD |
| studio.sankofa.nexus | web | 192.168.11.72:8000 |
| rpc.public-0138.defi-oracle.io | rpc-http | 192.168.11.240:443 |
| rpc.defi-oracle.io | rpc-http | 192.168.11.221:8545 |
| wss.defi-oracle.io | rpc-ws | 192.168.11.221:8546 |
---
## Prerequisites
- Run from a host with outbound HTTPS (and optional WebSocket) to the internet. For DNS checks against the public IP, running from **outside** your LAN (e.g. mobile hotspot) is recommended when validating direct-to-origin.
- Tools: `curl`, `jq`, `dig`, `openssl`. Optional: `wscat` for full WebSocket RPC test (`npm install -g wscat`).
- Cloudflare DNS (and NPMplus/Fastly) already configured per [CLOUDFLARE_ROUTING_MASTER.md](CLOUDFLARE_ROUTING_MASTER.md) and [RPC_ENDPOINTS_MASTER.md](../04-configuration/RPC_ENDPOINTS_MASTER.md).
---
## Step 0: Fix RPC 405 (If Needed)
If RPC endpoints return **405 Method Not Allowed**, apply the NPMplus RPC fix from a host on LAN (or via SSH to a Proxmox host that can reach NPMplus):
```bash
# Option A: From repo (copies scripts to Proxmox and runs there)
bash scripts/run-via-proxmox-ssh.sh wave0 --skip-backup --host 192.168.11.11
# Option B: From a host already on LAN with the repo
bash scripts/run-wave0-from-lan.sh --skip-backup
# Or only the NPMplus update:
bash scripts/nginx-proxy-manager/update-npmplus-proxy-hosts-api.sh
```
This sets `block_exploits: false` for all RPC proxy hosts so JSON-RPC POST to `/` is allowed by NPMplus.
**If RPC still returns 405 after the fix:** Test from LAN: `curl -X POST "https://192.168.11.167/" -H "Host: rpc.d-bis.org" -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' -k`. If that returns 200 with `"result":"0x8a"`, NPMplus is correct and the 405 is coming from the edge (UDM Pro port-forward or upstream). Check UDM Pro firewall/port-forward rules for any HTTP method restrictions, or use Cloudflare Tunnel for RPC if needed.
---
## Step 1: Run E2E Verification Script
From the project root:
```bash
cd /home/intlc/projects/proxmox
bash scripts/verify/verify-end-to-end-routing.sh --profile=public
```
Optional environment variables:
| Variable | Default | Purpose |
|----------|---------|---------|
| `PUBLIC_IP` | `76.53.10.36` | Expected A record for DNS pass (direct-to-origin). |
| `ACCEPT_ANY_DNS` | `0` | Set to `1` to pass DNS if domain resolves to **any** IP (e.g. when using Fastly CNAME). |
| `E2E_SUCCESS_IF_ONLY_RPC_BLOCKED` | `0` | Set to `1` to treat run as success (exit 0) when only RPC fails (edge blocks POST). See [E2E_RPC_EDGE_LIMITATION.md](E2E_RPC_EDGE_LIMITATION.md). |
| `SKIP_BLOCKSCOUT_API` | `0` | Set to `1` to skip the optional Blockscout API check for explorer.d-bis.org (e.g. when running off-LAN and API is unreachable). |
Example when using Fastly (DNS points to Fastly, not 76.53.10.36):
```bash
ACCEPT_ANY_DNS=1 bash scripts/verify/verify-end-to-end-routing.sh --profile=public
```
Outputs:
- **Report**: `docs/04-configuration/verification-evidence/e2e-verification-<timestamp>/verification_report.md`
- **JSON**: `.../all_e2e_results.json`
- **Headers/RPC**: `.../<domain>_https_headers.txt`, `.../<domain>_rpc_response.txt`
---
## Step 2: Interpret Results
- **DNS pass**: Domain resolves to `PUBLIC_IP` (or to any IP if `ACCEPT_ANY_DNS=1`).
- **SSL pass**: Certificate valid and matches domain.
- **HTTPS pass**: HTTP code 2xx (or 3xx for redirects) for web/api domains.
- **Blockscout API** (explorer.d-bis.org only, optional): GET `/api/v2/stats` returns 200 with `total_blocks` or `total_transactions`. Reported as pass/skip; does not affect E2E exit code.
- **RPC pass**: JSON-RPC `eth_chainId` returns `"result":"0x8a"`.
- **WebSocket pass**: Upgrade 101 or successful wscat RPC test.
If any domain fails:
1. Open `verification_report.md` and `all_e2e_results.json` for that domain.
2. Check Cloudflare DNS (A/CNAME) for the hostname.
3. Check NPMplus proxy host exists and points to the correct backend (see [RPC_ENDPOINTS_MASTER.md](../04-configuration/RPC_ENDPOINTS_MASTER.md)).
4. If using UDM Pro direct: ensure port forward 76.53.10.36:80/443 → NPMplus (192.168.11.167:80/443). See [EDGE_PORT_VERIFICATION_RUNBOOK.md](EDGE_PORT_VERIFICATION_RUNBOOK.md).
---
## Step 3: Fix Common Failures
| Symptom | Likely cause | Action |
|---------|----------------|--------|
| DNS fail, expected 76.53.10.36 | DNS points to Fastly or other | Use `ACCEPT_ANY_DNS=1` or set DNS to 76.53.10.36 per design. |
| DNS no resolution | Record missing or wrong zone | Add/update A or CNAME in Cloudflare. |
| SSL fail | Certificate missing or wrong host | Ensure NPMplus has valid cert (e.g. Lets Encrypt) for that domain. |
| HTTPS 502/504 | Backend down or NPMplus wrong target | Check backend VM/container and NPMplus proxy target IP:port. |
| **RPC 405 Method Not Allowed** | NPMplus block_exploits or edge (UDM Pro) limiting POST | Run Wave 0 from LAN (see Step 0). If POST to `https://192.168.11.167/` with `Host: rpc.d-bis.org` returns 200, the edge is the cause; see [E2E_RPC_EDGE_LIMITATION.md](E2E_RPC_EDGE_LIMITATION.md) for full RPC pass options. |
| RPC no result | RPC service down or wrong port | Check Besu/ThirdWeb node and NPMplus proxy (8545/8546 or 443). |
| WebSocket fail | Proxy or backend not supporting WS | Enable WebSocket in NPMplus for that host; check backend WS port. |
| **explorer.d-bis.org** HTTPS 502 or Blockscout API skip | Backend VMID 5000 down, DB/migrations, or thin pool full | See [BLOCKSCOUT_FIX_RUNBOOK.md](../03-deployment/BLOCKSCOUT_FIX_RUNBOOK.md). Run from LAN: `./scripts/fix-blockscout-ssl-and-migrations.sh` on Proxmox host; then re-run E2E. For full explorer tests on LAN: `explorer-monorepo/scripts/e2e-test-explorer.sh`. |
---
## Blockscout and explorer.d-bis.org (E2E completion)
- **Public E2E**: `verify-end-to-end-routing.sh --profile=public` tests explorer.d-bis.org as **web** (DNS, SSL, HTTPS). It also runs an **optional** Blockscout API check (GET `https://explorer.d-bis.org/api/v2/stats`). If the API is unreachable (e.g. run from off-LAN), the result is recorded as `skip` and does not fail the run. Use `SKIP_BLOCKSCOUT_API=1` to skip this check entirely.
- **Fix Blockscout** (502, DB, migrations): Run on Proxmox host or from LAN per [BLOCKSCOUT_FIX_RUNBOOK.md](../03-deployment/BLOCKSCOUT_FIX_RUNBOOK.md). Key script: `scripts/fix-blockscout-ssl-and-migrations.sh`.
- **Full explorer E2E on LAN**: For comprehensive explorer tests (frontend, API, services on VMID 5000), run from a host that can reach 192.168.11.140: `explorer-monorepo/scripts/e2e-test-explorer.sh`. Report: [explorer-monorepo/E2E_TEST_REPORT.md](../../../explorer-monorepo/E2E_TEST_REPORT.md).
- **Daily checks**: Explorer indexer is checked by `scripts/maintenance/daily-weekly-checks.sh daily` using Blockscout `/api/v2/stats` (and fallback to `?module=stats&action=eth_price`).
---
## Step 4: Full Verification Suite (Optional)
To run the full verification (DNS export, NPMplus export, backend VMs, then E2E):
```bash
bash scripts/verify/run-full-verification.sh
```
This includes the same E2E script and produces the same E2E artifacts plus other evidence.
---
## Success Criteria
- **Complete E2E success**: All domains in the script have:
- DNS: pass
- SSL: pass (where applicable)
- HTTPS / RPC / WebSocket: pass per domain type
Domains that are intentionally not yet deployed (e.g. `the-order.sankofa.nexus`) may show failures until backends are added; document them as known exceptions and add to the scripts exclusion list later if desired.
---
## References
- [RPC_ENDPOINTS_MASTER.md](../04-configuration/RPC_ENDPOINTS_MASTER.md) Authoritative proxy and backend list
- [CLOUDFLARE_ROUTING_MASTER.md](CLOUDFLARE_ROUTING_MASTER.md) Edge routing (Fastly / direct)
- [EDGE_PORT_VERIFICATION_RUNBOOK.md](EDGE_PORT_VERIFICATION_RUNBOOK.md) 76.53.10.36 port check
- [INGRESS_VERIFICATION_RUNBOOK.md](../04-configuration/INGRESS_VERIFICATION_RUNBOOK.md) Full ingress verification