Files
proxmox/docs/04-configuration/cloudflare/CLOUDFLARE_TUNNEL_502_FIX_RUNBOOK.md
defiQUG fbda1b4beb
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:46:57 -08:00

7.7 KiB
Raw Blame History

Cloudflare Tunnel 502 Fix — Practical Order of Operations

Last Updated: 2026-02-05
Status: Active runbook
See also: CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md (deprecated), CLOUDFLARE_ROUTING_MASTER.md


Overview

502 Bad Gateway with Cloudflare Tunnel means Cloudflares edge received an invalid or no response from your side of the tunnel (cloudflared → origin). Fix by: (1) confirming where cloudflared runs and what it points to, (2) verifying the origin is reachable from cloudflared, (3) checking cloudflared logs, (4) aligning tunnel ingress with the current proxy (NPMplus).


Step 1: Confirm where cloudflared runs and what each ingress URL is

Where cloudflared runs

  • Public app tunnel (explorer, rpc-, dbis-, mim4u, etc.): documented as VMID 102 (LXC). Infrastructure range 100108 is typically on one Proxmox host (e.g. ml110 192.168.11.10).
  • Per-host tunnels (Proxmox UI: ml110-01.d-bis.org, r630-01.d-bis.org, etc.): run on each host; configs in scripts/cloudflare-tunnels/configs/.

To find which node has VMID 102:

# From a Proxmox host (or SSH to 192.168.11.10 / .11 / .12)
for h in 192.168.11.10 192.168.11.11 192.168.11.12; do
  ssh -o ConnectTimeout=3 root@$h "pct list 2>/dev/null | grep -E '^\s*102\s'" && echo "VMID 102 on $h" && break
done
Source HTTP ingress target Notes
CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md http://192.168.11.21:80 Central Nginx (VMID 105). Likely cause of 502 if VMID 105 is decommissioned or unreachable.
CENTRAL_NGINX_ROUTING_SETUP.md http://192.168.11.26:80 Alternate Nginx IP in some docs.
Recommended (current architecture) http://192.168.11.167:80 NPMplus (VMID 10233). Single proxy for all public hostnames; route by Host header.

Action: In Cloudflare Zero Trust → Networks → Tunnels → your public tunnel (e.g. ID 10ab22da-8ea3-4e2e-a896-27ece2211a05) → Public Hostnames, note the URL for each hostname. If it is 192.168.11.21:80 or 192.168.11.26:80, switch it to http://192.168.11.167:80 (NPMplus) so tunnel matches current architecture.


Step 2: Verify origin from cloudflared host (curl)

From the host that runs the public cloudflared (VMID 102), the tunnels ingress target must be reachable. Run these from inside VMID 102 (or from the Proxmox node using pct exec 102 -- ...).

Replace INGRESS_TARGET with the URL currently in the dashboard (e.g. 192.168.11.21 or 192.168.11.167) and PORT with 80.

# From Proxmox node that has VMID 102 (e.g. root@192.168.11.10)
VMID=102
TARGET="${INGRESS_TARGET:-192.168.11.167}"
PORT="${PORT:-80}"

# Quick connectivity
pct exec $VMID -- curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 "http://${TARGET}:${PORT}/" -H "Host: dbis-admin.d-bis.org"

# If 200/301/302, origin is reachable. If 000/timeout, origin is down or unreachable from 102.

Recommended: Use the verification script so you dont have to type this manually:

# From repo root (requires SSH to Proxmox host that has VMID 102)
bash scripts/verify/verify-cloudflare-tunnel-ingress.sh [--host 192.168.11.10]

The script tries both the old target (192.168.11.21:80) and NPMplus (192.168.11.167:80) and reports which responds. If only NPMplus responds, update the tunnels Public Hostnames to http://192.168.11.167:80.


Step 3: Check cloudflared logs when you hit 502

On the machine or container running the public cloudflared (VMID 102):

# From Proxmox node
pct exec 102 -- journalctl -u cloudflared -n 100 --no-pager

# Or follow live while you reproduce 502
pct exec 102 -- journalctl -u cloudflared -f

What to look for:

  • Unable to reach the origin service → cloudflared cannot reach the URL in ingress (wrong IP, firewall, or service down). Fix: correct the URL (e.g. to NPMplus 192.168.11.167:80) or fix network/service.
  • dial tcp ... i/o timeout → timeout to origin. Fix: increase connectTimeout in tunnel config or fix slow/down origin.
  • connection refused → nothing listening on that IP:port. Fix: point to correct proxy (NPMplus) or start the service.

Healthy tunnel to Cloudflare does not guarantee origin is reachable; the log confirms whether the failure is tunnel↔Cloudflare or cloudflared→origin.


Step 4: Align tunnel with current architecture (NPMplus)

Current design: one public proxy — NPMplus (VMID 10233 at 192.168.11.167). All public hostnames (explorer, rpc-, dbis-, mim4u, etc.) should resolve via NPMplus by Host header.

Set all public hostnames in the tunnels Public Hostnames to:

  • URL: http://192.168.11.167:80
  • Type: HTTP

No need for separate WebSocket URLs to 252/251 unless you intentionally bypass NPMplus; NPMplus can handle WebSocket for RPC (and is already configured for it in the Fastly/direct path).

If you keep WebSocket hostnames pointing directly to RPC nodes:

  • rpc-ws-pub.d-bis.orghttps://192.168.11.221:8546 (or keep NPMplus: http://192.168.11.167:80 with Host header)
  • rpc-ws-prv.d-bis.orghttps://192.168.11.211:8546 (or NPMplus)

Using a single backend http://192.168.11.167:80 for every hostname is the simplest and matches the rest of your routing.

Reference ingress (all hostnames → NPMplus):

ingress:
  - hostname: explorer.d-bis.org
    service: http://192.168.11.167:80
  - hostname: rpc-http-pub.d-bis.org
    service: http://192.168.11.167:80
  - hostname: rpc-http-prv.d-bis.org
    service: http://192.168.11.167:80
  - hostname: rpc-ws-pub.d-bis.org
    service: http://192.168.11.167:80
  - hostname: rpc-ws-prv.d-bis.org
    service: http://192.168.11.167:80
  - hostname: dbis-admin.d-bis.org
    service: http://192.168.11.167:80
  - hostname: dbis-api.d-bis.org
    service: http://192.168.11.167:80
  - hostname: dbis-api-2.d-bis.org
    service: http://192.168.11.167:80
  - hostname: mim4u.org
    service: http://192.168.11.167:80
  - hostname: www.mim4u.org
    service: http://192.168.11.167:80
  - service: http_status:404

Configure the same in Cloudflare Zero Trust → Tunnels → your tunnel → Public Hostnames (each hostname → URL http://192.168.11.167:80).

Optional: originRequest timeouts (if you still see timeouts after fixing URL)

In the Cloudflare dashboard, for the tunnel Public Hostname you can add Additional application settings (or in config file):

  • Connect timeout: e.g. 30s
  • TCP keep-alive: 30s

If your origin (NPMplus or backend) is slow, increasing these can reduce 502s caused by timeouts.

After updating the dashboard

  1. Save the tunnel configuration.
  2. Optionally restart cloudflared:
    pct exec 102 -- systemctl restart cloudflared
  3. Re-test the hostnames that were returning 502 (e.g. curl -I https://explorer.d-bis.org, curl -I https://dbis-admin.d-bis.org).

Summary checklist

  • Step 1: Identify which host runs VMID 102 and what URL each Public Hostname uses (21 vs 26 vs 167).
  • Step 2: Run verify-cloudflare-tunnel-ingress.sh (or manual curl from VMID 102) to confirm http://192.168.11.167:80 responds for test hostnames.
  • Step 3: Reproduce 502 and check journalctl -u cloudflared for “Unable to reach the origin service” or timeout errors.
  • Step 4: Change all Public Hostnames to http://192.168.11.167:80 (NPMplus); restart cloudflared if needed; re-test.

Paying for Cloudflare does not fix 502s; fixing the origin URL and reachability does.