Files
proxmox/docs/03-deployment/OPERATIONAL_RUNBOOKS.md
defiQUG dbd517b279 Sync workspace: config, docs, scripts, CI, operator rules, and submodule pointers.
- Update dbis_core, cross-chain-pmm-lps, explorer-monorepo, metamask-integration, pr-workspace/chains
- Omit embedded publish git dirs and empty placeholders from index

Made-with: Cursor
2026-04-12 06:12:20 -07:00

33 KiB
Raw Blame History

Operational Runbooks - Master Index

Navigation: Home > Deployment > Operational Runbooks

Last Updated: 2026-03-30
Document Version: 1.3
Status: Active Documentation


Overview

This document provides a master index of all operational runbooks and procedures for the Sankofa/Phoenix/PanTel Proxmox deployment. For issue-specific troubleshooting (RPC, QBFT, SSH, tunnel, etc.), see ../09-troubleshooting/README.md and TROUBLESHOOTING_FAQ.md.

Proxmox VE hosts, peering, FQDN/NPMplus summary, deployment gates (human + JSON): PROXMOX_VE_OPERATIONAL_DEPLOYMENT_TEMPLATE.md.

DBIS institutional (HYBX OMNL, DBIS Core, Chain 138 Smart Vaults, external RTGS, identifiers, Blockscout labels): OMNL_DBIS_CORE_CHAIN138_SMART_VAULT_RTGS_RUNBOOK.md, OJK_BI_AUDIT_JVMTM_REMEDIATION_AND_UETR_POLICY.md (Chain 138 as SWIFT-replacement identifiers), config/dbis-institutional/README.md, DBIS_RTGS_FX_TRANSACTION_CATALOG.md, scripts/verify/sync-blockscout-address-labels-from-registry.sh.


Quick Reference

Emergency Procedures

VM/Container Restart

To restart all stopped containers across Proxmox hosts via SSH:

# From project root; source config for host IPs
source config/ip-addresses.conf

# List stopped per host
for host in $PROXMOX_HOST_ML110 $PROXMOX_HOST_R630_01 $PROXMOX_HOST_R630_02; do
  ssh root@$host "pct list | awk '\$2==\"stopped\" {print \$1}'"
done

# Start each (replace HOST and VMID)
ssh root@HOST "pct start VMID"

Verification: scripts/verify/verify-backend-vms.sh | Report: VM_RESTART_AND_VERIFICATION_20260203.md

CT 2301 corrupted rootfs: If besu-rpc-private-1 (ml110) fails with pre-start hook: scripts/fix-ct-2301-corrupted-rootfs.sh

Common Operations


Network Operations

ER605 Router Configuration

  • ER605_ROUTER_CONFIGURATION.md - Complete router configuration guide
  • VLAN Configuration - Setting up VLANs on ER605
  • NAT Pool Configuration - Configuring role-based egress NAT
  • Failover Configuration - Setting up WAN failover

VLAN Management

  • VLAN Migration - Migrating from flat LAN to VLANs
  • VLAN Troubleshooting - Common VLAN issues and solutions
  • Inter-VLAN Routing - Configuring routing between VLANs

Edge and DNS (Fastly / Direct to NPMplus)

NPMplus API update and recovery

  • Primary admin URL: https://192.168.11.167:81 (VMID 10233 on r630-01)
  • If TCP connects but HTTP never returns: treat CT 10233 as wedged and reboot it from r630-01 with pct reboot 10233, then re-check :81 for the expected 301 redirect.
  • API updater: NPM_URL=https://192.168.11.167:81 bash scripts/nginx-proxy-manager/update-npmplus-proxy-hosts-api.sh
  • Script behavior: curl_npm and try_connect use -L, so port 81 redirects do not break POST /api/tokens with 400 Payload is undefined; IP_NPMPLUS_ETH1 is optional-safe under set -u.
  • Large .env warning: if your normal set -a; source .env flow fails with Argument list too long, avoid exporting the entire file for NPM-only runs. Pull only the needed credentials, for example:
NPM_EMAIL="$(grep '^NPM_EMAIL=' .env | tail -n1 | cut -d= -f2-)"
NPM_PASSWORD="$(grep '^NPM_PASSWORD=' .env | tail -n1 | cut -d= -f2-)"
NPM_URL=https://192.168.11.167:81 \
NPM_EMAIL="$NPM_EMAIL" \
NPM_PASSWORD="$NPM_PASSWORD" \
bash scripts/nginx-proxy-manager/update-npmplus-proxy-hosts-api.sh
  • Verified on 2026-03-26: after rebooting CT 10233, bash scripts/nginx-proxy-manager/update-npmplus-proxy-hosts-api.sh completed with 39 hosts updated, 0 failed, including the-order.sankofa.nexus, www.the-order.sankofa.nexus, and studio.sankofa.nexus.

Cloudflare (DNS and optional Access)

  • CLOUDFLARE_ZERO_TRUST_GUIDE.md - Cloudflare setup (DNS retained; Option B tunnel for RPC only)
  • Application Publishing - Publishing applications via Cloudflare Access (optional)
  • Access Policy Management - Managing access policies

Smart Accounts (Chain 138 / ERC-4337)

  • Location: smom-dbis-138/script/smart-accounts/DeploySmartAccountsKit.s.sol
  • Env (required for deploy/use): PRIVATE_KEY, RPC_URL_138. Optional: ENTRY_POINT, SMART_ACCOUNT_FACTORY, PAYMASTER — set to deployed addresses to use existing contracts; otherwise deploy EntryPoint (ERC-4337), AccountFactory (e.g. MetaMask Smart Accounts Kit), and optionally Paymaster, then set in .env and re-run.
  • Run: forge script script/smart-accounts/DeploySmartAccountsKit.s.sol --rpc-url $RPC_URL_138 --broadcast (from smom-dbis-138). If addresses are in env, script logs them; else it logs next steps.
  • See: PLACEHOLDERS_AND_TBD.md — Smart Accounts Kit.

Besu Operations

Node Management

Adding a Validator

Prerequisites:

  • Validator key generated
  • VMID allocated (1000-1499 range)
  • VLAN 110 configured (if migrated)

Steps:

  1. Create LXC container with VMID
  2. Install Besu
  3. Configure validator key
  4. Add to static-nodes.json on all nodes
  5. Update allowlist (if using permissioning)
  6. Start Besu service
  7. Verify validator is participating

See: VALIDATED_SET_DEPLOYMENT_GUIDE.md

Removing a Validator

Prerequisites:

  • Validator is not critical (check quorum requirements)
  • Backup validator key

Steps:

  1. Stop Besu service
  2. Remove from static-nodes.json on all nodes
  3. Update allowlist (if using permissioning)
  4. Remove container (optional)
  5. Document removal

Upgrading Besu

Prerequisites:

  • Backup current configuration
  • Test upgrade in dev environment
  • Create snapshot before upgrade

Steps:

  1. Create snapshot: pct snapshot <vmid> pre-upgrade-$(date +%Y%m%d)
  2. Stop Besu service
  3. Backup configuration and keys
  4. Install new Besu version
  5. Update configuration if needed
  6. Start Besu service
  7. Verify node is syncing
  8. Monitor for issues

Rollback:

  • If issues occur: pct rollback <vmid> pre-upgrade-YYYYMMDD

Node list deploy and verify (static-nodes.json / permissions-nodes.toml)

Canonical source: config/besu-node-lists/ (single source of truth; 30 nodes in allowlist after 203/204 removed; 32 Besu nodes total).

  • Deploy to all nodes: scripts/deploy-besu-node-lists-to-all.sh (optionally --dry-run). Pushes static-nodes.json and permissions-nodes.toml to /etc/besu/ on every validator, sentry, and RPC (VMIDs 10001004, 15001508, 2101, 2102, 2201, 2301, 23032308, 24002403, 25002505).
  • Verify presence and match canonical: scripts/verify/verify-static-permissions-on-all-besu-nodes.sh --checksum.
  • Restart Besu to reload lists: scripts/besu/restart-besu-reload-node-lists.sh (optional; lists are read at startup).
  • Full-mesh peering (all 32 nodes): Every node needs max-peers=32. Repo configs updated; to apply on running nodes run scripts/maintenance/set-all-besu-max-peers-32.sh then restart. See 08-monitoring/PEER_CONNECTIONS_PLAN.md.

See: 06-besu/BESU_NODES_FILE_REFERENCE.md, 08-monitoring/RPC_AND_VALIDATOR_TESTING_RUNBOOK.md.

RPC block production (chain 138 / current block)

If an RPC node returns wrong chain ID or block 0 / no block: use the dedicated runbook for status checks and common fixes (host-allowlist, tx-pool-min-score, permissions/static-nodes paths, discovery, Besu binary/genesis).

Allowlist Management

Common Operations:

  • Generate allowlist from nodekeys
  • Update allowlist on all nodes
  • Verify allowlist is correct
  • Troubleshoot allowlist issues

Consensus Troubleshooting


Liquidity & Multi-Chain (cUSDT/cUSDC)

  • CUSDT_CUSDC_MULTICHAIN_LIQUIDITY_RUNBOOK.md — Deploy cUSDT/cUSDC to other chains (Ethereum, BSC, Polygon, Base, etc.); create Dodo PMM and Uniswap pools; add to Balancer, Curve. Scripts: deploy-cusdt-cusdc-all-chains.sh, deploy-pmm-all-l2s.sh, create-uniswap-v3-pool-cusdt-cusdc.sh.
  • USDW_PUBLIC_WRAP_VAULT_RUNBOOK.md — Deploy native public-chain USDW -> cWUSDW wrap vault and share cWUSDW roles with the GRU bridge; BSC re-use, Polygon-first deployment gate.
  • AUSDT_CAUSDT_CWAUSDT_BRIDGE_CHECKLIST.md — ALL Mainnet AUSDT origin pins, public cWAUSDT mirrors, and the activation gate for landing on Chain 138 as cAUSDT.
  • CXAUC_CXAUT_CWAXAUC_CWAXAUT_ALLTRA_BRIDGE_CHECKLIST.md — ALL Mainnet gold corridor: cXAUC/cXAUT source assets, cWAXAUC/cWAXAUT bridge-minted ALL Mainnet wrappers, cAXAUC/cAXAUT unwrapped landing assets, and the activation gate.
  • WORMHOLE_NTT_EXECUTOR_OPERATOR_RUNBOOK.md — Wormhole NTT + Executor operator preparation, NTT CLI bootstrap, and the current Chain 138 support boundary.
  • LIQUIDITY_POOL_CONTROLS_RUNBOOK.md — Trustless LiquidityPoolETH, DODO PMM, PoolManager, LiquidityManager controls and funding.
  • MAINNET_PMM_TRUU_CWUSD_PEG_AND_BOT_RUNBOOK.md — Ethereum mainnet: TRUU PMM liquidity, cWUSD* peg maintenance, deeper deployed pools, bot guardrails; verifier scripts/verify/check-mainnet-pmm-peg-bot-readiness.sh; deploy scripts/deployment/deploy-mainnet-pmm-cw-truu-pool.sh; USD-notional seed helper scripts/deployment/compute-mainnet-truu-pmm-seed-amounts.sh; ratio-matched top-up scripts/deployment/add-mainnet-truu-pmm-topup.sh (section 11 live inventory); large 1:1 USD legs scripts/deployment/compute-mainnet-truu-liquidity-amounts.sh (section 11.1).
  • Runbooks master index: ../RUNBOOKS_MASTER_INDEX.md — All runbooks across the repo.

GRU M1 Listing Operations

GRU M1 Listing Dry-Run

See also: docs/gru-m1/


Blockscout & Contract Verification

Blockscout (VMID 5000)

Forge Contract Verification

Forge verify-contract fails against Blockscout with "Params 'module' and 'action' are required". Use the dedicated proxy.

Preferred (orchestrated; starts proxy if needed):

source smom-dbis-138/.env 2>/dev/null
./scripts/verify/run-contract-verification-with-proxy.sh

Manual (proxy + verify):

  1. Start proxy: BLOCKSCOUT_URL=http://192.168.11.140:4000 node forge-verification-proxy/server.js
  2. Run: ./scripts/verify-contracts-blockscout.sh

Alternative: Nginx fix (scripts/fix-blockscout-forge-verification.sh) or manual verification at https://explorer.d-bis.org/address/#verify-contract

See:


CCIP Operations

CCIP Relay Service (Chain 138 → Mainnet)

Status: Deployed on r630-01 (192.168.11.11) at /opt/smom-dbis-138/services/relay

Quick commands:

# View logs
ssh root@192.168.11.11 "tail -f /opt/smom-dbis-138/services/relay/relay-service.log"

# Restart
ssh root@192.168.11.11 "pkill -f 'node index.js' 2>/dev/null; sleep 2; cd /opt/smom-dbis-138/services/relay && nohup ./start-relay.sh >> relay-service.log 2>&1 &"

Configuration: Uses RPC_URL_138_PUBLIC for Chain 138 (VMID 2201). On the relay host (LAN): http://192.168.11.221:8545. For published URLs and checks from the internet, use HTTPS FQDN https://rpc-http-pub.d-bis.org. START_BLOCK=latest.

XDC Zero + Chain 138 (parallel to CCIP)

OP Stack Standard Rollup (Ethereum mainnet, Superchain registry)

CCIP Deployment

WETH9 Bridge (Chain 138) Router mismatch fix: Run scripts/deploy-and-configure-weth9-bridge-chain138.sh (requires PRIVATE_KEY); then set CCIPWETH9_BRIDGE_CHAIN138 to the printed address. Deploy scripts now default to working CCIP router (0x8078A...). See 07-ccip/README.md, COMPREHENSIVE_STATUS_BRIDGE_READY.md, scripts/README.md.

Deployment Phases:

  1. Deploy Ops/Admin nodes (5400-5401)
  2. Deploy Monitoring nodes (5402-5403)
  3. Deploy Commit nodes (5410-5425)
  4. Deploy Execute nodes (5440-5455)
  5. Deploy RMN nodes (5470-5476)

CCIP Node Management

  • Adding CCIP Node - Add new CCIP node to fleet
  • Removing CCIP Node - Remove CCIP node from fleet
  • CCIP Node Troubleshooting - Common CCIP issues

Admin Runner (Scripts / MCP) — Phase 4.4

Purpose: Run admin scripts and MCP tooling with central audit (who ran what, when, outcome). Design and implementation when infra admin view is built.

  • Design: Runner service or wrapper that (1) authenticates (e.g. JWT or API key), (2) executes script/MCP action, (3) appends to central audit (dbis_core POST /api/admin/central/audit) with actor, action, resource, outcome.
  • Docs: MASTER_PLAN.md §4.4; admin-console-frontend-plan.md.
  • When: Implement with org-level panel and infra admin view.

Phase 2 & 3 Deployment (Infrastructure)

Phase 2 — Monitoring stack: Deploy Prometheus, Grafana, Loki, Alertmanager; configure Cloudflare Access; enable health-check alerting. See MONITORING_SUMMARY.md, MASTER_PLAN.md §5.

Phase 2 — Security: SSH key-based auth (disable password); firewall Proxmox API (port 8006); secure validator keys; audits VLT-024, ISO-024; bridge integrations BRG-VLT, BRG-ISO. See SECRETS_KEYS_CONFIGURATION.md, IMPLEMENTATION_CHECKLIST.md.

Phase 2 — Backups: Automated backup script; encrypted validator keys; NPMplus backup (NPM_PASSWORD); config backup. See BACKUP_AND_RESTORE.md, scripts/backup-proxmox-configs.sh, scripts/verify/backup-npmplus.sh.

Phase 3 — CCIP fleet: Ops/Admin nodes (5400-5401), commit/execute/RMN nodes, NAT pools. See CCIP_DEPLOYMENT_SPEC.md, OPERATIONAL_RUNBOOKS.md § CCIP Operations.

Phase 4 — Sovereign tenants (docs/runbook): VLANs 200203 (Phoenix Sovereign Cloud Band), Block #6 egress NAT, tenant isolation. Script: scripts/deployment/phase4-sovereign-tenants.sh [--show-steps|--dry-run]. Docs: ORCHESTRATION_DEPLOYMENT_GUIDE.md § Phase 4, NETWORK_ARCHITECTURE.md (VLAN 200203), UDM_PRO_FIREWALL_MANUAL_CONFIGURATION.md (sovereign tenant isolation rules).


Monitoring & Observability

Monitoring Setup

Components:

  • Prometheus metrics collection
  • Grafana dashboards
  • Loki log aggregation
  • Alertmanager alerting

Health Checks

  • Node Health Checks - Check individual node health
  • Service Health Checks - Check service status
  • Network Health Checks - Check network connectivity

Scripts:

  • check-node-health.sh - Node health check script
  • check-service-status.sh - Service status check

Backup & Recovery

Backup Procedures

  • Configuration Backup - Backup all configuration files
  • Validator Key Backup - Encrypted backup of validator keys
  • Container Backup - Backup container configurations

Automated Backups:

  • Scheduled daily backups
  • Encrypted storage
  • Multiple locations
  • 30-day retention

Disaster Recovery

  • Service Recovery - Recover failed services
  • Network Recovery - Recover network connectivity
  • Full System Recovery - Complete system recovery

Recovery Procedures:

  1. Identify failure point
  2. Restore from backup
  3. Verify service status
  4. Monitor for issues

Maintenance (ALL_IMPROVEMENTS 135139)

# Task Frequency Command / Script
135 Monitor explorer sync status Daily `curl -s http://192.168.11.140:4000/api/v1/stats
136 Monitor RPC node health (e.g. VMID 2201) Daily bash scripts/verify/verify-backend-vms.sh; curl -s -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' to $RPC_URL_138_PUBLIC — prefer https://rpc-http-pub.d-bis.org when not on LAN; on LAN, http://192.168.11.221:8545 is OK
137 Check config API uptime Weekly curl -sI https://dbis-api.d-bis.org/health or target config API URL
138 Review explorer logs (O-4) Weekly See O-4 below. ssh root@<explorer-host> "journalctl -u blockscout -n 200 --no-pager" or pct exec 5000 -- journalctl -u blockscout -n 200 --no-pager. Explorer: VMID 5000 (r630-02, 192.168.11.140).
139 Update token list (O-5) As needed See O-5 below. Canonical list: token-lists/lists/dbis-138.tokenlist.json. Guide: TOKEN_LIST_AUTHORING_GUIDE.md. Bump version and timestamp; validate schema; deploy/public URL per runbook.

O-4 (Review explorer logs, weekly): Run weekly or after incidents. From a host with SSH to the Blockscout node: ssh root@192.168.11.XX "journalctl -u blockscout -n 200 --no-pager" (replace with actual Proxmox/container host for VMID 5000), or from Proxmox host: pct exec 5000 -- journalctl -u blockscout -n 200 --no-pager. Check for indexer errors, DB connection issues, OOM.

O-5 (Update token list, as needed): Edit token-lists/lists/dbis-138.tokenlist.json; bump version.major|minor|patch and timestamp; run validation (see TOKEN_LIST_AUTHORING_GUIDE); update any public URL (e.g. tokens.d-bis.org) and explorer/config API token list reference.

Script: scripts/maintenance/daily-weekly-checks.sh [daily|weekly|all] — daily: explorer, RPC, indexer lag, in-CT disk (138b); weekly: config API, thin pool all hosts (138a), fstrim (138c), journal vacuum (138d). Cron: install from a persistent host checkout, e.g. CRON_PROJECT_ROOT=/srv/proxmox bash scripts/maintenance/schedule-daily-weekly-cron.sh --install (daily 08:00, weekly Sun 09:00). Storage: CRON_PROJECT_ROOT=/srv/proxmox bash scripts/maintenance/schedule-storage-growth-cron.sh --install (collect every 6h, prune snapshots+history Sun 08:00); CRON_PROJECT_ROOT=/srv/proxmox bash scripts/maintenance/schedule-storage-monitor-cron.sh --install (host alerts daily 07:00). See 04-configuration/STORAGE_GROWTH_AND_HEALTH.md.

When decommissioning or changing RPC nodes

Explorer (VMID 5000) depends on RPC at ETHEREUM_JSONRPC_HTTP_URL. Point it at VMID 2201 (public Besu). Prefer https://rpc-http-pub.d-bis.org in Blockscout when the indexer reaches RPC over NPM/tunnel; on-LAN-only installs may use http://192.168.11.221:8545. When you decommission or change the IP of an RPC node that Blockscout might use:

  1. Check Blockscout env on VM 5000:
    pct exec 5000 -- bash -c 'grep -E "ETHEREUM_JSONRPC|RPC" /opt/blockscout/.env 2>/dev/null || docker inspect blockscout 2>/dev/null | grep -A5 Env' (run from root@r630-02, 192.168.11.12).
  2. If it points to the affected node, update to a live RPC (set ETHEREUM_JSONRPC_HTTP_URL to https://rpc-http-pub.d-bis.org or, on LAN, http://192.168.11.221:8545) in Blockscout env and restart Blockscout.
  3. Update any script defaults and config/ip-addresses.conf / docs that reference the old RPC.

See BLOCKSCOUT_FIX_RUNBOOK.md § "Proactive: When changing RPC or decommissioning nodes" and SOLACESCANSCOUT_DEEP_DIVE_FIXES_AND_TIMING.md.

After NPMplus or DNS changes

Run E2E routing (includes explorer.d-bis.org):
bash scripts/verify/verify-end-to-end-routing.sh --profile=public

After frontend or Blockscout deploy

From a host on LAN that can reach 192.168.11.140, run full explorer E2E:
bash explorer-monorepo/scripts/e2e-test-explorer.sh

Before/after Blockscout version or config change

Run migrations (SSL-disabled DB URL):
bash scripts/fix-blockscout-ssl-and-migrations.sh (on Proxmox host r630-02 or via SSH).
See BLOCKSCOUT_FIX_RUNBOOK.md.


Security Operations

Key Management

  • SECRETS_KEYS_CONFIGURATION.md - Secrets and keys management
  • Validator Key Rotation - Rotate validator keys
  • API Token Rotation - Rotate API tokens

Access Control (Phase 2 — Security)

  • SSH key-based auth; disable password auth: On each Proxmox host and key VMs: sudo sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config; sudo systemctl reload sshd. Ensure SSH keys are deployed first. See IMPLEMENTATION_CHECKLIST.md. Scripts: scripts/security/setup-ssh-key-auth.sh [--dry-run|--apply].
  • Firewall: restrict Proxmox API (port 8006): Allow only admin IPs. Example (iptables): iptables -A INPUT -p tcp --dport 8006 -s <ADMIN_CIDR> -j ACCEPT; iptables -A INPUT -p tcp --dport 8006 -j DROP. Or use Proxmox firewall / UDM Pro rules. Script: scripts/security/firewall-proxmox-8006.sh [--dry-run|--apply] [CIDR]. Document in NETWORK_ARCHITECTURE.md.
  • Secure validator keys (W1-19): On Proxmox host as root: scripts/secure-validator-keys.sh [--dry-run] — chmod 600/700, chown besu:besu on VMIDs 10001004.
  • Cloudflare Access - Manage Cloudflare Access policies

Troubleshooting

Common Issues

Diagnostic Procedures

  1. Check Service Status

    systemctl status besu-validator
    
  2. Check Logs

    journalctl -u besu-validator -f
    
  3. Check Network Connectivity

    ping <node-ip>
    
  4. Check Node Health

    ./scripts/health/check-node-health.sh <vmid>
    

Emergency Procedures

Emergency Access

Break-glass Access:

  1. Use emergency SSH endpoint (if configured)
  2. Access via Cloudflare Access (if available)
  3. Physical console access (last resort)

Emergency Contacts:

  • Infrastructure Team: [contact info]
  • On-call Engineer: [contact info]

Service Recovery

Priority Order:

  1. Validators (critical for consensus)
  2. RPC nodes (critical for access)
  3. Monitoring (important for visibility)
  4. Other services

Recovery Steps:

  1. Identify failed service
  2. Check service logs
  3. Restart service
  4. If restart fails, restore from backup
  5. Verify service is operational

Network Recovery

Network Issues:

  1. Check ER605 router status
  2. Check switch status
  3. Check VLAN configuration
  4. Check firewall rules
  5. Test connectivity

VLAN Issues:

  1. Verify VLAN configuration on switches
  2. Verify VLAN configuration on ER605
  3. Verify Proxmox bridge configuration
  4. Test inter-VLAN routing

Maintenance Windows

Scheduled Maintenance

  • Weekly: Health checks, log review
  • Monthly: Security updates, configuration review
  • Quarterly: Full system review, backup testing

Maintenance Procedures

  1. Notify Stakeholders - Send maintenance notification
  2. Create Snapshots - Snapshot all containers before changes
  3. Perform Maintenance - Execute maintenance tasks
  4. Verify Services - Verify all services are operational
  5. Document Changes - Document all changes made

Maintenance procedures (Ongoing)

Task Frequency Reference
Monitor explorer sync (O-1) Daily 08:00 Cron: schedule-daily-weekly-cron.sh; script: daily-weekly-checks.sh daily
Monitor RPC 2201 (O-2) Daily 08:00 Same cron/script
Config API uptime (O-3) Weekly (Sun 09:00) daily-weekly-checks.sh weekly
Review explorer logs (O-4) Weekly Runbook [138] above; pct exec 5000 -- journalctl -u blockscout -n 200 or SSH to Blockscout host
Update token list (O-5) As needed Runbook [139] above; token-lists/lists/dbis-138.tokenlist.json; TOKEN_LIST_AUTHORING_GUIDE.md
NPMplus backup When NPMplus is up scripts/verify/backup-npmplus.sh
Validator key/config backup Per backup policy W1-8; BACKUP_AND_RESTORE.md
Ensure FireFly primary (6200) As needed scripts/maintenance/ensure-firefly-primary-via-ssh.sh (r630-02)
Ensure Fabric sample network (6000) As needed scripts/maintenance/ensure-fabric-sample-network-via-ssh.sh (r630-02)
Start firefly-ali-1 (6201) Optional, when needed scripts/maintenance/start-firefly-6201.sh (r630-02)

Troubleshooting

Architecture & Design

Configuration

Deployment

Monitoring

Reference


Document Status: Active
Maintained By: Infrastructure Team
Review Cycle: Monthly
Last Updated: 2026-02-05