2026-03-04 02:03:08 -08:00
# Fix Block Production — Runbook
**Last Updated:** 2026-03-04
**When:** Block production is stalled on Chain 138 (no new blocks; validators active).
---
## 1. Confirm the problem
```bash
# Block not advancing (run twice, 10s apart)
cast block-number --rpc-url http://192.168.11.211:8545
sleep 10
cast block-number --rpc-url http://192.168.11.211:8545
# If same → stalled
```
```bash
./scripts/monitoring/monitor-blockchain-health.sh
# Look for: "Block production stalled (no new blocks in 5s)"
```
---
## 2. Check validator status and height
All 5 validators (1000– 1004) must be **active ** and ideally at **chain head ** :
```bash
# Service status (from repo root)
for spec in "1000:192.168.11.11" "1001:192.168.11.11" "1002:192.168.11.11" "1003:192.168.11.10" "1004:192.168.11.10"; do
IFS=: read -r vmid host <<< "$spec"
s=$(ssh -o ConnectTimeout=6 root@"$host" "pct exec $vmid -- systemctl is-active besu-validator 2>/dev/null" || echo "?")
echo "Validator $vmid: $s"
done
```
Optional: check block height per validator (metrics on port 9545):
```bash
ssh root@192 .168.11.11 "pct exec 1000 -- curl -s -m 4 http://127.0.0.1:9545/metrics" | grep -E '^ethereum_best_known_block_number |^besu_blockchain_difficulty_total '
# Should be ~2547803 (chain head)
```
---
## 3. Apply fix: staggered restart
Restart validators **one at a time ** so the rest stay at head and the restarted node syncs quickly. This preserves quorum and avoids "everyone in full sync."
```bash
cd /home/intlc/projects/proxmox
./scripts/maintenance/fix-block-production-staggered-restart.sh
```
- **Dry run:** `./scripts/maintenance/fix-block-production-staggered-restart.sh --dry-run`
- **Duration:** ~7– 8 minutes (90s wait between each of 5 restarts + final 30s).
- **Order:** 1004 → 1003 → 1002 → 1001 → 1000 (ML110 first, then R630-01).
---
## 4. Verify block production
```bash
./scripts/monitoring/monitor-blockchain-health.sh
# Expect: "Block production" advancing (block diff > 0 in 5s window)
```
Or:
```bash
watch -n 5 'cast block-number --rpc-url http://192.168.11.211:8545'
# Block number should increase every ~2s (genesis blockperiodseconds=2)
```
---
## 5. If still stalled
docs: update master documentation and push to Gitea (2026-03-06)
- MASTER_INDEX: Last Updated 2026-03-06; status 59/59 contracts; add NEXT_STEPS_LIST, CONTRACT_NEXT_STEPS_LIST
- docs/README, NEXT_STEPS_INDEX, 06-besu/MASTER_INDEX: Last Updated 2026-03-06
- Contract check script: 59 addresses (PMM, vault/reserve, CompliantFiatTokens); canonical CCIP/router
- New docs: EXECUTION_CHECKLIST, NEXT_STEPS_LIST, DOTENV_AUDIT, ADDITIONAL_PATHS, deployer gas runbook, WEMIX_ACQUISITION_TABLED, etc.
- Config: deployer-gas-routes, cro-wemix-swap-routes, routing-registry, token-mapping
- Scripts: check-contracts-on-chain-138, check-pmm-pool-balances-chain138, deployer-gas-auto-route, acquire-cro-and-wemix-gas
- Operator rule: operator-lan-access-check.mdc
Made-with: Cursor
2026-03-06 19:11:25 -08:00
**Quorum:** With 5 validators, QBFT needs **4 at chain head ** (2F+1) to produce blocks. If only 3 are at head (e.g. 1000, 1001, 1002), blocks will not advance until 1003 and/or 1004 sync to head. Check each validator's `ethereum_best_known_block_number` or `besu_blockchain_difficulty_total` (metrics on port 9545); all should match RPC block number.
2026-03-04 02:03:08 -08:00
1. **Validator peer count: ** Validators must peer with each other. On a validator:
`pct exec <vmid> -- curl -s http://127.0.0.1:9545/metrics | grep besu_peers_connected_total`
Should be several (e.g. 4+). If 0, check static-nodes / permissions and P2P ports (30303).
2. **Check validator logs ** for QBFT/consensus errors:
```bash
ssh root@192 .168.11.11 "pct exec 1000 -- journalctl -u besu-validator -n 100 --no-pager" | grep -iE 'qbft|consensus|propos|round|error'
```
2. **Check time sync: ** QBFT is time-based; ensure NTP on all Proxmox hosts and containers.
3. **Enable INFO logging ** (see [CRITICAL_ISSUE_BLOCK_PRODUCTION_STOPPED.md ](CRITICAL_ISSUE_BLOCK_PRODUCTION_STOPPED.md ) § Enable Verbose Logging) and restart one validator; watch logs for round/proposal messages.
4. **Genesis: ** Confirm `config.qbft.blockperiodseconds` (e.g. 2) and validator set in genesis match running nodes.
---
## References
- [CRITICAL_ISSUE_BLOCK_PRODUCTION_STOPPED.md ](CRITICAL_ISSUE_BLOCK_PRODUCTION_STOPPED.md )
- [SOLUTION_QUORUM_LOSS.md ](SOLUTION_QUORUM_LOSS.md ) — if fewer than 4/5 validators are running
- Script: `scripts/maintenance/fix-block-production-staggered-restart.sh`