Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
- Marked submodules ai-mcp-pmm-controller, explorer-monorepo, and smom-dbis-138 as dirty to reflect recent changes. - Updated documentation to clarify operator script usage, including dotenv loading and task execution instructions. - Enhanced the README and various index files to provide clearer navigation and task completion guidance. Made-with: Cursor
751 lines
21 KiB
Markdown
751 lines
21 KiB
Markdown
# Troubleshooting FAQ
|
||
|
||
**Last Updated:** 2026-01-31
|
||
**Document Version:** 1.0
|
||
**Status:** Active Documentation
|
||
|
||
---
|
||
|
||
Common issues and solutions for Besu validated set deployment.
|
||
|
||
## Table of Contents
|
||
|
||
**Estimated Reading Time:** 30 minutes
|
||
**Progress:** Check off sections as you read
|
||
|
||
1. ✅ [Container Issues](#container-issues) - *Container troubleshooting*
|
||
2. ✅ [Service Issues](#service-issues) - *Service troubleshooting*
|
||
3. ✅ [Network Issues](#network-issues) - *Network troubleshooting*
|
||
4. ✅ [Consensus Issues](#consensus-issues) - *Consensus troubleshooting*
|
||
5. ✅ [Configuration Issues](#configuration-issues) - *Configuration troubleshooting*
|
||
6. ✅ [Performance Issues](#performance-issues) - *Performance troubleshooting*
|
||
7. ✅ [Additional Common Questions](#additional-common-questions) - *More FAQs*
|
||
8. [RPC errors -32001 / -32602 / gas 32xxx](RPC_ERRORS_32001_32602.md) - *Nonce too low, Invalid params, gas when deploying*
|
||
|
||
---
|
||
|
||
## Troubleshooting Flow (Decision Tree)
|
||
|
||
1. **Is the service/container down?** → Check logs (`journalctl -u pve-container@<vmid>`, `systemctl status`), then [Container Issues](#container-issues) or [Service Issues](#service-issues).
|
||
2. **Network/connectivity issue?** → Check ping, curl, DNS, firewall; see [Network Issues](#network-issues).
|
||
3. **Consensus / QBFT?** → See [QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md) and [Consensus Issues](#consensus-issues).
|
||
4. **Configuration or performance?** → See [Configuration Issues](#configuration-issues), [Performance Issues](#performance-issues), or [Additional Common Questions](#additional-common-questions).
|
||
|
||
---
|
||
|
||
## Container Issues
|
||
|
||
### Q: Container won't start
|
||
|
||
**Symptoms**: `pct status <vmid>` shows "stopped" or errors during startup
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check container status
|
||
pct status <vmid>
|
||
|
||
# View container console
|
||
pct console <vmid>
|
||
|
||
# Check logs
|
||
journalctl -u pve-container@<vmid>
|
||
|
||
# Check container configuration
|
||
pct config <vmid>
|
||
|
||
# Try starting manually
|
||
pct start <vmid>
|
||
```
|
||
|
||
**Common Causes**:
|
||
- Insufficient resources (RAM, disk)
|
||
- Network configuration errors
|
||
- Invalid container configuration
|
||
- OS template issues
|
||
|
||
<details>
|
||
<summary>Click to expand advanced troubleshooting steps</summary>
|
||
|
||
**Advanced Diagnostics:**
|
||
```bash
|
||
# Check container resources
|
||
pct list --full | grep <vmid>
|
||
|
||
# Check Proxmox host resources
|
||
free -h
|
||
df -h
|
||
|
||
# Check container logs in detail
|
||
journalctl -u pve-container@<vmid> -n 100 --no-pager
|
||
|
||
# Verify container template
|
||
pveam list | grep <template-name>
|
||
```
|
||
|
||
</details>
|
||
|
||
---
|
||
|
||
### Q: Container runs out of disk space
|
||
|
||
**Symptoms**: Services fail, "No space left on device" errors
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check disk usage
|
||
pct exec <vmid> -- df -h
|
||
|
||
# Check Besu database size
|
||
pct exec <vmid> -- du -sh /data/besu/database/
|
||
|
||
# Clean up old logs
|
||
pct exec <vmid> -- journalctl --vacuum-time=7d
|
||
|
||
# Increase disk size (if using LVM)
|
||
pct resize <vmid> rootfs +10G
|
||
```
|
||
|
||
---
|
||
|
||
### Q: Container network issues
|
||
|
||
**Symptoms**: Cannot ping, cannot connect to services
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check network configuration
|
||
pct config <vmid> | grep net0
|
||
|
||
# Check if container has IP
|
||
pct exec <vmid> -- ip addr show
|
||
|
||
# Check routing
|
||
pct exec <vmid> -- ip route
|
||
|
||
# Restart container networking
|
||
pct stop <vmid>
|
||
pct start <vmid>
|
||
```
|
||
|
||
---
|
||
|
||
## Service Issues
|
||
|
||
### Q: Besu service won't start
|
||
|
||
**Symptoms**: `systemctl status besu-validator` shows failed
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check service status
|
||
pct exec <vmid> -- systemctl status besu-validator
|
||
|
||
# View service logs
|
||
pct exec <vmid> -- journalctl -u besu-validator -n 100
|
||
|
||
# Check for configuration errors
|
||
pct exec <vmid> -- besu --config-file=/etc/besu/config-validator.toml --help
|
||
|
||
# Verify configuration file syntax
|
||
pct exec <vmid> -- cat /etc/besu/config-validator.toml
|
||
```
|
||
|
||
**Common Causes**:
|
||
- Missing configuration files
|
||
- Invalid configuration syntax
|
||
- Missing validator keys
|
||
- Port conflicts
|
||
- Insufficient resources
|
||
|
||
---
|
||
|
||
### Q: Service starts but crashes
|
||
|
||
**Symptoms**: Service starts then stops, high restart count
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check crash logs
|
||
pct exec <vmid> -- journalctl -u besu-validator --since "10 minutes ago"
|
||
|
||
# Check for out of memory
|
||
pct exec <vmid> -- dmesg | grep -i "out of memory"
|
||
|
||
# Check system resources
|
||
pct exec <vmid> -- free -h
|
||
pct exec <vmid> -- df -h
|
||
|
||
# Check JVM heap settings
|
||
pct exec <vmid> -- cat /etc/systemd/system/besu-validator.service | grep BESU_OPTS
|
||
```
|
||
|
||
---
|
||
|
||
### Q: Service shows as active but not responding
|
||
|
||
**Symptoms**: Service status shows "active" but RPC/P2P not responding
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check if process is actually running
|
||
pct exec <vmid> -- ps aux | grep besu
|
||
|
||
# Check if ports are listening
|
||
pct exec <vmid> -- netstat -tuln | grep -E "30303|8545|9545"
|
||
|
||
# Check firewall rules
|
||
pct exec <vmid> -- iptables -L -n
|
||
|
||
# Test connectivity
|
||
pct exec <vmid> -- curl -s http://localhost:8545
|
||
```
|
||
|
||
---
|
||
|
||
## Network Issues
|
||
|
||
### Q: Nodes cannot connect to peers
|
||
|
||
**Symptoms**: Low or zero peer count, "No peers" in logs
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check static-nodes.json
|
||
pct exec <vmid> -- cat /etc/besu/static-nodes.json
|
||
|
||
# Check permissions-nodes.toml
|
||
pct exec <vmid> -- cat /etc/besu/permissions-nodes.toml
|
||
|
||
# Verify enode URLs are correct
|
||
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode
|
||
|
||
# Check P2P port is open
|
||
pct exec <vmid> -- netstat -tuln | grep 30303
|
||
|
||
# Test connectivity to peer
|
||
pct exec <vmid> -- ping -c 3 <peer-ip>
|
||
```
|
||
|
||
**Common Causes**:
|
||
- Incorrect enode URLs in static-nodes.json
|
||
- Firewall blocking P2P port (30303)
|
||
- Nodes not in permissions-nodes.toml
|
||
- Network connectivity issues
|
||
|
||
---
|
||
|
||
### Q: Invalid enode URL errors
|
||
|
||
**Symptoms**: "Invalid enode URL syntax" or "Invalid node ID" in logs
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check node ID length (must be 128 hex chars)
|
||
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode | \
|
||
sed 's|^enode://||' | cut -d'@' -f1 | wc -c
|
||
|
||
# Should output 129 (128 chars + newline)
|
||
|
||
# Fix node IDs using allowlist scripts
|
||
./scripts/besu-collect-all-enodes.sh
|
||
./scripts/besu-generate-allowlist.sh
|
||
./scripts/besu-deploy-allowlist.sh
|
||
```
|
||
|
||
---
|
||
|
||
### Q: RPC endpoint not accessible
|
||
|
||
**Symptoms**: Cannot connect to RPC on port 8545
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check if RPC is enabled (validators typically don't have RPC)
|
||
pct exec <vmid> -- grep -i "rpc-http-enabled" /etc/besu/config-*.toml
|
||
|
||
# Check if RPC port is listening
|
||
pct exec <vmid> -- netstat -tuln | grep 8545
|
||
|
||
# Check firewall
|
||
pct exec <vmid> -- iptables -L -n | grep 8545
|
||
|
||
# Test from container
|
||
pct exec <vmid> -- curl -X POST -H "Content-Type: application/json" \
|
||
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
|
||
http://localhost:8545
|
||
|
||
# Check host allowlist in config
|
||
pct exec <vmid> -- grep -i "host-allowlist\|rpc-http-host" /etc/besu/config-*.toml
|
||
```
|
||
|
||
---
|
||
|
||
## Consensus Issues
|
||
|
||
### Q: No blocks being produced
|
||
|
||
**Symptoms**: Block height not increasing, "No blocks" in logs
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check validator service is running
|
||
pct exec <vmid> -- systemctl status besu-validator
|
||
|
||
# Check validator keys
|
||
pct exec <vmid> -- ls -la /keys/validators/
|
||
|
||
# Check consensus logs
|
||
pct exec <vmid> -- journalctl -u besu-validator | grep -i "consensus\|qbft\|proposing"
|
||
|
||
# Verify validators are in genesis (if static validators)
|
||
pct exec <vmid> -- cat /etc/besu/genesis.json | grep -A 20 "qbft"
|
||
|
||
# Check peer connectivity
|
||
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
|
||
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
|
||
http://localhost:8545
|
||
```
|
||
|
||
**Common Causes**:
|
||
- Validator keys missing or incorrect
|
||
- Not enough validators online
|
||
- Network connectivity issues
|
||
- Consensus configuration errors
|
||
|
||
---
|
||
|
||
### Q: Validator not participating in consensus
|
||
|
||
**Symptoms**: Validator running but not producing blocks
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Verify validator address
|
||
pct exec <vmid> -- cat /keys/validators/validator-*/address.txt
|
||
|
||
# Check if address is in validator contract (for dynamic validators)
|
||
# Or check genesis.json (for static validators)
|
||
pct exec <vmid> -- cat /etc/besu/genesis.json | python3 -m json.tool | grep -A 10 "qbft"
|
||
|
||
# Verify validator keys are loaded
|
||
pct exec <vmid> -- journalctl -u besu-validator | grep -i "validator.*key"
|
||
|
||
# Check for permission errors
|
||
pct exec <vmid> -- journalctl -u besu-validator | grep -i "permission\|denied"
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration Issues
|
||
|
||
### Q: Configuration file not found
|
||
|
||
**Symptoms**: "File not found" errors, service won't start
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# List all config files
|
||
pct exec <vmid> -- ls -la /etc/besu/
|
||
|
||
# Verify required files exist
|
||
pct exec <vmid> -- test -f /etc/besu/genesis.json && echo "genesis.json OK" || echo "genesis.json MISSING"
|
||
pct exec <vmid> -- test -f /etc/besu/config-validator.toml && echo "config OK" || echo "config MISSING"
|
||
|
||
# Copy missing files
|
||
# (Use copy-besu-config.sh script)
|
||
./scripts/copy-besu-config.sh /path/to/smom-dbis-138
|
||
```
|
||
|
||
---
|
||
|
||
### Q: Invalid configuration syntax
|
||
|
||
**Symptoms**: "Invalid option" or syntax errors in logs
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Validate TOML syntax
|
||
pct exec <vmid> -- python3 -c "import tomllib; open('/etc/besu/config-validator.toml').read()" 2>&1
|
||
|
||
# Validate JSON syntax
|
||
pct exec <vmid> -- python3 -m json.tool /etc/besu/genesis.json > /dev/null
|
||
|
||
# Check for deprecated options
|
||
pct exec <vmid> -- journalctl -u besu-validator | grep -i "deprecated\|unknown option"
|
||
|
||
# Review Besu documentation for current options
|
||
```
|
||
|
||
---
|
||
|
||
### Q: Path errors in configuration
|
||
|
||
**Symptoms**: "File not found" errors with paths like "/config/genesis.json"
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check configuration file paths
|
||
pct exec <vmid> -- grep -E "genesis-file|data-path" /etc/besu/config-validator.toml
|
||
|
||
# Correct paths should be:
|
||
# genesis-file="/etc/besu/genesis.json"
|
||
# data-path="/data/besu"
|
||
|
||
# Fix paths if needed
|
||
pct exec <vmid> -- sed -i 's|/config/|/etc/besu/|g' /etc/besu/config-validator.toml
|
||
```
|
||
|
||
---
|
||
|
||
## Performance Issues
|
||
|
||
### Q: High CPU usage
|
||
|
||
**Symptoms**: Container CPU usage > 80% consistently
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check CPU usage
|
||
pct exec <vmid> -- top -bn1 | head -20
|
||
|
||
# Check JVM GC activity
|
||
pct exec <vmid> -- journalctl -u besu-validator | grep -i "gc\|pause"
|
||
|
||
# Adjust JVM settings if needed
|
||
# Edit /etc/systemd/system/besu-validator.service
|
||
# Adjust BESU_OPTS and JAVA_OPTS
|
||
|
||
# Consider allocating more CPU cores
|
||
pct set <vmid> --cores 4
|
||
```
|
||
|
||
---
|
||
|
||
### Q: High memory usage
|
||
|
||
**Symptoms**: Container running out of memory, OOM kills
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check memory usage
|
||
pct exec <vmid> -- free -h
|
||
|
||
# Check JVM heap settings
|
||
pct exec <vmid> -- ps aux | grep besu | grep -oP 'Xm[xs]\K[0-9]+[gm]'
|
||
|
||
# Reduce heap size if too large
|
||
# Edit /etc/systemd/system/besu-validator.service
|
||
# Adjust BESU_OPTS="-Xmx4g" to appropriate size
|
||
|
||
# Or increase container memory
|
||
pct set <vmid> --memory 8192
|
||
```
|
||
|
||
---
|
||
|
||
### Q: Slow sync or block processing
|
||
|
||
**Symptoms**: Blocks processing slowly, falling behind
|
||
|
||
**Solutions**:
|
||
```bash
|
||
# Check database size and health
|
||
pct exec <vmid> -- du -sh /data/besu/database/
|
||
|
||
# Check disk I/O
|
||
pct exec <vmid> -- iostat -x 1 5
|
||
|
||
# Consider using SSD storage
|
||
# Check network latency
|
||
pct exec <vmid> -- ping -c 10 <peer-ip>
|
||
|
||
# Verify sufficient peers
|
||
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
|
||
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
|
||
http://localhost:8545 | python3 -c "import sys, json; print(len(json.load(sys.stdin).get('result', [])))"
|
||
```
|
||
|
||
---
|
||
|
||
## General Troubleshooting Commands
|
||
|
||
```bash
|
||
# View all container statuses
|
||
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do
|
||
echo "=== Container $vmid ==="
|
||
pct status $vmid
|
||
done
|
||
|
||
# Check all service statuses
|
||
for vmid in 1000 1001 1002 1003 1004; do
|
||
pct exec $vmid -- systemctl status besu-validator --no-pager -l | head -10
|
||
done
|
||
|
||
# View recent logs from all nodes
|
||
for vmid in 1000 1001 1002 1003 1004; do
|
||
echo "=== Logs for container $vmid ==="
|
||
pct exec $vmid -- journalctl -u besu-validator -n 20 --no-pager
|
||
done
|
||
|
||
# Check network connectivity between nodes
|
||
pct exec 1000 -- ping -c 3 192.168.11.14 # validator to validator
|
||
|
||
# Verify RPC endpoint (RPC nodes only)
|
||
pct exec 2500 -- curl -s -X POST -H "Content-Type: application/json" \
|
||
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
|
||
http://localhost:8545 | python3 -m json.tool
|
||
```
|
||
|
||
---
|
||
|
||
## Getting Help
|
||
|
||
If issues persist:
|
||
|
||
1. **Collect Information**:
|
||
- Service logs: `journalctl -u besu-validator -n 100`
|
||
- Container status: `pct status <vmid>`
|
||
- Configuration: `pct exec <vmid> -- cat /etc/besu/config-validator.toml`
|
||
- Network: `pct exec <vmid> -- ip addr show`
|
||
|
||
2. **Check Documentation**:
|
||
- [Besu Nodes File Reference](../06-besu/BESU_NODES_FILE_REFERENCE.md)
|
||
- [Deployment Guide](../03-deployment/VALIDATED_SET_DEPLOYMENT_GUIDE.md)
|
||
- [Besu Documentation](https://besu.hyperledger.org/)
|
||
|
||
3. **Validate Configuration**:
|
||
- Run prerequisites check: `./scripts/validation/check-prerequisites.sh`
|
||
- Validate validators: `./scripts/validation/validate-validator-set.sh`
|
||
|
||
4. **Review Logs**:
|
||
- Check deployment logs: `logs/deploy-validated-set-*.log`
|
||
- Check service logs in containers
|
||
- Check Proxmox host logs
|
||
|
||
---
|
||
|
||
## Additional Common Questions
|
||
|
||
### Q: How do I add a new VMID?
|
||
|
||
**Answer:**
|
||
1. Check available VMID ranges in [VMID_ALLOCATION_FINAL.md](../02-architecture/VMID_ALLOCATION_FINAL.md)
|
||
2. Select an appropriate VMID from the designated range for your service
|
||
3. Verify the VMID is not already in use: `pct list | grep <vmid>` or `qm list | grep <vmid>`
|
||
4. Document the assignment in VMID_ALLOCATION_FINAL.md
|
||
5. Use the VMID when creating containers/VMs
|
||
|
||
**Example:**
|
||
```bash
|
||
# Check if VMID 2503 is available
|
||
pct list | grep 2503
|
||
qm list | grep 2503
|
||
|
||
# If available, create container with VMID 2503
|
||
pct create 2503 ...
|
||
```
|
||
|
||
**Related Documentation:**
|
||
- [VMID Allocation Registry](../02-architecture/VMID_ALLOCATION_FINAL.md) ⭐⭐⭐
|
||
- [Quick Reference Cards](../12-quick-reference/QUICK_REFERENCE_CARDS.md) (VMID and network) ⭐⭐⭐
|
||
|
||
---
|
||
|
||
### Q: What's the difference between public and private RPC?
|
||
|
||
**Answer:**
|
||
|
||
| Feature | Public RPC | Private RPC |
|
||
|---------|-----------|-------------|
|
||
| **Discovery** | Enabled | Disabled |
|
||
| **Permissioning** | Disabled | Enabled |
|
||
| **Access** | Public (CORS: *) | Restricted (internal only) |
|
||
| **APIs** | ETH, NET, WEB3 (read-only) | ETH, NET, WEB3, ADMIN, DEBUG (full) |
|
||
| **Use Case** | dApps, external users | Internal services, admin |
|
||
| **ChainID** | 0x8a (138) or 0x1 (wallet compatibility) | 0x8a (138) |
|
||
| **Domain** | rpc-http-pub.d-bis.org | rpc-http-prv.d-bis.org |
|
||
|
||
**Public RPC:**
|
||
- Accessible from the internet
|
||
- Used by dApps and external tools
|
||
- Read-only APIs for security
|
||
- May report chainID 0x1 for MetaMask compatibility
|
||
|
||
**Private RPC:**
|
||
- Internal network only
|
||
- Used by internal services and administration
|
||
- Full API access including ADMIN and DEBUG
|
||
- Strict permissioning and access control
|
||
|
||
**Related Documentation:**
|
||
- [RPC Node Types Architecture](../05-network/RPC_NODE_TYPES_ARCHITECTURE.md) ⭐⭐
|
||
- [RPC Template Types](../05-network/RPC_TEMPLATE_TYPES.md) ⭐
|
||
|
||
---
|
||
|
||
### Q: How do I troubleshoot Cloudflare tunnel issues?
|
||
|
||
**Answer:**
|
||
|
||
**Step 1: Check Tunnel Status**
|
||
```bash
|
||
# Check cloudflared container status
|
||
pct status 102
|
||
|
||
# Check tunnel logs
|
||
pct logs 102 --tail 50
|
||
|
||
# Verify tunnel is running
|
||
pct exec 102 -- ps aux | grep cloudflared
|
||
```
|
||
|
||
**Step 2: Verify Configuration**
|
||
```bash
|
||
# Check tunnel configuration
|
||
pct exec 102 -- cat /etc/cloudflared/config.yaml
|
||
|
||
# Verify credentials file exists
|
||
pct exec 102 -- ls -la /etc/cloudflared/*.json
|
||
```
|
||
|
||
**Step 3: Test Connectivity**
|
||
```bash
|
||
# Test from internal network
|
||
curl -I http://192.168.11.21:80
|
||
|
||
# Test from external (through Cloudflare)
|
||
curl -I https://explorer.d-bis.org
|
||
```
|
||
|
||
**Step 4: Check Cloudflare Dashboard**
|
||
- Verify tunnel is healthy in Cloudflare Zero Trust dashboard
|
||
- Check ingress rules are configured correctly
|
||
- Verify DNS records point to tunnel
|
||
|
||
**Common Issues:**
|
||
- Tunnel not running → Restart: `pct restart 102`
|
||
- Configuration error → Check YAML syntax
|
||
- Credentials invalid → Regenerate tunnel token
|
||
- DNS not resolving → Check Cloudflare DNS settings
|
||
|
||
**Related Documentation:**
|
||
- [Cloudflare Tunnel Routing Architecture](../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md) ⭐⭐⭐
|
||
- [Cloudflare Routing Master Reference](../05-network/CLOUDFLARE_ROUTING_MASTER.md) ⭐⭐⭐
|
||
- [Troubleshooting Quick Reference](../12-quick-reference/TROUBLESHOOTING_QUICK_REFERENCE.md) ⭐⭐⭐
|
||
|
||
---
|
||
|
||
### Q: What's the recommended storage configuration?
|
||
|
||
**Answer:**
|
||
|
||
**For R630 Compute Nodes:**
|
||
- **Boot drives (2×600GB):** ZFS mirror (recommended) or hardware RAID1
|
||
- **Data SSDs (6×250GB):** ZFS pool with one of:
|
||
- Striped mirrors (if pairs available)
|
||
- RAIDZ1 (single parity, 5 drives usable)
|
||
- RAIDZ2 (double parity, 4 drives usable)
|
||
- **High-write workloads:** Dedicated dataset with quotas
|
||
|
||
**For ML110 Management Node:**
|
||
- Standard Proxmox storage configuration
|
||
- Sufficient space for templates and backups
|
||
|
||
**Storage Best Practices:**
|
||
- Use ZFS for data integrity and snapshots
|
||
- Enable compression for space efficiency
|
||
- Set quotas for containers to prevent disk exhaustion
|
||
- Regular backups to external storage
|
||
|
||
**Related Documentation:**
|
||
- [Network Architecture - Storage Orchestration](../02-architecture/NETWORK_ARCHITECTURE.md#53-storage-orchestration-r630) ⭐⭐⭐
|
||
- [Backup and Restore](../03-deployment/BACKUP_AND_RESTORE.md) ⭐⭐
|
||
|
||
---
|
||
|
||
### Q: How do I migrate from flat LAN to VLANs?
|
||
|
||
**Answer:**
|
||
|
||
**Phase 1: Preparation**
|
||
1. Review VLAN plan in [NETWORK_ARCHITECTURE.md](../02-architecture/NETWORK_ARCHITECTURE.md)
|
||
2. Document current IP assignments
|
||
3. Plan IP address migration for each service
|
||
4. Create rollback plan
|
||
|
||
**Phase 2: Network Configuration**
|
||
1. Configure ES216G switches with VLAN trunks
|
||
2. Enable VLAN-aware bridge on Proxmox hosts
|
||
3. Create VLAN interfaces on ER605 router
|
||
4. Test VLAN connectivity
|
||
|
||
**Phase 3: Service Migration**
|
||
1. Migrate services one VLAN at a time
|
||
2. Start with non-critical services
|
||
3. Update container/VM network configuration
|
||
4. Verify connectivity after each migration
|
||
|
||
**Phase 4: Validation**
|
||
1. Test all services on new VLANs
|
||
2. Verify routing between VLANs
|
||
3. Test egress NAT pools
|
||
4. Document final configuration
|
||
|
||
**Migration Order (Recommended):**
|
||
1. Management services (VLAN 11) - Already active
|
||
2. Monitoring/observability (VLAN 120, 121)
|
||
3. Besu network (VLANs 110, 111, 112)
|
||
4. CCIP network (VLANs 130, 132, 133, 134)
|
||
5. Service layer (VLAN 160)
|
||
6. Sovereign tenants (VLANs 200-203)
|
||
|
||
**Related Documentation:**
|
||
- [Network Architecture - VLAN Orchestration](../02-architecture/NETWORK_ARCHITECTURE.md#3-layer-2--vlan-orchestration-plan) ⭐⭐⭐
|
||
- [Orchestration Deployment Guide - VLAN Enablement](../02-architecture/ORCHESTRATION_DEPLOYMENT_GUIDE.md#phase-1--vlan-enablement) ⭐⭐⭐
|
||
|
||
---
|
||
|
||
## Additional Common Questions (Expanded)
|
||
|
||
### Q: How do I find which VMID uses a given IP?
|
||
|
||
**Answer:** See [NETWORK_CONFIGURATION_MASTER.md](../11-references/NETWORK_CONFIGURATION_MASTER.md) for IP ranges by service type and VMID. Use `pct list` or `qm list` on the Proxmox host to list containers/VMs and their config (including IP).
|
||
|
||
### Q: What's the difference between public and private RPC?
|
||
|
||
**Answer:** **Public RPC** (e.g. rpc-http-pub.d-bis.org) is exposed for external clients; may have rate limits and JWT. **Private RPC** (e.g. rpc-http-prv.d-bis.org) is for internal or trusted clients. See [05-network/CLOUDFLARE_ROUTING_MASTER.md](../05-network/CLOUDFLARE_ROUTING_MASTER.md) for domain → backend mapping.
|
||
|
||
### Q: Cloudflare tunnel not connecting – where do I start?
|
||
|
||
**Answer:** 1) Check cloudflared service on the tunnel host (VMID 102 or NPMplus). 2) Verify credentials and tunnel ID. 3) Check [04-configuration/cloudflare/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md) and [05-network/CLOUDFLARE_ROUTING_MASTER.md](../05-network/CLOUDFLARE_ROUTING_MASTER.md). 4) Confirm NPMplus (192.168.11.167) is reachable from UDM Pro port forward.
|
||
|
||
### Q: Recommended storage configuration for RPC nodes?
|
||
|
||
**Answer:** Use SSD for Besu data directory; avoid NFS for Besu unless tested. See [02-architecture/NETWORK_ARCHITECTURE.md](../02-architecture/NETWORK_ARCHITECTURE.md) and deployment guides for node layout. Run `scripts/audit-proxmox-rpc-storage.sh` to check restrictions.
|
||
|
||
---
|
||
|
||
## Related Documentation
|
||
|
||
### Operational Procedures
|
||
- **[OPERATIONAL_RUNBOOKS.md](../03-deployment/OPERATIONAL_RUNBOOKS.md)** - Complete operational runbooks
|
||
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
|
||
- **[BESU_ALLOWLIST_QUICK_START.md](../06-besu/BESU_ALLOWLIST_QUICK_START.md)** - Allowlist troubleshooting
|
||
|
||
### Deployment & Configuration
|
||
- **[DEPLOYMENT_STATUS_CONSOLIDATED.md](../03-deployment/DEPLOYMENT_STATUS_CONSOLIDATED.md)** - Current deployment status
|
||
- **[NETWORK_ARCHITECTURE.md](../02-architecture/NETWORK_ARCHITECTURE.md)** - Network architecture reference
|
||
- **[VALIDATED_SET_DEPLOYMENT_GUIDE.md](../03-deployment/VALIDATED_SET_DEPLOYMENT_GUIDE.md)** - Deployment guide
|
||
|
||
### Monitoring
|
||
- **[MONITORING_SUMMARY.md](../08-monitoring/MONITORING_SUMMARY.md)** - Monitoring setup
|
||
- **[BLOCK_PRODUCTION_MONITORING.md](../08-monitoring/BLOCK_PRODUCTION_MONITORING.md)** - Block production monitoring
|
||
|
||
### Reference
|
||
- **[MASTER_INDEX.md](../MASTER_INDEX.md)** - Complete documentation index
|
||
|
||
---
|
||
|
||
**Last Updated:** 2025-01-20
|
||
**Version:** 1.0
|