8.1 KiB
Best Practices and Recommendations
Complete guide for production deployment, multi-node setups, elastic storage, and operational excellence.
🏗️ Architecture Recommendations
Multi-Node Deployment
Benefits:
- High availability and redundancy
- Load distribution across nodes
- Disaster recovery capabilities
- Better resource utilization
Node Assignment Strategies:
- Auto (Recommended): Automatically selects nodes with available resources
- Round-Robin: Distributes containers evenly across nodes
- Manual: Specify node assignments in configuration file
Configuration:
# In config/proxmox.conf
PROXMOX_NODES="pve,pve2,pve3"
NODE_ASSIGNMENT_STRATEGY="auto"
Deployment:
# Deploy with multi-node support
./scripts/manage/deploy-multi-node.sh validators 4
Elastic Storage Configuration
Storage Expansion:
# Expand container storage
./scripts/manage/expand-storage.sh <VMID> <additional_GB>
# Example: Expand validator by 50GB
./scripts/manage/expand-storage.sh 1000 50
Automatic Expansion:
- Enable
AUTO_EXPAND_STORAGE=truein config - Set
STORAGE_ALERT_THRESHOLDfor proactive expansion - Monitor storage usage with
pvesm status
Storage Pool Types:
- local-lvm: Fast, local storage (default)
- local-zfs: Advanced features, snapshots
- shared-storage: Network storage for HA
Container Migration
Live Migration:
# Migrate container to another node
./scripts/manage/migrate-container.sh <VMID> <target_node>
# Example: Migrate validator to pve2
./scripts/manage/migrate-container.sh 1000 pve2
# With storage migration
./scripts/manage/migrate-container.sh 1000 pve2 local-lvm true
Migration Best Practices:
- Perform during maintenance windows
- Test migration on non-critical containers first
- Ensure network connectivity between nodes
- Monitor during and after migration
📊 Resource Management
Resource Allocation
Recommended Allocations:
- Validators: 8GB RAM, 4 CPU, 100GB disk (expandable)
- RPC Nodes: 16GB RAM, 4 CPU, 200GB disk (expandable)
- Services: 2-4GB RAM, 2 CPU, 20-50GB disk
- Monitoring: 4GB RAM, 4 CPU, 50GB disk (with retention)
Resource Monitoring
Enable Monitoring:
# In config/proxmox.conf
RESOURCE_MONITORING_ENABLED="true"
RESOURCE_ALERT_CPU="90"
RESOURCE_ALERT_MEMORY="85"
RESOURCE_ALERT_DISK="80"
Check Resources:
# Check all nodes
./scripts/manage/deploy-multi-node.sh check-resources
# Check specific container
pct exec <VMID> -- free -h
pct exec <VMID> -- df -h
🔄 High Availability
HA Configuration
Enable HA:
# In config/proxmox.conf
HA_ENABLED="true"
HA_GROUP="smom-dbis-138"
Benefits:
- Automatic failover
- Service continuity
- Reduced downtime
Redundancy
Recommended Redundancy:
- Validators: Minimum 4 nodes (2/3 consensus requires 3+1)
- RPC Nodes: 3+ nodes for load balancing
- Sentries: 3+ nodes for DDoS protection
- Services: 2+ instances for critical services
💾 Backup Strategy
Backup Configuration
# In config/proxmox.conf
BACKUP_ENABLED="1"
BACKUP_RETENTION_DAYS="30"
BACKUP_SCHEDULE="02:00"
Backup Best Practices
- Regular Backups: Daily automated backups
- Snapshot Before Changes: Create snapshots before upgrades
- Off-Site Storage: Store backups on separate storage
- Test Restores: Regularly test backup restoration
Backup Scripts
# Manual backup
./scripts/backup/backup-all.sh
# Restore from backup
./scripts/backup/restore-container.sh <VMID> <backup_file>
🔐 Security Best Practices
Network Security
VLAN Isolation:
- Validators: VLAN 100 (private)
- Sentries: VLAN 101 (semi-private)
- RPC Nodes: VLAN 102 (public)
- Services: VLAN 103 (internal)
- Monitoring: VLAN 104 (management)
Firewall Rules:
- Restrict validator RPC access
- Limit public RPC access with rate limiting
- Isolate management networks
Access Control
API Tokens:
- Use API tokens instead of passwords
- Rotate tokens regularly
- Use least privilege principle
Container Security:
- Use unprivileged containers where possible
- Enable AppArmor/SELinux
- Keep containers updated
📈 Scaling Recommendations
Horizontal Scaling
Adding More Nodes:
- Add node to cluster
- Update
PROXMOX_NODESconfiguration - Migrate containers using migration script
- Verify connectivity
Scaling Services:
# Deploy additional validators
./scripts/deployment/deploy-besu-nodes.sh --validators 6
# Deploy additional RPC nodes
./scripts/deployment/deploy-besu-nodes.sh --rpc 5
Vertical Scaling
Increasing Resources:
# Expand storage
./scripts/manage/expand-storage.sh <VMID> <GB>
# Increase memory (requires container restart)
pct set <VMID> -memory <MB>
# Increase CPU
pct set <VMID> -cores <count>
🔍 Monitoring and Alerting
Prometheus Integration
Metrics Collection:
- Container resource usage
- Service health metrics
- Network metrics
- Storage metrics
Alerting:
- Configure Alertmanager
- Set up notification channels
- Define alert rules
Health Checks
Enable Health Checks:
# In config/proxmox.conf
HEALTH_CHECK_ENABLED="true"
HEALTH_CHECK_INTERVAL="300"
Check Service Health:
# Check container status
pct status <VMID>
# Check service status
pct exec <VMID> -- systemctl status <service>
# Check logs
pct exec <VMID> -- journalctl -u <service> -n 50
🚀 Performance Optimization
Storage Optimization
Use Appropriate Storage:
- SSD: For validators and RPC nodes (performance)
- NVMe: For high-performance requirements
- Network Storage: For shared data (Ceph, NFS)
Disk I/O Optimization:
- Use separate storage for logs
- Enable write-back caching where appropriate
- Monitor disk I/O with
iostat
Network Optimization
Network Configuration:
- Use dedicated network for cluster communication
- Enable jumbo frames for inter-node communication
- Configure network bonding for redundancy
Connection Pooling:
- Configure RPC connection limits
- Use connection pooling for services
- Monitor network usage
🔧 Maintenance Procedures
Upgrade Procedure
-
Create Snapshots:
./scripts/backup/backup-all.sh -
Rolling Upgrades:
./scripts/upgrade/upgrade-all.sh -
Verify Services:
./scripts/verify/verify-deployment.sh
Maintenance Window
Best Practices:
- Schedule during low-traffic periods
- Perform rolling updates
- Test on non-production first
- Have rollback plan ready
📝 Documentation
Keep Documentation Updated
-
Update Configuration:
- Document all configuration changes
- Keep network diagrams current
- Maintain inventory list
-
Change Log:
- Track all deployments
- Document issues and resolutions
- Maintain runbooks
✅ Checklist
Pre-Deployment
- Review resource requirements
- Configure storage pools
- Set up network VLANs
- Configure backup storage
- Test node connectivity
Deployment
- Deploy to staging first
- Verify container creation
- Check network connectivity
- Verify service health
- Test failover scenarios
Post-Deployment
- Configure monitoring
- Set up alerting
- Schedule backups
- Document configuration
- Train operations team
🆘 Troubleshooting
Common Issues
Storage Full:
# Check storage usage
pvesm status
# Expand storage
./scripts/manage/expand-storage.sh <VMID> <GB>
Container Won't Start:
# Check logs
pct exec <VMID> -- journalctl -xe
# Check resource limits
pct config <VMID>
Network Issues:
# Check network configuration
pct config <VMID> | grep net0
# Test connectivity
pct exec <VMID> -- ping <gateway>
Support Resources
- Proxmox VE Documentation
- Hyperledger Besu Documentation | GitHub Repository
- Project README files
- Log files in
/var/log/smom-dbis-138/