d-bis/proxmox

Fork 0

Files

defiQUG b45c2006be Refactor code for improved readability and performance

2025-12-21 22:32:09 -08:00

8.1 KiB

Raw Permalink Blame History

Best Practices and Recommendations

Complete guide for production deployment, multi-node setups, elastic storage, and operational excellence.

🏗️ Architecture Recommendations

Multi-Node Deployment

Benefits:

High availability and redundancy
Load distribution across nodes
Disaster recovery capabilities
Better resource utilization

Node Assignment Strategies:

Auto (Recommended): Automatically selects nodes with available resources
Round-Robin: Distributes containers evenly across nodes
Manual: Specify node assignments in configuration file

Configuration:

# In config/proxmox.conf
PROXMOX_NODES="pve,pve2,pve3"
NODE_ASSIGNMENT_STRATEGY="auto"

Deployment:

# Deploy with multi-node support
./scripts/manage/deploy-multi-node.sh validators 4

Elastic Storage Configuration

Storage Expansion:

# Expand container storage
./scripts/manage/expand-storage.sh <VMID> <additional_GB>

# Example: Expand validator by 50GB
./scripts/manage/expand-storage.sh 1000 50

Automatic Expansion:

Enable AUTO_EXPAND_STORAGE=true in config
Set STORAGE_ALERT_THRESHOLD for proactive expansion
Monitor storage usage with pvesm status

Storage Pool Types:

local-lvm: Fast, local storage (default)
local-zfs: Advanced features, snapshots
shared-storage: Network storage for HA

Container Migration

Live Migration:

# Migrate container to another node
./scripts/manage/migrate-container.sh <VMID> <target_node>

# Example: Migrate validator to pve2
./scripts/manage/migrate-container.sh 1000 pve2

# With storage migration
./scripts/manage/migrate-container.sh 1000 pve2 local-lvm true

Migration Best Practices:

Perform during maintenance windows
Test migration on non-critical containers first
Ensure network connectivity between nodes
Monitor during and after migration

📊 Resource Management

Resource Allocation

Recommended Allocations:

Validators: 8GB RAM, 4 CPU, 100GB disk (expandable)
RPC Nodes: 16GB RAM, 4 CPU, 200GB disk (expandable)
Services: 2-4GB RAM, 2 CPU, 20-50GB disk
Monitoring: 4GB RAM, 4 CPU, 50GB disk (with retention)

Resource Monitoring

Enable Monitoring:

# In config/proxmox.conf
RESOURCE_MONITORING_ENABLED="true"
RESOURCE_ALERT_CPU="90"
RESOURCE_ALERT_MEMORY="85"
RESOURCE_ALERT_DISK="80"

Check Resources:

# Check all nodes
./scripts/manage/deploy-multi-node.sh check-resources

# Check specific container
pct exec <VMID> -- free -h
pct exec <VMID> -- df -h

🔄 High Availability

HA Configuration

Enable HA:

# In config/proxmox.conf
HA_ENABLED="true"
HA_GROUP="smom-dbis-138"

Benefits:

Automatic failover
Service continuity
Reduced downtime

Redundancy

Recommended Redundancy:

Validators: Minimum 4 nodes (2/3 consensus requires 3+1)
RPC Nodes: 3+ nodes for load balancing
Sentries: 3+ nodes for DDoS protection
Services: 2+ instances for critical services

💾 Backup Strategy

Backup Configuration

# In config/proxmox.conf
BACKUP_ENABLED="1"
BACKUP_RETENTION_DAYS="30"
BACKUP_SCHEDULE="02:00"

Backup Best Practices

Regular Backups: Daily automated backups
Snapshot Before Changes: Create snapshots before upgrades
Off-Site Storage: Store backups on separate storage
Test Restores: Regularly test backup restoration

Backup Scripts

# Manual backup
./scripts/backup/backup-all.sh

# Restore from backup
./scripts/backup/restore-container.sh <VMID> <backup_file>

🔐 Security Best Practices

Network Security

VLAN Isolation:

Validators: VLAN 100 (private)
Sentries: VLAN 101 (semi-private)
RPC Nodes: VLAN 102 (public)
Services: VLAN 103 (internal)
Monitoring: VLAN 104 (management)

Firewall Rules:

Restrict validator RPC access
Limit public RPC access with rate limiting
Isolate management networks

Access Control

API Tokens:

Use API tokens instead of passwords
Rotate tokens regularly
Use least privilege principle

Container Security:

Use unprivileged containers where possible
Enable AppArmor/SELinux
Keep containers updated

📈 Scaling Recommendations

Horizontal Scaling

Adding More Nodes:

Add node to cluster
Update PROXMOX_NODES configuration
Migrate containers using migration script
Verify connectivity

Scaling Services:

# Deploy additional validators
./scripts/deployment/deploy-besu-nodes.sh --validators 6

# Deploy additional RPC nodes
./scripts/deployment/deploy-besu-nodes.sh --rpc 5

Vertical Scaling

Increasing Resources:

# Expand storage
./scripts/manage/expand-storage.sh <VMID> <GB>

# Increase memory (requires container restart)
pct set <VMID> -memory <MB>

# Increase CPU
pct set <VMID> -cores <count>

🔍 Monitoring and Alerting

Prometheus Integration

Metrics Collection:

Container resource usage
Service health metrics
Network metrics
Storage metrics

Alerting:

Configure Alertmanager
Set up notification channels
Define alert rules

Health Checks

Enable Health Checks:

# In config/proxmox.conf
HEALTH_CHECK_ENABLED="true"
HEALTH_CHECK_INTERVAL="300"

Check Service Health:

# Check container status
pct status <VMID>

# Check service status
pct exec <VMID> -- systemctl status <service>

# Check logs
pct exec <VMID> -- journalctl -u <service> -n 50

🚀 Performance Optimization

Storage Optimization

Use Appropriate Storage:

SSD: For validators and RPC nodes (performance)
NVMe: For high-performance requirements
Network Storage: For shared data (Ceph, NFS)

Disk I/O Optimization:

Use separate storage for logs
Enable write-back caching where appropriate
Monitor disk I/O with iostat

Network Optimization

Network Configuration:

Use dedicated network for cluster communication
Enable jumbo frames for inter-node communication
Configure network bonding for redundancy

Connection Pooling:

Configure RPC connection limits
Use connection pooling for services
Monitor network usage

🔧 Maintenance Procedures

Upgrade Procedure

Create Snapshots:
```
./scripts/backup/backup-all.sh
```
Rolling Upgrades:
```
./scripts/upgrade/upgrade-all.sh
```
Verify Services:
```
./scripts/verify/verify-deployment.sh
```

Maintenance Window

Best Practices:

Schedule during low-traffic periods
Perform rolling updates
Test on non-production first
Have rollback plan ready

📝 Documentation

Keep Documentation Updated

Update Configuration:
- Document all configuration changes
- Keep network diagrams current
- Maintain inventory list
Change Log:
- Track all deployments
- Document issues and resolutions
- Maintain runbooks

✅ Checklist

Pre-Deployment

Review resource requirements
Configure storage pools
Set up network VLANs
Configure backup storage
Test node connectivity

Deployment

Deploy to staging first
Verify container creation
Check network connectivity
Verify service health
Test failover scenarios

Post-Deployment

Configure monitoring
Set up alerting
Schedule backups
Document configuration
Train operations team

🆘 Troubleshooting

Common Issues

Storage Full:

# Check storage usage
pvesm status

# Expand storage
./scripts/manage/expand-storage.sh <VMID> <GB>

Container Won't Start:

# Check logs
pct exec <VMID> -- journalctl -xe

# Check resource limits
pct config <VMID>

Network Issues:

# Check network configuration
pct config <VMID> | grep net0

# Test connectivity
pct exec <VMID> -- ping <gateway>

Support Resources

Proxmox VE Documentation
Hyperledger Besu Documentation | GitHub Repository
Project README files
Log files in /var/log/smom-dbis-138/

8.1 KiB Raw Permalink Blame History