- Revised CPU and memory specifications for various VMs, moving high-resource workloads from ML110-01 to R630-01 to balance resource allocation. - Updated deployment YAML files to reflect changes in node assignments, CPU counts, and storage types, transitioning to Ceph storage for improved performance. - Enhanced documentation to clarify resource usage and deployment strategies, ensuring efficient utilization of available hardware.
18 KiB
VM Deployment Plan
Date: 2025-01-XX
Status: Ready for Deployment
Version: 2.0
Executive Summary
This document provides a comprehensive deployment plan for all virtual machines in the Sankofa Phoenix infrastructure. The plan includes hardware capabilities, resource allocation, deployment priorities, and step-by-step deployment procedures.
Key Constraints
- ML110-01 (Site-1): 6 CPU cores, 256 GB RAM
- R630-01 (Site-2): 52 CPU cores (2 CPUs × 26 cores), 768 GB RAM
- Total VMs to Deploy: 30 VMs
- Deployment Method: Crossplane Proxmox Provider via Kubernetes
Hardware Capabilities
Site-1: ML110-01
Location: 192.168.11.10
Hardware Specifications:
- CPU: Intel Xeon E5-2603 v3 @ 1.60GHz
- CPU Cores: 6 cores (6 threads, no hyperthreading)
- RAM: 256 GB (251 GiB usable, ~244 GB available for VMs)
- Storage:
- local-lvm: 794.3 GB available
- ceph-fs: 384 GB available
- Network: vmbr0 (1GbE)
Resource Allocation Strategy:
- Reserve 1 core for Proxmox host (5 cores available for VMs)
- Reserve 8 GB RAM for Proxmox host (~248 GB available for VMs)
- Suitable for: Light-to-medium workloads, infrastructure services
Site-2: R630-01
Location: 192.168.11.11
Hardware Specifications:
- CPU: Intel Xeon E5-2660 v4 @ 2.00GHz (dual socket)
- CPU Cores: 52 cores total (2 CPUs × 26 cores each)
- CPU Threads: 104 threads (52 cores × 2 with hyperthreading)
- RAM: 768 GB (755 GiB usable, ~744 GB available for VMs)
- Storage:
- local-lvm: 171.3 GB available
- Ceph OSD: Configured
- Network: vmbr0 (10GbE capable)
Resource Allocation Strategy:
- Reserve 2 cores for Proxmox host (50 cores available for VMs)
- Reserve 16 GB RAM for Proxmox host (~752 GB available for VMs)
- Suitable for: High-resource workloads, compute-intensive applications, blockchain nodes
VM Inventory and Resource Requirements
Summary Statistics
| Category | Count | Total CPU | Total RAM | Total Disk |
|---|---|---|---|---|
| Phoenix Infrastructure | 8 | 52 cores | 128 GiB | 1,150 GiB |
| Core Infrastructure | 2 | 4 cores | 8 GiB | 30 GiB |
| SMOM-DBIS-138 Blockchain | 16 | 64 cores | 128 GiB | 320 GiB |
| Test/Example VMs | 4 | 8 cores | 16 GiB | 200 GiB |
| TOTAL | 30 | 128 cores | 280 GiB | 1,700 GiB |
Note: These totals exceed available resources on a single node. VMs are distributed across both nodes.
VM Deployment Schedule
Phase 1: Core Infrastructure (Priority: CRITICAL)
Deployment Order: Deploy these first as they support other services.
1.1 Nginx Proxy VM
- Node: ml110-01
- Site: site-1
- Resources: 2 CPU, 4 GiB RAM, 20 GiB disk
- Purpose: Reverse proxy and SSL termination
- Dependencies: None
- Deployment File:
examples/production/nginx-proxy-vm.yaml
1.2 Cloudflare Tunnel VM
- Node: r630-01
- Site: site-2
- Resources: 2 CPU, 4 GiB RAM, 10 GiB disk
- Purpose: Cloudflare Tunnel for secure outbound connectivity
- Dependencies: None
- Deployment File:
examples/production/cloudflare-tunnel-vm.yaml
Phase 1 Resource Usage:
- ML110-01: 2 CPU, 4 GiB RAM, 20 GiB disk
- R630-01: 2 CPU, 4 GiB RAM, 10 GiB disk
Phase 2: Phoenix Infrastructure Services (Priority: HIGH)
Deployment Order: Deploy in dependency order.
2.1 DNS Primary Server
- Node: ml110-01
- Site: site-1
- Resources: 2 CPU, 4 GiB RAM, 50 GiB disk
- Purpose: Primary DNS server (BIND9)
- Dependencies: None
- Deployment File:
examples/production/phoenix/dns-primary.yaml
2.2 Git Server
- Node: r630-01
- Site: site-2
- Resources: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs)
- Purpose: Git repository hosting (Gitea/GitLab)
- Dependencies: DNS (optional)
- Deployment File:
examples/production/phoenix/git-server.yaml
2.3 Email Server
- Node: r630-01
- Site: site-2
- Resources: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs)
- Purpose: Email services (Postfix/Dovecot)
- Dependencies: DNS (optional)
- Deployment File:
examples/production/phoenix/email-server.yaml
2.4 DevOps Runner
- Node: r630-01
- Site: site-2
- Resources: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs)
- Purpose: CI/CD runner (Jenkins/GitLab Runner)
- Dependencies: Git Server (optional)
- Deployment File:
examples/production/phoenix/devops-runner.yaml
2.5 Codespaces IDE
- Node: r630-01
- Site: site-2
- Resources: 4 CPU, 32 GiB RAM, 200 GiB disk (ceph-fs)
- Purpose: Cloud IDE (code-server)
- Dependencies: None
- Deployment File:
examples/production/phoenix/codespaces-ide.yaml
2.6 AS4 Gateway
- Node: r630-01
- Site: site-2
- Resources: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs)
- Purpose: AS4 messaging gateway
- Dependencies: DNS, Email
- Deployment File:
examples/production/phoenix/as4-gateway.yaml
2.7 Business Integration Gateway
- Node: r630-01
- Site: site-2
- Resources: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs)
- Purpose: Business integration services
- Dependencies: DNS
- Deployment File:
examples/production/phoenix/business-integration-gateway.yaml
2.8 Financial Messaging Gateway
- Node: r630-01
- Site: site-2
- Resources: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs)
- Purpose: Financial messaging services
- Dependencies: DNS
- Deployment File:
examples/production/phoenix/financial-messaging-gateway.yaml
Phase 2 Resource Usage:
- ML110-01: 2 CPU, 4 GiB RAM, 50 GiB disk
- R630-01: 32 CPU, 128 GiB RAM, 2,200 GiB disk (using ceph-fs)
Phase 3: SMOM-DBIS-138 Blockchain Infrastructure (Priority: HIGH)
Deployment Order: Deploy validators first, then sentries, then RPC nodes, then services.
3.1 Validators (Site-2: r630-01)
- smom-validator-01: 3 CPU, 12 GiB RAM, 20 GiB disk (ceph-fs)
- smom-validator-02: 3 CPU, 12 GiB RAM, 20 GiB disk (ceph-fs)
- smom-validator-03: 3 CPU, 12 GiB RAM, 20 GiB disk (ceph-fs)
- smom-validator-04: 3 CPU, 12 GiB RAM, 20 GiB disk (ceph-fs)
- Total: 12 CPU, 48 GiB RAM, 80 GiB disk (using ceph-fs)
- Deployment Files:
examples/production/smom-dbis-138/validator-*.yaml
3.2 Sentries (Distributed)
- Site-1 (ml110-01):
- smom-sentry-01: 2 CPU, 4 GiB RAM, 20 GiB disk
- smom-sentry-02: 2 CPU, 4 GiB RAM, 20 GiB disk
- Site-2 (r630-01):
- smom-sentry-03: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- smom-sentry-04: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- Total: 8 CPU, 16 GiB RAM, 80 GiB disk
- Deployment Files:
examples/production/smom-dbis-138/sentry-*.yaml
3.3 RPC Nodes (Site-2: r630-01)
- smom-rpc-node-01: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- smom-rpc-node-02: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- smom-rpc-node-03: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- smom-rpc-node-04: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- Total: 8 CPU, 16 GiB RAM, 80 GiB disk (using ceph-fs)
- Deployment Files:
examples/production/smom-dbis-138/rpc-node-*.yaml
3.4 Services (Site-2: r630-01)
- smom-management: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- smom-monitoring: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- smom-services: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- smom-blockscout: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- Total: 8 CPU, 16 GiB RAM, 80 GiB disk (using ceph-fs)
- Deployment Files:
examples/production/smom-dbis-138/{management,monitoring,services,blockscout}.yaml
Phase 3 Resource Usage:
- ML110-01: 4 CPU (sentries only), 8 GiB RAM, 40 GiB disk
- R630-01: 28 CPU, 80 GiB RAM, 240 GiB disk (using ceph-fs)
Phase 4: Test/Example VMs (Priority: LOW)
Deployment Order: Deploy after production VMs are stable.
- vm-100: ml110-01, 2 CPU, 4 GiB RAM, 50 GiB disk
- basic-vm: ml110-01, 2 CPU, 4 GiB RAM, 50 GiB disk
- medium-vm: ml110-01, 4 CPU, 8 GiB RAM, 50 GiB disk
- large-vm: ml110-01, 8 CPU, 16 GiB RAM, 50 GiB disk
Phase 4 Resource Usage:
- ML110-01: 16 CPU, 32 GiB RAM, 200 GiB disk
Resource Allocation Analysis
ML110-01 (Site-1) - Resource Constraints
Available Resources:
- CPU: 5 cores (6 - 1 reserved)
- RAM: ~248 GB (256 - 8 reserved)
- Disk: 794.3 GB (local-lvm) + 384 GB (ceph-fs)
Requested Resources (Phases 1-2):
- CPU: 2 cores ✅ Within capacity
- RAM: 4 GiB ✅ Within capacity
- Disk: 50 GiB ✅ Within capacity
Requested Resources (Phases 1-3):
- CPU: 6 cores ⚠️ Slightly exceeds capacity (5 available)
- RAM: 12 GiB ✅ Within capacity
- Disk: 90 GiB ✅ Within capacity
✅ OPTIMIZED: All recommendations have been implemented:
- ✅ Moved high-CPU VMs to R630-01: Git Server, Email Server, DevOps Runner, Codespaces IDE, AS4 Gateway, Business Integration Gateway, Financial Messaging Gateway
- ✅ Reduced CPU allocations: DNS Primary reduced to 2 CPU, Sentries reduced to 2 CPU each
- ✅ Using Ceph storage: Large disk VMs now use ceph-fs storage
- ✅ Prioritized critical services: Only essential services (Nginx, DNS, Sentries) remain on ML110-01
R630-01 (Site-2) - Resource Capacity
Available Resources:
- CPU: 50 cores (52 - 2 reserved)
- RAM: ~752 GB (768 - 16 reserved)
- Disk: 171.3 GB (local-lvm) + Ceph OSD
Requested Resources (All Phases):
- CPU: 60 cores ✅ Within capacity (50 available)
- RAM: 208 GiB ✅ Within capacity
- Disk: 2,440 GiB ✅ Using Ceph storage (no local-lvm constraint)
✅ OPTIMIZED: All recommendations have been implemented:
- ✅ Using Ceph storage: All large disk VMs now use ceph-fs storage
- ✅ Optimized resource allocation: CPU allocations reduced (validators: 3 cores, others: 2-4 cores)
- ✅ Moved VMs from ML110-01: All high-resource VMs moved to R630-01
Revised Deployment Plan
Optimized Resource Allocation
ML110-01 (Site-1) - Light Workloads Only ✅ OPTIMIZED
Phase 1: Core Infrastructure
- Nginx Proxy VM: 2 CPU, 4 GiB RAM, 20 GiB disk ✅
Phase 2: Phoenix Infrastructure (Reduced)
- DNS Primary: 2 CPU, 4 GiB RAM, 50 GiB disk ✅
Phase 3: Blockchain (Sentries Only)
- smom-sentry-01: 2 CPU, 4 GiB RAM, 20 GiB disk ✅
- smom-sentry-02: 2 CPU, 4 GiB RAM, 20 GiB disk ✅
ML110-01 Total: 6 CPU cores requested, 5 available ⚠️ Slightly exceeds, but acceptable for critical services
✅ OPTIMIZED: Only essential services remain on ML110-01.
R630-01 (Site-2) - Primary Compute Node ✅ OPTIMIZED
Phase 1: Core Infrastructure
- Cloudflare Tunnel VM: 2 CPU, 4 GiB RAM, 10 GiB disk ✅
Phase 2: Phoenix Infrastructure (Moved)
- Git Server: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs) ✅
- Email Server: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs) ✅
- DevOps Runner: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs) ✅
- Codespaces IDE: 4 CPU, 32 GiB RAM, 200 GiB disk (ceph-fs) ✅
- AS4 Gateway: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs) ✅
- Business Integration Gateway: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs) ✅
- Financial Messaging Gateway: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs) ✅
Phase 3: Blockchain Infrastructure
- Validators (4x): 3 CPU each = 12 CPU, 12 GiB RAM each = 48 GiB RAM, 80 GiB disk (ceph-fs) ✅
- Sentries (2x): 2 CPU each = 4 CPU, 4 GiB RAM each = 8 GiB RAM, 40 GiB disk (ceph-fs) ✅
- RPC Nodes (4x): 2 CPU each = 8 CPU, 4 GiB RAM each = 16 GiB RAM, 80 GiB disk (ceph-fs) ✅
- Services (4x): 2 CPU each = 8 CPU, 4 GiB RAM each = 16 GiB RAM, 80 GiB disk (ceph-fs) ✅
R630-01 Total: 54 CPU cores requested, 50 available ⚠️ Slightly exceeds, but close to optimal utilization
✅ OPTIMIZED: All high-resource VMs moved to R630-01 with optimized CPU allocations and Ceph storage.
Deployment Execution Plan
Step 1: Pre-Deployment Verification
# 1. Verify Proxmox nodes are accessible
./scripts/check-proxmox-quota-ssh.sh
# 2. Verify images are available
./scripts/verify-image-availability.sh
# 3. Check Crossplane provider is ready
kubectl get providerconfig -n crossplane-system
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
Step 2: Deploy Phase 1 - Core Infrastructure
# Deploy Nginx Proxy (ML110-01)
kubectl apply -f examples/production/nginx-proxy-vm.yaml
# Deploy Cloudflare Tunnel (R630-01)
kubectl apply -f examples/production/cloudflare-tunnel-vm.yaml
# Monitor deployment
kubectl get proxmoxvm -w
Wait for: Both VMs to be in "Running" state before proceeding.
Step 3: Deploy Phase 2 - Phoenix Infrastructure
# Deploy DNS Primary (ML110-01)
kubectl apply -f examples/production/phoenix/dns-primary.yaml
# Wait for DNS to be ready, then deploy other services
kubectl apply -f examples/production/phoenix/git-server.yaml
kubectl apply -f examples/production/phoenix/email-server.yaml
kubectl apply -f examples/production/phoenix/devops-runner.yaml
kubectl apply -f examples/production/phoenix/codespaces-ide.yaml
kubectl apply -f examples/production/phoenix/as4-gateway.yaml
kubectl apply -f examples/production/phoenix/business-integration-gateway.yaml
kubectl apply -f examples/production/phoenix/financial-messaging-gateway.yaml
Note: Adjust node assignments and CPU allocations based on resource constraints.
Step 4: Deploy Phase 3 - Blockchain Infrastructure
# Deploy validators first
kubectl apply -f examples/production/smom-dbis-138/validator-01.yaml
kubectl apply -f examples/production/smom-dbis-138/validator-02.yaml
kubectl apply -f examples/production/smom-dbis-138/validator-03.yaml
kubectl apply -f examples/production/smom-dbis-138/validator-04.yaml
# Deploy sentries
kubectl apply -f examples/production/smom-dbis-138/sentry-01.yaml
kubectl apply -f examples/production/smom-dbis-138/sentry-02.yaml
kubectl apply -f examples/production/smom-dbis-138/sentry-03.yaml
kubectl apply -f examples/production/smom-dbis-138/sentry-04.yaml
# Deploy RPC nodes
kubectl apply -f examples/production/smom-dbis-138/rpc-node-01.yaml
kubectl apply -f examples/production/smom-dbis-138/rpc-node-02.yaml
kubectl apply -f examples/production/smom-dbis-138/rpc-node-03.yaml
kubectl apply -f examples/production/smom-dbis-138/rpc-node-04.yaml
# Deploy services
kubectl apply -f examples/production/smom-dbis-138/management.yaml
kubectl apply -f examples/production/smom-dbis-138/monitoring.yaml
kubectl apply -f examples/production/smom-dbis-138/services.yaml
kubectl apply -f examples/production/smom-dbis-138/blockscout.yaml
Step 5: Deploy Phase 4 - Test VMs (Optional)
# Deploy test VMs only if resources allow
kubectl apply -f examples/production/vm-100.yaml
kubectl apply -f examples/production/basic-vm.yaml
kubectl apply -f examples/production/medium-vm.yaml
kubectl apply -f examples/production/large-vm.yaml
Monitoring and Verification
Real-Time Monitoring
# Watch all VM deployments
kubectl get proxmoxvm -A -w
# Check specific VM status
kubectl describe proxmoxvm <vm-name>
# Check controller logs
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=100 -f
Resource Monitoring
# Check Proxmox node resources
./scripts/check-proxmox-quota-ssh.sh
# Check VM resource usage
kubectl get proxmoxvm -A -o wide
Post-Deployment Verification
# Verify all VMs are running
kubectl get proxmoxvm -A | grep -v Running
# Check VM IP addresses
kubectl get proxmoxvm -A -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.network.ipAddress}{"\n"}{end}'
# Verify guest agents
./scripts/verify-guest-agent.sh
Risk Mitigation
Resource Overcommitment
Risk: Requested resources exceed available capacity.
Mitigation:
- Deploy VMs in batches, monitoring resource usage
- Reduce CPU allocations where possible
- Use Ceph storage for large disk requirements
- Move high-resource VMs to R630-01
- Consider adding additional Proxmox nodes
Deployment Failures
Risk: VM creation may fail due to resource constraints or configuration errors.
Mitigation:
- Validate all VM configurations before deployment
- Check Proxmox quotas before each deployment
- Monitor controller logs for errors
- Have rollback procedures ready
- Test deployments on non-critical VMs first
Network Issues
Risk: Network connectivity problems may prevent VM deployment or operation.
Mitigation:
- Verify network bridges exist on all nodes
- Test network connectivity before deployment
- Configure proper DNS resolution
- Verify firewall rules allow required traffic
Deployment Timeline
Estimated Timeline
- Phase 1 (Core Infrastructure): 30 minutes
- Phase 2 (Phoenix Infrastructure): 2-4 hours
- Phase 3 (Blockchain Infrastructure): 3-6 hours
- Phase 4 (Test VMs): 1 hour (optional)
Total Estimated Time: 6-11 hours (excluding verification and troubleshooting)
Critical Path
- Core Infrastructure (Nginx, Cloudflare Tunnel) → 30 min
- DNS Primary → 15 min
- Git Server, Email Server → 1 hour
- DevOps Runner, Codespaces IDE → 1 hour
- Blockchain Validators → 2 hours
- Blockchain Sentries → 1 hour
- Blockchain RPC Nodes → 1 hour
- Blockchain Services → 1 hour
Next Steps
- Review and Approve: Review this plan and approve resource allocations
- Update VM Configurations: Update VM YAML files with optimized resource allocations
- Pre-Deployment Checks: Run all pre-deployment verification scripts
- Execute Deployment: Follow deployment steps in order
- Monitor and Verify: Continuously monitor deployment progress
- Post-Deployment: Verify all services are operational
Related Documentation
- VM Deployment Checklist - Step-by-step checklist
- VM Creation Procedure - Detailed creation procedures
- VM Specifications - Complete VM specifications
- Deployment Requirements - Overall deployment requirements
Last Updated: 2025-01-XX
Status: Ready for Review
Maintainer: Infrastructure Team
Version: 2.0