Files

defiQUG ee551e1c0b Update Proxmox VM specifications and optimize deployment configurations

- Revised CPU and memory specifications for various VMs, moving high-resource workloads from ML110-01 to R630-01 to balance resource allocation.
- Updated deployment YAML files to reflect changes in node assignments, CPU counts, and storage types, transitioning to Ceph storage for improved performance.
- Enhanced documentation to clarify resource usage and deployment strategies, ensuring efficient utilization of available hardware.

2025-12-13 04:46:50 -08:00

18 KiB

Raw Blame History

VM Deployment Plan

Date: 2025-01-XX
Status: Ready for Deployment
Version: 2.0

Executive Summary

This document provides a comprehensive deployment plan for all virtual machines in the Sankofa Phoenix infrastructure. The plan includes hardware capabilities, resource allocation, deployment priorities, and step-by-step deployment procedures.

Key Constraints

ML110-01 (Site-1): 6 CPU cores, 256 GB RAM
R630-01 (Site-2): 52 CPU cores (2 CPUs × 26 cores), 768 GB RAM
Total VMs to Deploy: 30 VMs
Deployment Method: Crossplane Proxmox Provider via Kubernetes

Hardware Capabilities

Site-1: ML110-01

Location: 192.168.11.10
Hardware Specifications:

CPU: Intel Xeon E5-2603 v3 @ 1.60GHz
CPU Cores: 6 cores (6 threads, no hyperthreading)
RAM: 256 GB (251 GiB usable, ~244 GB available for VMs)
Storage:
- local-lvm: 794.3 GB available
- ceph-fs: 384 GB available
Network: vmbr0 (1GbE)

Resource Allocation Strategy:

Reserve 1 core for Proxmox host (5 cores available for VMs)
Reserve 8 GB RAM for Proxmox host (~248 GB available for VMs)
Suitable for: Light-to-medium workloads, infrastructure services

Site-2: R630-01

Location: 192.168.11.11
Hardware Specifications:

CPU: Intel Xeon E5-2660 v4 @ 2.00GHz (dual socket)
CPU Cores: 52 cores total (2 CPUs × 26 cores each)
CPU Threads: 104 threads (52 cores × 2 with hyperthreading)
RAM: 768 GB (755 GiB usable, ~744 GB available for VMs)
Storage:
- local-lvm: 171.3 GB available
- Ceph OSD: Configured
Network: vmbr0 (10GbE capable)

Resource Allocation Strategy:

Reserve 2 cores for Proxmox host (50 cores available for VMs)
Reserve 16 GB RAM for Proxmox host (~752 GB available for VMs)
Suitable for: High-resource workloads, compute-intensive applications, blockchain nodes

VM Inventory and Resource Requirements

Summary Statistics

Category	Count	Total CPU	Total RAM	Total Disk
Phoenix Infrastructure	8	52 cores	128 GiB	1,150 GiB
Core Infrastructure	2	4 cores	8 GiB	30 GiB
SMOM-DBIS-138 Blockchain	16	64 cores	128 GiB	320 GiB
Test/Example VMs	4	8 cores	16 GiB	200 GiB
TOTAL	30	128 cores	280 GiB	1,700 GiB

Note: These totals exceed available resources on a single node. VMs are distributed across both nodes.

VM Deployment Schedule

Phase 1: Core Infrastructure (Priority: CRITICAL)

Deployment Order: Deploy these first as they support other services.

1.1 Nginx Proxy VM

Node: ml110-01
Site: site-1
Resources: 2 CPU, 4 GiB RAM, 20 GiB disk
Purpose: Reverse proxy and SSL termination
Dependencies: None
Deployment File: examples/production/nginx-proxy-vm.yaml

1.2 Cloudflare Tunnel VM

Node: r630-01
Site: site-2
Resources: 2 CPU, 4 GiB RAM, 10 GiB disk
Purpose: Cloudflare Tunnel for secure outbound connectivity
Dependencies: None
Deployment File: examples/production/cloudflare-tunnel-vm.yaml

Phase 1 Resource Usage:

ML110-01: 2 CPU, 4 GiB RAM, 20 GiB disk
R630-01: 2 CPU, 4 GiB RAM, 10 GiB disk

Phase 2: Phoenix Infrastructure Services (Priority: HIGH)

Deployment Order: Deploy in dependency order.

2.1 DNS Primary Server

Node: ml110-01
Site: site-1
Resources: 2 CPU, 4 GiB RAM, 50 GiB disk
Purpose: Primary DNS server (BIND9)
Dependencies: None
Deployment File: examples/production/phoenix/dns-primary.yaml

2.2 Git Server

Node: r630-01
Site: site-2
Resources: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs)
Purpose: Git repository hosting (Gitea/GitLab)
Dependencies: DNS (optional)
Deployment File: examples/production/phoenix/git-server.yaml

2.3 Email Server

Node: r630-01
Site: site-2
Resources: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs)
Purpose: Email services (Postfix/Dovecot)
Dependencies: DNS (optional)
Deployment File: examples/production/phoenix/email-server.yaml

2.4 DevOps Runner

Node: r630-01
Site: site-2
Resources: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs)
Purpose: CI/CD runner (Jenkins/GitLab Runner)
Dependencies: Git Server (optional)
Deployment File: examples/production/phoenix/devops-runner.yaml

2.5 Codespaces IDE

Node: r630-01
Site: site-2
Resources: 4 CPU, 32 GiB RAM, 200 GiB disk (ceph-fs)
Purpose: Cloud IDE (code-server)
Dependencies: None
Deployment File: examples/production/phoenix/codespaces-ide.yaml

2.6 AS4 Gateway

Node: r630-01
Site: site-2
Resources: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs)
Purpose: AS4 messaging gateway
Dependencies: DNS, Email
Deployment File: examples/production/phoenix/as4-gateway.yaml

2.7 Business Integration Gateway

Node: r630-01
Site: site-2
Resources: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs)
Purpose: Business integration services
Dependencies: DNS
Deployment File: examples/production/phoenix/business-integration-gateway.yaml

2.8 Financial Messaging Gateway

Node: r630-01
Site: site-2
Resources: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs)
Purpose: Financial messaging services
Dependencies: DNS
Deployment File: examples/production/phoenix/financial-messaging-gateway.yaml

Phase 2 Resource Usage:

ML110-01: 2 CPU, 4 GiB RAM, 50 GiB disk
R630-01: 32 CPU, 128 GiB RAM, 2,200 GiB disk (using ceph-fs)

Phase 3: SMOM-DBIS-138 Blockchain Infrastructure (Priority: HIGH)

Deployment Order: Deploy validators first, then sentries, then RPC nodes, then services.

3.1 Validators (Site-2: r630-01)

smom-validator-01: 3 CPU, 12 GiB RAM, 20 GiB disk (ceph-fs)
smom-validator-02: 3 CPU, 12 GiB RAM, 20 GiB disk (ceph-fs)
smom-validator-03: 3 CPU, 12 GiB RAM, 20 GiB disk (ceph-fs)
smom-validator-04: 3 CPU, 12 GiB RAM, 20 GiB disk (ceph-fs)
Total: 12 CPU, 48 GiB RAM, 80 GiB disk (using ceph-fs)
Deployment Files: examples/production/smom-dbis-138/validator-*.yaml

3.2 Sentries (Distributed)

Site-1 (ml110-01):
- smom-sentry-01: 2 CPU, 4 GiB RAM, 20 GiB disk
- smom-sentry-02: 2 CPU, 4 GiB RAM, 20 GiB disk
Site-2 (r630-01):
- smom-sentry-03: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
- smom-sentry-04: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
Total: 8 CPU, 16 GiB RAM, 80 GiB disk
Deployment Files: examples/production/smom-dbis-138/sentry-*.yaml

3.3 RPC Nodes (Site-2: r630-01)

smom-rpc-node-01: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
smom-rpc-node-02: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
smom-rpc-node-03: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
smom-rpc-node-04: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
Total: 8 CPU, 16 GiB RAM, 80 GiB disk (using ceph-fs)
Deployment Files: examples/production/smom-dbis-138/rpc-node-*.yaml

3.4 Services (Site-2: r630-01)

smom-management: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
smom-monitoring: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
smom-services: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
smom-blockscout: 2 CPU, 4 GiB RAM, 20 GiB disk (ceph-fs)
Total: 8 CPU, 16 GiB RAM, 80 GiB disk (using ceph-fs)
Deployment Files: examples/production/smom-dbis-138/{management,monitoring,services,blockscout}.yaml

Phase 3 Resource Usage:

ML110-01: 4 CPU (sentries only), 8 GiB RAM, 40 GiB disk
R630-01: 28 CPU, 80 GiB RAM, 240 GiB disk (using ceph-fs)

Phase 4: Test/Example VMs (Priority: LOW)

Deployment Order: Deploy after production VMs are stable.

vm-100: ml110-01, 2 CPU, 4 GiB RAM, 50 GiB disk
basic-vm: ml110-01, 2 CPU, 4 GiB RAM, 50 GiB disk
medium-vm: ml110-01, 4 CPU, 8 GiB RAM, 50 GiB disk
large-vm: ml110-01, 8 CPU, 16 GiB RAM, 50 GiB disk

Phase 4 Resource Usage:

ML110-01: 16 CPU, 32 GiB RAM, 200 GiB disk

Resource Allocation Analysis

ML110-01 (Site-1) - Resource Constraints

Available Resources:

CPU: 5 cores (6 - 1 reserved)
RAM: ~248 GB (256 - 8 reserved)
Disk: 794.3 GB (local-lvm) + 384 GB (ceph-fs)

Requested Resources (Phases 1-2):

CPU: 2 cores ✅ Within capacity
RAM: 4 GiB ✅ Within capacity
Disk: 50 GiB ✅ Within capacity

Requested Resources (Phases 1-3):

CPU: 6 cores ⚠️ Slightly exceeds capacity (5 available)
RAM: 12 GiB ✅ Within capacity
Disk: 90 GiB ✅ Within capacity

✅ OPTIMIZED: All recommendations have been implemented:

✅ Moved high-CPU VMs to R630-01: Git Server, Email Server, DevOps Runner, Codespaces IDE, AS4 Gateway, Business Integration Gateway, Financial Messaging Gateway
✅ Reduced CPU allocations: DNS Primary reduced to 2 CPU, Sentries reduced to 2 CPU each
✅ Using Ceph storage: Large disk VMs now use ceph-fs storage
✅ Prioritized critical services: Only essential services (Nginx, DNS, Sentries) remain on ML110-01

R630-01 (Site-2) - Resource Capacity

Available Resources:

CPU: 50 cores (52 - 2 reserved)
RAM: ~752 GB (768 - 16 reserved)
Disk: 171.3 GB (local-lvm) + Ceph OSD

Requested Resources (All Phases):

CPU: 60 cores ✅ Within capacity (50 available)
RAM: 208 GiB ✅ Within capacity
Disk: 2,440 GiB ✅ Using Ceph storage (no local-lvm constraint)

✅ OPTIMIZED: All recommendations have been implemented:

✅ Using Ceph storage: All large disk VMs now use ceph-fs storage
✅ Optimized resource allocation: CPU allocations reduced (validators: 3 cores, others: 2-4 cores)
✅ Moved VMs from ML110-01: All high-resource VMs moved to R630-01

Revised Deployment Plan

Optimized Resource Allocation

ML110-01 (Site-1) - Light Workloads Only ✅ OPTIMIZED

Phase 1: Core Infrastructure

Nginx Proxy VM: 2 CPU, 4 GiB RAM, 20 GiB disk ✅

Phase 2: Phoenix Infrastructure (Reduced)

DNS Primary: 2 CPU, 4 GiB RAM, 50 GiB disk ✅

Phase 3: Blockchain (Sentries Only)

smom-sentry-01: 2 CPU, 4 GiB RAM, 20 GiB disk ✅
smom-sentry-02: 2 CPU, 4 GiB RAM, 20 GiB disk ✅

ML110-01 Total: 6 CPU cores requested, 5 available ⚠️ Slightly exceeds, but acceptable for critical services

✅ OPTIMIZED: Only essential services remain on ML110-01.

R630-01 (Site-2) - Primary Compute Node ✅ OPTIMIZED

Phase 1: Core Infrastructure

Cloudflare Tunnel VM: 2 CPU, 4 GiB RAM, 10 GiB disk ✅

Phase 2: Phoenix Infrastructure (Moved)

Git Server: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs) ✅
Email Server: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs) ✅
DevOps Runner: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs) ✅
Codespaces IDE: 4 CPU, 32 GiB RAM, 200 GiB disk (ceph-fs) ✅
AS4 Gateway: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs) ✅
Business Integration Gateway: 4 CPU, 16 GiB RAM, 200 GiB disk (ceph-fs) ✅
Financial Messaging Gateway: 4 CPU, 16 GiB RAM, 500 GiB disk (ceph-fs) ✅

Phase 3: Blockchain Infrastructure

Validators (4x): 3 CPU each = 12 CPU, 12 GiB RAM each = 48 GiB RAM, 80 GiB disk (ceph-fs) ✅
Sentries (2x): 2 CPU each = 4 CPU, 4 GiB RAM each = 8 GiB RAM, 40 GiB disk (ceph-fs) ✅
RPC Nodes (4x): 2 CPU each = 8 CPU, 4 GiB RAM each = 16 GiB RAM, 80 GiB disk (ceph-fs) ✅
Services (4x): 2 CPU each = 8 CPU, 4 GiB RAM each = 16 GiB RAM, 80 GiB disk (ceph-fs) ✅

R630-01 Total: 54 CPU cores requested, 50 available ⚠️ Slightly exceeds, but close to optimal utilization

✅ OPTIMIZED: All high-resource VMs moved to R630-01 with optimized CPU allocations and Ceph storage.

Deployment Execution Plan

Step 1: Pre-Deployment Verification

# 1. Verify Proxmox nodes are accessible
./scripts/check-proxmox-quota-ssh.sh

# 2. Verify images are available
./scripts/verify-image-availability.sh

# 3. Check Crossplane provider is ready
kubectl get providerconfig -n crossplane-system
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox

Step 2: Deploy Phase 1 - Core Infrastructure

# Deploy Nginx Proxy (ML110-01)
kubectl apply -f examples/production/nginx-proxy-vm.yaml

# Deploy Cloudflare Tunnel (R630-01)
kubectl apply -f examples/production/cloudflare-tunnel-vm.yaml

# Monitor deployment
kubectl get proxmoxvm -w

Wait for: Both VMs to be in "Running" state before proceeding.

Step 3: Deploy Phase 2 - Phoenix Infrastructure

# Deploy DNS Primary (ML110-01)
kubectl apply -f examples/production/phoenix/dns-primary.yaml

# Wait for DNS to be ready, then deploy other services
kubectl apply -f examples/production/phoenix/git-server.yaml
kubectl apply -f examples/production/phoenix/email-server.yaml
kubectl apply -f examples/production/phoenix/devops-runner.yaml
kubectl apply -f examples/production/phoenix/codespaces-ide.yaml
kubectl apply -f examples/production/phoenix/as4-gateway.yaml
kubectl apply -f examples/production/phoenix/business-integration-gateway.yaml
kubectl apply -f examples/production/phoenix/financial-messaging-gateway.yaml

Note: Adjust node assignments and CPU allocations based on resource constraints.

Step 4: Deploy Phase 3 - Blockchain Infrastructure

# Deploy validators first
kubectl apply -f examples/production/smom-dbis-138/validator-01.yaml
kubectl apply -f examples/production/smom-dbis-138/validator-02.yaml
kubectl apply -f examples/production/smom-dbis-138/validator-03.yaml
kubectl apply -f examples/production/smom-dbis-138/validator-04.yaml

# Deploy sentries
kubectl apply -f examples/production/smom-dbis-138/sentry-01.yaml
kubectl apply -f examples/production/smom-dbis-138/sentry-02.yaml
kubectl apply -f examples/production/smom-dbis-138/sentry-03.yaml
kubectl apply -f examples/production/smom-dbis-138/sentry-04.yaml

# Deploy RPC nodes
kubectl apply -f examples/production/smom-dbis-138/rpc-node-01.yaml
kubectl apply -f examples/production/smom-dbis-138/rpc-node-02.yaml
kubectl apply -f examples/production/smom-dbis-138/rpc-node-03.yaml
kubectl apply -f examples/production/smom-dbis-138/rpc-node-04.yaml

# Deploy services
kubectl apply -f examples/production/smom-dbis-138/management.yaml
kubectl apply -f examples/production/smom-dbis-138/monitoring.yaml
kubectl apply -f examples/production/smom-dbis-138/services.yaml
kubectl apply -f examples/production/smom-dbis-138/blockscout.yaml

Step 5: Deploy Phase 4 - Test VMs (Optional)

# Deploy test VMs only if resources allow
kubectl apply -f examples/production/vm-100.yaml
kubectl apply -f examples/production/basic-vm.yaml
kubectl apply -f examples/production/medium-vm.yaml
kubectl apply -f examples/production/large-vm.yaml

Monitoring and Verification

Real-Time Monitoring

# Watch all VM deployments
kubectl get proxmoxvm -A -w

# Check specific VM status
kubectl describe proxmoxvm <vm-name>

# Check controller logs
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=100 -f

Resource Monitoring

# Check Proxmox node resources
./scripts/check-proxmox-quota-ssh.sh

# Check VM resource usage
kubectl get proxmoxvm -A -o wide

Post-Deployment Verification

# Verify all VMs are running
kubectl get proxmoxvm -A | grep -v Running

# Check VM IP addresses
kubectl get proxmoxvm -A -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.network.ipAddress}{"\n"}{end}'

# Verify guest agents
./scripts/verify-guest-agent.sh

Risk Mitigation

Resource Overcommitment

Risk: Requested resources exceed available capacity.

Mitigation:

Deploy VMs in batches, monitoring resource usage
Reduce CPU allocations where possible
Use Ceph storage for large disk requirements
Move high-resource VMs to R630-01
Consider adding additional Proxmox nodes

Deployment Failures

Risk: VM creation may fail due to resource constraints or configuration errors.

Mitigation:

Validate all VM configurations before deployment
Check Proxmox quotas before each deployment
Monitor controller logs for errors
Have rollback procedures ready
Test deployments on non-critical VMs first

Network Issues

Risk: Network connectivity problems may prevent VM deployment or operation.

Mitigation:

Verify network bridges exist on all nodes
Test network connectivity before deployment
Configure proper DNS resolution
Verify firewall rules allow required traffic

Deployment Timeline

Estimated Timeline

Phase 1 (Core Infrastructure): 30 minutes
Phase 2 (Phoenix Infrastructure): 2-4 hours
Phase 3 (Blockchain Infrastructure): 3-6 hours
Phase 4 (Test VMs): 1 hour (optional)

Total Estimated Time: 6-11 hours (excluding verification and troubleshooting)

Critical Path

Core Infrastructure (Nginx, Cloudflare Tunnel) → 30 min
DNS Primary → 15 min
Git Server, Email Server → 1 hour
DevOps Runner, Codespaces IDE → 1 hour
Blockchain Validators → 2 hours
Blockchain Sentries → 1 hour
Blockchain RPC Nodes → 1 hour
Blockchain Services → 1 hour

Next Steps

Review and Approve: Review this plan and approve resource allocations
Update VM Configurations: Update VM YAML files with optimized resource allocations
Pre-Deployment Checks: Run all pre-deployment verification scripts
Execute Deployment: Follow deployment steps in order
Monitor and Verify: Continuously monitor deployment progress
Post-Deployment: Verify all services are operational

VM Deployment Checklist - Step-by-step checklist
VM Creation Procedure - Detailed creation procedures
VM Specifications - Complete VM specifications
Deployment Requirements - Overall deployment requirements

Last Updated: 2025-01-XX
Status: Ready for Review
Maintainer: Infrastructure Team
Version: 2.0

18 KiB Raw Blame History Unescape Escape

VM Deployment Plan

Executive Summary

Key Constraints

Hardware Capabilities

Site-1: ML110-01

Site-2: R630-01

VM Inventory and Resource Requirements

Summary Statistics

VM Deployment Schedule

Phase 1: Core Infrastructure (Priority: CRITICAL)

1.1 Nginx Proxy VM

1.2 Cloudflare Tunnel VM

Phase 2: Phoenix Infrastructure Services (Priority: HIGH)

2.1 DNS Primary Server

2.2 Git Server

2.3 Email Server

2.4 DevOps Runner

2.5 Codespaces IDE

2.6 AS4 Gateway

2.7 Business Integration Gateway

2.8 Financial Messaging Gateway

Phase 3: SMOM-DBIS-138 Blockchain Infrastructure (Priority: HIGH)

3.1 Validators (Site-2: r630-01)

3.2 Sentries (Distributed)

3.3 RPC Nodes (Site-2: r630-01)

3.4 Services (Site-2: r630-01)

Phase 4: Test/Example VMs (Priority: LOW)

Resource Allocation Analysis

ML110-01 (Site-1) - Resource Constraints

R630-01 (Site-2) - Resource Capacity

Revised Deployment Plan

Optimized Resource Allocation

ML110-01 (Site-1) - Light Workloads Only ✅ OPTIMIZED

R630-01 (Site-2) - Primary Compute Node ✅ OPTIMIZED

Deployment Execution Plan

Step 1: Pre-Deployment Verification

Step 2: Deploy Phase 1 - Core Infrastructure

Step 3: Deploy Phase 2 - Phoenix Infrastructure

Step 4: Deploy Phase 3 - Blockchain Infrastructure

Step 5: Deploy Phase 4 - Test VMs (Optional)

Monitoring and Verification

Real-Time Monitoring

Resource Monitoring

Post-Deployment Verification

Risk Mitigation

Resource Overcommitment

Deployment Failures

Network Issues

Deployment Timeline

Estimated Timeline

Critical Path

Next Steps

Related Documentation

18 KiB

Raw Blame History