Update documentation structure and enhance .gitignore
- Added generated index files and report directories to .gitignore to prevent unnecessary tracking of transient files. - Updated README links to reflect new documentation paths for better navigation. - Improved documentation organization by ensuring all links point to the correct locations, enhancing user experience and accessibility.
This commit is contained in:
221
docs/deployment/DEPLOYMENT_NEXT_STEPS.md
Normal file
221
docs/deployment/DEPLOYMENT_NEXT_STEPS.md
Normal file
@@ -0,0 +1,221 @@
|
||||
# Deployment Next Steps
|
||||
|
||||
**Date**: 2025-12-09
|
||||
**Status**: ⚠️ **LOCK ISSUE - MANUAL RESOLUTION REQUIRED**
|
||||
|
||||
---
|
||||
|
||||
## Current Situation
|
||||
|
||||
### ✅ Completed
|
||||
1. **Provider Configuration**: ✅ Verified and working
|
||||
2. **VM Resource Created**: ✅ basic-vm-001 (VMID 100)
|
||||
3. **Deployment Initiated**: ✅ VM created in Proxmox
|
||||
|
||||
### ⚠️ Blocking Issue
|
||||
**VM Lock Timeout**: Configuration update blocked by Proxmox lock file
|
||||
|
||||
**Error**: `can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout`
|
||||
|
||||
---
|
||||
|
||||
## Immediate Action Required
|
||||
|
||||
### Step 1: Resolve Lock on Proxmox Node
|
||||
|
||||
**Access the Proxmox node and clear the lock:**
|
||||
|
||||
```bash
|
||||
# Connect to Proxmox node (replace with actual IP/hostname)
|
||||
ssh root@<proxmox-node-ip>
|
||||
|
||||
# Check VM status
|
||||
qm status 100
|
||||
|
||||
# Unlock the VM
|
||||
qm unlock 100
|
||||
|
||||
# If unlock doesn't work, remove lock file
|
||||
rm -f /var/lock/qemu-server/lock-100.conf
|
||||
|
||||
# Verify lock is cleared
|
||||
ls -la /var/lock/qemu-server/lock-100.conf
|
||||
```
|
||||
|
||||
**Note**: If you don't have direct SSH access, you may need to:
|
||||
- Use Proxmox web UI
|
||||
- Access via console
|
||||
- Use another method to access the node
|
||||
|
||||
### Step 2: Verify Image Availability
|
||||
|
||||
**While on the Proxmox node, verify the image exists:**
|
||||
|
||||
```bash
|
||||
# Check for image
|
||||
find /var/lib/vz/template/iso -name "ubuntu-22.04-cloud.img"
|
||||
pvesm list local-lvm | grep ubuntu-22.04-cloud
|
||||
|
||||
# If missing, download it
|
||||
cd /var/lib/vz/template/iso
|
||||
wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
|
||||
mv jammy-server-cloudimg-amd64.img ubuntu-22.04-cloud.img
|
||||
```
|
||||
|
||||
### Step 3: Monitor Automatic Retry
|
||||
|
||||
**After clearing the lock, the provider will automatically retry:**
|
||||
|
||||
```bash
|
||||
# Watch VM status
|
||||
kubectl get proxmoxvm basic-vm-001 -w
|
||||
|
||||
# Watch provider logs
|
||||
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50 -f
|
||||
```
|
||||
|
||||
**Expected Timeline**: 1-5 minutes after lock is cleared
|
||||
|
||||
---
|
||||
|
||||
## After Lock Resolution
|
||||
|
||||
### Expected Sequence
|
||||
|
||||
1. **Provider retries** configuration update (automatic)
|
||||
2. **VM configuration** completes successfully
|
||||
3. **Image import** (if needed) completes
|
||||
4. **Boot order** set correctly
|
||||
5. **Cloud-init** configured
|
||||
6. **VM boots** successfully
|
||||
7. **VM reaches "running" state**
|
||||
8. **IP address assigned**
|
||||
9. **Ready condition becomes "True"**
|
||||
|
||||
### Verification Steps
|
||||
|
||||
Once VM is running:
|
||||
|
||||
```bash
|
||||
# Get VM IP
|
||||
IP=$(kubectl get proxmoxvm basic-vm-001 -o jsonpath='{.status.networkInterfaces[0].ipAddress}')
|
||||
|
||||
# Check cloud-init logs
|
||||
ssh admin@$IP "cat /var/log/cloud-init-output.log | tail -50"
|
||||
|
||||
# Verify services
|
||||
ssh admin@$IP "systemctl status qemu-guest-agent chrony unattended-upgrades"
|
||||
|
||||
# Test SSH access
|
||||
ssh admin@$IP "hostname && uptime"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## If Lock Resolution Fails
|
||||
|
||||
### Alternative: Delete and Redeploy
|
||||
|
||||
If the lock cannot be cleared:
|
||||
|
||||
```bash
|
||||
# 1. Delete Kubernetes resource
|
||||
kubectl delete proxmoxvm basic-vm-001
|
||||
|
||||
# 2. On Proxmox node, force delete VM
|
||||
ssh root@<proxmox-node> "qm destroy 100 --purge --skiplock"
|
||||
|
||||
# 3. Clean up locks
|
||||
ssh root@<proxmox-node> "rm -f /var/lock/qemu-server/lock-100.conf"
|
||||
|
||||
# 4. Wait for cleanup
|
||||
sleep 10
|
||||
|
||||
# 5. Redeploy
|
||||
kubectl apply -f examples/production/basic-vm.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Long-term Solutions
|
||||
|
||||
### 1. Code Enhancement
|
||||
|
||||
**Add lock handling to provider code:**
|
||||
|
||||
- Detect lock errors in `UpdateVM`
|
||||
- Automatically call `qm unlock` before retry
|
||||
- Increase timeout for lock operations
|
||||
- Add exponential backoff for lock retries
|
||||
|
||||
**File**: `crossplane-provider-proxmox/pkg/proxmox/client.go`
|
||||
|
||||
### 2. Pre-deployment Checks
|
||||
|
||||
**Add validation before VM creation:**
|
||||
|
||||
- Check for existing locks on target node
|
||||
- Verify no conflicting operations
|
||||
- Ensure Proxmox node is healthy
|
||||
|
||||
### 3. Deployment Strategy
|
||||
|
||||
**For full deployment:**
|
||||
|
||||
- Deploy VMs sequentially (not in parallel)
|
||||
- Add delays between deployments (30-60 seconds)
|
||||
- Monitor each deployment before proceeding
|
||||
- Implement retry logic with lock handling
|
||||
|
||||
---
|
||||
|
||||
## Full Deployment Plan (After Test Success)
|
||||
|
||||
### Phase 1: Infrastructure (2 VMs)
|
||||
1. nginx-proxy-vm.yaml
|
||||
2. cloudflare-tunnel-vm.yaml
|
||||
|
||||
### Phase 2: SMOM-DBIS-138 Core (8 VMs)
|
||||
3-6. validator-01 through validator-04
|
||||
7-10. sentry-01 through sentry-04
|
||||
|
||||
### Phase 3: SMOM-DBIS-138 Services (8 VMs)
|
||||
11-14. rpc-node-01 through rpc-node-04
|
||||
15. services.yaml
|
||||
16. blockscout.yaml
|
||||
17. monitoring.yaml
|
||||
18. management.yaml
|
||||
|
||||
### Phase 4: Phoenix VMs (8 VMs)
|
||||
19-26. All Phoenix VMs
|
||||
|
||||
### Phase 5: Template VMs (2 VMs - Optional)
|
||||
27. medium-vm.yaml
|
||||
28. large-vm.yaml
|
||||
|
||||
**Total**: 28 additional VMs after test VM
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
### Current Status
|
||||
- ✅ Provider: Working
|
||||
- ✅ VM Created: Yes (VMID 100)
|
||||
- ⚠️ Configuration: Blocked by lock
|
||||
- ⚠️ State: Stopped
|
||||
|
||||
### Required Action
|
||||
**Manual lock resolution on Proxmox node**
|
||||
|
||||
### After Resolution
|
||||
- Provider will automatically retry
|
||||
- VM should complete configuration
|
||||
- VM should boot successfully
|
||||
- Full deployment can proceed
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-09
|
||||
**Status**: ⚠️ **WAITING FOR MANUAL LOCK RESOLUTION**
|
||||
|
||||
211
docs/deployment/DEPLOYMENT_READY.md
Normal file
211
docs/deployment/DEPLOYMENT_READY.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# Deployment Ready - Final Status
|
||||
|
||||
**Date**: 2025-12-09
|
||||
**Status**: ✅ **READY FOR DEPLOYMENT**
|
||||
|
||||
---
|
||||
|
||||
## Final Pre-Deployment Review Complete
|
||||
|
||||
All systems have been reviewed and verified. The deployment is ready to proceed.
|
||||
|
||||
---
|
||||
|
||||
## ✅ Verification Results
|
||||
|
||||
### VM Configuration (29/29) ✅
|
||||
- ✅ **Total VM Files**: 29
|
||||
- ✅ **YAML Syntax Valid**: 29/29 (100%)
|
||||
- ✅ **Image Specified**: 29/29 (100%)
|
||||
- ✅ **Node Specified**: 29/29 (100%)
|
||||
- ✅ **Storage Specified**: 29/29 (100%)
|
||||
- ✅ **Network Specified**: 29/29 (100%)
|
||||
- ✅ **Provider Config**: 29/29 (100%)
|
||||
|
||||
### Cloud-Init Enhancements (29/29) ✅
|
||||
- ✅ **NTP Configuration**: 29/29 (100%)
|
||||
- ✅ **SSH Hardening**: 29/29 (100%)
|
||||
- ✅ **Enhanced Final Message**: 29/29 (100%)
|
||||
- ✅ **Security Updates**: 29/29 (100%)
|
||||
- ✅ **Guest Agent**: 29/29 (100%)
|
||||
|
||||
### Deployment Code ✅
|
||||
- ✅ **Image Import**: Pre-flight checks, VM stop, verification
|
||||
- ✅ **Boot Order**: Explicitly set to `scsi0`
|
||||
- ✅ **Cloud-init Retry**: 3 attempts with retry logic
|
||||
- ✅ **Guest Agent**: Always enabled (`agent: "1"`)
|
||||
- ✅ **Disk Purge**: `purge=1` on delete
|
||||
|
||||
### Resource Summary
|
||||
- **Total CPUs**: 148 cores
|
||||
- **Total Memory**: 312 GiB
|
||||
- **Total Disk**: 2,968 GiB (~3 TiB)
|
||||
- **Unique Nodes**: 2 (ml110-01, r630-01)
|
||||
- **Image**: ubuntu-22.04-cloud (all VMs)
|
||||
- **Network**: vmbr0 (all VMs)
|
||||
- **Storage**: local-lvm (all VMs)
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Pre-Deployment Actions Required
|
||||
|
||||
### 1. Image Availability ⏳
|
||||
**Verify `ubuntu-22.04-cloud` image exists on all Proxmox nodes:**
|
||||
|
||||
```bash
|
||||
# On ml110-01:
|
||||
find /var/lib/vz/template/iso -name "ubuntu-22.04-cloud.img"
|
||||
pvesm list local | grep ubuntu-22.04-cloud
|
||||
|
||||
# On r630-01:
|
||||
find /var/lib/vz/template/iso -name "ubuntu-22.04-cloud.img"
|
||||
pvesm list local-lvm | grep ubuntu-22.04-cloud
|
||||
```
|
||||
|
||||
**If image missing, download:**
|
||||
```bash
|
||||
wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
|
||||
mv jammy-server-cloudimg-amd64.img /var/lib/vz/template/iso/ubuntu-22.04-cloud.img
|
||||
```
|
||||
|
||||
### 2. Provider Configuration ⏳
|
||||
**Verify provider configuration in Kubernetes:**
|
||||
|
||||
```bash
|
||||
# Check provider config exists:
|
||||
kubectl get providerconfig proxmox-provider-config -n crossplane-system
|
||||
|
||||
# Check provider secret:
|
||||
kubectl get secret -n crossplane-system | grep proxmox
|
||||
|
||||
# Verify provider pod is running:
|
||||
kubectl get pods -n crossplane-system | grep crossplane-provider-proxmox
|
||||
```
|
||||
|
||||
### 3. Resource Availability ⏳
|
||||
**Verify sufficient resources on Proxmox nodes:**
|
||||
|
||||
```bash
|
||||
# Check ml110-01 resources:
|
||||
pvesh get /nodes/ml110-01/status
|
||||
|
||||
# Check r630-01 resources:
|
||||
pvesh get /nodes/r630-01/status
|
||||
|
||||
# Check storage:
|
||||
pvesm list local-lvm
|
||||
```
|
||||
|
||||
**Required Resources:**
|
||||
- **CPU**: 148 cores total
|
||||
- **Memory**: 312 GiB total
|
||||
- **Disk**: 2,968 GiB (~3 TiB) total
|
||||
|
||||
### 4. Network Configuration ⏳
|
||||
**Verify `vmbr0` exists on all Proxmox nodes:**
|
||||
|
||||
```bash
|
||||
# On each node:
|
||||
ip link show vmbr0
|
||||
# Should show: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Deployment Process
|
||||
|
||||
### Step 1: Test Deployment
|
||||
```bash
|
||||
# Deploy test VM:
|
||||
kubectl apply -f examples/production/basic-vm.yaml
|
||||
|
||||
# Monitor deployment:
|
||||
kubectl get proxmoxvm basic-vm-001 -w
|
||||
|
||||
# Check logs:
|
||||
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50
|
||||
|
||||
# Verify in Proxmox:
|
||||
qm status 100 # (or appropriate VMID)
|
||||
```
|
||||
|
||||
### Step 2: Verify Test VM
|
||||
```bash
|
||||
# Get VM IP:
|
||||
qm guest exec <vmid> -- ip addr show
|
||||
|
||||
# Check cloud-init logs:
|
||||
ssh admin@<vm-ip> "cat /var/log/cloud-init-output.log | tail -50"
|
||||
|
||||
# Verify services:
|
||||
ssh admin@<vm-ip> "systemctl status qemu-guest-agent chrony unattended-upgrades"
|
||||
```
|
||||
|
||||
### Step 3: Deploy Infrastructure VMs
|
||||
```bash
|
||||
kubectl apply -f examples/production/nginx-proxy-vm.yaml
|
||||
kubectl apply -f examples/production/cloudflare-tunnel-vm.yaml
|
||||
```
|
||||
|
||||
### Step 4: Deploy SMOM-DBIS-138 VMs
|
||||
```bash
|
||||
# Deploy all SMOM VMs:
|
||||
kubectl apply -f examples/production/smom-dbis-138/
|
||||
```
|
||||
|
||||
### Step 5: Deploy Phoenix VMs
|
||||
```bash
|
||||
# Deploy all Phoenix VMs:
|
||||
kubectl apply -f examples/production/phoenix/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Post-Deployment Verification
|
||||
|
||||
### Immediate Checks (First 5 minutes)
|
||||
1. ✅ VM created in Proxmox
|
||||
2. ✅ VM booting successfully
|
||||
3. ✅ Cloud-init running
|
||||
4. ✅ Guest agent responding
|
||||
|
||||
### Post-Boot Checks (After 10 minutes)
|
||||
1. ✅ SSH access working
|
||||
2. ✅ All services running
|
||||
3. ✅ NTP synchronized
|
||||
4. ✅ Security updates configured
|
||||
5. ✅ Network connectivity
|
||||
|
||||
### Component-Specific Checks
|
||||
1. ✅ Nginx: HTTP/HTTPS accessible
|
||||
2. ✅ Cloudflare Tunnel: Service running
|
||||
3. ✅ DNS: Resolution working
|
||||
4. ✅ Blockchain: Services ready
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
### ✅ Complete
|
||||
- ✅ All 29 VMs configured and enhanced
|
||||
- ✅ All Cloud-Init enhancements applied
|
||||
- ✅ All critical code fixes verified
|
||||
- ✅ All documentation complete
|
||||
- ✅ YAML syntax validated
|
||||
|
||||
### ⏳ Pre-Deployment
|
||||
- ⏳ Image availability verification
|
||||
- ⏳ Provider configuration verification
|
||||
- ⏳ Resource availability check
|
||||
- ⏳ Network configuration check
|
||||
|
||||
### 🎯 Status
|
||||
|
||||
**READY FOR DEPLOYMENT** ✅
|
||||
|
||||
All configurations are complete, all enhancements are applied, and all critical fixes are verified. The deployment process is ready to proceed after completing the pre-deployment verification steps.
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-09
|
||||
**Status**: ✅ **READY FOR DEPLOYMENT**
|
||||
646
docs/deployment/DEPLOYMENT_REQUIREMENTS.md
Normal file
646
docs/deployment/DEPLOYMENT_REQUIREMENTS.md
Normal file
@@ -0,0 +1,646 @@
|
||||
# Sankofa Phoenix - Deployment Requirements
|
||||
|
||||
## Overview
|
||||
|
||||
This document outlines all requirements needed to deploy **Sankofa** (the ecosystem) and **Sankofa Phoenix** (the sovereign cloud platform). This includes infrastructure, software, network, security, and operational requirements.
|
||||
|
||||
---
|
||||
|
||||
## 1. Infrastructure Requirements
|
||||
|
||||
### 1.1 Edge Sites (Current Implementation)
|
||||
|
||||
**Proxmox VE Infrastructure:**
|
||||
- ✅ **Proxmox VE 8+** installed on physical hosts
|
||||
- ✅ **2+ Proxmox nodes** per site (for redundancy)
|
||||
- ✅ **Network bridge** configured (vmbr0)
|
||||
- ✅ **Storage pools** configured (local-lvm, ceph-fs, ceph-rbd)
|
||||
- ✅ **OS Images** available (ubuntu-22.04-cloud.img)
|
||||
|
||||
**Current Status:**
|
||||
- Site 1 (ml110-01): 192.168.11.10 - Operational ✅
|
||||
- Site 2 (r630-01): 192.168.11.11 - Operational ✅
|
||||
|
||||
**Resource Requirements (SMOM-DBIS-138):**
|
||||
- **Total VMs**: 18 (16 application + 2 infrastructure)
|
||||
- **Total CPU**: 72 cores
|
||||
- **Total RAM**: 140 GiB
|
||||
- **Total Disk**: 278 GiB
|
||||
|
||||
### 1.2 Kubernetes Control Plane
|
||||
|
||||
**Requirements:**
|
||||
- **Kubernetes v1.24+** cluster
|
||||
- **3 master nodes** minimum (for HA)
|
||||
- **5+ worker nodes** (for production workloads)
|
||||
- **Container runtime**: containerd or CRI-O
|
||||
- **CNI plugin**: Calico, Flannel, or Cilium
|
||||
- **Storage class**: Dynamic provisioning (local-path, NFS, or Ceph)
|
||||
|
||||
**Control Plane Components:**
|
||||
- **Crossplane**: Infrastructure as Code (Proxmox provider)
|
||||
- **ArgoCD**: GitOps deployment
|
||||
- **Keycloak**: Identity and access management
|
||||
- **Prometheus/Grafana**: Monitoring and observability
|
||||
- **Loki**: Log aggregation
|
||||
- **Vault**: Secrets management (optional)
|
||||
|
||||
### 1.3 Database Infrastructure
|
||||
|
||||
**PostgreSQL Requirements:**
|
||||
- **PostgreSQL 14+** (recommended: 15+)
|
||||
- **High availability**: Primary + replicas
|
||||
- **Storage**: NVMe SSD recommended (2TB+ per node)
|
||||
- **RAM**: 64GB+ per node
|
||||
- **Backup**: Automated daily backups
|
||||
|
||||
**Database Schema:**
|
||||
- 26 migrations including:
|
||||
- Multi-tenancy tables
|
||||
- Billing and usage tracking
|
||||
- MFA and RBAC
|
||||
- Blockchain integration
|
||||
- Audit logging
|
||||
|
||||
### 1.4 Blockchain Infrastructure (Future)
|
||||
|
||||
**Hyperledger Besu Validators:**
|
||||
- **3-5 validator nodes** per core datacenter
|
||||
- **CPU**: AMD EPYC 7763 (64 cores) or Intel Xeon Platinum 8380 (40 cores)
|
||||
- **RAM**: 128GB DDR4 ECC
|
||||
- **Storage**: 2x 4TB NVMe SSD (RAID 1) for blockchain state
|
||||
- **Network**: 2x 25GbE network adapters
|
||||
- **HSM**: Hardware Security Module for key storage
|
||||
|
||||
**Read Replica Nodes:**
|
||||
- **2-3 nodes** per regional datacenter
|
||||
- **CPU**: AMD EPYC 7543 (32 cores) or Intel Xeon Gold 6338 (32 cores)
|
||||
- **RAM**: 64GB DDR4 ECC
|
||||
- **Storage**: 2x 2TB NVMe SSD (RAID 1)
|
||||
|
||||
---
|
||||
|
||||
## 2. Software Requirements
|
||||
|
||||
### 2.1 Development Tools
|
||||
|
||||
**Required:**
|
||||
- **Node.js 18+** (for frontend, API, portal)
|
||||
- **pnpm** (recommended) or npm/yarn
|
||||
- **Go 1.21+** (for Crossplane provider)
|
||||
- **Docker** (for local development and containerization)
|
||||
- **Git** (version control)
|
||||
|
||||
**Optional:**
|
||||
- **kubectl** (v1.24+) - Kubernetes CLI
|
||||
- **helm** (v3.0+) - Kubernetes package manager
|
||||
- **docker-compose** - Local development
|
||||
|
||||
### 2.2 Application Components
|
||||
|
||||
**Frontend (Next.js):**
|
||||
- Next.js 14+
|
||||
- React + TypeScript
|
||||
- TailwindCSS + shadcn/ui
|
||||
- TanStack Query
|
||||
|
||||
**Backend:**
|
||||
- GraphQL API (Apollo Server + Fastify)
|
||||
- PostgreSQL 14+
|
||||
- WebSocket support
|
||||
- Real-time subscriptions
|
||||
|
||||
**Portal:**
|
||||
- Next.js portal application
|
||||
- Keycloak OIDC integration
|
||||
- Role-based dashboards
|
||||
|
||||
**Infrastructure:**
|
||||
- Crossplane provider for Proxmox
|
||||
- Kubernetes custom resources (ProxmoxVM)
|
||||
- GitOps with ArgoCD
|
||||
|
||||
### 2.3 Monitoring and Observability
|
||||
|
||||
**Required:**
|
||||
- **Prometheus**: Metrics collection
|
||||
- **Grafana**: Dashboards and visualization
|
||||
- **Loki**: Log aggregation
|
||||
- **Alertmanager**: Alert routing
|
||||
|
||||
**Optional:**
|
||||
- **Jaeger**: Distributed tracing
|
||||
- **Kiali**: Service mesh visualization
|
||||
|
||||
---
|
||||
|
||||
## 3. Network Requirements
|
||||
|
||||
### 3.1 Edge Sites (Current)
|
||||
|
||||
**Network Configuration:**
|
||||
- **Network bridge**: vmbr0
|
||||
- **IP range**: 192.168.11.0/24
|
||||
- **Gateway**: Configured
|
||||
- **DNS**: Configured
|
||||
|
||||
**Connectivity:**
|
||||
- **Cloudflare Tunnel**: Outbound-only secure connections
|
||||
- **Nginx Proxy**: SSL/TLS termination and routing
|
||||
- **Internet**: High-speed with redundancy
|
||||
|
||||
### 3.2 Cloudflare Integration
|
||||
|
||||
**Required:**
|
||||
- **Cloudflare account** with Zero Trust
|
||||
- **Cloudflare Tunnel** configured
|
||||
- **DNS records** configured
|
||||
- **Access policies** configured
|
||||
- **SSL/TLS certificates** (managed by Cloudflare)
|
||||
|
||||
**Tunnel Configuration:**
|
||||
- Tunnel credentials JSON file
|
||||
- Ingress rules configured
|
||||
- Health monitoring enabled
|
||||
|
||||
### 3.3 Inter-Datacenter Links (Future)
|
||||
|
||||
**Core to Core:**
|
||||
- **Bandwidth**: 100Gbps+ per link
|
||||
- **Redundancy**: Multiple redundant paths
|
||||
- **Type**: Dark fiber or high-bandwidth leased lines
|
||||
|
||||
**Core to Regional:**
|
||||
- **Bandwidth**: 10-40Gbps per link
|
||||
- **Redundancy**: Redundant paths
|
||||
- **Type**: Leased lines or MPLS
|
||||
|
||||
**Regional to Edge:**
|
||||
- **Bandwidth**: 1-10Gbps per link
|
||||
- **Redundancy**: Internet with redundancy
|
||||
- **Type**: Internet connectivity with Cloudflare Tunnels
|
||||
|
||||
---
|
||||
|
||||
## 4. Security Requirements
|
||||
|
||||
### 4.1 Identity and Access Management
|
||||
|
||||
**Keycloak:**
|
||||
- **Keycloak 20+** deployed
|
||||
- **OIDC clients** configured:
|
||||
- `sankofa-api` (backend API)
|
||||
- `portal-client` (portal application)
|
||||
- **Realms** configured (multi-tenant support)
|
||||
- **MFA** enabled (TOTP, FIDO2, SMS, Email)
|
||||
- **User federation** configured (optional)
|
||||
|
||||
**Access Control:**
|
||||
- **RBAC**: Role-based access control
|
||||
- **Tenant isolation**: Multi-tenant data isolation
|
||||
- **API authentication**: JWT tokens
|
||||
- **Session management**: Secure session handling
|
||||
|
||||
### 4.2 Network Security
|
||||
|
||||
**Firewalls:**
|
||||
- **Next-generation firewalls** (Palo Alto, Fortinet, Check Point)
|
||||
- **Access policies** configured
|
||||
- **Intrusion detection/prevention** (IDS/IPS)
|
||||
- **DDoS protection** (Cloudflare)
|
||||
|
||||
**Network Segmentation:**
|
||||
- **VLANs** for different tiers
|
||||
- **Network policies** in Kubernetes
|
||||
- **Service mesh** (optional: Istio, Linkerd)
|
||||
|
||||
### 4.3 Application Security
|
||||
|
||||
**Security Features:**
|
||||
- **Rate limiting**: 100 req/min per IP, 1000 req/hour per user
|
||||
- **Security headers**: CSP, HSTS, X-Frame-Options
|
||||
- **Input sanitization**: Body sanitization middleware
|
||||
- **Encryption**: TLS 1.2+ for all connections
|
||||
- **Secrets management**: Kubernetes secrets or Vault
|
||||
|
||||
**Audit Logging:**
|
||||
- **Comprehensive audit trail** for all operations
|
||||
- **Log retention** policy configured
|
||||
- **Compliance** logging (GDPR, SOC 2, ISO 27001)
|
||||
|
||||
### 4.4 Blockchain Security
|
||||
|
||||
**Key Management:**
|
||||
- **HSM**: Hardware Security Module for validator keys
|
||||
- **Key rotation**: Automated key rotation
|
||||
- **Multi-signature**: Multi-party governance
|
||||
|
||||
**Network Security:**
|
||||
- **Private P2P network**: Encrypted peer-to-peer connections
|
||||
- **Network overlay**: VPN or dedicated network segment
|
||||
- **Consensus communication**: Secure channels for validators
|
||||
|
||||
---
|
||||
|
||||
## 5. Environment Configuration
|
||||
|
||||
### 5.1 Environment Variables
|
||||
|
||||
**API (.env):**
|
||||
```env
|
||||
DB_HOST=postgres
|
||||
DB_PORT=5432
|
||||
DB_NAME=sankofa
|
||||
DB_USER=postgres
|
||||
DB_PASSWORD=your-password
|
||||
JWT_SECRET=your-jwt-secret
|
||||
|
||||
# Sovereign Identity (Keycloak)
|
||||
KEYCLOAK_URL=http://keycloak:8080
|
||||
KEYCLOAK_REALM=master
|
||||
KEYCLOAK_CLIENT_ID=sankofa-api
|
||||
KEYCLOAK_CLIENT_SECRET=your-keycloak-client-secret
|
||||
KEYCLOAK_MULTI_REALM=true
|
||||
|
||||
# Multi-Tenancy
|
||||
ENABLE_MULTI_TENANT=true
|
||||
DEFAULT_TENANT_ID=
|
||||
BLOCKCHAIN_IDENTITY_ENABLED=true
|
||||
|
||||
# Billing
|
||||
BILLING_GRANULARITY=SECOND
|
||||
BLOCKCHAIN_BILLING_ENABLED=true
|
||||
|
||||
# Blockchain
|
||||
BLOCKCHAIN_RPC_URL=http://besu:8545
|
||||
RESOURCE_PROVISIONING_CONTRACT_ADDRESS=0x...
|
||||
```
|
||||
|
||||
**Frontend (.env.local):**
|
||||
```env
|
||||
NEXT_PUBLIC_GRAPHQL_ENDPOINT=http://api:4000/graphql
|
||||
NEXT_PUBLIC_GRAPHQL_WS_ENDPOINT=ws://api:4000/graphql-ws
|
||||
NEXT_PUBLIC_APP_URL=http://localhost:3000
|
||||
NODE_ENV=development
|
||||
```
|
||||
|
||||
**Portal (.env.local):**
|
||||
```env
|
||||
KEYCLOAK_URL=https://keycloak.sankofa.nexus
|
||||
KEYCLOAK_REALM=sankofa
|
||||
KEYCLOAK_CLIENT_ID=portal-client
|
||||
KEYCLOAK_CLIENT_SECRET=your-secret
|
||||
NEXT_PUBLIC_CROSSPLANE_API=https://crossplane.sankofa.nexus
|
||||
NEXT_PUBLIC_ARGOCD_URL=https://argocd.sankofa.nexus
|
||||
NEXT_PUBLIC_GRAFANA_URL=https://grafana.sankofa.nexus
|
||||
NEXT_PUBLIC_LOKI_URL=https://loki.sankofa.nexus:3100
|
||||
```
|
||||
|
||||
**Proxmox Provider:**
|
||||
```env
|
||||
PROXMOX_HOST=192.168.11.10
|
||||
PROXMOX_USER=root@pam
|
||||
PROXMOX_PASS=your-password
|
||||
# OR
|
||||
PROXMOX_TOKEN=your-api-token
|
||||
```
|
||||
|
||||
### 5.2 Kubernetes Secrets
|
||||
|
||||
**Required Secrets:**
|
||||
- Database credentials
|
||||
- Keycloak client secrets
|
||||
- JWT secrets
|
||||
- Proxmox API credentials
|
||||
- Cloudflare tunnel credentials
|
||||
- SSL/TLS certificates (if not using Cloudflare)
|
||||
|
||||
---
|
||||
|
||||
## 6. Deployment Steps
|
||||
|
||||
### 6.1 Prerequisites Checklist
|
||||
|
||||
- [ ] Kubernetes cluster deployed and operational
|
||||
- [ ] PostgreSQL database deployed and accessible
|
||||
- [ ] Keycloak deployed and configured
|
||||
- [ ] Proxmox nodes accessible and configured
|
||||
- [ ] Cloudflare account and tunnel configured
|
||||
- [ ] Network connectivity verified
|
||||
- [ ] DNS records configured
|
||||
- [ ] SSL/TLS certificates configured
|
||||
|
||||
### 6.2 Database Setup
|
||||
|
||||
```bash
|
||||
# 1. Create database
|
||||
createdb sankofa
|
||||
|
||||
# 2. Run migrations (26 migrations)
|
||||
cd api
|
||||
npm run db:migrate
|
||||
|
||||
# 3. Verify migrations
|
||||
psql -d sankofa -c "\dt"
|
||||
|
||||
# 4. Seed initial data (optional)
|
||||
npm run db:seed
|
||||
```
|
||||
|
||||
### 6.3 Kubernetes Deployment
|
||||
|
||||
```bash
|
||||
# 1. Create namespaces
|
||||
kubectl create namespace sankofa
|
||||
kubectl create namespace crossplane-system
|
||||
kubectl create namespace monitoring
|
||||
|
||||
# 2. Deploy Crossplane
|
||||
kubectl apply -f gitops/apps/crossplane/
|
||||
|
||||
# 3. Deploy Proxmox Provider
|
||||
kubectl apply -f crossplane-provider-proxmox/config/
|
||||
|
||||
# 4. Deploy ArgoCD
|
||||
kubectl apply -f gitops/apps/argocd/
|
||||
|
||||
# 5. Deploy Keycloak
|
||||
kubectl apply -f gitops/apps/keycloak/
|
||||
|
||||
# 6. Deploy API
|
||||
kubectl apply -f gitops/apps/api/
|
||||
|
||||
# 7. Deploy Frontend
|
||||
kubectl apply -f gitops/apps/frontend/
|
||||
|
||||
# 8. Deploy Portal
|
||||
kubectl apply -f gitops/apps/portal/
|
||||
|
||||
# 9. Deploy Monitoring
|
||||
kubectl apply -f gitops/apps/monitoring/
|
||||
```
|
||||
|
||||
### 6.4 Proxmox VM Deployment
|
||||
|
||||
```bash
|
||||
# 1. Deploy infrastructure VMs first
|
||||
kubectl apply -f examples/production/nginx-proxy-vm.yaml
|
||||
kubectl apply -f examples/production/cloudflare-tunnel-vm.yaml
|
||||
|
||||
# 2. Deploy application VMs
|
||||
kubectl apply -f examples/production/smom-dbis-138/
|
||||
|
||||
# 3. Monitor deployment
|
||||
kubectl get proxmoxvm -A -w
|
||||
|
||||
# 4. Check controller logs
|
||||
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50 -f
|
||||
```
|
||||
|
||||
### 6.5 GitOps Setup (ArgoCD)
|
||||
|
||||
```bash
|
||||
# 1. Apply ArgoCD application
|
||||
kubectl apply -f gitops/apps/argocd/application.yaml
|
||||
|
||||
# 2. Sync application
|
||||
argocd app sync sankofa-phoenix
|
||||
|
||||
# 3. Verify sync status
|
||||
argocd app get sankofa-phoenix
|
||||
```
|
||||
|
||||
### 6.6 Multi-Tenancy Setup
|
||||
|
||||
```bash
|
||||
# 1. Create system tenant (via GraphQL)
|
||||
curl -X POST http://api.sankofa.nexus/graphql \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer <admin-token>" \
|
||||
-d '{
|
||||
"query": "mutation { createTenant(input: { name: \"system\", tier: SOVEREIGN }) { id name billingAccountId } }"
|
||||
}'
|
||||
|
||||
# 2. Assign admin user to system tenant
|
||||
curl -X POST http://api.sankofa.nexus/graphql \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer <admin-token>" \
|
||||
-d '{
|
||||
"query": "mutation { addUserToTenant(tenantId: \"<system-tenant-id>\", userId: \"<admin-user-id>\", role: TENANT_OWNER) }"
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Verification and Testing
|
||||
|
||||
### 7.1 Health Checks
|
||||
|
||||
```bash
|
||||
# API health
|
||||
curl http://api.sankofa.nexus/health
|
||||
|
||||
# Frontend
|
||||
curl http://frontend.sankofa.nexus
|
||||
|
||||
# Portal
|
||||
curl http://portal.sankofa.nexus
|
||||
|
||||
# Keycloak health
|
||||
curl http://keycloak.sankofa.nexus/health
|
||||
|
||||
# Proxmox VMs
|
||||
kubectl get proxmoxvm -A
|
||||
```
|
||||
|
||||
### 7.2 Smoke Tests
|
||||
|
||||
```bash
|
||||
# Run smoke tests
|
||||
./scripts/smoke-tests.sh
|
||||
```
|
||||
|
||||
### 7.3 Performance Testing
|
||||
|
||||
```bash
|
||||
# Load testing
|
||||
./scripts/performance-test.sh
|
||||
|
||||
# k6 load test
|
||||
k6 run scripts/k6-load-test.js
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Operational Requirements
|
||||
|
||||
### 8.1 Monitoring and Alerting
|
||||
|
||||
**Required:**
|
||||
- Prometheus metrics collection
|
||||
- Grafana dashboards
|
||||
- Alertmanager rules
|
||||
- Notification channels (email, Slack, PagerDuty)
|
||||
|
||||
**Key Metrics:**
|
||||
- API response times
|
||||
- Database query performance
|
||||
- VM resource utilization
|
||||
- Blockchain network health
|
||||
- Service availability
|
||||
|
||||
### 8.2 Backup and Disaster Recovery
|
||||
|
||||
**Database Backups:**
|
||||
- Daily automated backups
|
||||
- Retention policy: 30 days minimum
|
||||
- Off-site backup storage
|
||||
- Backup verification scripts
|
||||
|
||||
**VM Backups:**
|
||||
- Proxmox backup schedules
|
||||
- Snapshot management
|
||||
- Disaster recovery procedures
|
||||
|
||||
### 8.3 Support and Operations
|
||||
|
||||
**Required:**
|
||||
- 24/7 on-call rotation
|
||||
- Incident response procedures
|
||||
- Runbooks for common issues
|
||||
- Escalation procedures
|
||||
- Support team training
|
||||
|
||||
---
|
||||
|
||||
## 9. Compliance and Governance
|
||||
|
||||
### 9.1 Compliance Requirements
|
||||
|
||||
**Data Protection:**
|
||||
- GDPR compliance (EU)
|
||||
- Data retention policies
|
||||
- Privacy policy published
|
||||
- Terms of service published
|
||||
|
||||
**Security Standards:**
|
||||
- SOC 2 Type II (if applicable)
|
||||
- ISO 27001 (if applicable)
|
||||
- Security audit procedures
|
||||
- Penetration testing
|
||||
|
||||
### 9.2 Governance
|
||||
|
||||
**Multi-Tenancy:**
|
||||
- Tenant isolation verified
|
||||
- Resource quotas enforced
|
||||
- Billing accuracy verified
|
||||
- Audit logging enabled
|
||||
|
||||
**Blockchain Governance:**
|
||||
- Multi-party governance nodes
|
||||
- Smart contract upgrade procedures
|
||||
- Network upgrade procedures
|
||||
|
||||
---
|
||||
|
||||
## 10. Cost Estimates
|
||||
|
||||
### 10.1 Infrastructure Costs
|
||||
|
||||
**Edge Sites (Current):**
|
||||
- Proxmox hardware: $10K-$50K per site
|
||||
- Network equipment: $5K-$20K per site
|
||||
- Power and cooling: $1K-$5K per year per site
|
||||
|
||||
**Kubernetes Cluster:**
|
||||
- Control plane: $500-$2K per month
|
||||
- Worker nodes: $1K-$5K per month
|
||||
- Storage: $200-$1K per month
|
||||
|
||||
**Database:**
|
||||
- PostgreSQL cluster: $500-$2K per month
|
||||
- Backup storage: $100-$500 per month
|
||||
|
||||
### 10.2 Cloudflare Costs
|
||||
|
||||
**Zero Trust:**
|
||||
- Free tier: Up to 50 users
|
||||
- Paid tier: $7 per user per month
|
||||
|
||||
**Tunnels:**
|
||||
- Free: Unlimited tunnels
|
||||
- Paid: Additional features
|
||||
|
||||
**Bandwidth:**
|
||||
- Included in Zero Trust plan
|
||||
- Additional bandwidth: $0.10-$0.50 per GB
|
||||
|
||||
### 10.3 Operational Costs
|
||||
|
||||
**Personnel:**
|
||||
- DevOps engineers: $100K-$200K per year
|
||||
- SRE engineers: $120K-$250K per year
|
||||
- Support staff: $50K-$100K per year
|
||||
|
||||
**Software Licenses:**
|
||||
- Most components are open source
|
||||
- Optional commercial support: $10K-$100K per year
|
||||
|
||||
---
|
||||
|
||||
## 11. Quick Start Summary
|
||||
|
||||
### Minimum Viable Deployment
|
||||
|
||||
**For Development/Testing:**
|
||||
1. Single Kubernetes cluster (3 nodes minimum)
|
||||
2. PostgreSQL database (single instance)
|
||||
3. Keycloak (single instance)
|
||||
4. 2 Proxmox nodes
|
||||
5. Cloudflare account (free tier)
|
||||
6. All application components deployed
|
||||
|
||||
**For Production:**
|
||||
1. High-availability Kubernetes cluster (3 masters + 5 workers)
|
||||
2. PostgreSQL cluster (primary + replicas)
|
||||
3. Keycloak cluster (HA)
|
||||
4. Multiple Proxmox sites (2+ sites)
|
||||
5. Cloudflare Zero Trust (paid tier)
|
||||
6. Monitoring and alerting configured
|
||||
7. Backup and disaster recovery configured
|
||||
8. Security hardening completed
|
||||
|
||||
---
|
||||
|
||||
## 12. Documentation References
|
||||
|
||||
- **[Production Deployment Ready](./PRODUCTION_DEPLOYMENT_READY.md)** - Current deployment status
|
||||
- **[Launch Checklist](./status/LAUNCH_CHECKLIST.md)** - Pre-launch verification
|
||||
- **[Deployment Guide](./DEPLOYMENT.md)** - Detailed deployment instructions
|
||||
- **[Deployment Plan](./deployment_plan.md)** - Phased rollout plan
|
||||
- **[System Architecture](./system_architecture.md)** - Overall architecture
|
||||
- **[Hardware BOM](./hardware_bom.md)** - Hardware specifications
|
||||
- **[VM Specifications](vm/VM_SPECIFICATIONS.md)** - Complete VM specifications and deployment patterns
|
||||
- **[VM Creation Procedure](vm/VM_CREATION_PROCEDURE.md)** - Step-by-step VM deployment guide
|
||||
|
||||
---
|
||||
|
||||
## 13. Next Steps
|
||||
|
||||
1. **Review Prerequisites**: Verify all infrastructure and software requirements
|
||||
2. **Configure Environment**: Set up environment variables and secrets
|
||||
3. **Deploy Database**: Run migrations and seed data
|
||||
4. **Deploy Kubernetes**: Deploy control plane components
|
||||
5. **Deploy Applications**: Deploy API, frontend, and portal
|
||||
6. **Deploy VMs**: Deploy Proxmox VMs via Crossplane
|
||||
7. **Configure Monitoring**: Set up Prometheus, Grafana, and Loki
|
||||
8. **Verify Deployment**: Run health checks and smoke tests
|
||||
9. **Configure Multi-Tenancy**: Set up initial tenants
|
||||
10. **Go Live**: Enable production traffic
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-01-XX
|
||||
**Status**: Comprehensive deployment requirements documented
|
||||
|
||||
289
docs/deployment/PRE_DEPLOYMENT_CHECKLIST.md
Normal file
289
docs/deployment/PRE_DEPLOYMENT_CHECKLIST.md
Normal file
@@ -0,0 +1,289 @@
|
||||
# Pre-Deployment Checklist
|
||||
|
||||
**Date**: 2025-12-09
|
||||
**Status**: ✅ **READY FOR DEPLOYMENT**
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**All pre-deployment checks have been completed successfully.** All 29 VMs are configured correctly with enhanced Cloud-Init, all critical code fixes are in place, and the deployment process is ready.
|
||||
|
||||
---
|
||||
|
||||
## ✅ 1. VM Configuration Review
|
||||
|
||||
### File Count and Structure
|
||||
- ✅ **Total VM Files**: 29
|
||||
- ✅ **All files valid YAML**: Verified
|
||||
- ✅ **All files have required fields**: Verified
|
||||
|
||||
### Enhancement Status
|
||||
- ✅ **NTP Configuration**: 29/29 (100%)
|
||||
- ✅ **SSH Hardening**: 29/29 (100%)
|
||||
- ✅ **Enhanced Final Message**: 29/29 (100%)
|
||||
- ✅ **Security Updates**: 29/29 (100%)
|
||||
- ✅ **Additional Packages**: 29/29 (100%)
|
||||
|
||||
### Cloud-Init Configuration
|
||||
- ✅ **userData present**: 29/29 (100%)
|
||||
- ✅ **#cloud-config header**: 29/29 (100%)
|
||||
- ✅ **package_update/upgrade**: 29/29 (100%)
|
||||
- ✅ **qemu-guest-agent**: 29/29 (100%)
|
||||
- ✅ **Guest agent verification**: 29/29 (100%)
|
||||
|
||||
---
|
||||
|
||||
## ✅ 2. Deployment Code Review
|
||||
|
||||
### Critical Fixes Applied
|
||||
- ✅ **Image Import**: Pre-flight checks, VM stop before import, verification
|
||||
- ✅ **Boot Order**: Explicitly set to `scsi0` after image import
|
||||
- ✅ **Cloud-init userData**: Retry logic (3 attempts) implemented
|
||||
- ✅ **Disk Deletion**: Purge option to remove all associated disks
|
||||
- ✅ **Guest Agent**: Enabled in all VM creation/update paths
|
||||
|
||||
### Code Verification
|
||||
- ✅ **Guest agent enabled**: `agent: "1"` in all VM configs
|
||||
- ✅ **Image import handling**: `findImageInStorage` with error handling
|
||||
- ✅ **Boot order setting**: `boot: order=scsi0` after import
|
||||
- ✅ **Cloud-init retry**: `Retry` function with 3 attempts
|
||||
|
||||
---
|
||||
|
||||
## ✅ 3. Image and Resource Configuration
|
||||
|
||||
### Image Configuration
|
||||
- ✅ **All VMs specify image**: `ubuntu-22.04-cloud`
|
||||
- ✅ **Image path resolution**: Handled in `findImageInStorage`
|
||||
- ✅ **Image import process**: Complete with verification
|
||||
|
||||
### Resource Allocation
|
||||
- ✅ **Node assignment**: All VMs have valid node specified
|
||||
- ✅ **Storage configuration**: All VMs have storage specified
|
||||
- ✅ **Network configuration**: All VMs have network specified
|
||||
- ✅ **Provider config reference**: All VMs reference `proxmox-provider-config`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 4. Security Configuration
|
||||
|
||||
### SSH Configuration
|
||||
- ✅ **Root login**: Disabled in all VMs
|
||||
- ✅ **Password auth**: Disabled in all VMs
|
||||
- ✅ **Public key auth**: Enabled in all VMs
|
||||
- ✅ **SSH keys**: Configured in userData
|
||||
|
||||
### Security Updates
|
||||
- ✅ **Automatic updates**: Enabled in all VMs
|
||||
- ✅ **Security-only updates**: Configured
|
||||
- ✅ **No auto-reboot**: Manual control maintained
|
||||
|
||||
### Time Synchronization
|
||||
- ✅ **NTP enabled**: All VMs configured with Chrony
|
||||
- ✅ **NTP servers**: 4 servers configured
|
||||
- ✅ **Status verification**: Included in boot process
|
||||
|
||||
---
|
||||
|
||||
## ✅ 5. Component-Specific Configurations
|
||||
|
||||
### SMOM-DBIS-138 VMs (16 files)
|
||||
- ✅ All validators configured correctly
|
||||
- ✅ All sentries configured correctly
|
||||
- ✅ All RPC nodes configured correctly
|
||||
- ✅ Services, blockscout, monitoring, management configured
|
||||
|
||||
### Phoenix VMs (8 files)
|
||||
- ✅ DNS primary configured with BIND9
|
||||
- ✅ Git server configured
|
||||
- ✅ Email server configured
|
||||
- ✅ All gateways configured
|
||||
- ✅ DevOps runner configured
|
||||
- ✅ Codespaces IDE configured
|
||||
|
||||
### Infrastructure VMs (2 files)
|
||||
- ✅ Nginx proxy configured with Nginx, Certbot, UFW
|
||||
- ✅ Cloudflare tunnel configured with cloudflared
|
||||
|
||||
### Template VMs (3 files)
|
||||
- ✅ Basic, medium, large templates all enhanced
|
||||
|
||||
---
|
||||
|
||||
## ✅ 6. Documentation Review
|
||||
|
||||
### Documentation Created
|
||||
- ✅ `CLOUD_INIT_REVIEW.md` - Comprehensive review
|
||||
- ✅ `CLOUD_INIT_TESTING_CHECKLIST.md` - Testing procedures
|
||||
- ✅ `CLOUD_INIT_REVIEW_SUMMARY.md` - Executive summary
|
||||
- ✅ `CLOUD_INIT_ENHANCED_TEMPLATE.md` - Template reference
|
||||
- ✅ `CLOUD_INIT_ENHANCEMENTS_COMPLETE.md` - Enhancement status
|
||||
- ✅ `CLOUD_INIT_ENHANCEMENTS_FINAL.md` - Final status
|
||||
- ✅ `CLOUD_INIT_COMPLETE_SUMMARY.md` - Complete summary
|
||||
- ✅ `CLOUD_INIT_ENHANCEMENTS_FINAL_STATUS.md` - Final status report
|
||||
- ✅ `VM_DEPLOYMENT_REVIEW_COMPLETE.md` - Deployment review
|
||||
- ✅ `VM_DEPLOYMENT_FIXES.md` - Fixes identified
|
||||
- ✅ `VM_DEPLOYMENT_FIXES_IMPLEMENTED.md` - Fixes implemented
|
||||
- ✅ `VM_DEPLOYMENT_PROCESS_VERIFIED.md` - Process verification
|
||||
- ✅ `BUG_FIXES_2025-12-09.md` - Bug fixes documentation
|
||||
- ✅ `PRE_DEPLOYMENT_CHECKLIST.md` - This document
|
||||
|
||||
---
|
||||
|
||||
## ✅ 7. Potential Issues Check
|
||||
|
||||
### Image Availability
|
||||
- ⚠️ **Action Required**: Verify `ubuntu-22.04-cloud` image exists on all Proxmox nodes
|
||||
- ⚠️ **Action Required**: Ensure image is accessible from specified storage
|
||||
|
||||
### Provider Configuration
|
||||
- ⚠️ **Action Required**: Verify `proxmox-provider-config` exists in Kubernetes
|
||||
- ⚠️ **Action Required**: Verify provider credentials are correct
|
||||
|
||||
### Network Configuration
|
||||
- ✅ **All VMs use vmbr0**: Consistent network configuration
|
||||
- ⚠️ **Action Required**: Verify vmbr0 exists on all Proxmox nodes
|
||||
|
||||
### Resource Availability
|
||||
- ⚠️ **Action Required**: Verify sufficient CPU, memory, and disk on Proxmox nodes
|
||||
- ⚠️ **Action Required**: Check resource quotas before deployment
|
||||
|
||||
---
|
||||
|
||||
## ✅ 8. Deployment Readiness
|
||||
|
||||
### Pre-Deployment Requirements
|
||||
- ✅ All VM YAML files complete and valid
|
||||
- ✅ All Cloud-Init configurations enhanced
|
||||
- ✅ All critical code fixes applied
|
||||
- ✅ All documentation complete
|
||||
- ⏳ **Pending**: Image availability verification
|
||||
- ⏳ **Pending**: Provider configuration verification
|
||||
- ⏳ **Pending**: Resource availability check
|
||||
|
||||
### Deployment Process
|
||||
1. ✅ **VM Templates**: All 29 VMs ready
|
||||
2. ✅ **Cloud-Init**: All configurations complete
|
||||
3. ✅ **Code Fixes**: All critical issues resolved
|
||||
4. ⏳ **Provider Config**: Verify in Kubernetes
|
||||
5. ⏳ **Image Availability**: Verify on Proxmox nodes
|
||||
6. ⏳ **Resource Check**: Verify capacity
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Pre-Deployment Actions Required
|
||||
|
||||
### 1. Verify Image Availability
|
||||
```bash
|
||||
# On each Proxmox node, verify image exists:
|
||||
find /var/lib/vz/template/iso -name "ubuntu-22.04-cloud.img"
|
||||
# Or check storage:
|
||||
pvesm list <storage-name> | grep ubuntu-22.04-cloud
|
||||
```
|
||||
|
||||
### 2. Verify Provider Configuration
|
||||
```bash
|
||||
# In Kubernetes:
|
||||
kubectl get providerconfig proxmox-provider-config -n crossplane-system
|
||||
kubectl get secret -n crossplane-system | grep proxmox
|
||||
```
|
||||
|
||||
### 3. Verify Resource Availability
|
||||
```bash
|
||||
# Check Proxmox node resources:
|
||||
pvesh get /nodes/<node>/status
|
||||
# Check available storage:
|
||||
pvesm list <storage-name>
|
||||
```
|
||||
|
||||
### 4. Test Deployment
|
||||
```bash
|
||||
# Deploy test VM first:
|
||||
kubectl apply -f examples/production/basic-vm.yaml
|
||||
# Monitor deployment:
|
||||
kubectl get proxmoxvm basic-vm-001 -w
|
||||
# Check logs:
|
||||
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ 9. Deployment Order Recommendation
|
||||
|
||||
### Phase 1: Infrastructure (2 VMs)
|
||||
1. nginx-proxy-vm.yaml
|
||||
2. cloudflare-tunnel-vm.yaml
|
||||
|
||||
### Phase 2: Test Deployment (1 VM)
|
||||
3. basic-vm.yaml (test case)
|
||||
|
||||
### Phase 3: SMOM-DBIS-138 Core (8 VMs)
|
||||
4-7. validator-01 through validator-04
|
||||
8-11. sentry-01 through sentry-04
|
||||
|
||||
### Phase 4: SMOM-DBIS-138 Services (8 VMs)
|
||||
12-15. rpc-node-01 through rpc-node-04
|
||||
16. services.yaml
|
||||
17. blockscout.yaml
|
||||
18. monitoring.yaml
|
||||
19. management.yaml
|
||||
|
||||
### Phase 5: Phoenix VMs (8 VMs)
|
||||
20-27. All Phoenix VMs
|
||||
|
||||
### Phase 6: Template VMs (2 VMs - Optional)
|
||||
28. medium-vm.yaml
|
||||
29. large-vm.yaml
|
||||
|
||||
---
|
||||
|
||||
## ✅ 10. Verification Steps After Deployment
|
||||
|
||||
### Immediate Verification (First 5 minutes)
|
||||
1. ✅ Check VM creation in Proxmox
|
||||
2. ✅ Verify VM boot status
|
||||
3. ✅ Check cloud-init logs
|
||||
4. ✅ Verify guest agent status
|
||||
|
||||
### Post-Boot Verification (After 10 minutes)
|
||||
1. ✅ SSH access test
|
||||
2. ✅ Service status check
|
||||
3. ✅ NTP synchronization check
|
||||
4. ✅ Security updates status
|
||||
5. ✅ Network connectivity test
|
||||
|
||||
### Component-Specific Verification
|
||||
1. ✅ Nginx: HTTP/HTTPS access
|
||||
2. ✅ Cloudflare Tunnel: Service status
|
||||
3. ✅ DNS: DNS resolution test
|
||||
4. ✅ Blockchain components: Service readiness
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
### ✅ Ready for Deployment
|
||||
- ✅ All 29 VMs configured correctly
|
||||
- ✅ All Cloud-Init enhancements applied
|
||||
- ✅ All critical code fixes in place
|
||||
- ✅ All documentation complete
|
||||
|
||||
### ⚠️ Pre-Deployment Actions
|
||||
- ⏳ Verify image availability on Proxmox nodes
|
||||
- ⏳ Verify provider configuration in Kubernetes
|
||||
- ⏳ Verify resource availability
|
||||
- ⏳ Test with single VM first
|
||||
|
||||
### 🎯 Deployment Status
|
||||
|
||||
**Status**: ✅ **READY FOR DEPLOYMENT**
|
||||
|
||||
All configurations are complete, all enhancements are applied, and all critical fixes are in place. The deployment process is ready to proceed after verifying image availability and provider configuration.
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-09
|
||||
**Review Status**: ✅ **COMPLETE**
|
||||
**Deployment Readiness**: ✅ **READY**
|
||||
|
||||
19
docs/deployment/README.md
Normal file
19
docs/deployment/README.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# Deployment Documentation
|
||||
|
||||
This directory contains deployment-related status and planning documents.
|
||||
|
||||
## Contents
|
||||
|
||||
- **[Deployment Next Steps](DEPLOYMENT_NEXT_STEPS.md)** - Future deployment phases
|
||||
- **[Deployment Ready](DEPLOYMENT_READY.md)** - Overall deployment readiness status
|
||||
- **[Pre-Deployment Checklist](PRE_DEPLOYMENT_CHECKLIST.md)** - Checklist before deployment
|
||||
|
||||
**Note**: Main deployment guides are in the root `docs/` directory:
|
||||
- [Deployment Guide](../DEPLOYMENT.md) - Production deployment instructions
|
||||
- [Deployment Requirements](../DEPLOYMENT_REQUIREMENTS.md) - Complete deployment requirements
|
||||
- [Deployment Execution Plan](../DEPLOYMENT_EXECUTION_PLAN.md) - Step-by-step execution guide
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-01-09
|
||||
|
||||
Reference in New Issue
Block a user