# Sankofa Phoenix - Deployment Execution Plan **Date**: 2025-01-XX **Status**: Ready for Execution --- ## Executive Summary This document provides a step-by-step execution plan for deploying Sankofa and Sankofa Phoenix. All prerequisites are complete, VM YAML files are ready, and infrastructure is operational. --- ## Pre-Execution Checklist ### ✅ Completed - [x] Proxmox infrastructure operational (2 sites) - [x] All 21 VM YAML files updated with enhanced template - [x] Guest agent configuration complete - [x] OS images available (ubuntu-22.04-cloud.img) - [x] Network configuration verified - [x] Documentation comprehensive - [x] Scripts ready for deployment ### ⚠️ Requires Verification - [ ] Resource quota check (run `./scripts/check-proxmox-quota.sh`) - [ ] Kubernetes cluster status - [ ] Database connectivity - [ ] Keycloak deployment status --- ## Execution Phases ### Phase 1: Resource Verification (15 minutes) **Objective**: Verify Proxmox resources are sufficient for deployment **Steps**: ```bash cd /home/intlc/projects/Sankofa # 1. Run resource quota check ./scripts/check-proxmox-quota.sh # 2. Review output # Expected: Available resources >= 72 CPU, 140 GiB RAM, 278 GiB disk # 3. If insufficient, document and plan expansion ``` **Success Criteria**: - ✅ Resources sufficient for all 18 VMs - ✅ Storage pools have adequate space - ✅ Network connectivity verified **Rollback**: None required - verification only --- ### Phase 2: Kubernetes Control Plane (30-60 minutes) **Objective**: Deploy and verify Kubernetes control plane components **Steps**: ```bash # 1. Verify Kubernetes cluster kubectl cluster-info kubectl get nodes # 2. Create namespaces kubectl create namespace sankofa --dry-run=client -o yaml | kubectl apply -f - kubectl create namespace crossplane-system --dry-run=client -o yaml | kubectl apply -f - kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f - # 3. Deploy Crossplane kubectl apply -f gitops/apps/crossplane/ kubectl wait --for=condition=Ready pod -l app=crossplane -n crossplane-system --timeout=300s # 4. Deploy Proxmox Provider kubectl apply -f crossplane-provider-proxmox/config/ kubectl wait --for=condition=Installed provider -l pkg.crossplane.io/name=provider-proxmox --timeout=300s # 5. Create ProviderConfig kubectl apply -f crossplane-provider-proxmox/config/provider.yaml # 6. Verify kubectl get pods -n crossplane-system kubectl get providerconfig -A ``` **Success Criteria**: - ✅ Crossplane pods running - ✅ Proxmox provider installed - ✅ ProviderConfig ready **Rollback**: ```bash kubectl delete -f crossplane-provider-proxmox/config/ kubectl delete -f gitops/apps/crossplane/ ``` --- ### Phase 3: Database and Identity (30-45 minutes) **Objective**: Deploy PostgreSQL and Keycloak **Steps**: ```bash # 1. Deploy PostgreSQL (if not external) kubectl apply -f gitops/apps/postgresql/ # If exists # 2. Run database migrations cd api npm install npm run db:migrate # 3. Verify migrations psql -h -U postgres -d sankofa -c "\dt" | grep -E "tenants|billing" # 4. Deploy Keycloak kubectl apply -f gitops/apps/keycloak/ # 5. Wait for Keycloak ready kubectl wait --for=condition=Ready pod -l app=keycloak -n sankofa --timeout=600s # 6. Configure Keycloak clients kubectl apply -f gitops/apps/keycloak/keycloak-clients.yaml ``` **Success Criteria**: - ✅ Database migrations complete (26 migrations) - ✅ Keycloak pods running - ✅ Keycloak clients configured **Rollback**: ```bash kubectl delete -f gitops/apps/keycloak/ # Database rollback: Restore from backup or re-run migrations ``` --- ### Phase 4: Application Deployment (30-45 minutes) **Objective**: Deploy API, Frontend, and Portal **Steps**: ```bash # 1. Create secrets kubectl create secret generic api-secrets -n sankofa \ --from-literal=DB_PASSWORD= \ --from-literal=JWT_SECRET= \ --from-literal=KEYCLOAK_CLIENT_SECRET= \ --dry-run=client -o yaml | kubectl apply -f - # 2. Deploy API kubectl apply -f gitops/apps/api/ kubectl wait --for=condition=Ready pod -l app=api -n sankofa --timeout=300s # 3. Deploy Frontend kubectl apply -f gitops/apps/frontend/ kubectl wait --for=condition=Ready pod -l app=frontend -n sankofa --timeout=300s # 4. Deploy Portal kubectl apply -f gitops/apps/portal/ kubectl wait --for=condition=Ready pod -l app=portal -n sankofa --timeout=300s # 5. Verify health endpoints curl http://api.sankofa.nexus/health curl http://frontend.sankofa.nexus curl http://portal.sankofa.nexus ``` **Success Criteria**: - ✅ All application pods running - ✅ Health endpoints responding - ✅ No critical errors in logs **Rollback**: ```bash kubectl rollout undo deployment/api -n sankofa kubectl rollout undo deployment/frontend -n sankofa kubectl rollout undo deployment/portal -n sankofa ``` --- ### Phase 5: Infrastructure VMs (15-30 minutes) **Objective**: Deploy Nginx Proxy and Cloudflare Tunnel VMs **Steps**: ```bash # 1. Deploy Nginx Proxy VM kubectl apply -f examples/production/nginx-proxy-vm.yaml # 2. Deploy Cloudflare Tunnel VM kubectl apply -f examples/production/cloudflare-tunnel-vm.yaml # 3. Monitor deployment watch kubectl get proxmoxvm -A # 4. Wait for VMs ready (check status) kubectl wait --for=condition=Ready proxmoxvm nginx-proxy-vm -n default --timeout=600s kubectl wait --for=condition=Ready proxmoxvm cloudflare-tunnel-vm -n default --timeout=600s # 5. Verify VM creation in Proxmox ssh root@192.168.11.10 "qm list | grep -E 'nginx-proxy|cloudflare-tunnel'" # 6. Check guest agent ssh root@192.168.11.10 "qm guest exec -- cat /etc/os-release" ``` **Success Criteria**: - ✅ Both VMs created and running - ✅ Guest agent running - ✅ VMs accessible via SSH - ✅ Cloud-init completed **Rollback**: ```bash kubectl delete proxmoxvm nginx-proxy-vm -n default kubectl delete proxmoxvm cloudflare-tunnel-vm -n default ``` --- ### Phase 6: Application VMs (30-60 minutes) **Objective**: Deploy all 16 SMOM-DBIS-138 VMs **Steps**: ```bash # 1. Deploy all VMs kubectl apply -f examples/production/smom-dbis-138/ # 2. Monitor deployment (in separate terminal) watch kubectl get proxmoxvm -A # 3. Check controller logs (in separate terminal) kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50 -f # 4. Wait for all VMs ready (this may take 10-30 minutes) # Monitor progress and verify each VM reaches Ready state # 5. Verify VM creation kubectl get proxmoxvm -A -o wide # 6. Check guest agent on all VMs for vm in $(kubectl get proxmoxvm -A -o jsonpath='{.items[*].metadata.name}'); do echo "Checking $vm..." kubectl get proxmoxvm $vm -A -o jsonpath='{.status.conditions[*].status}' done ``` **VM Deployment Order** (if deploying sequentially): 1. validator-01, validator-02, validator-03, validator-04 2. sentry-01, sentry-02, sentry-03, sentry-04 3. rpc-node-01, rpc-node-02, rpc-node-03, rpc-node-04 4. services, blockscout, monitoring, management **Success Criteria**: - ✅ All 16 VMs created - ✅ All VMs in Running state - ✅ Guest agent running on all VMs - ✅ Cloud-init completed successfully **Rollback**: ```bash # Delete all VMs kubectl delete -f examples/production/smom-dbis-138/ ``` --- ### Phase 7: Monitoring Stack (20-30 minutes) **Objective**: Deploy monitoring and observability stack **Steps**: ```bash # 1. Deploy Prometheus kubectl apply -f gitops/apps/monitoring/prometheus/ kubectl wait --for=condition=Ready pod -l app=prometheus -n monitoring --timeout=300s # 2. Deploy Grafana kubectl apply -f gitops/apps/monitoring/grafana/ kubectl wait --for=condition=Ready pod -l app=grafana -n monitoring --timeout=300s # 3. Deploy Loki kubectl apply -f gitops/apps/monitoring/loki/ kubectl wait --for=condition=Ready pod -l app=loki -n monitoring --timeout=300s # 4. Deploy Alertmanager kubectl apply -f gitops/apps/monitoring/alertmanager/ # 5. Deploy backup CronJob kubectl apply -f gitops/apps/monitoring/backup-cronjob.yaml # 6. Verify kubectl get pods -n monitoring curl http://grafana.sankofa.nexus ``` **Success Criteria**: - ✅ All monitoring pods running - ✅ Prometheus scraping metrics - ✅ Grafana accessible - ✅ Loki ingesting logs - ✅ Backup CronJob scheduled **Rollback**: ```bash kubectl delete -f gitops/apps/monitoring/ ``` --- ### Phase 8: Network Configuration (30-45 minutes) **Objective**: Configure Cloudflare Tunnel, Nginx, and DNS **Steps**: ```bash # 1. Configure Cloudflare Tunnel ./scripts/configure-cloudflare-tunnel.sh # Or manually: # - Create tunnel in Cloudflare dashboard # - Download credentials JSON # - Upload to cloudflare-tunnel-vm: /etc/cloudflared/tunnel-credentials.json # - Update /etc/cloudflared/config.yaml with ingress rules # - Restart cloudflared service # 2. Configure Nginx Proxy ./scripts/configure-nginx-proxy.sh # Or manually: # - SSH into nginx-proxy-vm # - Update /etc/nginx/conf.d/*.conf # - Run certbot for SSL certificates # - Test: nginx -t # - Reload: systemctl reload nginx # 3. Configure DNS ./scripts/setup-dns-records.sh # Or manually in Cloudflare: # - Create A/CNAME records # - Point to Cloudflare Tunnel # - Enable proxy (orange cloud) ``` **Success Criteria**: - ✅ Cloudflare Tunnel connected - ✅ Nginx proxying correctly - ✅ DNS records created - ✅ SSL certificates issued - ✅ Services accessible via public URLs **Rollback**: - Revert DNS changes in Cloudflare - Restore previous Nginx configuration - Disable Cloudflare Tunnel --- ### Phase 9: Multi-Tenancy Setup (15-20 minutes) **Objective**: Create system tenant and configure multi-tenancy **Steps**: ```bash # 1. Get API endpoint and admin token API_URL="http://api.sankofa.nexus/graphql" ADMIN_TOKEN="" # 2. Create system tenant curl -X POST $API_URL \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -d '{ "query": "mutation { createTenant(input: { name: \"system\", tier: SOVEREIGN }) { id name billingAccountId } }" }' # 3. Get system tenant ID from response SYSTEM_TENANT_ID="" # 4. Add admin user to system tenant curl -X POST $API_URL \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -d "{ \"query\": \"mutation { addUserToTenant(tenantId: \\\"$SYSTEM_TENANT_ID\\\", userId: \\\"\\\", role: TENANT_OWNER) }\" }" # 5. Verify tenant curl -X POST $API_URL \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -d '{ "query": "query { myTenant { id name status tier } }" }' ``` **Success Criteria**: - ✅ System tenant created - ✅ Admin user assigned - ✅ Tenant accessible via API - ✅ RBAC working correctly **Rollback**: - Delete tenant via API (if supported) - Or manually remove from database --- ### Phase 10: Verification and Testing (30-45 minutes) **Objective**: Verify deployment and run tests **Steps**: ```bash # 1. Health checks curl http://api.sankofa.nexus/health curl http://frontend.sankofa.nexus curl http://portal.sankofa.nexus curl http://keycloak.sankofa.nexus/health # 2. Check all VMs kubectl get proxmoxvm -A # 3. Check all pods kubectl get pods -A # 4. Run smoke tests ./scripts/smoke-tests.sh # 5. Run performance tests (optional) ./scripts/performance-test.sh # 6. Verify monitoring curl http://grafana.sankofa.nexus kubectl get pods -n monitoring # 7. Check backups ./scripts/verify-backups.sh ``` **Success Criteria**: - ✅ All health checks passing - ✅ All VMs running - ✅ All pods running - ✅ Smoke tests passing - ✅ Monitoring operational - ✅ Backups configured **Rollback**: N/A - verification only --- ## Execution Timeline ### Estimated Total Time: 4-6 hours | Phase | Duration | Dependencies | |-------|----------|--------------| | Phase 1: Resource Verification | 15 min | None | | Phase 2: Kubernetes Control Plane | 30-60 min | Kubernetes cluster | | Phase 3: Database and Identity | 30-45 min | Phase 2 | | Phase 4: Application Deployment | 30-45 min | Phase 3 | | Phase 5: Infrastructure VMs | 15-30 min | Phase 2, Phase 4 | | Phase 6: Application VMs | 30-60 min | Phase 5 | | Phase 7: Monitoring Stack | 20-30 min | Phase 2 | | Phase 8: Network Configuration | 30-45 min | Phase 5 | | Phase 9: Multi-Tenancy Setup | 15-20 min | Phase 3, Phase 4 | | Phase 10: Verification and Testing | 30-45 min | All phases | --- ## Risk Mitigation ### High-Risk Areas 1. **VM Deployment**: May take longer than expected - **Mitigation**: Monitor closely, allow extra time 2. **Network Configuration**: DNS propagation delays - **Mitigation**: Test with IP addresses first, then DNS 3. **Database Migrations**: Potential data loss - **Mitigation**: Backup before migrations, test in staging first ### Rollback Procedures - Each phase includes rollback steps - Document any issues encountered - Keep backups of all configurations --- ## Post-Deployment ### Immediate (First 24 hours) - [ ] Monitor all services - [ ] Review logs for errors - [ ] Verify all VMs accessible - [ ] Check monitoring dashboards - [ ] Verify backups running ### Short-term (First week) - [ ] Performance optimization - [ ] Security hardening - [ ] Documentation updates - [ ] Team training - [ ] Support procedures --- ## Success Criteria ### Technical - ✅ All 18 VMs deployed and running - ✅ All services healthy - ✅ Guest agent on all VMs - ✅ Monitoring operational - ✅ Backups configured ### Functional - ✅ Portal accessible - ✅ API responding - ✅ Multi-tenancy working - ✅ Resource provisioning functional --- **Last Updated**: 2025-01-XX **Status**: Ready for Execution