- Add comprehensive database migrations (001-024) for schema evolution - Enhance API schema with expanded type definitions and resolvers - Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth - Implement new services: AI optimization, billing, blockchain, compliance, marketplace - Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage) - Update Crossplane provider with enhanced VM management capabilities - Add comprehensive test suite for API endpoints and services - Update frontend components with improved GraphQL subscriptions and real-time updates - Enhance security configurations and headers (CSP, CORS, etc.) - Update documentation and configuration files - Add new CI/CD workflows and validation scripts - Implement design system improvements and UI enhancements
540 lines
13 KiB
Markdown
540 lines
13 KiB
Markdown
# Sankofa Phoenix - Deployment Execution Plan
|
|
|
|
**Date**: 2025-01-XX
|
|
**Status**: Ready for Execution
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This document provides a step-by-step execution plan for deploying Sankofa and Sankofa Phoenix. All prerequisites are complete, VM YAML files are ready, and infrastructure is operational.
|
|
|
|
---
|
|
|
|
## Pre-Execution Checklist
|
|
|
|
### ✅ Completed
|
|
- [x] Proxmox infrastructure operational (2 sites)
|
|
- [x] All 21 VM YAML files updated with enhanced template
|
|
- [x] Guest agent configuration complete
|
|
- [x] OS images available (ubuntu-22.04-cloud.img)
|
|
- [x] Network configuration verified
|
|
- [x] Documentation comprehensive
|
|
- [x] Scripts ready for deployment
|
|
|
|
### ⚠️ Requires Verification
|
|
- [ ] Resource quota check (run `./scripts/check-proxmox-quota.sh`)
|
|
- [ ] Kubernetes cluster status
|
|
- [ ] Database connectivity
|
|
- [ ] Keycloak deployment status
|
|
|
|
---
|
|
|
|
## Execution Phases
|
|
|
|
### Phase 1: Resource Verification (15 minutes)
|
|
|
|
**Objective**: Verify Proxmox resources are sufficient for deployment
|
|
|
|
**Steps**:
|
|
```bash
|
|
cd /home/intlc/projects/Sankofa
|
|
|
|
# 1. Run resource quota check
|
|
./scripts/check-proxmox-quota.sh
|
|
|
|
# 2. Review output
|
|
# Expected: Available resources >= 72 CPU, 140 GiB RAM, 278 GiB disk
|
|
|
|
# 3. If insufficient, document and plan expansion
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- ✅ Resources sufficient for all 18 VMs
|
|
- ✅ Storage pools have adequate space
|
|
- ✅ Network connectivity verified
|
|
|
|
**Rollback**: None required - verification only
|
|
|
|
---
|
|
|
|
### Phase 2: Kubernetes Control Plane (30-60 minutes)
|
|
|
|
**Objective**: Deploy and verify Kubernetes control plane components
|
|
|
|
**Steps**:
|
|
```bash
|
|
# 1. Verify Kubernetes cluster
|
|
kubectl cluster-info
|
|
kubectl get nodes
|
|
|
|
# 2. Create namespaces
|
|
kubectl create namespace sankofa --dry-run=client -o yaml | kubectl apply -f -
|
|
kubectl create namespace crossplane-system --dry-run=client -o yaml | kubectl apply -f -
|
|
kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -
|
|
|
|
# 3. Deploy Crossplane
|
|
kubectl apply -f gitops/apps/crossplane/
|
|
kubectl wait --for=condition=Ready pod -l app=crossplane -n crossplane-system --timeout=300s
|
|
|
|
# 4. Deploy Proxmox Provider
|
|
kubectl apply -f crossplane-provider-proxmox/config/
|
|
kubectl wait --for=condition=Installed provider -l pkg.crossplane.io/name=provider-proxmox --timeout=300s
|
|
|
|
# 5. Create ProviderConfig
|
|
kubectl apply -f crossplane-provider-proxmox/config/provider.yaml
|
|
|
|
# 6. Verify
|
|
kubectl get pods -n crossplane-system
|
|
kubectl get providerconfig -A
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- ✅ Crossplane pods running
|
|
- ✅ Proxmox provider installed
|
|
- ✅ ProviderConfig ready
|
|
|
|
**Rollback**:
|
|
```bash
|
|
kubectl delete -f crossplane-provider-proxmox/config/
|
|
kubectl delete -f gitops/apps/crossplane/
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 3: Database and Identity (30-45 minutes)
|
|
|
|
**Objective**: Deploy PostgreSQL and Keycloak
|
|
|
|
**Steps**:
|
|
```bash
|
|
# 1. Deploy PostgreSQL (if not external)
|
|
kubectl apply -f gitops/apps/postgresql/ # If exists
|
|
|
|
# 2. Run database migrations
|
|
cd api
|
|
npm install
|
|
npm run db:migrate
|
|
|
|
# 3. Verify migrations
|
|
psql -h <db-host> -U postgres -d sankofa -c "\dt" | grep -E "tenants|billing"
|
|
|
|
# 4. Deploy Keycloak
|
|
kubectl apply -f gitops/apps/keycloak/
|
|
|
|
# 5. Wait for Keycloak ready
|
|
kubectl wait --for=condition=Ready pod -l app=keycloak -n sankofa --timeout=600s
|
|
|
|
# 6. Configure Keycloak clients
|
|
kubectl apply -f gitops/apps/keycloak/keycloak-clients.yaml
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- ✅ Database migrations complete (26 migrations)
|
|
- ✅ Keycloak pods running
|
|
- ✅ Keycloak clients configured
|
|
|
|
**Rollback**:
|
|
```bash
|
|
kubectl delete -f gitops/apps/keycloak/
|
|
# Database rollback: Restore from backup or re-run migrations
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 4: Application Deployment (30-45 minutes)
|
|
|
|
**Objective**: Deploy API, Frontend, and Portal
|
|
|
|
**Steps**:
|
|
```bash
|
|
# 1. Create secrets
|
|
kubectl create secret generic api-secrets -n sankofa \
|
|
--from-literal=DB_PASSWORD=<db-password> \
|
|
--from-literal=JWT_SECRET=<jwt-secret> \
|
|
--from-literal=KEYCLOAK_CLIENT_SECRET=<keycloak-secret> \
|
|
--dry-run=client -o yaml | kubectl apply -f -
|
|
|
|
# 2. Deploy API
|
|
kubectl apply -f gitops/apps/api/
|
|
kubectl wait --for=condition=Ready pod -l app=api -n sankofa --timeout=300s
|
|
|
|
# 3. Deploy Frontend
|
|
kubectl apply -f gitops/apps/frontend/
|
|
kubectl wait --for=condition=Ready pod -l app=frontend -n sankofa --timeout=300s
|
|
|
|
# 4. Deploy Portal
|
|
kubectl apply -f gitops/apps/portal/
|
|
kubectl wait --for=condition=Ready pod -l app=portal -n sankofa --timeout=300s
|
|
|
|
# 5. Verify health endpoints
|
|
curl http://api.sankofa.nexus/health
|
|
curl http://frontend.sankofa.nexus
|
|
curl http://portal.sankofa.nexus
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- ✅ All application pods running
|
|
- ✅ Health endpoints responding
|
|
- ✅ No critical errors in logs
|
|
|
|
**Rollback**:
|
|
```bash
|
|
kubectl rollout undo deployment/api -n sankofa
|
|
kubectl rollout undo deployment/frontend -n sankofa
|
|
kubectl rollout undo deployment/portal -n sankofa
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 5: Infrastructure VMs (15-30 minutes)
|
|
|
|
**Objective**: Deploy Nginx Proxy and Cloudflare Tunnel VMs
|
|
|
|
**Steps**:
|
|
```bash
|
|
# 1. Deploy Nginx Proxy VM
|
|
kubectl apply -f examples/production/nginx-proxy-vm.yaml
|
|
|
|
# 2. Deploy Cloudflare Tunnel VM
|
|
kubectl apply -f examples/production/cloudflare-tunnel-vm.yaml
|
|
|
|
# 3. Monitor deployment
|
|
watch kubectl get proxmoxvm -A
|
|
|
|
# 4. Wait for VMs ready (check status)
|
|
kubectl wait --for=condition=Ready proxmoxvm nginx-proxy-vm -n default --timeout=600s
|
|
kubectl wait --for=condition=Ready proxmoxvm cloudflare-tunnel-vm -n default --timeout=600s
|
|
|
|
# 5. Verify VM creation in Proxmox
|
|
ssh root@192.168.11.10 "qm list | grep -E 'nginx-proxy|cloudflare-tunnel'"
|
|
|
|
# 6. Check guest agent
|
|
ssh root@192.168.11.10 "qm guest exec <vmid> -- cat /etc/os-release"
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- ✅ Both VMs created and running
|
|
- ✅ Guest agent running
|
|
- ✅ VMs accessible via SSH
|
|
- ✅ Cloud-init completed
|
|
|
|
**Rollback**:
|
|
```bash
|
|
kubectl delete proxmoxvm nginx-proxy-vm -n default
|
|
kubectl delete proxmoxvm cloudflare-tunnel-vm -n default
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 6: Application VMs (30-60 minutes)
|
|
|
|
**Objective**: Deploy all 16 SMOM-DBIS-138 VMs
|
|
|
|
**Steps**:
|
|
```bash
|
|
# 1. Deploy all VMs
|
|
kubectl apply -f examples/production/smom-dbis-138/
|
|
|
|
# 2. Monitor deployment (in separate terminal)
|
|
watch kubectl get proxmoxvm -A
|
|
|
|
# 3. Check controller logs (in separate terminal)
|
|
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50 -f
|
|
|
|
# 4. Wait for all VMs ready (this may take 10-30 minutes)
|
|
# Monitor progress and verify each VM reaches Ready state
|
|
|
|
# 5. Verify VM creation
|
|
kubectl get proxmoxvm -A -o wide
|
|
|
|
# 6. Check guest agent on all VMs
|
|
for vm in $(kubectl get proxmoxvm -A -o jsonpath='{.items[*].metadata.name}'); do
|
|
echo "Checking $vm..."
|
|
kubectl get proxmoxvm $vm -A -o jsonpath='{.status.conditions[*].status}'
|
|
done
|
|
```
|
|
|
|
**VM Deployment Order** (if deploying sequentially):
|
|
1. validator-01, validator-02, validator-03, validator-04
|
|
2. sentry-01, sentry-02, sentry-03, sentry-04
|
|
3. rpc-node-01, rpc-node-02, rpc-node-03, rpc-node-04
|
|
4. services, blockscout, monitoring, management
|
|
|
|
**Success Criteria**:
|
|
- ✅ All 16 VMs created
|
|
- ✅ All VMs in Running state
|
|
- ✅ Guest agent running on all VMs
|
|
- ✅ Cloud-init completed successfully
|
|
|
|
**Rollback**:
|
|
```bash
|
|
# Delete all VMs
|
|
kubectl delete -f examples/production/smom-dbis-138/
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 7: Monitoring Stack (20-30 minutes)
|
|
|
|
**Objective**: Deploy monitoring and observability stack
|
|
|
|
**Steps**:
|
|
```bash
|
|
# 1. Deploy Prometheus
|
|
kubectl apply -f gitops/apps/monitoring/prometheus/
|
|
kubectl wait --for=condition=Ready pod -l app=prometheus -n monitoring --timeout=300s
|
|
|
|
# 2. Deploy Grafana
|
|
kubectl apply -f gitops/apps/monitoring/grafana/
|
|
kubectl wait --for=condition=Ready pod -l app=grafana -n monitoring --timeout=300s
|
|
|
|
# 3. Deploy Loki
|
|
kubectl apply -f gitops/apps/monitoring/loki/
|
|
kubectl wait --for=condition=Ready pod -l app=loki -n monitoring --timeout=300s
|
|
|
|
# 4. Deploy Alertmanager
|
|
kubectl apply -f gitops/apps/monitoring/alertmanager/
|
|
|
|
# 5. Deploy backup CronJob
|
|
kubectl apply -f gitops/apps/monitoring/backup-cronjob.yaml
|
|
|
|
# 6. Verify
|
|
kubectl get pods -n monitoring
|
|
curl http://grafana.sankofa.nexus
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- ✅ All monitoring pods running
|
|
- ✅ Prometheus scraping metrics
|
|
- ✅ Grafana accessible
|
|
- ✅ Loki ingesting logs
|
|
- ✅ Backup CronJob scheduled
|
|
|
|
**Rollback**:
|
|
```bash
|
|
kubectl delete -f gitops/apps/monitoring/
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 8: Network Configuration (30-45 minutes)
|
|
|
|
**Objective**: Configure Cloudflare Tunnel, Nginx, and DNS
|
|
|
|
**Steps**:
|
|
```bash
|
|
# 1. Configure Cloudflare Tunnel
|
|
./scripts/configure-cloudflare-tunnel.sh
|
|
|
|
# Or manually:
|
|
# - Create tunnel in Cloudflare dashboard
|
|
# - Download credentials JSON
|
|
# - Upload to cloudflare-tunnel-vm: /etc/cloudflared/tunnel-credentials.json
|
|
# - Update /etc/cloudflared/config.yaml with ingress rules
|
|
# - Restart cloudflared service
|
|
|
|
# 2. Configure Nginx Proxy
|
|
./scripts/configure-nginx-proxy.sh
|
|
|
|
# Or manually:
|
|
# - SSH into nginx-proxy-vm
|
|
# - Update /etc/nginx/conf.d/*.conf
|
|
# - Run certbot for SSL certificates
|
|
# - Test: nginx -t
|
|
# - Reload: systemctl reload nginx
|
|
|
|
# 3. Configure DNS
|
|
./scripts/setup-dns-records.sh
|
|
|
|
# Or manually in Cloudflare:
|
|
# - Create A/CNAME records
|
|
# - Point to Cloudflare Tunnel
|
|
# - Enable proxy (orange cloud)
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- ✅ Cloudflare Tunnel connected
|
|
- ✅ Nginx proxying correctly
|
|
- ✅ DNS records created
|
|
- ✅ SSL certificates issued
|
|
- ✅ Services accessible via public URLs
|
|
|
|
**Rollback**:
|
|
- Revert DNS changes in Cloudflare
|
|
- Restore previous Nginx configuration
|
|
- Disable Cloudflare Tunnel
|
|
|
|
---
|
|
|
|
### Phase 9: Multi-Tenancy Setup (15-20 minutes)
|
|
|
|
**Objective**: Create system tenant and configure multi-tenancy
|
|
|
|
**Steps**:
|
|
```bash
|
|
# 1. Get API endpoint and admin token
|
|
API_URL="http://api.sankofa.nexus/graphql"
|
|
ADMIN_TOKEN="<get-from-keycloak>"
|
|
|
|
# 2. Create system tenant
|
|
curl -X POST $API_URL \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
|
-d '{
|
|
"query": "mutation { createTenant(input: { name: \"system\", tier: SOVEREIGN }) { id name billingAccountId } }"
|
|
}'
|
|
|
|
# 3. Get system tenant ID from response
|
|
SYSTEM_TENANT_ID="<from-response>"
|
|
|
|
# 4. Add admin user to system tenant
|
|
curl -X POST $API_URL \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
|
-d "{
|
|
\"query\": \"mutation { addUserToTenant(tenantId: \\\"$SYSTEM_TENANT_ID\\\", userId: \\\"<admin-user-id>\\\", role: TENANT_OWNER) }\"
|
|
}"
|
|
|
|
# 5. Verify tenant
|
|
curl -X POST $API_URL \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
|
-d '{
|
|
"query": "query { myTenant { id name status tier } }"
|
|
}'
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- ✅ System tenant created
|
|
- ✅ Admin user assigned
|
|
- ✅ Tenant accessible via API
|
|
- ✅ RBAC working correctly
|
|
|
|
**Rollback**:
|
|
- Delete tenant via API (if supported)
|
|
- Or manually remove from database
|
|
|
|
---
|
|
|
|
### Phase 10: Verification and Testing (30-45 minutes)
|
|
|
|
**Objective**: Verify deployment and run tests
|
|
|
|
**Steps**:
|
|
```bash
|
|
# 1. Health checks
|
|
curl http://api.sankofa.nexus/health
|
|
curl http://frontend.sankofa.nexus
|
|
curl http://portal.sankofa.nexus
|
|
curl http://keycloak.sankofa.nexus/health
|
|
|
|
# 2. Check all VMs
|
|
kubectl get proxmoxvm -A
|
|
|
|
# 3. Check all pods
|
|
kubectl get pods -A
|
|
|
|
# 4. Run smoke tests
|
|
./scripts/smoke-tests.sh
|
|
|
|
# 5. Run performance tests (optional)
|
|
./scripts/performance-test.sh
|
|
|
|
# 6. Verify monitoring
|
|
curl http://grafana.sankofa.nexus
|
|
kubectl get pods -n monitoring
|
|
|
|
# 7. Check backups
|
|
./scripts/verify-backups.sh
|
|
```
|
|
|
|
**Success Criteria**:
|
|
- ✅ All health checks passing
|
|
- ✅ All VMs running
|
|
- ✅ All pods running
|
|
- ✅ Smoke tests passing
|
|
- ✅ Monitoring operational
|
|
- ✅ Backups configured
|
|
|
|
**Rollback**: N/A - verification only
|
|
|
|
---
|
|
|
|
## Execution Timeline
|
|
|
|
### Estimated Total Time: 4-6 hours
|
|
|
|
| Phase | Duration | Dependencies |
|
|
|-------|----------|--------------|
|
|
| Phase 1: Resource Verification | 15 min | None |
|
|
| Phase 2: Kubernetes Control Plane | 30-60 min | Kubernetes cluster |
|
|
| Phase 3: Database and Identity | 30-45 min | Phase 2 |
|
|
| Phase 4: Application Deployment | 30-45 min | Phase 3 |
|
|
| Phase 5: Infrastructure VMs | 15-30 min | Phase 2, Phase 4 |
|
|
| Phase 6: Application VMs | 30-60 min | Phase 5 |
|
|
| Phase 7: Monitoring Stack | 20-30 min | Phase 2 |
|
|
| Phase 8: Network Configuration | 30-45 min | Phase 5 |
|
|
| Phase 9: Multi-Tenancy Setup | 15-20 min | Phase 3, Phase 4 |
|
|
| Phase 10: Verification and Testing | 30-45 min | All phases |
|
|
|
|
---
|
|
|
|
## Risk Mitigation
|
|
|
|
### High-Risk Areas
|
|
1. **VM Deployment**: May take longer than expected
|
|
- **Mitigation**: Monitor closely, allow extra time
|
|
|
|
2. **Network Configuration**: DNS propagation delays
|
|
- **Mitigation**: Test with IP addresses first, then DNS
|
|
|
|
3. **Database Migrations**: Potential data loss
|
|
- **Mitigation**: Backup before migrations, test in staging first
|
|
|
|
### Rollback Procedures
|
|
- Each phase includes rollback steps
|
|
- Document any issues encountered
|
|
- Keep backups of all configurations
|
|
|
|
---
|
|
|
|
## Post-Deployment
|
|
|
|
### Immediate (First 24 hours)
|
|
- [ ] Monitor all services
|
|
- [ ] Review logs for errors
|
|
- [ ] Verify all VMs accessible
|
|
- [ ] Check monitoring dashboards
|
|
- [ ] Verify backups running
|
|
|
|
### Short-term (First week)
|
|
- [ ] Performance optimization
|
|
- [ ] Security hardening
|
|
- [ ] Documentation updates
|
|
- [ ] Team training
|
|
- [ ] Support procedures
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
### Technical
|
|
- ✅ All 18 VMs deployed and running
|
|
- ✅ All services healthy
|
|
- ✅ Guest agent on all VMs
|
|
- ✅ Monitoring operational
|
|
- ✅ Backups configured
|
|
|
|
### Functional
|
|
- ✅ Portal accessible
|
|
- ✅ API responding
|
|
- ✅ Multi-tenancy working
|
|
- ✅ Resource provisioning functional
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-01-XX
|
|
**Status**: Ready for Execution
|
|
|