- Add comprehensive database migrations (001-024) for schema evolution - Enhance API schema with expanded type definitions and resolvers - Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth - Implement new services: AI optimization, billing, blockchain, compliance, marketplace - Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage) - Update Crossplane provider with enhanced VM management capabilities - Add comprehensive test suite for API endpoints and services - Update frontend components with improved GraphQL subscriptions and real-time updates - Enhance security configurations and headers (CSP, CORS, etc.) - Update documentation and configuration files - Add new CI/CD workflows and validation scripts - Implement design system improvements and UI enhancements
6.6 KiB
6.6 KiB
Rollback Plan
Overview
This document outlines procedures for rolling back deployments in the Sankofa Phoenix platform.
Rollback Strategy
GitOps Rollback (Recommended)
All applications are managed via ArgoCD GitOps. Rollbacks should be performed through Git by reverting to a previous commit.
Manual Rollback
For emergency situations, manual rollbacks can be performed directly in Kubernetes.
Pre-Rollback Checklist
- Identify the commit/tag to rollback to
- Verify the previous version is stable
- Notify team of rollback
- Document reason for rollback
- Check database migration compatibility (if applicable)
Rollback Procedures
1. API Service Rollback
GitOps Method
# 1. Identify the commit to rollback to
git log --oneline api/
# 2. Revert to previous commit or tag
cd gitops/apps/api
git checkout <previous-commit-hash>
git push origin main
# 3. ArgoCD will automatically sync
# Or manually sync:
argocd app sync api
Manual Method
# 1. List deployment history
kubectl rollout history deployment/api -n api
# 2. View specific revision
kubectl rollout history deployment/api -n api --revision=<revision-number>
# 3. Rollback to previous revision
kubectl rollout undo deployment/api -n api
# 4. Or rollback to specific revision
kubectl rollout undo deployment/api -n api --to-revision=<revision-number>
# 5. Monitor rollback
kubectl rollout status deployment/api -n api
2. Portal Rollback
GitOps Method
cd gitops/apps/portal
git checkout <previous-commit-hash>
git push origin main
argocd app sync portal
Manual Method
kubectl rollout undo deployment/portal -n portal
kubectl rollout status deployment/portal -n portal
3. Database Migration Rollback
⚠️ WARNING: Database rollbacks require careful planning. Not all migrations are reversible.
Check Migration Status
# Connect to database
kubectl exec -it -n api deployment/api -- \
psql $DATABASE_URL
# Check migration history
SELECT * FROM schema_migrations ORDER BY version DESC LIMIT 10;
Rollback Migration (if reversible)
# Run down migration
cd api
npm run db:migrate:down
# Or manually revert SQL
kubectl exec -it -n api deployment/api -- \
psql $DATABASE_URL -f /path/to/rollback.sql
For Non-Reversible Migrations
- Create new migration to restore previous state
- Test in staging first
- Apply during maintenance window
- Document data loss risks
4. Frontend (Public Site) Rollback
GitOps Method
cd gitops/apps/frontend
git checkout <previous-commit-hash>
git push origin main
argocd app sync frontend
Manual Method
kubectl rollout undo deployment/frontend -n frontend
kubectl rollout status deployment/frontend -n frontend
5. Monitoring Stack Rollback
# Rollback Prometheus
kubectl rollout undo deployment/prometheus-operator -n monitoring
# Rollback Grafana
kubectl rollout undo deployment/grafana -n monitoring
# Rollback Alertmanager
kubectl rollout undo deployment/alertmanager -n monitoring
6. Keycloak Rollback
# Rollback Keycloak
kubectl rollout undo deployment/keycloak -n keycloak
# Verify Keycloak health
curl https://keycloak.sankofa.nexus/health
Post-Rollback Verification
1. Health Checks
# API
curl -f https://api.sankofa.nexus/health
# Portal
curl -f https://portal.sankofa.nexus/api/health
# Keycloak
curl -f https://keycloak.sankofa.nexus/health
2. Functional Testing
# Run smoke tests
./scripts/smoke-tests.sh
# Test authentication
curl -X POST https://api.sankofa.nexus/graphql \
-H "Content-Type: application/json" \
-d '{"query": "mutation { login(email: \"test@example.com\", password: \"test\") { token } }"}'
3. Monitoring
- Check Grafana dashboards for errors
- Verify Prometheus metrics are normal
- Check Loki logs for errors
4. Database Verification
# Verify database connectivity
kubectl exec -it -n api deployment/api -- \
psql $DATABASE_URL -c "SELECT 1"
# Check for data integrity issues
kubectl exec -it -n api deployment/api -- \
psql $DATABASE_URL -c "SELECT COUNT(*) FROM users;"
Rollback Scenarios
Scenario 1: API Breaking Change
Symptoms: API returns errors after deployment
Rollback Steps:
- Immediately rollback API deployment
- Verify API health
- Check error logs
- Investigate root cause
- Fix and redeploy
Scenario 2: Database Migration Failure
Symptoms: Database errors, application crashes
Rollback Steps:
- Stop application deployments
- Assess migration state
- Rollback migration if possible
- Or restore from backup
- Redeploy previous application version
Scenario 3: Portal Build Failure
Symptoms: Portal shows blank page or errors
Rollback Steps:
- Rollback portal deployment
- Verify portal loads
- Check build logs
- Fix build issues
- Redeploy
Scenario 4: Configuration Error
Symptoms: Services cannot connect to dependencies
Rollback Steps:
- Revert configuration changes in Git
- ArgoCD will sync automatically
- Or manually update ConfigMaps/Secrets
- Restart affected services
Rollback Testing
Staging Rollback Test
# 1. Deploy new version to staging
argocd app sync api-staging
# 2. Test new version
./scripts/smoke-tests.sh --env=staging
# 3. Simulate rollback
kubectl rollout undo deployment/api -n api-staging
# 4. Verify rollback works
./scripts/smoke-tests.sh --env=staging
Rollback Communication
Internal Communication
- Notify team in #engineering channel
- Update incident tracking system
- Document in runbook
External Communication
- Update status page if user-facing
- Notify affected customers if needed
- Post-mortem for P0/P1 incidents
Prevention
Pre-Deployment
- All tests passing
- Code review completed
- Staging deployment successful
- Smoke tests passing
- Database migrations tested
- Rollback plan reviewed
Deployment
- Deploy to staging first
- Monitor staging for 24 hours
- Gradual production rollout (canary)
- Monitor metrics closely
- Have rollback plan ready
Rollback Decision Matrix
| Issue | Severity | Rollback? |
|---|---|---|
| Complete outage | P0 | Yes, immediately |
| Data corruption | P0 | Yes, immediately |
| Security breach | P0 | Yes, immediately |
| >50% error rate | P1 | Yes, within 15 min |
| Performance >50% degraded | P1 | Yes, within 30 min |
| Single feature broken | P2 | Maybe, assess impact |
| Minor bugs | P3 | No, fix forward |
Emergency Contacts
- On-call Engineer: [Contact]
- Team Lead: [Contact]
- DevOps Lead: [Contact]