Files
Sankofa/docs/runbooks/ROLLBACK_PLAN.md
defiQUG 9daf1fd378 Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution
- Enhance API schema with expanded type definitions and resolvers
- Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth
- Implement new services: AI optimization, billing, blockchain, compliance, marketplace
- Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage)
- Update Crossplane provider with enhanced VM management capabilities
- Add comprehensive test suite for API endpoints and services
- Update frontend components with improved GraphQL subscriptions and real-time updates
- Enhance security configurations and headers (CSP, CORS, etc.)
- Update documentation and configuration files
- Add new CI/CD workflows and validation scripts
- Implement design system improvements and UI enhancements
2025-12-12 18:01:35 -08:00

6.6 KiB

Rollback Plan

Overview

This document outlines procedures for rolling back deployments in the Sankofa Phoenix platform.

Rollback Strategy

All applications are managed via ArgoCD GitOps. Rollbacks should be performed through Git by reverting to a previous commit.

Manual Rollback

For emergency situations, manual rollbacks can be performed directly in Kubernetes.

Pre-Rollback Checklist

  • Identify the commit/tag to rollback to
  • Verify the previous version is stable
  • Notify team of rollback
  • Document reason for rollback
  • Check database migration compatibility (if applicable)

Rollback Procedures

1. API Service Rollback

GitOps Method

# 1. Identify the commit to rollback to
git log --oneline api/

# 2. Revert to previous commit or tag
cd gitops/apps/api
git checkout <previous-commit-hash>
git push origin main

# 3. ArgoCD will automatically sync
# Or manually sync:
argocd app sync api

Manual Method

# 1. List deployment history
kubectl rollout history deployment/api -n api

# 2. View specific revision
kubectl rollout history deployment/api -n api --revision=<revision-number>

# 3. Rollback to previous revision
kubectl rollout undo deployment/api -n api

# 4. Or rollback to specific revision
kubectl rollout undo deployment/api -n api --to-revision=<revision-number>

# 5. Monitor rollback
kubectl rollout status deployment/api -n api

2. Portal Rollback

GitOps Method

cd gitops/apps/portal
git checkout <previous-commit-hash>
git push origin main
argocd app sync portal

Manual Method

kubectl rollout undo deployment/portal -n portal
kubectl rollout status deployment/portal -n portal

3. Database Migration Rollback

⚠️ WARNING: Database rollbacks require careful planning. Not all migrations are reversible.

Check Migration Status

# Connect to database
kubectl exec -it -n api deployment/api -- \
  psql $DATABASE_URL

# Check migration history
SELECT * FROM schema_migrations ORDER BY version DESC LIMIT 10;

Rollback Migration (if reversible)

# Run down migration
cd api
npm run db:migrate:down

# Or manually revert SQL
kubectl exec -it -n api deployment/api -- \
  psql $DATABASE_URL -f /path/to/rollback.sql

For Non-Reversible Migrations

  1. Create new migration to restore previous state
  2. Test in staging first
  3. Apply during maintenance window
  4. Document data loss risks

4. Frontend (Public Site) Rollback

GitOps Method

cd gitops/apps/frontend
git checkout <previous-commit-hash>
git push origin main
argocd app sync frontend

Manual Method

kubectl rollout undo deployment/frontend -n frontend
kubectl rollout status deployment/frontend -n frontend

5. Monitoring Stack Rollback

# Rollback Prometheus
kubectl rollout undo deployment/prometheus-operator -n monitoring

# Rollback Grafana
kubectl rollout undo deployment/grafana -n monitoring

# Rollback Alertmanager
kubectl rollout undo deployment/alertmanager -n monitoring

6. Keycloak Rollback

# Rollback Keycloak
kubectl rollout undo deployment/keycloak -n keycloak

# Verify Keycloak health
curl https://keycloak.sankofa.nexus/health

Post-Rollback Verification

1. Health Checks

# API
curl -f https://api.sankofa.nexus/health

# Portal
curl -f https://portal.sankofa.nexus/api/health

# Keycloak
curl -f https://keycloak.sankofa.nexus/health

2. Functional Testing

# Run smoke tests
./scripts/smoke-tests.sh

# Test authentication
curl -X POST https://api.sankofa.nexus/graphql \
  -H "Content-Type: application/json" \
  -d '{"query": "mutation { login(email: \"test@example.com\", password: \"test\") { token } }"}'

3. Monitoring

  • Check Grafana dashboards for errors
  • Verify Prometheus metrics are normal
  • Check Loki logs for errors

4. Database Verification

# Verify database connectivity
kubectl exec -it -n api deployment/api -- \
  psql $DATABASE_URL -c "SELECT 1"

# Check for data integrity issues
kubectl exec -it -n api deployment/api -- \
  psql $DATABASE_URL -c "SELECT COUNT(*) FROM users;"

Rollback Scenarios

Scenario 1: API Breaking Change

Symptoms: API returns errors after deployment

Rollback Steps:

  1. Immediately rollback API deployment
  2. Verify API health
  3. Check error logs
  4. Investigate root cause
  5. Fix and redeploy

Scenario 2: Database Migration Failure

Symptoms: Database errors, application crashes

Rollback Steps:

  1. Stop application deployments
  2. Assess migration state
  3. Rollback migration if possible
  4. Or restore from backup
  5. Redeploy previous application version

Scenario 3: Portal Build Failure

Symptoms: Portal shows blank page or errors

Rollback Steps:

  1. Rollback portal deployment
  2. Verify portal loads
  3. Check build logs
  4. Fix build issues
  5. Redeploy

Scenario 4: Configuration Error

Symptoms: Services cannot connect to dependencies

Rollback Steps:

  1. Revert configuration changes in Git
  2. ArgoCD will sync automatically
  3. Or manually update ConfigMaps/Secrets
  4. Restart affected services

Rollback Testing

Staging Rollback Test

# 1. Deploy new version to staging
argocd app sync api-staging

# 2. Test new version
./scripts/smoke-tests.sh --env=staging

# 3. Simulate rollback
kubectl rollout undo deployment/api -n api-staging

# 4. Verify rollback works
./scripts/smoke-tests.sh --env=staging

Rollback Communication

Internal Communication

  • Notify team in #engineering channel
  • Update incident tracking system
  • Document in runbook

External Communication

  • Update status page if user-facing
  • Notify affected customers if needed
  • Post-mortem for P0/P1 incidents

Prevention

Pre-Deployment

  • All tests passing
  • Code review completed
  • Staging deployment successful
  • Smoke tests passing
  • Database migrations tested
  • Rollback plan reviewed

Deployment

  • Deploy to staging first
  • Monitor staging for 24 hours
  • Gradual production rollout (canary)
  • Monitor metrics closely
  • Have rollback plan ready

Rollback Decision Matrix

Issue Severity Rollback?
Complete outage P0 Yes, immediately
Data corruption P0 Yes, immediately
Security breach P0 Yes, immediately
>50% error rate P1 Yes, within 15 min
Performance >50% degraded P1 Yes, within 30 min
Single feature broken P2 Maybe, assess impact
Minor bugs P3 No, fix forward

Emergency Contacts

  • On-call Engineer: [Contact]
  • Team Lead: [Contact]
  • DevOps Lead: [Contact]