Files
Sankofa/docs/runbooks/PROXMOX_DISASTER_RECOVERY.md
defiQUG 9daf1fd378 Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution
- Enhance API schema with expanded type definitions and resolvers
- Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth
- Implement new services: AI optimization, billing, blockchain, compliance, marketplace
- Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage)
- Update Crossplane provider with enhanced VM management capabilities
- Add comprehensive test suite for API endpoints and services
- Update frontend components with improved GraphQL subscriptions and real-time updates
- Enhance security configurations and headers (CSP, CORS, etc.)
- Update documentation and configuration files
- Add new CI/CD workflows and validation scripts
- Implement design system improvements and UI enhancements
2025-12-12 18:01:35 -08:00

5.3 KiB

Proxmox Disaster Recovery Procedures

Overview

This document outlines disaster recovery procedures for Proxmox infrastructure managed by the Crossplane provider.

Recovery Scenarios

Scenario 1: Provider Pod Failure

Symptoms

  • Provider pod not running
  • VM operations failing
  • ProviderConfig not working

Recovery Steps

  1. Check Pod Status:

    kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
    
  2. Restart Provider:

    kubectl delete pod -n crossplane-system -l app=crossplane-provider-proxmox
    
  3. Verify Recovery:

    kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
    kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50
    

Scenario 2: Proxmox Node Failure

Symptoms

  • Cannot connect to Proxmox
  • VMs unreachable
  • Provider connection errors

Recovery Steps

  1. Verify Node Status:

    • Check Proxmox Web UI
    • Verify node is online
    • Check network connectivity
  2. Check ProviderConfig:

    kubectl get providerconfig proxmox-provider-config -o yaml
    
  3. Update Endpoint if Needed:

    • If node IP changed, update ProviderConfig
    • If using hostname, verify DNS
  4. Test Connectivity:

    curl -k https://your-proxmox:8006/api2/json/version
    

Scenario 3: Credential Compromise

Symptoms

  • Authentication failures
  • Security alerts
  • Unauthorized access

Recovery Steps

  1. Revoke Compromised Credentials:

    • Log into Proxmox Web UI
    • Revoke API tokens
    • Change passwords
  2. Create New Credentials:

    • Create new API tokens
    • Use strong passwords
    • Set appropriate permissions
  3. Update Kubernetes Secret:

    kubectl delete secret proxmox-credentials -n crossplane-system
    kubectl create secret generic proxmox-credentials \
      --from-literal=credentials.json='{"username":"root@pam","token":"new-token"}' \
      -n crossplane-system
    
  4. Restart Provider:

    kubectl delete pod -n crossplane-system -l app=crossplane-provider-proxmox
    

Scenario 4: VM Data Loss

Symptoms

  • VM not found
  • Data missing
  • Storage errors

Recovery Steps

  1. Check VM Status:

    kubectl get proxmoxvm <vm-name>
    kubectl describe proxmoxvm <vm-name>
    
  2. Check Proxmox Backups:

    • Log into Proxmox Web UI
    • Check backup storage
    • Review backup schedule
  3. Restore from Backup:

    • Use Proxmox backup restore
    • Or recreate VM from template
  4. Recreate VM Resource:

    # Delete existing resource
    kubectl delete proxmoxvm <vm-name>
    
    # Recreate with same configuration
    kubectl apply -f <vm-manifest>.yaml
    

Scenario 5: Complete Provider Failure

Symptoms

  • Provider not responding
  • All VM operations failing
  • ProviderConfig errors

Recovery Steps

  1. Check Provider Deployment:

    kubectl get deployment -n crossplane-system crossplane-provider-proxmox
    kubectl describe deployment -n crossplane-system crossplane-provider-proxmox
    
  2. Redeploy Provider:

    kubectl delete deployment -n crossplane-system crossplane-provider-proxmox
    kubectl apply -f crossplane-provider-proxmox/config/provider.yaml
    
  3. Verify ProviderConfig:

    kubectl get providerconfig
    kubectl describe providerconfig proxmox-provider-config
    
  4. Test VM Operations:

    kubectl get proxmoxvm
    kubectl describe proxmoxvm <test-vm>
    

Backup Procedures

Provider Configuration Backup

# Backup ProviderConfig
kubectl get providerconfig proxmox-provider-config -o yaml > providerconfig-backup.yaml

# Backup credentials secret (be careful with this!)
kubectl get secret proxmox-credentials -n crossplane-system -o yaml > credentials-backup.yaml

VM Configuration Backup

# Backup all VM resources
kubectl get proxmoxvm -o yaml > all-vms-backup.yaml

# Backup specific VM
kubectl get proxmoxvm <vm-name> -o yaml > <vm-name>-backup.yaml

Proxmox Backup

  1. Configure Backup Schedule:

    • Log into Proxmox Web UI
    • Go to Datacenter → Backup
    • Configure backup schedule
  2. Manual Backup:

    • Select VM in Proxmox Web UI
    • Click Backup
    • Choose backup storage
    • Start backup

Recovery Testing

Test Provider Recovery

  1. Simulate Failure:

    kubectl delete pod -n crossplane-system -l app=crossplane-provider-proxmox
    
  2. Verify Auto-Recovery:

    kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
    
  3. Test VM Operations:

    kubectl get proxmoxvm
    

Test VM Recovery

  1. Create Test VM:

    kubectl apply -f test-vm.yaml
    
  2. Delete VM:

    kubectl delete proxmoxvm test-vm
    
  3. Recreate VM:

    kubectl apply -f test-vm.yaml
    

Prevention

  1. Regular Backups: Schedule regular backups
  2. Monitoring: Set up alerts for failures
  3. Documentation: Keep procedures documented
  4. Testing: Regularly test recovery procedures
  5. Redundancy: Use multiple Proxmox nodes