Files
proxmox/reports/storage/BACKUP_AND_RECREATION_PLAN.md
defiQUG fbda1b4beb
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:46:57 -08:00

11 KiB

Backup and Recreation Plan

Date: January 7, 2026
Status: 📋 PLAN READY FOR IMPLEMENTATION


Executive Summary

This document outlines a comprehensive plan for:

  1. Setting up automated backups to prevent future data loss
  2. Recreating lost containers from their configurations
  3. Restoring data from backups if available
  4. Best practices for ongoing backup management

Current Situation

Containers Status

Containers with Data (7 containers):

  • 100, 101, 102, 103, 104, 105, 130 (migrated from r630-02)

Containers Without Data (~28 containers):

  • 106, 107, 108 (empty volumes)
  • 3000-10151 (empty volumes)

Data Loss Summary

  • Lost During: RAID 10 expansion (4→6 disks)
  • Cause: RAID recreation wiped all data structures
  • Recovery: Not possible from thin1 (data overwritten)
  • Solution: Restore from backups or recreate from templates

Part 1: Automated Backup Setup

Objective

Set up automated daily backups for all containers/VMs to prevent future data loss.

Implementation

Step 1: Create Backup Script

Script: scripts/setup-automated-backups.sh

Features:

  • Daily backups at 2 AM
  • Snapshot mode (no downtime)
  • Gzip compression
  • Automatic cleanup (keep 7 days)
  • Logging to /var/log/proxmox-backups/

Usage:

./scripts/setup-automated-backups.sh

Step 2: Manual Backup Script

Script: /usr/local/bin/manual-backup.sh (created on r630-01)

Usage:

# Backup specific containers
ssh root@192.168.11.11 "/usr/local/bin/manual-backup.sh 106 107 108"

# Backup all running containers
ssh root@192.168.11.11 "pct list | awk 'NR>1 && \$2==\"running\" {print \$1}' | xargs /usr/local/bin/manual-backup.sh"

Step 3: Backup Storage

Current Storage:

  • local storage: /var/lib/vz/dump/ (directory storage)
  • Capacity: ~536GB available

Backup Location:

  • /var/lib/vz/dump/vzdump-lxc-<vmid>-<timestamp>.tar.gz
  • /var/lib/vz/dump/vzdump-qemu-<vmid>-<timestamp>.vma.gz

Step 4: Backup Schedule

Automated:

  • Frequency: Daily at 2:00 AM
  • Mode: Snapshot (no downtime)
  • Compression: Gzip
  • Retention: 7 days

Manual:

  • Run anytime using /usr/local/bin/manual-backup.sh

Part 2: Container Recreation Plan

Objective

Recreate containers that lost data, restoring them to a working state.

Approach

Option A: Restore from Backups (If Available)

Steps:

  1. Check for backups:

    find /var/lib/vz/dump -name "*106*" -o -name "*107*" -o -name "*108*"
    
  2. Restore container:

    pct restore <vmid> <backup_file> --storage thin1
    
  3. Start container:

    pct start <vmid>
    

Option B: Recreate from Templates (If No Backups)

Steps:

  1. Use recreation script:

    ./scripts/recreate-containers-from-configs.sh 106 107 108
    
  2. Script will:

    • Read container configuration
    • Destroy empty container
    • Recreate from template
    • Restore configuration
    • Create volume on correct storage
  3. Manual recreation:

    # Download template if needed
    pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst
    
    # Recreate container
    pct create <vmid> /var/lib/vz/template/cache/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
      --storage thin1 --rootfs thin1:10G \
      --hostname <hostname> \
      --memory <memory> --swap <swap> --cores <cores> \
      --net0 name=eth0,bridge=vmbr0,ip=<ip>/24
    

Container Recreation Priority

High Priority (Critical Services):

  1. 106 - redis-rpc-translator
  2. 107 - web3signer-rpc-translator
  3. 108 - vault-rpc-translator
  4. 3000-3003 - ml110 containers
  5. 3500 - oracle-publisher-1
  6. 3501 - ccip-monitor-1

Medium Priority: 7. 5200 - cacti-1 8. 6000 - fabric-1 9. 6400 - indy-1 10. 10100-10151 - dbis containers

Lower Priority: 11. 10000-10092 - order containers 12. 10200-10230 - monitoring containers


Part 3: Data Restoration Procedures

Step 1: Check for Existing Backups

# Check all nodes for backups
for node in ml110 r630-01 r630-02; do
    echo "=== $node ==="
    ssh root@$node "find /var/lib/vz/dump -name '*106*' -o -name '*107*' -o -name '*108*'"
done

# Check Proxmox Backup Server (if configured)
pvesm list | grep backup

Step 2: Restore from Backup

# Copy backup to r630-01
scp root@source:/var/lib/vz/dump/vzdump-lxc-106-*.tar.gz root@192.168.11.11:/var/lib/vz/dump/

# Restore container
ssh root@192.168.11.11 "pct restore 106 /var/lib/vz/dump/vzdump-lxc-106-*.tar.gz --storage thin1"

# Start container
ssh root@192.168.11.11 "pct start 106"

Step 3: Verify Restoration

# Check container status
pct list | grep 106

# Check container logs
pct logs 106

# Test services
pct exec 106 -- systemctl status <service>

Part 4: Ongoing Backup Management

Daily Operations

Automated Backups:

  • Run automatically at 2 AM daily
  • Logs available in /var/log/proxmox-backups/
  • Check logs weekly for errors

Manual Backups:

  • Before major changes
  • Before migrations
  • Before updates

Backup Verification

Weekly Checks:

# Check backup directory
ls -lh /var/lib/vz/dump/

# Check backup logs
tail -f /var/log/proxmox-backups/backup_$(date +%Y%m%d).log

# Verify backup integrity
tar -tzf /var/lib/vz/dump/vzdump-lxc-106-*.tar.gz | head -10

Backup Retention

Current Policy:

  • Keep last 7 days of backups
  • Cleanup runs automatically after each backup

Recommended Policy:

  • Daily backups: Keep 7 days
  • Weekly backups: Keep 4 weeks
  • Monthly backups: Keep 12 months

Backup Storage Management

Monitor Storage:

# Check backup storage usage
df -h /var/lib/vz/dump/
pvesm status | grep local

# Cleanup old backups manually if needed
find /var/lib/vz/dump -name "*.tar.gz" -mtime +7 -delete

Part 5: Implementation Checklist

Phase 1: Backup Setup

  • Run scripts/setup-automated-backups.sh
  • Verify cron job is set up
  • Test manual backup script
  • Verify backup storage has space
  • Run test backup

Phase 2: Check for Existing Backups

  • Check /var/lib/vz/dump/ on all nodes
  • Check external backup locations
  • Check Proxmox Backup Server (if configured)
  • Document found backups

Phase 3: Restore from Backups (If Available)

  • Copy backups to r630-01
  • Restore containers using pct restore
  • Verify containers are working
  • Start containers
  • Test services

Phase 4: Recreate Containers (If No Backups)

  • Prioritize containers by importance
  • Download required templates
  • Run recreation script for each container
  • Restore configurations manually
  • Install applications
  • Restore application data (if available)

Phase 5: Verify and Document

  • Verify all containers are running
  • Test all services
  • Document restoration process
  • Update backup procedures
  • Schedule regular backup verification

Part 6: Best Practices

Backup Best Practices

  1. Automated Backups:

    • Set up daily automated backups
    • Use snapshot mode for running containers
    • Compress backups to save space
    • Keep multiple backup copies
  2. Backup Storage:

    • Use separate storage for backups
    • Monitor backup storage usage
    • Consider off-site backups
    • Use Proxmox Backup Server for better management
  3. Backup Testing:

    • Test backup restoration regularly
    • Verify backup integrity
    • Document restoration procedures
    • Keep backup logs

Container Recreation Best Practices

  1. Before Recreation:

    • Check for backups first
    • Document container configurations
    • Note any custom settings
    • Plan recreation order
  2. During Recreation:

    • Recreate from templates
    • Restore configurations
    • Install applications
    • Restore data if available
  3. After Recreation:

    • Verify containers work
    • Test all services
    • Update documentation
    • Set up backups immediately

Part 7: Recovery Procedures

Container Recovery Workflow

1. Check for Backups
   ├─ Found? → Restore from Backup
   └─ Not Found? → Recreate from Template

2. Restore from Backup
   ├─ Copy backup to r630-01
   ├─ Restore using pct restore
   ├─ Start container
   └─ Verify services

3. Recreate from Template
   ├─ Read container config
   ├─ Download template
   ├─ Create container
   ├─ Restore configuration
   ├─ Install applications
   └─ Restore data (if available)

Emergency Recovery

If backups fail:

  1. Stop affected containers
  2. Check backup logs
  3. Manually create backup
  4. Restore from manual backup
  5. Verify restoration

If recreation fails:

  1. Check container logs
  2. Verify template exists
  3. Check storage availability
  4. Retry recreation
  5. Contact support if needed

Part 8: Monitoring and Maintenance

Backup Monitoring

Daily:

  • Check backup logs for errors
  • Verify backups completed successfully

Weekly:

  • Review backup storage usage
  • Test backup restoration
  • Clean up old backups

Monthly:

  • Review backup policies
  • Update backup procedures
  • Document any issues

Container Monitoring

After Recreation:

  • Monitor container status
  • Check service logs
  • Verify applications work
  • Test functionality

Ongoing:

  • Regular health checks
  • Monitor resource usage
  • Update applications
  • Maintain backups

Commands Reference

Backup Commands

# Setup automated backups
./scripts/setup-automated-backups.sh

# Manual backup
ssh root@192.168.11.11 "/usr/local/bin/manual-backup.sh 106 107 108"

# Check backup status
ssh root@192.168.11.11 "ls -lh /var/lib/vz/dump/"

# View backup logs
ssh root@192.168.11.11 "tail -f /var/log/proxmox-backups/backup_$(date +%Y%m%d).log"

Restoration Commands

# Restore from backup
ssh root@192.168.11.11 "pct restore 106 /var/lib/vz/dump/vzdump-lxc-106-*.tar.gz --storage thin1"

# Recreate from template
./scripts/recreate-containers-from-configs.sh 106 107 108

# Check container status
ssh root@192.168.11.11 "pct list | grep 106"

Verification Commands

# Check container volumes
ssh root@192.168.11.11 "lvs pve | grep vm-106-disk"

# Check container config
ssh root@192.168.11.11 "pct config 106"

# Check container logs
ssh root@192.168.11.11 "pct logs 106"

Next Steps

  1. Immediate:

    • Run backup setup script
    • Check for existing backups
    • Document found backups
  2. Short-term:

    • Restore containers from backups (if available)
    • Recreate high-priority containers
    • Verify all services work
  3. Long-term:

    • Set up Proxmox Backup Server
    • Implement off-site backups
    • Regular backup testing
    • Documentation updates

Status: 📋 PLAN READY
Next Action: Run backup setup script and check for existing backups
Last Updated: January 7, 2026