- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands - CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround - CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check - NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere - MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates - LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference Co-authored-by: Cursor <cursoragent@cursor.com>
7.8 KiB
R630-02 Container Startup Failures Analysis
Date: January 19, 2026
Node: r630-02 (192.168.11.12)
Status: ⚠️ CRITICAL - 33 CONTAINERS FAILED TO START
Executive Summary
A bulk container startup operation on r630-02 resulted in 33 container failures out of attempted starts. The failures fall into three distinct categories:
- Logical Volume Missing (8 containers) - Storage volumes don't exist
- Startup Failures (24 containers) - Containers fail to start for unknown reasons
- Lock Error (1 container) - Container is locked in "create" state
Total Impact: 33 containers unable to start, affecting multiple services.
Failure Breakdown
Category 1: Missing Logical Volumes (8 containers)
Error Pattern: no such logical volume pve/vm-XXXX-disk-X
Affected Containers:
- CT 3000:
pve/vm-3000-disk-1 - CT 3001:
pve/vm-3001-disk-1 - CT 3002:
pve/vm-3002-disk-2 - CT 3003:
pve/vm-3003-disk-1 - CT 3500:
pve/vm-3500-disk-1 - CT 3501:
pve/vm-3501-disk-2 - CT 6000:
pve/vm-6000-disk-1 - CT 6400:
pve/vm-6400-disk-1
Root Cause Analysis:
- Storage volumes were likely deleted, migrated, or never created
- Containers may have been migrated to another node but configs not updated
- Storage pool may have been recreated/reset, losing volume metadata
- Containers may reference wrong storage pool (e.g.,
thin1vsthin1-r630-02)
Diagnostic Steps:
-
Check if volumes exist on other storage pools:
ssh root@192.168.11.12 "lvs | grep -E 'vm-3000|vm-3001|vm-3002|vm-3003|vm-3500|vm-3501|vm-6000|vm-6400'" -
Check container storage configuration:
ssh root@192.168.11.12 "pct config 3000 | grep rootfs" -
Check available storage pools:
ssh root@192.168.11.12 "pvesm status"
Resolution Options:
- Option A: Recreate missing volumes if data is not critical
- Option B: Migrate containers to existing storage pool
- Option C: Restore volumes from backup if available
- Option D: Update container configs to point to correct storage
Category 2: Startup Failures (24 containers)
Error Pattern: startup for container 'XXXX' failed
Affected Containers:
- CT 5200
- CT 10000, 10001, 10020, 10030, 10040, 10050, 10060
- CT 10070, 10080, 10090, 10091, 10092
- CT 10100, 10101, 10120, 10130
- CT 10150, 10151
- CT 10200, 10201, 10202, 10210, 10230
Root Cause Analysis: Startup failures can have multiple causes:
- Missing configuration files - Container config deleted or not migrated
- Storage issues - Storage accessible but corrupted or misconfigured
- Network issues - Network configuration problems
- Resource constraints - Insufficient memory/CPU
- Container corruption - Container filesystem issues
- Dependencies - Missing required services or mounts
Diagnostic Steps:
-
Check if config files exist:
ssh root@192.168.11.12 "ls -la /etc/pve/lxc/ | grep -E '5200|10000|10001|10020|10030|10040|10050|10060|10070|10080|10090|10091|10092|10100|10101|10120|10130|10150|10151|10200|10201|10202|10210|10230'" -
Check detailed startup error:
ssh root@192.168.11.12 "pct start 5200 2>&1" -
Check container status and locks:
ssh root@192.168.11.12 "pct list | grep -E '5200|10000|10001'" -
Check system resources:
ssh root@192.168.11.12 "free -h; df -h" -
Check container logs:
ssh root@192.168.11.12 "journalctl -u pve-container@5200 -n 50 --no-pager"
Resolution Options:
- Option A: Fix configuration issues (network, storage, etc.)
- Option B: Recreate containers if configs are missing
- Option C: Check and resolve resource constraints
- Option D: Restore from backup if corruption detected
Category 3: Lock Error (1 container)
Error Pattern: CT is locked (create)
Affected Container:
- CT 10232
Root Cause Analysis:
- Container is stuck in "create" state
- Previous creation operation may have been interrupted
- Lock file exists but container creation incomplete
Diagnostic Steps:
-
Check lock status:
ssh root@192.168.11.12 "pct list | grep 10232" -
Check for lock files:
ssh root@192.168.11.12 "ls -la /var/lock/qemu-server/ | grep 10232" -
Check Proxmox task queue:
ssh root@192.168.11.12 "qm list | grep 10232"
Resolution Options:
- Option A: Clear lock manually:
ssh root@192.168.11.12 "rm -f /var/lock/qemu-server/lock-10232" - Option B: Complete or cancel the creation task
- Option C: Delete and recreate container if creation incomplete
Successfully Started Containers
The following containers started successfully:
- CT 10030, 10040, 10050, 10060, 10070, 10080, 10090, 10091, 10092, 10100, 10101, 10120, 10130, 10150, 10151, 10200, 10201, 10202, 10210, 10230, 10232
Note: Some of these may have started initially but then failed (see failure list above).
Recommended Actions
Immediate Actions (Priority 1)
-
Run Diagnostic Script:
./scripts/diagnose-r630-02-startup-failures.shThis will identify the root cause for each failure.
-
Check Storage Status:
ssh root@192.168.11.12 "pvesm status; lvs; vgs" -
Check System Resources:
ssh root@192.168.11.12 "free -h; df -h; uptime"
Short-term Actions (Priority 2)
-
Fix Logical Volume Issues:
- Identify where volumes should be or if they need recreation
- Update container configs to use correct storage pools
- Recreate volumes if data is not critical
-
Resolve Startup Failures:
- Check each container's detailed error message
- Fix configuration issues
- Recreate containers if configs are missing
-
Clear Lock on CT 10232:
- Remove lock file and retry creation or delete container
Long-term Actions (Priority 3)
-
Implement Monitoring:
- Set up alerts for container startup failures
- Monitor storage pool health
- Track container status changes
-
Documentation:
- Document container dependencies
- Create runbooks for common failure scenarios
- Maintain container inventory with storage mappings
-
Prevention:
- Implement pre-startup validation
- Add storage health checks
- Create backup procedures for container configs
Diagnostic Commands Reference
Check Container Status
ssh root@192.168.11.12 "pct list | grep -E '3000|3001|3002|3003|3500|3501|5200|6000|6400|10000|10001|10020|10030|10040|10050|10060|10070|10080|10090|10091|10092|10100|10101|10120|10130|10150|10151|10200|10201|10202|10210|10230|10232'"
Check Storage Configuration
ssh root@192.168.11.12 "pvesm status"
ssh root@192.168.11.12 "lvs | grep -E 'vm-3000|vm-3001|vm-3002|vm-3003|vm-3500|vm-3501|vm-6000|vm-6400'"
Check Container Configs
ssh root@192.168.11.12 "for vmid in 3000 3001 3002 3003 3500 3501 5200 6000 6400; do echo \"=== CT \$vmid ===\"; pct config \$vmid 2>&1 | head -5; done"
Check Detailed Errors
ssh root@192.168.11.12 "for vmid in 3000 5200 10000 10232; do echo \"=== CT \$vmid ===\"; pct start \$vmid 2>&1; echo; done"
Related Documentation
Next Steps
- Run the diagnostic script to gather detailed information
- Review diagnostic output and categorize failures
- Execute fix script for automated resolution where possible
- Manually resolve remaining issues based on diagnostic findings
- Verify all containers can start successfully
- Document resolution steps for future reference