Files

Deploy to Phoenix / deploy (push) Has been cancelled

Details

docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates

- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-12 15:46:57 -08:00

7.8 KiB

Raw Permalink Blame History

R630-02 Container Startup Failures Analysis

Date: January 19, 2026
Node: r630-02 (192.168.11.12)
Status: ⚠️ CRITICAL - 33 CONTAINERS FAILED TO START

Executive Summary

A bulk container startup operation on r630-02 resulted in 33 container failures out of attempted starts. The failures fall into three distinct categories:

Logical Volume Missing (8 containers) - Storage volumes don't exist
Startup Failures (24 containers) - Containers fail to start for unknown reasons
Lock Error (1 container) - Container is locked in "create" state

Total Impact: 33 containers unable to start, affecting multiple services.

Failure Breakdown

Category 1: Missing Logical Volumes (8 containers)

Error Pattern: no such logical volume pve/vm-XXXX-disk-X

Affected Containers:

CT 3000: pve/vm-3000-disk-1
CT 3001: pve/vm-3001-disk-1
CT 3002: pve/vm-3002-disk-2
CT 3003: pve/vm-3003-disk-1
CT 3500: pve/vm-3500-disk-1
CT 3501: pve/vm-3501-disk-2
CT 6000: pve/vm-6000-disk-1
CT 6400: pve/vm-6400-disk-1

Root Cause Analysis:

Storage volumes were likely deleted, migrated, or never created
Containers may have been migrated to another node but configs not updated
Storage pool may have been recreated/reset, losing volume metadata
Containers may reference wrong storage pool (e.g., thin1 vs thin1-r630-02)

Diagnostic Steps:

Check if volumes exist on other storage pools:

ssh root@192.168.11.12 "lvs | grep -E 'vm-3000|vm-3001|vm-3002|vm-3003|vm-3500|vm-3501|vm-6000|vm-6400'"

Check container storage configuration:

ssh root@192.168.11.12 "pct config 3000 | grep rootfs"

Check available storage pools:
```
ssh root@192.168.11.12 "pvesm status"
```

Resolution Options:

Option A: Recreate missing volumes if data is not critical
Option B: Migrate containers to existing storage pool
Option C: Restore volumes from backup if available
Option D: Update container configs to point to correct storage

Category 2: Startup Failures (24 containers)

Error Pattern: startup for container 'XXXX' failed

Affected Containers:

CT 5200
CT 10000, 10001, 10020, 10030, 10040, 10050, 10060
CT 10070, 10080, 10090, 10091, 10092
CT 10100, 10101, 10120, 10130
CT 10150, 10151
CT 10200, 10201, 10202, 10210, 10230

Root Cause Analysis: Startup failures can have multiple causes:

Missing configuration files - Container config deleted or not migrated
Storage issues - Storage accessible but corrupted or misconfigured
Network issues - Network configuration problems
Resource constraints - Insufficient memory/CPU
Container corruption - Container filesystem issues
Dependencies - Missing required services or mounts

Diagnostic Steps:

Check if config files exist:

ssh root@192.168.11.12 "ls -la /etc/pve/lxc/ | grep -E '5200|10000|10001|10020|10030|10040|10050|10060|10070|10080|10090|10091|10092|10100|10101|10120|10130|10150|10151|10200|10201|10202|10210|10230'"

Check detailed startup error:

ssh root@192.168.11.12 "pct start 5200 2>&1"

Check container status and locks:

ssh root@192.168.11.12 "pct list | grep -E '5200|10000|10001'"

Check system resources:

ssh root@192.168.11.12 "free -h; df -h"

Check container logs:

ssh root@192.168.11.12 "journalctl -u pve-container@5200 -n 50 --no-pager"

Resolution Options:

Option A: Fix configuration issues (network, storage, etc.)
Option B: Recreate containers if configs are missing
Option C: Check and resolve resource constraints
Option D: Restore from backup if corruption detected

Category 3: Lock Error (1 container)

Error Pattern: CT is locked (create)

Affected Container:

CT 10232

Root Cause Analysis:

Container is stuck in "create" state
Previous creation operation may have been interrupted
Lock file exists but container creation incomplete

Diagnostic Steps:

Check lock status:

ssh root@192.168.11.12 "pct list | grep 10232"

Check for lock files:

ssh root@192.168.11.12 "ls -la /var/lock/qemu-server/ | grep 10232"

Check Proxmox task queue:

ssh root@192.168.11.12 "qm list | grep 10232"

Resolution Options:

Option A: Clear lock manually:

ssh root@192.168.11.12 "rm -f /var/lock/qemu-server/lock-10232"

Option B: Complete or cancel the creation task
Option C: Delete and recreate container if creation incomplete

Successfully Started Containers

The following containers started successfully:

CT 10030, 10040, 10050, 10060, 10070, 10080, 10090, 10091, 10092, 10100, 10101, 10120, 10130, 10150, 10151, 10200, 10201, 10202, 10210, 10230, 10232

Note: Some of these may have started initially but then failed (see failure list above).

Recommended Actions

Immediate Actions (Priority 1)

Run Diagnostic Script:
```
./scripts/diagnose-r630-02-startup-failures.sh
```
This will identify the root cause for each failure.

Check Storage Status:

ssh root@192.168.11.12 "pvesm status; lvs; vgs"

Check System Resources:

ssh root@192.168.11.12 "free -h; df -h; uptime"

Short-term Actions (Priority 2)

Fix Logical Volume Issues:
- Identify where volumes should be or if they need recreation
- Update container configs to use correct storage pools
- Recreate volumes if data is not critical
Resolve Startup Failures:
- Check each container's detailed error message
- Fix configuration issues
- Recreate containers if configs are missing
Clear Lock on CT 10232:
- Remove lock file and retry creation or delete container

Long-term Actions (Priority 3)

Implement Monitoring:
- Set up alerts for container startup failures
- Monitor storage pool health
- Track container status changes
Documentation:
- Document container dependencies
- Create runbooks for common failure scenarios
- Maintain container inventory with storage mappings
Prevention:
- Implement pre-startup validation
- Add storage health checks
- Create backup procedures for container configs

Diagnostic Commands Reference

Check Container Status

ssh root@192.168.11.12 "pct list | grep -E '3000|3001|3002|3003|3500|3501|5200|6000|6400|10000|10001|10020|10030|10040|10050|10060|10070|10080|10090|10091|10092|10100|10101|10120|10130|10150|10151|10200|10201|10202|10210|10230|10232'"

Check Storage Configuration

ssh root@192.168.11.12 "pvesm status"
ssh root@192.168.11.12 "lvs | grep -E 'vm-3000|vm-3001|vm-3002|vm-3003|vm-3500|vm-3501|vm-6000|vm-6400'"

Check Container Configs

ssh root@192.168.11.12 "for vmid in 3000 3001 3002 3003 3500 3501 5200 6000 6400; do echo \"=== CT \$vmid ===\"; pct config \$vmid 2>&1 | head -5; done"

Check Detailed Errors

ssh root@192.168.11.12 "for vmid in 3000 5200 10000 10232; do echo \"=== CT \$vmid ===\"; pct start \$vmid 2>&1; echo; done"

Next Steps

Run the diagnostic script to gather detailed information
Review diagnostic output and categorize failures
Execute fix script for automated resolution where possible
Manually resolve remaining issues based on diagnostic findings
Verify all containers can start successfully
Document resolution steps for future reference

7.8 KiB Raw Permalink Blame History

R630-02 Container Startup Failures Analysis

Executive Summary

Failure Breakdown

Category 1: Missing Logical Volumes (8 containers)

Category 2: Startup Failures (24 containers)

Category 3: Lock Error (1 container)

Successfully Started Containers

Recommended Actions

Immediate Actions (Priority 1)

Short-term Actions (Priority 2)

Long-term Actions (Priority 3)

Diagnostic Commands Reference

Check Container Status

Check Storage Configuration

Check Container Configs

Check Detailed Errors

Related Documentation

Next Steps

7.8 KiB

Raw Permalink Blame History