132 lines
3.8 KiB
Markdown
132 lines
3.8 KiB
Markdown
|
|
# R630-02 Container Startup Failures - Resolution
|
||
|
|
|
||
|
|
**Date:** January 19, 2026
|
||
|
|
**Status:** ✅ **ROOT CAUSE IDENTIFIED - CONTAINERS ON WRONG NODE**
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Critical Finding
|
||
|
|
|
||
|
|
**All 33 containers that failed to start on r630-02 do not exist on that node.** They have been migrated to **pve2 (192.168.11.11)** and are currently stopped there.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Root Cause
|
||
|
|
|
||
|
|
The startup script attempted to start containers on r630-02, but:
|
||
|
|
1. **Container configuration files are missing** on r630-02 (`/etc/pve/lxc/XXXX.conf` don't exist)
|
||
|
|
2. **Logical volumes are missing** on r630-02 (no `vm-XXXX-disk-X` volumes)
|
||
|
|
3. **All containers exist on pve2** and are in "stopped" state
|
||
|
|
|
||
|
|
**Conclusion:** The containers were migrated from r630-02 to pve2, but the startup operation was attempted on the wrong node.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Container Locations
|
||
|
|
|
||
|
|
### On pve2 (192.168.11.11) - All 33 containers found:
|
||
|
|
|
||
|
|
#### Logical Volume Error Containers (8):
|
||
|
|
- CT 3000: `ml110` - stopped
|
||
|
|
- CT 3001: `ml110` - stopped
|
||
|
|
- CT 3002: `ml110` - stopped
|
||
|
|
- CT 3003: `ml110` - stopped
|
||
|
|
- CT 3500: `oracle-publisher-1` - stopped
|
||
|
|
- CT 3501: `ccip-monitor-1` - stopped
|
||
|
|
- CT 6000: `fabric-1` - stopped
|
||
|
|
- CT 6400: `indy-1` - stopped
|
||
|
|
|
||
|
|
#### Startup Failure Containers (24):
|
||
|
|
- CT 5200: `cacti-1` - stopped
|
||
|
|
- CT 10000-10092: Order management services (12 containers) - stopped
|
||
|
|
- CT 10100-10151: DBIS Core services (6 containers) - stopped
|
||
|
|
- CT 10200-10230: Order monitoring services (5 containers) - stopped
|
||
|
|
|
||
|
|
#### Lock Error Container (1):
|
||
|
|
- CT 10232: `CT10232` - stopped, locked in "create" state
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Resolution
|
||
|
|
|
||
|
|
### Option 1: Start Containers on pve2 (Recommended)
|
||
|
|
|
||
|
|
Since all containers exist on pve2, start them there:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./scripts/start-containers-on-pve2.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
### Option 2: Migrate Containers Back to r630-02
|
||
|
|
|
||
|
|
If containers should be on r630-02, migrate them back:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# For each container
|
||
|
|
pct migrate <VMID> r630-02 --storage thin1-r630-02 --restart
|
||
|
|
```
|
||
|
|
|
||
|
|
**Note:** This requires:
|
||
|
|
- Available storage on r630-02
|
||
|
|
- Network connectivity between nodes
|
||
|
|
- Container data to be migrated
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. ✅ **Diagnostic Complete** - Identified containers are on pve2
|
||
|
|
2. ⏳ **Start Containers on pve2** - Use the start script
|
||
|
|
3. ⏳ **Verify Services** - Check that services start correctly
|
||
|
|
4. ⏳ **Update Documentation** - Document actual container locations
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Container Inventory on pve2
|
||
|
|
|
||
|
|
All 33 containers are present on pve2:
|
||
|
|
|
||
|
|
| VMID | Hostname | Status |
|
||
|
|
|------|----------|--------|
|
||
|
|
| 3000 | ml110 | stopped |
|
||
|
|
| 3001 | ml110 | stopped |
|
||
|
|
| 3002 | ml110 | stopped |
|
||
|
|
| 3003 | ml110 | stopped |
|
||
|
|
| 3500 | oracle-publisher-1 | stopped |
|
||
|
|
| 3501 | ccip-monitor-1 | stopped |
|
||
|
|
| 5200 | cacti-1 | stopped |
|
||
|
|
| 6000 | fabric-1 | stopped |
|
||
|
|
| 6400 | indy-1 | stopped |
|
||
|
|
| 10000 | order-postgres-primary | stopped |
|
||
|
|
| 10001 | order-postgres-replica | stopped |
|
||
|
|
| 10020 | order-redis | stopped |
|
||
|
|
| 10030 | order-identity | stopped |
|
||
|
|
| 10040 | order-intake | stopped |
|
||
|
|
| 10050 | order-finance | stopped |
|
||
|
|
| 10060 | order-dataroom | stopped |
|
||
|
|
| 10070 | order-legal | stopped |
|
||
|
|
| 10080 | order-eresidency | stopped |
|
||
|
|
| 10090 | order-portal-public | stopped |
|
||
|
|
| 10091 | order-portal-internal | stopped |
|
||
|
|
| 10092 | order-mcp-legal | stopped |
|
||
|
|
| 10100 | dbis-postgres-primary | stopped |
|
||
|
|
| 10101 | dbis-postgres-replica-1 | stopped |
|
||
|
|
| 10120 | dbis-redis | stopped |
|
||
|
|
| 10130 | dbis-frontend | stopped |
|
||
|
|
| 10150 | dbis-api-primary | stopped |
|
||
|
|
| 10151 | dbis-api-secondary | stopped |
|
||
|
|
| 10200 | order-prometheus | stopped |
|
||
|
|
| 10201 | order-grafana | stopped |
|
||
|
|
| 10202 | order-opensearch | stopped |
|
||
|
|
| 10210 | order-haproxy | stopped |
|
||
|
|
| 10230 | order-vault | stopped |
|
||
|
|
| 10232 | CT10232 | stopped (locked) |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Action Required
|
||
|
|
|
||
|
|
**Immediate:** Start containers on pve2 where they actually exist, not on r630-02.
|
||
|
|
|
||
|
|
**Future:** Update startup scripts to check container location before attempting to start, or migrate containers to intended nodes.
|