275 lines
6.6 KiB
Markdown
275 lines
6.6 KiB
Markdown
# Deployment Strategy Recommendation
|
|
|
|
**Date**: $(date)
|
|
**Proxmox Host**: ml110 (192.168.11.10)
|
|
|
|
## Executive Summary
|
|
|
|
Based on the current status of both deployments, the **recommended strategy** is to:
|
|
|
|
✅ **Keep LXC Containers (1000-2502) Active**
|
|
❌ **Shutdown VM 9000 (temporary VM)**
|
|
|
|
---
|
|
|
|
## Current Status Summary
|
|
|
|
### LXC Containers (1000-2502)
|
|
- **Status**: ✅ 11 out of 12 containers have active services
|
|
- **Resources**: 104GB RAM, 40 CPU cores, ~1.2TB disk
|
|
- **Readiness**: Production-ready deployment
|
|
- **Issue**: VMID 1503 needs service file attention
|
|
|
|
### VM 9000 (Temporary VM)
|
|
- **Status**: ⚠️ Running but network connectivity blocked
|
|
- **Resources**: 32GB RAM, 6 CPU cores, 1TB disk
|
|
- **Readiness**: Cannot verify (network issue prevents access)
|
|
- **Issue**: SSH/ping not accessible, QEMU guest agent not running
|
|
|
|
---
|
|
|
|
## Recommendation: Keep LXC, Shutdown VM 9000
|
|
|
|
### Primary Recommendation
|
|
|
|
**Action**: Shutdown VM 9000
|
|
|
|
**Command**:
|
|
```bash
|
|
qm stop 9000
|
|
```
|
|
|
|
### Reasoning
|
|
|
|
#### ✅ Advantages of Keeping LXC Containers
|
|
|
|
1. **Production Ready**
|
|
- Properly configured LXC containers
|
|
- 11 out of 12 services active and running
|
|
- Individual resource allocation per node
|
|
|
|
2. **Better Architecture**
|
|
- Resource isolation per node
|
|
- Independent scaling capability
|
|
- Better security boundaries
|
|
- Individual node management
|
|
|
|
3. **Service Status**
|
|
- Validators: 5/5 services started
|
|
- Sentries: 3/4 services active (1 needs minor fix)
|
|
- RPC Nodes: 3/3 services active
|
|
|
|
4. **Resource Efficiency**
|
|
- Dedicated resources per node
|
|
- No resource contention
|
|
- Better performance isolation
|
|
|
|
#### ❌ Reasons to Shutdown VM 9000
|
|
|
|
1. **Network Connectivity Issues**
|
|
- SSH not accessible
|
|
- Ping fails (destination unreachable)
|
|
- QEMU guest agent not running
|
|
- Cannot verify Docker containers status
|
|
|
|
2. **Resource Savings**
|
|
- Free 32GB RAM
|
|
- Free 6 CPU cores
|
|
- Reduce total resource usage from 136GB to 104GB
|
|
|
|
3. **Temporary Deployment**
|
|
- VM 9000 is intended as temporary/testing deployment
|
|
- LXC containers are the production target
|
|
- VM 9000 served its purpose (if it was used for testing)
|
|
|
|
4. **Maintenance Overhead**
|
|
- Network issue requires console access to troubleshoot
|
|
- Additional resource consumption for uncertain benefit
|
|
- Cannot verify if services are actually running
|
|
|
|
---
|
|
|
|
## Alternative: Fix VM 9000 Network
|
|
|
|
If VM 9000 is needed for specific testing purposes, you would need to:
|
|
|
|
1. **Access VM Console**
|
|
```bash
|
|
# Via Proxmox web UI: https://192.168.11.10:8006 -> VM 9000 -> Console
|
|
# Or try: qm terminal 9000
|
|
```
|
|
|
|
2. **Verify Cloud-init Completion**
|
|
- Check: `cat /var/log/cloud-init-output.log`
|
|
- Verify network configuration
|
|
- Check SSH service status
|
|
|
|
3. **Fix Network Configuration**
|
|
- Verify interface configuration
|
|
- Restart network service
|
|
- Verify routes and gateway
|
|
|
|
4. **Verify Docker Containers**
|
|
```bash
|
|
# Once SSH accessible:
|
|
ssh root@192.168.11.90
|
|
docker ps
|
|
cd /opt/besu && docker compose ps
|
|
```
|
|
|
|
**However**, this requires significant troubleshooting time and may not be necessary if LXC containers are already working.
|
|
|
|
---
|
|
|
|
## Resource Comparison
|
|
|
|
### Current State (Both Running)
|
|
| Resource | LXC Containers | VM 9000 | Total |
|
|
|----------|----------------|---------|-------|
|
|
| Memory | 104GB | 32GB | 136GB |
|
|
| CPU Cores | 40 | 6 | 46 |
|
|
| Disk | ~1.2TB | 1TB | ~2.2TB |
|
|
|
|
### Recommended State (LXC Only)
|
|
| Resource | LXC Containers | VM 9000 | Total |
|
|
|----------|----------------|---------|-------|
|
|
| Memory | 104GB | 0GB (stopped) | 104GB |
|
|
| CPU Cores | 40 | 0 (stopped) | 40 |
|
|
| Disk | ~1.2TB | 1TB (unused) | ~1.2TB |
|
|
|
|
**Savings**: 32GB RAM, 6 CPU cores freed up
|
|
|
|
---
|
|
|
|
## Implementation Steps
|
|
|
|
### Step 1: Verify LXC Services are Healthy
|
|
|
|
```bash
|
|
# Wait a few minutes for services to fully start
|
|
sleep 60
|
|
|
|
# Check all services
|
|
for vmid in 1000 1001 1002 1003 1004; do
|
|
echo "Validator $vmid:"
|
|
pct exec $vmid -- systemctl status besu-validator --no-pager | head -3
|
|
done
|
|
|
|
for vmid in 1500 1501 1502; do
|
|
echo "Sentry $vmid:"
|
|
pct exec $vmid -- systemctl status besu-sentry --no-pager | head -3
|
|
done
|
|
|
|
for vmid in 2500 2501 2502; do
|
|
echo "RPC $vmid:"
|
|
pct exec $vmid -- systemctl status besu-rpc --no-pager | head -3
|
|
done
|
|
```
|
|
|
|
### Step 2: Fix VMID 1503 Service (if needed)
|
|
|
|
```bash
|
|
# Check service file
|
|
pct exec 1503 -- systemctl list-unit-files | grep besu
|
|
|
|
# If service file missing, may need to re-run installation
|
|
# (Check deployment scripts)
|
|
```
|
|
|
|
### Step 3: Shutdown VM 9000
|
|
|
|
```bash
|
|
# Graceful shutdown
|
|
qm shutdown 9000
|
|
|
|
# Wait for shutdown
|
|
sleep 30
|
|
|
|
# Force stop if needed
|
|
qm stop 9000
|
|
|
|
# Verify stopped
|
|
qm status 9000
|
|
```
|
|
|
|
### Step 4: Monitor LXC Deployment
|
|
|
|
```bash
|
|
# Check service logs for errors
|
|
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 2500 2501 2502; do
|
|
if [[ $vmid -lt 1500 ]]; then
|
|
service="besu-validator"
|
|
elif [[ $vmid -lt 2500 ]]; then
|
|
service="besu-sentry"
|
|
else
|
|
service="besu-rpc"
|
|
fi
|
|
echo "=== VMID $vmid ($service) ==="
|
|
pct exec $vmid -- journalctl -u $service --since "5 minutes ago" --no-pager | tail -5
|
|
done
|
|
```
|
|
|
|
---
|
|
|
|
## When to Keep Both Running
|
|
|
|
Consider keeping both deployments if:
|
|
|
|
1. **Active Testing/Migration**
|
|
- Testing migration from VM to LXC
|
|
- Comparing performance between deployments
|
|
- Validating data migration process
|
|
|
|
2. **VM 9000 Network Fixed**
|
|
- Network connectivity restored
|
|
- Docker containers verified running
|
|
- Active use case identified
|
|
|
|
3. **Sufficient Resources**
|
|
- 136GB+ RAM available
|
|
- 46+ CPU cores available
|
|
- Clear benefit from both deployments
|
|
|
|
---
|
|
|
|
## Decision Matrix
|
|
|
|
| Scenario | Recommendation | Action |
|
|
|----------|----------------|--------|
|
|
| Production deployment needed | Keep LXC, shutdown VM | `qm stop 9000` |
|
|
| Testing/migration in progress | Keep both (temporarily) | Monitor both |
|
|
| VM 9000 network fixed & needed | Keep both | Verify Docker containers |
|
|
| Resource constrained | Keep LXC only | `qm stop 9000` |
|
|
| Uncertain use case | Keep LXC, shutdown VM | `qm stop 9000` |
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
**Recommended Action**: `qm stop 9000`
|
|
|
|
**Expected Outcome**:
|
|
- ✅ Free 32GB RAM and 6 CPU cores
|
|
- ✅ Focus resources on production LXC deployment
|
|
- ✅ Reduce maintenance overhead
|
|
- ✅ Simplify deployment management
|
|
- ✅ VM 9000 can be restarted later if needed
|
|
|
|
**Next Steps**:
|
|
1. Verify LXC services are healthy
|
|
2. Execute `qm stop 9000`
|
|
3. Monitor LXC deployment
|
|
4. Document final deployment state
|
|
|
|
---
|
|
|
|
**Related Documentation**:
|
|
- [Next Steps Completed Report](NEXT_STEPS_COMPLETED.md)
|
|
- [Current Deployment Status](CURRENT_DEPLOYMENT_STATUS.md)
|
|
- [Deployment Comparison](DEPLOYMENT_COMPARISON.md)
|
|
|
|
---
|
|
|
|
**Recommendation Generated**: $(date)
|
|
|