Files
Sankofa/docs/archive/status/VM_100_DEPLOYMENT_STATUS.md
defiQUG 7cd7022f6e Update .gitignore, remove package-lock.json, and enhance Cloudflare and Proxmox adapters
- Added lock file exclusions for pnpm in .gitignore.
- Removed obsolete package-lock.json from the api and portal directories.
- Enhanced Cloudflare adapter with additional interfaces for zones and tunnels.
- Improved Proxmox adapter error handling and logging for API requests.
- Updated Proxmox VM parameters with validation rules in the API schema.
- Enhanced documentation for Proxmox VM specifications and examples.
2025-12-12 19:29:01 -08:00

156 lines
4.1 KiB
Markdown

# VM 100 Deployment Status
**Date**: 2025-12-11
**Status**: ⚠️ **STUCK - Provider Code Issue**
---
## Current State
- **VMID**: 101 (assigned by Proxmox)
- **Status**: `stopped`
- **Lock**: `create` (stuck)
- **Age**: ~7 minutes
- **Issue**: Cannot complete configuration due to lock timeout
---
## Problem Identified
### Root Cause
The provider code has a fundamental issue with `importdisk` operations:
1. **VM Created**: Provider creates VM with blank disk
2. **Import Started**: `importdisk` API call starts (holds lock)
3. **Config Update Attempted**: Provider tries to update config immediately
4. **Lock Timeout**: Update fails because import is still running
5. **Stuck State**: Lock never releases, VM remains in `lock: create`
### Provider Code Issue
**Location**: `crossplane-provider-proxmox/pkg/proxmox/client.go`
**Problem** (Line 397-402):
```go
if err := c.httpClient.Post(ctx, importPath, importConfig, &importResult); err != nil {
return nil, errors.Wrapf(err, "failed to import image...")
}
// Wait a moment for import to complete
time.Sleep(2 * time.Second) // ❌ Only waits 2 seconds!
```
**Issue**: The code only waits 2 seconds, but importing a 660MB image takes 2-5 minutes. The provider then tries to update the config while the import is still running, causing lock timeouts.
---
## Template Format Issue
### vztmpl Templates Cannot Be Used for VMs
**Attempted**: `local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst`
**Problem**:
- `vztmpl` templates are for LXC containers, not QEMU VMs
- Provider code incorrectly tries to use them as VM disks
- Results in invalid disk configuration
### Current Format
**Using**: `local:iso/ubuntu-22.04-cloud.img`
**Behavior**:
- ✅ Correct format for VMs
- ⚠️ Triggers `importdisk` API
- ❌ Provider doesn't wait for completion
---
## Solutions
### Immediate Workaround
1. **Manual VM Creation** (if needed urgently):
```bash
# On Proxmox node
qm create 100 --name vm-100 --memory 4096 --cores 2 --net0 virtio,bridge=vmbr0
qm disk import 100 local:iso/ubuntu-22.04-cloud.img local-lvm
# Wait for import to complete (check tasks)
qm set 100 --scsi0 local-lvm:vm-100-disk-0 --boot order=scsi0
qm set 100 --agent 1
qm set 100 --ide2 local-lvm:cloudinit
```
### Long-term Fix
**Provider Code Needs**:
1. **Task Monitoring**: Monitor `importdisk` task status
2. **Wait for Completion**: Poll task until finished
3. **Then Update Config**: Only update after import completes
4. **Better Error Handling**: Proper timeout and retry logic
**Example Fix**:
```go
// After importdisk call
taskUPID := extractTaskUPID(importResult)
// Monitor task until complete
for i := 0; i < 300; i++ { // 5 minute timeout
taskStatus, err := c.getTaskStatus(ctx, taskUPID)
if err != nil {
return nil, err
}
if taskStatus.Status == "stopped" {
break // Import complete
}
time.Sleep(2 * time.Second)
}
// Now safe to update config
```
---
## All Templates Status
### Issue
All 29 templates were updated to use `vztmpl` format, which **will not work** for VMs.
### Required Update
All templates need to be reverted to cloud image format:
```yaml
image: "local:iso/ubuntu-22.04-cloud.img"
```
**However**: This will still have the lock issue until provider code is fixed.
---
## Recommendations
### Short-term
1. ✅ **VM 100**: Using cloud image (will remain stuck until provider fix)
2. ⏳ **All Templates**: Revert to cloud image format
3. ⏳ **Provider Code**: Add task monitoring for `importdisk`
### Long-term
1. **Create QEMU Templates**: Convert VMs to templates for fast cloning
2. **Fix Provider Code**: Proper task monitoring and wait logic
3. **Documentation**: Clear template format requirements
---
## Next Steps
1. **Fix Provider Code**: Add proper `importdisk` task monitoring
2. **Update All Templates**: Revert to cloud image format
3. **Test VM Creation**: Verify fix works
4. **Create QEMU Templates**: For faster future deployments
---
**Status**: ⚠️ **BLOCKED ON PROVIDER CODE FIX**
**Blocking Issue**: Provider doesn't wait for `importdisk` task completion