- Added lock file exclusions for pnpm in .gitignore. - Removed obsolete package-lock.json from the api and portal directories. - Enhanced Cloudflare adapter with additional interfaces for zones and tunnels. - Improved Proxmox adapter error handling and logging for API requests. - Updated Proxmox VM parameters with validation rules in the API schema. - Enhanced documentation for Proxmox VM specifications and examples.
156 lines
4.1 KiB
Markdown
156 lines
4.1 KiB
Markdown
# VM 100 Deployment Status
|
|
|
|
**Date**: 2025-12-11
|
|
**Status**: ⚠️ **STUCK - Provider Code Issue**
|
|
|
|
---
|
|
|
|
## Current State
|
|
|
|
- **VMID**: 101 (assigned by Proxmox)
|
|
- **Status**: `stopped`
|
|
- **Lock**: `create` (stuck)
|
|
- **Age**: ~7 minutes
|
|
- **Issue**: Cannot complete configuration due to lock timeout
|
|
|
|
---
|
|
|
|
## Problem Identified
|
|
|
|
### Root Cause
|
|
The provider code has a fundamental issue with `importdisk` operations:
|
|
|
|
1. **VM Created**: Provider creates VM with blank disk
|
|
2. **Import Started**: `importdisk` API call starts (holds lock)
|
|
3. **Config Update Attempted**: Provider tries to update config immediately
|
|
4. **Lock Timeout**: Update fails because import is still running
|
|
5. **Stuck State**: Lock never releases, VM remains in `lock: create`
|
|
|
|
### Provider Code Issue
|
|
|
|
**Location**: `crossplane-provider-proxmox/pkg/proxmox/client.go`
|
|
|
|
**Problem** (Line 397-402):
|
|
```go
|
|
if err := c.httpClient.Post(ctx, importPath, importConfig, &importResult); err != nil {
|
|
return nil, errors.Wrapf(err, "failed to import image...")
|
|
}
|
|
|
|
// Wait a moment for import to complete
|
|
time.Sleep(2 * time.Second) // ❌ Only waits 2 seconds!
|
|
```
|
|
|
|
**Issue**: The code only waits 2 seconds, but importing a 660MB image takes 2-5 minutes. The provider then tries to update the config while the import is still running, causing lock timeouts.
|
|
|
|
---
|
|
|
|
## Template Format Issue
|
|
|
|
### vztmpl Templates Cannot Be Used for VMs
|
|
|
|
**Attempted**: `local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst`
|
|
|
|
**Problem**:
|
|
- `vztmpl` templates are for LXC containers, not QEMU VMs
|
|
- Provider code incorrectly tries to use them as VM disks
|
|
- Results in invalid disk configuration
|
|
|
|
### Current Format
|
|
|
|
**Using**: `local:iso/ubuntu-22.04-cloud.img`
|
|
|
|
**Behavior**:
|
|
- ✅ Correct format for VMs
|
|
- ⚠️ Triggers `importdisk` API
|
|
- ❌ Provider doesn't wait for completion
|
|
|
|
---
|
|
|
|
## Solutions
|
|
|
|
### Immediate Workaround
|
|
|
|
1. **Manual VM Creation** (if needed urgently):
|
|
```bash
|
|
# On Proxmox node
|
|
qm create 100 --name vm-100 --memory 4096 --cores 2 --net0 virtio,bridge=vmbr0
|
|
qm disk import 100 local:iso/ubuntu-22.04-cloud.img local-lvm
|
|
# Wait for import to complete (check tasks)
|
|
qm set 100 --scsi0 local-lvm:vm-100-disk-0 --boot order=scsi0
|
|
qm set 100 --agent 1
|
|
qm set 100 --ide2 local-lvm:cloudinit
|
|
```
|
|
|
|
### Long-term Fix
|
|
|
|
**Provider Code Needs**:
|
|
1. **Task Monitoring**: Monitor `importdisk` task status
|
|
2. **Wait for Completion**: Poll task until finished
|
|
3. **Then Update Config**: Only update after import completes
|
|
4. **Better Error Handling**: Proper timeout and retry logic
|
|
|
|
**Example Fix**:
|
|
```go
|
|
// After importdisk call
|
|
taskUPID := extractTaskUPID(importResult)
|
|
|
|
// Monitor task until complete
|
|
for i := 0; i < 300; i++ { // 5 minute timeout
|
|
taskStatus, err := c.getTaskStatus(ctx, taskUPID)
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
if taskStatus.Status == "stopped" {
|
|
break // Import complete
|
|
}
|
|
time.Sleep(2 * time.Second)
|
|
}
|
|
|
|
// Now safe to update config
|
|
```
|
|
|
|
---
|
|
|
|
## All Templates Status
|
|
|
|
### Issue
|
|
All 29 templates were updated to use `vztmpl` format, which **will not work** for VMs.
|
|
|
|
### Required Update
|
|
All templates need to be reverted to cloud image format:
|
|
```yaml
|
|
image: "local:iso/ubuntu-22.04-cloud.img"
|
|
```
|
|
|
|
**However**: This will still have the lock issue until provider code is fixed.
|
|
|
|
---
|
|
|
|
## Recommendations
|
|
|
|
### Short-term
|
|
1. ✅ **VM 100**: Using cloud image (will remain stuck until provider fix)
|
|
2. ⏳ **All Templates**: Revert to cloud image format
|
|
3. ⏳ **Provider Code**: Add task monitoring for `importdisk`
|
|
|
|
### Long-term
|
|
1. **Create QEMU Templates**: Convert VMs to templates for fast cloning
|
|
2. **Fix Provider Code**: Proper task monitoring and wait logic
|
|
3. **Documentation**: Clear template format requirements
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Fix Provider Code**: Add proper `importdisk` task monitoring
|
|
2. **Update All Templates**: Revert to cloud image format
|
|
3. **Test VM Creation**: Verify fix works
|
|
4. **Create QEMU Templates**: For faster future deployments
|
|
|
|
---
|
|
|
|
**Status**: ⚠️ **BLOCKED ON PROVIDER CODE FIX**
|
|
|
|
**Blocking Issue**: Provider doesn't wait for `importdisk` task completion
|
|
|