Files
Sankofa/docs/archive/status/PROVIDER_CODE_FIX_IMPORTDISK.md
defiQUG 7cd7022f6e Update .gitignore, remove package-lock.json, and enhance Cloudflare and Proxmox adapters
- Added lock file exclusions for pnpm in .gitignore.
- Removed obsolete package-lock.json from the api and portal directories.
- Enhanced Cloudflare adapter with additional interfaces for zones and tunnels.
- Improved Proxmox adapter error handling and logging for API requests.
- Updated Proxmox VM parameters with validation rules in the API schema.
- Enhanced documentation for Proxmox VM specifications and examples.
2025-12-12 19:29:01 -08:00

4.5 KiB

Provider Code Fix: importdisk Task Monitoring

Date: 2025-12-11
Status: IMPLEMENTED


Problem

The provider code was trying to update VM configuration immediately after starting the importdisk operation, without waiting for it to complete. This caused:

  • Lock timeouts: VM locked during import, config updates failed
  • Stuck VMs: VMs remained in lock: create state indefinitely
  • Failed deployments: VM creation never completed

Root Cause

Location: crossplane-provider-proxmox/pkg/proxmox/client.go (Line 397-402)

Original Code:

if err := c.httpClient.Post(ctx, importPath, importConfig, &importResult); err != nil {
    return nil, errors.Wrapf(err, "failed to import image...")
}

// Wait a moment for import to complete
time.Sleep(2 * time.Second)  // ❌ Only 2 seconds!

Issue:

  • importdisk for a 660MB image takes 2-5 minutes
  • Code only waited 2 seconds
  • Then tried to update config while import still running
  • Proxmox locked the VM during import → config update failed

Solution

Implementation

Added proper task monitoring that:

  1. Extracts UPID from importdisk response
  2. Monitors task status via Proxmox API
  3. Waits for completion before proceeding
  4. Handles errors and timeouts gracefully

Code Changes

File: crossplane-provider-proxmox/pkg/proxmox/client.go

Lines: 401-464

Key Features:

  • Extracts task UPID from response
  • Monitors task status every 3 seconds
  • Maximum wait time: 10 minutes
  • Checks exit status for errors
  • Context cancellation support
  • Fallback for missing UPID

Implementation Details

// Extract UPID from importdisk response
taskUPID := strings.TrimSpace(importResult)

// Monitor task until completion
maxWaitTime := 10 * time.Minute
pollInterval := 3 * time.Second

for time.Since(startTime) < maxWaitTime {
    // Check task status
    var taskStatus struct {
        Status     string `json:"status"`
        ExitStatus string `json:"exitstatus,omitempty"`
    }
    taskStatusPath := fmt.Sprintf("/nodes/%s/tasks/%s/status", spec.Node, taskUPID)
    
    if err := c.httpClient.Get(ctx, taskStatusPath, &taskStatus); err != nil {
        // Retry on error
        continue
    }
    
    // Task completed
    if taskStatus.Status == "stopped" {
        if taskStatus.ExitStatus != "OK" && taskStatus.ExitStatus != "" {
            return nil, errors.Errorf("importdisk task failed: %s", taskStatus.ExitStatus)
        }
        break  // Success!
    }
    
    // Wait before next check
    time.Sleep(pollInterval)
}

// Now safe to update config

Benefits

Immediate

  • No more lock timeouts: Waits for import to complete
  • Reliable VM creation: Config updates succeed
  • Proper error handling: Detects import failures

Long-term

  • Scalable: Works for images of any size
  • Robust: Handles edge cases and errors
  • Maintainable: Clear, well-documented code

Testing

Test Scenarios

  1. Small Image (< 100MB):

    • Should complete in < 1 minute
    • Task monitoring should detect completion quickly
  2. Medium Image (100-500MB):

    • Should complete in 1-3 minutes
    • Task monitoring should wait appropriately
  3. Large Image (500MB+):

    • Should complete in 3-10 minutes
    • Task monitoring should handle long waits
  4. Failed Import:

    • Should detect non-OK exit status
    • Should return appropriate error
  5. Missing UPID:

    • Should fall back to conservative wait
    • Should still attempt config update

API Reference

Proxmox Task API

Get Task Status:

GET /api2/json/nodes/{node}/tasks/{upid}/status

Response:

{
  "data": {
    "status": "running" | "stopped",
    "exitstatus": "OK" | "error code",
    ...
  }
}

Task UPID Format:

UPID:node:timestamp:pid:type:user@realm:

  • VM 100 Deployment: Blocked by this issue
  • All Templates: Will benefit from this fix
  • Lock Timeouts: Resolved by this fix

Next Steps

  1. Code Fix: Implemented
  2. Build Provider: Rebuild provider image
  3. Deploy Provider: Update provider in cluster
  4. Test VM Creation: Verify fix works
  5. Update Templates: Revert to cloud image format

Files Modified

  • crossplane-provider-proxmox/pkg/proxmox/client.go
    • Lines 401-464: Added task monitoring

Status: CODE FIX COMPLETE

Next: Rebuild and deploy provider to test