Files
loc_az_hci/docs/REMAINING_STEPS.md
defiQUG c39465c2bd
Some checks failed
Test / test (push) Has been cancelled
Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-08 09:04:46 -08:00

21 KiB

Remaining Steps - Proxmox VE Deployment

Generated: 2025-11-27
Based on: Current status review and bring-up checklist

This document provides a comprehensive, prioritized list of all remaining steps to complete the Proxmox VE → Azure Arc → Hybrid Cloud Stack deployment.

Priority Legend

  • 🔴 Critical/Blocking - Must be completed before other work can proceed
  • 🟠 High Priority - Core infrastructure required for deployment
  • 🟡 Medium Priority - Service deployment and configuration
  • 🟢 Low Priority - Optimization, hardening, and polish

🔴 Critical/Blocking Items

1. Azure Subscription Verification

Status: PENDING
Blocking: Azure Arc onboarding, resource creation

Actions:

  • Verify Azure subscription status: az account show
  • Check if subscription is enabled (currently documented as disabled)
  • Re-enable subscription in Azure Portal if needed
  • Verify subscription ID: fc08d829-4f14-413d-ab27-ce024425db0b
  • Verify tenant ID: fb97e99d-3e94-4686-bfde-4bf4062e05f3

Commands:

az account show
az account list

Reference: docs/temporary/DEPLOYMENT_STATUS.md


🟠 High Priority: Core Infrastructure

2. Proxmox Cluster Configuration

2.1 Create Cluster on ML110

Status: PENDING
Server: ML110 (192.168.1.206)

Actions:

  • SSH to ML110: ssh root@192.168.1.206
  • Set environment variables:
    export CLUSTER_NAME=hc-cluster
    export NODE_ROLE=create
    
  • Run cluster setup script: ./infrastructure/proxmox/cluster-setup.sh
  • Verify cluster creation: pvecm status
  • Verify node count: pvecm nodes

Script: infrastructure/proxmox/cluster-setup.sh
Reference: docs/deployment/bring-up-checklist.md Phase 2

2.2 Join R630 to Cluster

Status: PENDING
Server: R630 (192.168.1.49)

Actions:

  • SSH to R630: ssh root@192.168.1.49
  • Set environment variables:
    export CLUSTER_NAME=hc-cluster
    export NODE_ROLE=join
    export CLUSTER_NODE_IP=192.168.1.206
    export ROOT_PASSWORD=<ML110_root_password>
    
  • Run cluster setup script: ./infrastructure/proxmox/cluster-setup.sh
  • Verify cluster membership: pvecm status
  • Verify both nodes visible: pvecm nodes

Script: infrastructure/proxmox/cluster-setup.sh
Reference: docs/deployment/bring-up-checklist.md Phase 2

2.3 Verify Cluster Health

Status: PENDING

Actions:

  • Check cluster quorum: pvecm expected
  • Verify cluster services: systemctl status pve-cluster
  • Test cluster communication between nodes
  • Verify shared configuration: ls -la /etc/pve/nodes/

Commands:

pvecm status
pvecm nodes
pvecm expected

3. Storage Configuration

3.1 Configure NFS Storage on ML110

Status: PENDING
Server: ML110 (192.168.1.206)

Prerequisites:

  • NFS server available (Router server at 10.10.10.1 or configured location)
  • NFS export path: /mnt/storage (or as configured)

Actions:

  • SSH to ML110: ssh root@192.168.1.206
  • Set environment variables:
    export NFS_SERVER=10.10.10.1  # Adjust if different
    export NFS_PATH=/mnt/storage  # Adjust if different
    export STORAGE_NAME=router-storage
    export CONTENT_TYPES=images,iso,vztmpl,backup
    
  • Run NFS storage script: ./infrastructure/proxmox/nfs-storage.sh
  • Verify storage: pvesm status
  • Test storage access

Script: infrastructure/proxmox/nfs-storage.sh
Alternative: infrastructure/storage/configure-proxmox-storage.sh
Reference: docs/deployment/bring-up-checklist.md Phase 5

3.2 Configure NFS Storage on R630

Status: PENDING
Server: R630 (192.168.1.49)

Actions:

  • SSH to R630: ssh root@192.168.1.49
  • Set environment variables (same as ML110)
  • Run NFS storage script: ./infrastructure/proxmox/nfs-storage.sh
  • Verify storage: pvesm status
  • Verify shared storage accessible from both nodes

Script: infrastructure/proxmox/nfs-storage.sh

3.3 Verify Shared Storage

Status: PENDING

Actions:

  • Verify storage visible on both nodes: pvesm status
  • Test storage read/write from both nodes
  • Verify storage content types configured correctly
  • Document storage configuration

Commands:

pvesm status
pvesm list

4. Network/VLAN Configuration

4.1 Configure VLAN Bridges on ML110

Status: PENDING
Server: ML110 (192.168.1.206)

Required VLANs:

  • VLAN 10: Management
  • VLAN 20: Infrastructure
  • VLAN 30: Services
  • VLAN 40: Monitoring
  • VLAN 50: CI/CD
  • VLAN 60: Development
  • VLAN 99: External/Cloudflare

Actions:

  • SSH to ML110: ssh root@192.168.1.206
  • Review network topology: docs/architecture/network-topology.md
  • Run VLAN configuration script: ./infrastructure/network/configure-proxmox-vlans.sh
  • Verify bridges created: ip addr show or Proxmox web UI
  • Test VLAN connectivity

Script: infrastructure/network/configure-proxmox-vlans.sh
Alternative: infrastructure/proxmox/configure-proxmox-vlans.sh
Reference: docs/deployment/bring-up-checklist.md Phase 4

4.2 Configure VLAN Bridges on R630

Status: PENDING
Server: R630 (192.168.1.49)

Actions:

  • SSH to R630: ssh root@192.168.1.49
  • Run VLAN configuration script: ./infrastructure/network/configure-proxmox-vlans.sh
  • Verify bridges created: ip addr show or Proxmox web UI
  • Verify VLAN configuration matches ML110
  • Test VLAN connectivity

Script: infrastructure/network/configure-proxmox-vlans.sh

4.3 Verify Network Configuration

Status: PENDING

Actions:

  • Verify all VLAN bridges on both nodes
  • Test VLAN isolation
  • Test inter-VLAN routing (if applicable)
  • Document network configuration

Commands:

ip addr show
cat /etc/network/interfaces

5. Azure Arc Onboarding

5.1 Create Azure Resource Group

Status: PENDING
Blockers: Azure subscription must be enabled

Actions:

  • Load environment variables from .env
  • Verify Azure CLI authenticated: az account show
  • Set subscription: az account set --subscription "$AZURE_SUBSCRIPTION_ID"
  • Create resource group:
    az group create \
      --name "$AZURE_RESOURCE_GROUP" \
      --location "$AZURE_LOCATION"
    
  • Verify resource group: az group show --name "$AZURE_RESOURCE_GROUP"

Reference: docs/temporary/NEXT_STEPS.md Section 2

5.2 Onboard ML110 to Azure Arc

Status: PENDING
Server: ML110 (192.168.1.206)

Actions:

  • SSH to ML110: ssh root@192.168.1.206
  • Set environment variables:
    export RESOURCE_GROUP=HC-Stack  # or from .env
    export TENANT_ID=<tenant_id>
    export SUBSCRIPTION_ID=<subscription_id>
    export LOCATION=eastus  # or from .env
    export TAGS="type=proxmox,host=ml110"
    
  • Run onboarding script: ./scripts/azure-arc/onboard-proxmox-hosts.sh
  • Verify agent installed: azcmagent show
  • Verify connection: Check Azure Portal

Script: scripts/azure-arc/onboard-proxmox-hosts.sh
Reference: docs/deployment/bring-up-checklist.md Phase 6

5.3 Onboard R630 to Azure Arc

Status: PENDING
Server: R630 (192.168.1.49)

Actions:

  • SSH to R630: ssh root@192.168.1.49
  • Set environment variables (same as ML110, change TAGS):
    export TAGS="type=proxmox,host=r630"
    
  • Run onboarding script: ./scripts/azure-arc/onboard-proxmox-hosts.sh
  • Verify agent installed: azcmagent show
  • Verify connection: Check Azure Portal

Script: scripts/azure-arc/onboard-proxmox-hosts.sh

5.4 Verify Azure Arc Integration

Status: PENDING

Actions:

  • Verify both servers in Azure Portal: Azure Arc → Servers
  • Check server status (should be "Connected")
  • Verify tags applied correctly
  • Test Azure Policy assignment (if configured)
  • Verify Azure Monitor integration (if configured)

Reference: docs/deployment/azure-arc-onboarding.md


6. Cloudflare Configuration

6.1 Configure Cloudflare Credentials

Status: PENDING

Actions:

  • Create Cloudflare API token: https://dash.cloudflare.com/profile/api-tokens
  • Add to .env file:
    CLOUDFLARE_API_TOKEN=<your_token>
    CLOUDFLARE_ACCOUNT_EMAIL=<your_email>
    
  • Verify credentials not committed to git (check .gitignore)
  • Test Cloudflare API access (if script available)

Reference: docs/temporary/DEPLOYMENT_STATUS.md Section "Cloudflare Configuration Pending"


🟡 Medium Priority: Service Deployment

7. VM Template Creation

7.1 Verify/Create Ubuntu 24.04 Template

Status: PENDING
Note: VM 9000 exists on ML110 but may need configuration

Actions:

  • Check existing template VM 9000 on ML110
  • Verify template configuration:
    • Cloud-init enabled
    • QEMU agent enabled
    • Proper disk size
    • Network configuration
  • If template needs creation:
    • Upload Ubuntu 24.04 ISO to Proxmox storage
    • Create VM from ISO
    • Install Ubuntu 24.04
    • Install QEMU guest agent
    • Install Azure Arc agent (optional, for template)
    • Configure cloud-init
    • Convert to template
  • Verify template accessible from both nodes (if clustered)

Scripts:

  • scripts/vm-management/create/create-proxmox-template.sh
  • scripts/vm-management/create/create-template-via-api.sh

Reference: docs/operations/proxmox-ubuntu-images.md


8. Service VM Deployment

8.1 Deploy Cloudflare Tunnel VM

Status: PENDING

VM Specifications:

  • VM ID: 100 (or next available)
  • Name: cloudflare-tunnel
  • IP: 192.168.1.60/24
  • Gateway: 192.168.1.254
  • VLAN: 99
  • CPU: 2 cores
  • RAM: 4GB
  • Disk: 40GB
  • Template: ubuntu-24.04-cloudinit

Actions:

  • Create VM from template (via Terraform or Proxmox API)
  • Configure network (VLAN 99)
  • Configure IP address (192.168.1.60/24)
  • Start VM
  • Verify VM accessible

Scripts:

  • Terraform: terraform/proxmox/
  • API: scripts/vm-management/create/create-vms-from-template.sh

Reference: docs/deployment/bring-up-checklist.md Phase 8

8.2 Deploy K3s Master VM

Status: PENDING

VM Specifications:

  • VM ID: 101 (or next available)
  • Name: k3s-master
  • IP: 192.168.1.188/24
  • Gateway: 192.168.1.254
  • VLAN: 30 (Services)
  • CPU: 4 cores
  • RAM: 8GB
  • Disk: 80GB
  • Template: ubuntu-24.04-cloudinit

Actions:

  • Create VM from template
  • Configure network (VLAN 30)
  • Configure IP address (192.168.1.188/24)
  • Start VM
  • Verify VM accessible

Reference: docs/deployment/bring-up-checklist.md Phase 8

8.3 Deploy Git Server VM

Status: PENDING

VM Specifications:

  • VM ID: 102 (or next available)
  • Name: git-server
  • IP: 192.168.1.121/24
  • Gateway: 192.168.1.254
  • VLAN: 50 (CI/CD)
  • CPU: 4 cores
  • RAM: 8GB
  • Disk: 100GB
  • Template: ubuntu-24.04-cloudinit

Actions:

  • Create VM from template
  • Configure network (VLAN 50)
  • Configure IP address (192.168.1.121/24)
  • Start VM
  • Verify VM accessible

Reference: docs/deployment/bring-up-checklist.md Phase 8

8.4 Deploy Observability VM

Status: PENDING

VM Specifications:

  • VM ID: 103 (or next available)
  • Name: observability
  • IP: 192.168.1.82/24
  • Gateway: 192.168.1.254
  • VLAN: 40 (Monitoring)
  • CPU: 4 cores
  • RAM: 8GB
  • Disk: 200GB
  • Template: ubuntu-24.04-cloudinit

Actions:

  • Create VM from template
  • Configure network (VLAN 40)
  • Configure IP address (192.168.1.82/24)
  • Start VM
  • Verify VM accessible

Reference: docs/deployment/bring-up-checklist.md Phase 8


9. OS Installation on VMs

9.1 Install Ubuntu 24.04 on All VMs

Status: PENDING
Note: This requires manual console access

Actions (for each VM):

  • Access Proxmox Web UI: https://192.168.1.206:8006 or https://192.168.1.49:8006
  • For each VM (100, 101, 102, 103):
    • Click on VM → Console
    • Ubuntu installer should boot from ISO/cloud-init
    • Complete installation with appropriate IP configuration:
      • VM 100 (cloudflare-tunnel): IP: 192.168.1.60/24, Gateway: 192.168.1.254
      • VM 101 (k3s-master): IP: 192.168.1.188/24, Gateway: 192.168.1.254
      • VM 102 (git-server): IP: 192.168.1.121/24, Gateway: 192.168.1.254
      • VM 103 (observability): IP: 192.168.1.82/24, Gateway: 192.168.1.254
    • Create user account (remember for SSH)
    • Verify SSH access

Reference: docs/temporary/COMPLETE_STATUS.md Step 1

9.2 Verify OS Installation

Status: PENDING

Actions:

  • Run VM status check: ./scripts/check-vm-status.sh (if available)
  • Verify network connectivity from each VM
  • Verify SSH access to each VM
  • Verify Ubuntu 24.04 installed correctly
  • Verify QEMU guest agent working

Scripts:

  • scripts/check-vm-status.sh (if exists)
  • scripts/vm-management/monitor/check-vm-disk-sizes.sh

10. Service Configuration

10.1 Configure Cloudflare Tunnel

Status: PENDING
VM: cloudflare-tunnel (192.168.1.60)

Actions:

  • SSH to cloudflare-tunnel VM
  • Install cloudflared:
    curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/local/bin/cloudflared
    chmod +x /usr/local/bin/cloudflared
    
  • Authenticate: cloudflared tunnel login
  • Create tunnel: cloudflared tunnel create azure-stack-hci
  • Configure tunnel routes (see docs/deployment/cloudflare-integration.md)
  • Configure tunnel for:
    • Windows Admin Center (if applicable)
    • Proxmox UI
    • Dashboards
    • Git/CI services
  • Set up systemd service for cloudflared
  • Test external access

Script: scripts/setup-cloudflare-tunnel.sh (if available)
Reference: docs/deployment/cloudflare-integration.md

10.2 Deploy and Configure K3s

Status: PENDING
VM: k3s-master (192.168.1.188)

Actions:

  • SSH to k3s-master VM
  • Install K3s: curl -sfL https://get.k3s.io | sh -
  • Verify K3s running: kubectl get nodes
  • Get kubeconfig: sudo cat /etc/rancher/k3s/k3s.yaml
  • Configure kubectl access
  • Install required addons (if any)
  • Onboard to Azure Arc (if applicable):
    export RESOURCE_GROUP=HC-Stack
    export CLUSTER_NAME=proxmox-k3s-cluster
    ./infrastructure/kubernetes/arc-onboard-k8s.sh
    

Script: scripts/setup-k3s.sh (if available)
Reference: docs/deployment/bring-up-checklist.md Phase 8

10.3 Set Up Git Server

Status: PENDING
VM: git-server (192.168.1.121)

Actions:

  • SSH to git-server VM
  • Choose Git server (Gitea or GitLab CE)
  • Install Git server:
    • Gitea: ./infrastructure/gitops/gitea-deploy.sh
    • GitLab CE: ./infrastructure/gitops/gitlab-deploy.sh
  • Configure Git server:
    • Admin account
    • Repository creation
    • User access
  • Create initial repositories
  • Configure GitOps workflows

Scripts:

  • scripts/setup-git-server.sh (if available)
  • infrastructure/gitops/gitea-deploy.sh
  • infrastructure/gitops/gitlab-deploy.sh

Reference: docs/deployment/bring-up-checklist.md Phase 8

10.4 Deploy Observability Stack

Status: PENDING
VM: observability (192.168.1.82)

Actions:

  • SSH to observability VM
  • Deploy Prometheus:
    • Install Prometheus
    • Configure scrape targets
    • Set up retention policies
  • Deploy Grafana:
    • Install Grafana
    • Configure data sources (Prometheus)
    • Import dashboards
    • Configure authentication
  • Configure monitoring for:
    • Proxmox hosts
    • VMs
    • Kubernetes cluster
    • Network metrics
    • Storage metrics
  • Set up alerting rules

Script: scripts/setup-observability.sh (if available)
Reference: docs/deployment/bring-up-checklist.md Phase 8

10.5 Configure GitOps Workflows

Status: PENDING

Actions:

  • Create Git repository in Git server
  • Copy gitops/ directory to repository
  • Configure Flux or ArgoCD (if applicable)
  • Set up CI/CD pipelines
  • Configure automated deployments
  • Test GitOps workflow

Reference: docs/operations/runbooks/gitops-workflow.md


🟢 Low Priority: Optimization & Hardening

11. Security Hardening

11.1 Create RBAC Accounts for Proxmox

Status: PENDING

Actions:

  • Review RBAC guide: docs/security/proxmox-rbac.md
  • Create service accounts for automation
  • Create operator accounts with appropriate roles
  • Generate API tokens for service accounts
  • Document RBAC account usage
  • Update automation scripts to use API tokens instead of root
  • Test API token authentication
  • Remove or restrict root API access (if desired)

Reference: docs/security/proxmox-rbac.md

11.2 Review Firewall Rules

Status: PENDING

Actions:

  • Review firewall configuration on both Proxmox hosts
  • Verify only necessary ports are open
  • Configure firewall rules for cluster communication
  • Document firewall configuration
  • Test firewall rules

11.3 Configure Security Policies

Status: PENDING

Actions:

  • Review Azure Policy assignments
  • Configure security baselines
  • Enable Azure Defender (if applicable)
  • Configure update management
  • Review secret management
  • Perform security scan

Reference: docs/deployment/bring-up-checklist.md Phase 10


12. Monitoring Setup

12.1 Configure Monitoring Dashboards

Status: PENDING

Actions:

  • Configure Grafana dashboards for:
    • Proxmox hosts
    • VMs
    • Kubernetes cluster
    • Network performance
    • Storage performance
  • Set up Prometheus alerting rules
  • Configure alert notifications
  • Test alerting

12.2 Configure Azure Monitor

Status: PENDING

Actions:

  • Enable Log Analytics workspace
  • Configure data collection rules
  • Set up Azure Monitor alerts
  • Configure log queries
  • Test Azure Monitor integration

Reference: docs/deployment/bring-up-checklist.md Phase 10


13. Performance Tuning

Status: PENDING

Actions:

  • Review storage performance
  • Optimize VM resource allocation
  • Tune network settings
  • Optimize Proxmox cluster settings
  • Run performance benchmarks
  • Document performance metrics

14. Documentation Updates

Status: PENDING

Actions:

  • Update docs/temporary/COMPLETE_STATUS.md with actual status
  • Update docs/temporary/DEPLOYMENT_STATUS.md with current blockers
  • Update docs/temporary/NEXT_STEPS.md with completed items
  • Create runbooks for common operations
  • Document network topology
  • Document storage configuration
  • Create troubleshooting guides

Summary Checklist

Critical (Must Complete First)

  • Azure subscription verification/enablement
  • Proxmox cluster configuration
  • NFS/shared storage configuration
  • Network/VLAN configuration

High Priority (Core Infrastructure)

  • Azure Arc onboarding (both servers)
  • Cloudflare credentials configuration

Medium Priority (Service Deployment)

  • VM template creation/verification
  • Service VM deployment (4 VMs)
  • OS installation on VMs
  • Service configuration (Cloudflare, K3s, Git, Observability)

Low Priority (Optimization)

  • Security hardening (RBAC, firewalls)
  • Monitoring setup
  • Performance tuning
  • Documentation updates

Estimated Timeline

  • Week 1: Critical and High Priority items (Infrastructure foundation)
  • Week 2: Medium Priority items (Service deployment)
  • Week 3-4: Low Priority items (Optimization and hardening)

Total Estimated Time: 3-4 weeks for complete deployment


Quick Reference

Key Scripts

  • Cluster Setup: infrastructure/proxmox/cluster-setup.sh
  • NFS Storage: infrastructure/proxmox/nfs-storage.sh
  • VLAN Configuration: infrastructure/network/configure-proxmox-vlans.sh
  • Azure Arc: scripts/azure-arc/onboard-proxmox-hosts.sh
  • Health Check: scripts/health/check-proxmox-health.sh
  • Status Query: scripts/health/query-proxmox-status.sh

Key Documentation

  • Status Review: docs/PROXMOX_STATUS_REVIEW.md
  • Bring-Up Checklist: docs/deployment/bring-up-checklist.md
  • Azure Arc Onboarding: docs/deployment/azure-arc-onboarding.md
  • Cloudflare Integration: docs/deployment/cloudflare-integration.md
  • Proxmox RBAC: docs/security/proxmox-rbac.md

Server Information

  • ML110: 192.168.1.206:8006
  • R630: 192.168.1.49:8006
  • Cluster Name: hc-cluster (to be created)
  • Resource Group: HC-Stack (to be created)

Last Updated: 2025-11-27
Next Review: After completing Phase 1 (Infrastructure Foundation)