Files
loc_az_hci/docs/deployment/bring-up-checklist.md
defiQUG c39465c2bd
Some checks failed
Test / test (push) Has been cancelled
Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-08 09:04:46 -08:00

10 KiB
Raw Permalink Blame History

Bring-Up Checklist

Day-One Installation Guide

This checklist provides a step-by-step guide for bringing up the complete Azure Stack HCI environment on installation day.

Pre-Installation Preparation

Hardware Verification

  • Router server chassis received and inspected
  • All PCIe cards received (NICs, HBAs, QAT)
  • Memory modules received (8× 4GB DDR4 ECC RDIMM)
  • Storage SSD received (256GB)
  • All cables received (Ethernet, Mini-SAS HD)
  • Storage shelves received and inspected
  • Proxmox hosts (ML110, R630) verified operational

Documentation Review

  • Complete architecture reviewed
  • PCIe slot allocation map reviewed
  • Network topology and VLAN schema reviewed
  • Driver matrix reviewed
  • All configuration files prepared

Environment Configuration

  • Copy .env.example to .env
  • Configure Azure credentials in .env:
    • AZURE_SUBSCRIPTION_ID
    • AZURE_TENANT_ID
    • AZURE_RESOURCE_GROUP
    • AZURE_LOCATION
  • Configure Cloudflare credentials in .env:
    • CLOUDFLARE_API_TOKEN
    • CLOUDFLARE_ACCOUNT_EMAIL
  • Configure Proxmox credentials in .env:
    • PVE_ROOT_PASS (shared root password for all instances)
    • PROXMOX_ML110_URL
    • PROXMOX_R630_URL
    • Note: Username root@pam is implied and should not be stored
    • For production: Create RBAC accounts and use API tokens instead of root
  • Verify .env file is in .gitignore (should not be committed)

Phase 1: Hardware Installation

Router Server Assembly

  • Install CPU and memory (8× 4GB DDR4 ECC RDIMM)
  • Install boot SSD (256GB)
  • Install Intel QAT 8970 in x16_1 slot
  • Install Intel X550-T2 in x8_1 slot
  • Install LSI 9207-8e #1 in x8_2 slot
  • Install LSI 9207-8e #2 in x8_3 slot
  • Install Intel i350-T4 in x4_1 slot
  • Install Intel i350-T8 in x4_2 slot
  • Install Intel i225 Quad-Port in x4_3 slot
  • Verify all cards seated properly
  • Connect power and verify POST

BIOS/UEFI Configuration

  • Enter BIOS/UEFI setup
  • Verify all PCIe cards detected
  • Configure boot order (SSD first)
  • Enable virtualization (Intel VT-x, VT-d)
  • Configure memory settings (ECC enabled)
  • Set date/time
  • Save and exit BIOS

Storage Shelf Cabling

  • Connect SFF-8644 cables from LSI HBA #1 to shelves 1-2
  • Connect SFF-8644 cables from LSI HBA #2 to shelves 3-4
  • Power on storage shelves
  • Verify shelf power and status LEDs
  • Label all cables

Network Cabling

  • Connect 4× Cat6 cables from i350-T4 to Spectrum modems/ONTs (WAN1-4)
  • Connect 2× Cat6a cables to X550-T2 (reserved for future)
  • Connect 4× Cat6 cables from i225 Quad to ML110, R630, and key services
  • Connect 8× Cat6 cables from i350-T8 to remaining servers/appliances
  • Label all cables at both ends
  • Document cable mapping

Phase 2: Operating System Installation

Router Server OS

Option A: Windows Server Core

  • Boot from Windows Server installation media
  • Install Windows Server Core
  • Configure initial administrator password
  • Install Windows Updates
  • Configure static IP on management interface
  • Enable Remote Desktop (if needed)
  • Install Windows Admin Center

Option B: Proxmox VE

  • Boot from Proxmox VE installation media
  • Install Proxmox VE
  • Configure initial root password
  • Configure network (management interface)
  • Update Proxmox packages
  • Verify Proxmox web interface accessible

Proxmox Hosts (ML110, R630)

  • Verify Proxmox VE installed and updated
  • Configure network interfaces
  • Verify cluster status (if clustered)
  • Test VM creation

Phase 3: Driver Installation

Router Server Drivers

  • Install Intel PROSet drivers for all NICs
    • i350-T4 (WAN)
    • i350-T8 (LAN 1GbE)
    • X550-T2 (10GbE)
    • i225 Quad-Port (LAN 2.5GbE)
  • Verify all NICs detected and functional
  • Install LSI mpt3sas driver
  • Flash LSI HBAs to IT mode
  • Verify storage shelves detected
  • Install Intel QAT drivers (qatlib)
  • Install OpenSSL QAT engine
  • Verify QAT acceleration working

Driver Verification

  • Run driver verification script
  • Test all network ports
  • Test storage connectivity
  • Test QAT acceleration
  • Document any issues

Phase 4: Network Configuration

OpenWrt VM Setup

  • Create OpenWrt VM on Router server
  • Configure OpenWrt network interfaces
  • Configure VLANs (10, 20, 30, 40, 50, 60, 99)
  • Configure mwan3 for 4× Spectrum WAN
  • Configure firewall zones
  • Test multi-WAN failover
  • Configure inter-VLAN routing

Proxmox VLAN Configuration

  • Configure VLAN bridges on ML110
  • Configure VLAN bridges on R630
  • Test VLAN connectivity
  • Verify VM network isolation

IP Address Configuration

  • Configure IP addresses per VLAN schema
  • Configure DNS settings
  • Test network connectivity
  • Verify routing between VLANs

Phase 5: Storage Configuration

Storage Spaces Direct Setup

  • Verify all shelves detected
  • Create Storage Spaces Direct pools
  • Create volumes for VMs
  • Create volumes for applications
  • Configure storage exports (NFS/iSCSI)

Proxmox Storage Mounts

  • Configure NFS mounts on ML110
  • Configure NFS mounts on R630
  • Test storage connectivity
  • Verify VM storage access

Phase 6: Azure Arc Onboarding

Arc Agent Installation

  • Install Azure Arc agent on Router server (if Linux)
  • Install Azure Arc agent on ML110
  • Install Azure Arc agent on R630
  • Install Azure Arc agent on Windows management VM (if applicable)

Arc Onboarding

  • Load environment variables from .env: export $(cat .env | grep -v '^#' | xargs)
  • Configure Azure subscription and resource group (from .env)
  • Onboard Router server to Azure Arc
  • Onboard ML110 to Azure Arc
  • Onboard R630 to Azure Arc
  • Verify all resources visible in Azure Portal

Arc Governance

  • Configure Azure Policy
  • Enable Azure Monitor
  • Enable Azure Defender
  • Configure Update Management
  • Test policy enforcement

Phase 7: Cloudflare Integration

Cloudflare Tunnel Setup

  • Create Cloudflare account (if not exists)
  • Create Zero Trust organization
  • Configure Cloudflare API token in .env file
  • Install cloudflared on Ubuntu VM
  • Authenticate cloudflared (interactive or using API token from .env)
  • Configure Tunnel for WAC
  • Configure Tunnel for Proxmox UI
  • Configure Tunnel for dashboards
  • Configure Tunnel for Git/CI services

Zero Trust Policies

  • Configure SSO (Azure AD/Okta)
  • Configure MFA requirements
  • Configure device posture checks
  • Configure access policies
  • Test external access

WAF Configuration

  • Configure WAF rules
  • Test WAF protection
  • Verify no inbound ports required

Phase 8: Service VM Deployment

Ubuntu VM Templates

  • Create Ubuntu LTS template on Proxmox
  • Install Azure Arc agent in template
  • Configure base packages
  • Create VM snapshots

Service VM Deployment

  • Deploy Cloudflare Tunnel VM (VLAN 99)
  • Deploy Reverse Proxy VM (VLAN 30/99)
  • Deploy Observability VM (VLAN 40)
  • Deploy CI/CD VM (VLAN 50)
  • Install Azure Arc agents on all VMs

Service Configuration

  • Configure Cloudflare Tunnel
  • Configure reverse proxy (NGINX/Traefik)
  • Configure observability stack (Prometheus/Grafana)
  • Configure CI/CD (GitLab Runner/Jenkins)

Phase 9: Verification and Testing

Network Testing

  • Test all WAN connections
  • Test multi-WAN failover
  • Test VLAN isolation
  • Test inter-VLAN routing
  • Test firewall rules

Storage Testing

  • Test storage read/write performance
  • Test storage redundancy
  • Test VM storage access
  • Test storage exports

Service Testing

  • Test Cloudflare Tunnel access
  • Test Azure Arc connectivity
  • Test observability dashboards
  • Test CI/CD pipelines

Performance Testing

  • Test QAT acceleration
  • Test network throughput
  • Test storage I/O
  • Document performance metrics

Phase 10: Documentation and Handoff

Documentation

  • Document all IP addresses
  • Verify .env file contains all credentials (stored securely, not in version control)
  • Document cable mappings
  • Document VLAN configurations
  • Document storage allocations
  • Create network diagrams
  • Create runbooks
  • Verify .env is in .gitignore and not committed to repository

Monitoring Setup

  • Configure Grafana dashboards
  • Configure Prometheus alerts
  • Configure Azure Monitor alerts
  • Test alerting

Security Hardening

  • Review firewall rules
  • Review access policies
  • Create RBAC accounts for Proxmox (replace root usage)
    • Create service accounts for automation
    • Create operator accounts with appropriate roles
    • Generate API tokens for service accounts
    • Document RBAC account usage (see docs/security/proxmox-rbac.md)
  • Review secret management
  • Perform security scan

Post-Installation Tasks

Ongoing Maintenance

  • Schedule regular backups
  • Schedule firmware updates
  • Schedule driver updates
  • Schedule OS updates
  • Schedule security patches

Monitoring

  • Review monitoring dashboards daily
  • Review Azure Arc status
  • Review Cloudflare Tunnel status
  • Review storage health
  • Review network performance

Troubleshooting Reference

Common Issues

Issue: NIC not detected

  • Check PCIe slot connection
  • Check BIOS settings
  • Update driver

Issue: Storage shelves not detected

  • Check cable connections
  • Check HBA firmware
  • Check shelf power

Issue: Azure Arc not connecting

  • Check network connectivity
  • Check proxy settings
  • Check Azure credentials

Issue: Cloudflare Tunnel not working

  • Check cloudflared service
  • Check Tunnel configuration
  • Check Zero Trust policies