Some checks failed
Test / test (push) Has been cancelled
Co-authored-by: Cursor <cursoragent@cursor.com>
10 KiB
10 KiB
Bring-Up Checklist
Day-One Installation Guide
This checklist provides a step-by-step guide for bringing up the complete Azure Stack HCI environment on installation day.
Pre-Installation Preparation
Hardware Verification
- Router server chassis received and inspected
- All PCIe cards received (NICs, HBAs, QAT)
- Memory modules received (8× 4GB DDR4 ECC RDIMM)
- Storage SSD received (256GB)
- All cables received (Ethernet, Mini-SAS HD)
- Storage shelves received and inspected
- Proxmox hosts (ML110, R630) verified operational
Documentation Review
- Complete architecture reviewed
- PCIe slot allocation map reviewed
- Network topology and VLAN schema reviewed
- Driver matrix reviewed
- All configuration files prepared
Environment Configuration
- Copy
.env.exampleto.env - Configure Azure credentials in
.env:AZURE_SUBSCRIPTION_IDAZURE_TENANT_IDAZURE_RESOURCE_GROUPAZURE_LOCATION
- Configure Cloudflare credentials in
.env:CLOUDFLARE_API_TOKENCLOUDFLARE_ACCOUNT_EMAIL
- Configure Proxmox credentials in
.env:PVE_ROOT_PASS(shared root password for all instances)PROXMOX_ML110_URLPROXMOX_R630_URL- Note: Username
root@pamis implied and should not be stored - For production: Create RBAC accounts and use API tokens instead of root
- Verify
.envfile is in.gitignore(should not be committed)
Phase 1: Hardware Installation
Router Server Assembly
- Install CPU and memory (8× 4GB DDR4 ECC RDIMM)
- Install boot SSD (256GB)
- Install Intel QAT 8970 in x16_1 slot
- Install Intel X550-T2 in x8_1 slot
- Install LSI 9207-8e #1 in x8_2 slot
- Install LSI 9207-8e #2 in x8_3 slot
- Install Intel i350-T4 in x4_1 slot
- Install Intel i350-T8 in x4_2 slot
- Install Intel i225 Quad-Port in x4_3 slot
- Verify all cards seated properly
- Connect power and verify POST
BIOS/UEFI Configuration
- Enter BIOS/UEFI setup
- Verify all PCIe cards detected
- Configure boot order (SSD first)
- Enable virtualization (Intel VT-x, VT-d)
- Configure memory settings (ECC enabled)
- Set date/time
- Save and exit BIOS
Storage Shelf Cabling
- Connect SFF-8644 cables from LSI HBA #1 to shelves 1-2
- Connect SFF-8644 cables from LSI HBA #2 to shelves 3-4
- Power on storage shelves
- Verify shelf power and status LEDs
- Label all cables
Network Cabling
- Connect 4× Cat6 cables from i350-T4 to Spectrum modems/ONTs (WAN1-4)
- Connect 2× Cat6a cables to X550-T2 (reserved for future)
- Connect 4× Cat6 cables from i225 Quad to ML110, R630, and key services
- Connect 8× Cat6 cables from i350-T8 to remaining servers/appliances
- Label all cables at both ends
- Document cable mapping
Phase 2: Operating System Installation
Router Server OS
Option A: Windows Server Core
- Boot from Windows Server installation media
- Install Windows Server Core
- Configure initial administrator password
- Install Windows Updates
- Configure static IP on management interface
- Enable Remote Desktop (if needed)
- Install Windows Admin Center
Option B: Proxmox VE
- Boot from Proxmox VE installation media
- Install Proxmox VE
- Configure initial root password
- Configure network (management interface)
- Update Proxmox packages
- Verify Proxmox web interface accessible
Proxmox Hosts (ML110, R630)
- Verify Proxmox VE installed and updated
- Configure network interfaces
- Verify cluster status (if clustered)
- Test VM creation
Phase 3: Driver Installation
Router Server Drivers
- Install Intel PROSet drivers for all NICs
- i350-T4 (WAN)
- i350-T8 (LAN 1GbE)
- X550-T2 (10GbE)
- i225 Quad-Port (LAN 2.5GbE)
- Verify all NICs detected and functional
- Install LSI mpt3sas driver
- Flash LSI HBAs to IT mode
- Verify storage shelves detected
- Install Intel QAT drivers (qatlib)
- Install OpenSSL QAT engine
- Verify QAT acceleration working
Driver Verification
- Run driver verification script
- Test all network ports
- Test storage connectivity
- Test QAT acceleration
- Document any issues
Phase 4: Network Configuration
OpenWrt VM Setup
- Create OpenWrt VM on Router server
- Configure OpenWrt network interfaces
- Configure VLANs (10, 20, 30, 40, 50, 60, 99)
- Configure mwan3 for 4× Spectrum WAN
- Configure firewall zones
- Test multi-WAN failover
- Configure inter-VLAN routing
Proxmox VLAN Configuration
- Configure VLAN bridges on ML110
- Configure VLAN bridges on R630
- Test VLAN connectivity
- Verify VM network isolation
IP Address Configuration
- Configure IP addresses per VLAN schema
- Configure DNS settings
- Test network connectivity
- Verify routing between VLANs
Phase 5: Storage Configuration
Storage Spaces Direct Setup
- Verify all shelves detected
- Create Storage Spaces Direct pools
- Create volumes for VMs
- Create volumes for applications
- Configure storage exports (NFS/iSCSI)
Proxmox Storage Mounts
- Configure NFS mounts on ML110
- Configure NFS mounts on R630
- Test storage connectivity
- Verify VM storage access
Phase 6: Azure Arc Onboarding
Arc Agent Installation
- Install Azure Arc agent on Router server (if Linux)
- Install Azure Arc agent on ML110
- Install Azure Arc agent on R630
- Install Azure Arc agent on Windows management VM (if applicable)
Arc Onboarding
- Load environment variables from
.env:export $(cat .env | grep -v '^#' | xargs) - Configure Azure subscription and resource group (from
.env) - Onboard Router server to Azure Arc
- Onboard ML110 to Azure Arc
- Onboard R630 to Azure Arc
- Verify all resources visible in Azure Portal
Arc Governance
- Configure Azure Policy
- Enable Azure Monitor
- Enable Azure Defender
- Configure Update Management
- Test policy enforcement
Phase 7: Cloudflare Integration
Cloudflare Tunnel Setup
- Create Cloudflare account (if not exists)
- Create Zero Trust organization
- Configure Cloudflare API token in
.envfile - Install cloudflared on Ubuntu VM
- Authenticate cloudflared (interactive or using API token from
.env) - Configure Tunnel for WAC
- Configure Tunnel for Proxmox UI
- Configure Tunnel for dashboards
- Configure Tunnel for Git/CI services
Zero Trust Policies
- Configure SSO (Azure AD/Okta)
- Configure MFA requirements
- Configure device posture checks
- Configure access policies
- Test external access
WAF Configuration
- Configure WAF rules
- Test WAF protection
- Verify no inbound ports required
Phase 8: Service VM Deployment
Ubuntu VM Templates
- Create Ubuntu LTS template on Proxmox
- Install Azure Arc agent in template
- Configure base packages
- Create VM snapshots
Service VM Deployment
- Deploy Cloudflare Tunnel VM (VLAN 99)
- Deploy Reverse Proxy VM (VLAN 30/99)
- Deploy Observability VM (VLAN 40)
- Deploy CI/CD VM (VLAN 50)
- Install Azure Arc agents on all VMs
Service Configuration
- Configure Cloudflare Tunnel
- Configure reverse proxy (NGINX/Traefik)
- Configure observability stack (Prometheus/Grafana)
- Configure CI/CD (GitLab Runner/Jenkins)
Phase 9: Verification and Testing
Network Testing
- Test all WAN connections
- Test multi-WAN failover
- Test VLAN isolation
- Test inter-VLAN routing
- Test firewall rules
Storage Testing
- Test storage read/write performance
- Test storage redundancy
- Test VM storage access
- Test storage exports
Service Testing
- Test Cloudflare Tunnel access
- Test Azure Arc connectivity
- Test observability dashboards
- Test CI/CD pipelines
Performance Testing
- Test QAT acceleration
- Test network throughput
- Test storage I/O
- Document performance metrics
Phase 10: Documentation and Handoff
Documentation
- Document all IP addresses
- Verify
.envfile contains all credentials (stored securely, not in version control) - Document cable mappings
- Document VLAN configurations
- Document storage allocations
- Create network diagrams
- Create runbooks
- Verify
.envis in.gitignoreand not committed to repository
Monitoring Setup
- Configure Grafana dashboards
- Configure Prometheus alerts
- Configure Azure Monitor alerts
- Test alerting
Security Hardening
- Review firewall rules
- Review access policies
- Create RBAC accounts for Proxmox (replace root usage)
- Create service accounts for automation
- Create operator accounts with appropriate roles
- Generate API tokens for service accounts
- Document RBAC account usage (see docs/security/proxmox-rbac.md)
- Review secret management
- Perform security scan
Post-Installation Tasks
Ongoing Maintenance
- Schedule regular backups
- Schedule firmware updates
- Schedule driver updates
- Schedule OS updates
- Schedule security patches
Monitoring
- Review monitoring dashboards daily
- Review Azure Arc status
- Review Cloudflare Tunnel status
- Review storage health
- Review network performance
Troubleshooting Reference
Common Issues
Issue: NIC not detected
- Check PCIe slot connection
- Check BIOS settings
- Update driver
Issue: Storage shelves not detected
- Check cable connections
- Check HBA firmware
- Check shelf power
Issue: Azure Arc not connecting
- Check network connectivity
- Check proxy settings
- Check Azure credentials
Issue: Cloudflare Tunnel not working
- Check cloudflared service
- Check Tunnel configuration
- Check Zero Trust policies
Related Documentation
- Complete Architecture - Full architecture overview
- Hardware BOM - Complete bill of materials
- PCIe Allocation - Slot allocation map
- Network Topology - VLAN/IP schema
- Driver Matrix - Driver versions