Some checks failed
Test / test (push) Has been cancelled
Co-authored-by: Cursor <cursoragent@cursor.com>
14 KiB
14 KiB
Complete Azure Stack HCI Architecture
Overview
This document describes the complete architecture for a local Azure Stack HCI environment with Cloudflare Zero Trust, Azure Arc governance, Proxmox VE virtualization, and Ubuntu service VMs. The system transforms your environment into a local Azure "cloud" using Azure Stack HCI principles.
Core Objectives
- Local Azure cloud: Govern on-prem servers with Azure Arc and adopt Azure operations practices
- Hyper-converged stack: Proxmox VE for virtualization, Ubuntu VMs for services, centralized storage via external shelves
- Secure edge: Cloudflare Zero Trust/Tunnel to expose services without inbound ports
- High-availability networking: 4× 1Gbps Spectrum WAN, multi-WAN failover/policy routing, QAT-accelerated VPN/TLS offload
- Unified ops: CI/CD, monitoring, and consistent configuration across all nodes
Architecture Diagram
┌─────────────────────────────────────────────────────────────────┐
│ Azure Portal │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Azure Arc │ │ Azure Policy │ │ Azure Monitor │ │
│ │ Servers │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Arc K8s │ │ GitOps │ │ Defender │ │
│ │ │ │ (Flux) │ │ for Cloud │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
│ Azure Arc Connection
│
┌─────────────────────────────────────────────────────────────────┐
│ On-Premises Infrastructure │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Router/Switch/Storage Controller Server │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Windows Server│ │ OpenWrt VM │ │ Storage S2D │ │ │
│ │ │ Core + Hyper-V│ │ (mwan3) │ │ Pools │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ Azure Arc │ │ 4× WAN │ │ 4× Shelves │ │ │
│ │ │ Agent │ │ (Spectrum) │ │ (via LSI HBAs)│ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │ │ │ │
│ └─────────┼──────────────────┼──────────────────┼──────────┘ │
│ │ │ │ │
│ ┌─────────▼──────────────────▼──────────────────▼──────────┐ │
│ │ Proxmox VE Hosts (Existing) │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ HPE ML110 │ │ Dell R630 │ │ │
│ │ │ Gen9 │ │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ Azure Arc │ │ Azure Arc │ │ │
│ │ │ Agent │ │ Agent │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Ubuntu Service VMs │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Cloudflare │ │ Reverse │ │ Observability │ │ │
│ │ │ Tunnel VM │ │ Proxy VM │ │ VM │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ Azure Arc │ │ Azure Arc │ │ Azure Arc │ │ │
│ │ │ Agent │ │ Agent │ │ Agent │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ ┌──────────────┐ │ │
│ │ │ CI/CD VM │ │ │
│ │ │ │ │ │
│ │ │ Azure Arc │ │ │
│ │ │ Agent │ │ │
│ │ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
│ Cloudflare Tunnel (Outbound Only)
│
┌─────────────────────────────────────────────────────────────────┐
│ Cloudflare Zero Trust │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Zero Trust │ │ WAF │ │ Tunnel │ │
│ │ Policies │ │ Rules │ │ Endpoints │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Physical Infrastructure
Router/Switch/Storage Controller Server (New)
- Chassis: Entry-level Supermicro/Dell mini-server
- CPU: Intel Xeon E-2100 or similar (6-8 cores), PCIe 3.0 support
- Memory: 8× 4GB DDR4 ECC RDIMM = 32GB (reused from R630)
- Storage: 256GB SSD (OS, configs), optional mirrored boot
- PCIe Cards:
- Intel i350-T4: 4× 1GbE (WAN - Spectrum connections)
- Intel X550-T2: 2× 10GbE RJ45 (future uplinks or high-perf server links)
- Intel i225 Quad-Port: 4× 2.5GbE (LAN to key servers)
- Intel i350-T8: 8× 1GbE (LAN to remaining servers)
- Intel QAT 8970: Crypto acceleration (TLS/IPsec/compression)
- 2× LSI 9207-8e: SAS HBAs for 4 external shelves
Proxmox VE Hosts (Existing)
-
HPE ProLiant ML110 Gen9:
- CPU: Intel Xeon E5-series
- Memory: Remaining DDR4 ECC RDIMM after Router allocation
- Storage: Local SSDs/HDDs for OS and VM disks
- Networking: 1GbE onboard NICs; optional Intel add-in NICs
-
Dell PowerEdge R630:
- CPU: Intel Xeon E5 v3/v4 dual-socket
- Memory: Remaining DDR4 ECC RDIMM (32GB spare pool noted)
- Storage: PERC or HBA with SSDs
- Networking: 1/10GbE depending on NICs installed
Storage Shelves
- Quantity: 4 external SAS JBOD shelves
- Connectivity: Each shelf via SFF-8644 to LSI HBAs; dual-pathing optional
- Role: Backing storage for VMs, Kubernetes PVCs, and NAS services
WAN Connectivity
- Providers: 4× Spectrum Internet 1Gbps
- Termination: i350-T4 on Router server
- Routing: Multi-WAN policy routing and failover; per-ISP health checks
Software Stack
Router Server
- Base OS: Windows Server Core with Hyper-V (for HCI integration) OR Proxmox VE (uniform virtualization)
- Network Services:
- OpenWrt VM: Multi-WAN (mwan3), firewall, VLANs, policy routing
- Intel PROSet drivers for all NICs
- QAT drivers/qatlib + OpenSSL QAT engine
- Storage Services:
- LSI HBAs: IT mode, mpt3sas driver, attach shelves
- Storage Spaces Direct: Pools/volumes for VM and app storage
- Optional ZFS on Linux (VM or host) for NAS
- Management:
- Windows Admin Center (WAC): Cluster lifecycle, health
- Azure Arc agent: Connected Machine agent on Linux VMs/hosts
Proxmox VE (ML110, R630)
- Hypervisor: Latest Proxmox VE
- Guests: Ubuntu LTS for app services, Cloudflare Tunnel endpoints, monitoring, logging, Arc agents
- Storage: Connect to shelves via exported protocols (NFS/iSCSI) or pass-through HBAs/volumes
- Networking: Tag VLANs per VM bridge; allocate vNICs tied to VLAN schema
Ubuntu Service VMs
- Cloudflare Tunnel (Zero Trust):
cloudflaredto publish internal apps (WAC, dashboards, SSH, selected services) without inbound ports - Azure Arc agent: Connected Machine agent to enroll Linux VMs and hosts for policy/monitor/defender/update
- Observability: Prometheus, Grafana, Loki/OpenSearch for logs; syslog from Router and Proxmox nodes
- Reverse proxy: NGINX/Traefik with mTLS, integrated behind Cloudflare
- Automation/CI: GitLab Runner/Jenkins agents for local CI/CD pipelines
Key Integrations
Cloudflare
- Zero Trust/Tunnel: Use
cloudflaredon Ubuntu VM in VLAN 99 to expose:- Management portals: WAC, Proxmox UI, dashboards (restrict via SSO/MFA)
- Developer services: Git, CI, internal APIs
- Policies: SSO (Azure AD/Okta), device posture checks, least privilege
- WAF and routing: Protect public ingress; no inbound ports on Spectrum WAN CPE
Azure Arc
- Targets: Ubuntu service VMs, optionally Proxmox hosts (as Linux), and Windows management VM
- Process: Install Connected Machine agent; validate Arc connection; enable Azure Policy, Monitor, Defender, and Update Manager
- Proxy considerations: If outbound constraints apply, onboarding via proxy methods is documented
High-Level Data Flows
- North-south: 4× Spectrum WAN → Router (OpenWrt VM) → Cloudflare Tunnel outbound only for published services
- East-west: VLAN-segmented traffic across Proxmox nodes, Ubuntu VMs, storage shelves; QAT accelerates crypto within Router server for site-to-site VPN if needed
- Storage: Router server's HBAs → shelves; exports (NFS/SMB/iSCSI) → Proxmox/Ubuntu VMs
Security Model
- Perimeter: No inbound ports; Cloudflare Tunnel + Zero Trust policies
- Identity: SSO + MFA for management; role-based access
- Network: Inter-VLAN default deny; explicit allow for app→storage, monitoring→inbound
- Supply chain: Signed commits/artifacts; secret vault (no secrets in repos)
- Azure governance: Policies for baseline configuration and updates via Arc
Milestones for Success
- Foundation - Hardware ready, base software installed
- Infrastructure Automation - Azure Arc agents installed, storage configured
- Networking and Storage Services - OpenWrt VM with multi-WAN, VLAN segmentation, storage exports
- VM and Platform - Ubuntu VMs deployed, Proxmox bridges mapped to VLANs
- Secure External Access and Governance - Cloudflare Tunnel published, Azure governance via Arc
- Operations and Continuous Improvement - Observability dashboards live, runbooks documented
Related Documentation
- Hardware BOM - Complete bill of materials
- PCIe Allocation - Slot allocation map
- Network Topology - VLAN/IP schema and routing
- Cloudflare Integration - Tunnel and Zero Trust setup
- Azure Arc Onboarding - Agent installation and governance
- Bring-Up Checklist - Day-one installation guide