Files
loc_az_hci/docs/architecture/complete-architecture.md
defiQUG c39465c2bd
Some checks failed
Test / test (push) Has been cancelled
Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-08 09:04:46 -08:00

14 KiB
Raw Permalink Blame History

Complete Azure Stack HCI Architecture

Overview

This document describes the complete architecture for a local Azure Stack HCI environment with Cloudflare Zero Trust, Azure Arc governance, Proxmox VE virtualization, and Ubuntu service VMs. The system transforms your environment into a local Azure "cloud" using Azure Stack HCI principles.

Core Objectives

  • Local Azure cloud: Govern on-prem servers with Azure Arc and adopt Azure operations practices
  • Hyper-converged stack: Proxmox VE for virtualization, Ubuntu VMs for services, centralized storage via external shelves
  • Secure edge: Cloudflare Zero Trust/Tunnel to expose services without inbound ports
  • High-availability networking: 4× 1Gbps Spectrum WAN, multi-WAN failover/policy routing, QAT-accelerated VPN/TLS offload
  • Unified ops: CI/CD, monitoring, and consistent configuration across all nodes

Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                         Azure Portal                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │ Azure Arc    │  │ Azure Policy │  │ Azure Monitor │         │
│  │ Servers      │  │              │  │              │         │
│  └──────────────┘  └──────────────┘  └──────────────┘         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │ Arc K8s      │  │ GitOps       │  │ Defender     │         │
│  │              │  │ (Flux)       │  │ for Cloud    │         │
│  └──────────────┘  └──────────────┘  └──────────────┘         │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ Azure Arc Connection
                              │
┌─────────────────────────────────────────────────────────────────┐
│                    On-Premises Infrastructure                    │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │         Router/Switch/Storage Controller Server            │ │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │ │
│  │  │ Windows Server│  │  OpenWrt VM  │  │ Storage S2D  │   │ │
│  │  │ Core + Hyper-V│  │  (mwan3)     │  │   Pools      │   │ │
│  │  │              │  │              │  │              │   │ │
│  │  │ Azure Arc    │  │ 4× WAN       │  │ 4× Shelves   │   │ │
│  │  │ Agent        │  │ (Spectrum)   │  │ (via LSI HBAs)│   │ │
│  │  └──────────────┘  └──────────────┘  └──────────────┘   │ │
│  │         │                  │                  │          │ │
│  └─────────┼──────────────────┼──────────────────┼──────────┘ │
│            │                  │                  │             │
│  ┌─────────▼──────────────────▼──────────────────▼──────────┐ │
│  │              Proxmox VE Hosts (Existing)                   │ │
│  │  ┌──────────────┐              ┌──────────────┐           │ │
│  │  │  HPE ML110   │              │  Dell R630   │           │ │
│  │  │  Gen9        │              │              │           │ │
│  │  │              │              │              │           │ │
│  │  │  Azure Arc   │              │  Azure Arc   │           │ │
│  │  │  Agent       │              │  Agent       │           │ │
│  │  └──────────────┘              └──────────────┘           │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │                    Ubuntu Service VMs                      │ │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │ │
│  │  │ Cloudflare   │  │  Reverse    │  │ Observability │    │ │
│  │  │ Tunnel VM    │  │  Proxy VM   │  │     VM       │    │ │
│  │  │              │  │             │  │              │    │ │
│  │  │ Azure Arc    │  │  Azure Arc  │  │  Azure Arc   │    │ │
│  │  │ Agent        │  │  Agent      │  │  Agent       │    │ │
│  │  └──────────────┘  └──────────────┘  └──────────────┘    │ │
│  │  ┌──────────────┐                                        │ │
│  │  │   CI/CD VM   │                                        │ │
│  │  │              │                                        │ │
│  │  │  Azure Arc   │                                        │ │
│  │  │  Agent       │                                        │ │
│  │  └──────────────┘                                        │ │
│  └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ Cloudflare Tunnel (Outbound Only)
                              │
┌─────────────────────────────────────────────────────────────────┐
│                      Cloudflare Zero Trust                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │ Zero Trust   │  │     WAF      │  │   Tunnel     │         │
│  │ Policies     │  │   Rules      │  │  Endpoints   │         │
│  └──────────────┘  └──────────────┘  └──────────────┘         │
└─────────────────────────────────────────────────────────────────┘

Physical Infrastructure

Router/Switch/Storage Controller Server (New)

  • Chassis: Entry-level Supermicro/Dell mini-server
  • CPU: Intel Xeon E-2100 or similar (6-8 cores), PCIe 3.0 support
  • Memory: 8× 4GB DDR4 ECC RDIMM = 32GB (reused from R630)
  • Storage: 256GB SSD (OS, configs), optional mirrored boot
  • PCIe Cards:
    • Intel i350-T4: 4× 1GbE (WAN - Spectrum connections)
    • Intel X550-T2: 2× 10GbE RJ45 (future uplinks or high-perf server links)
    • Intel i225 Quad-Port: 4× 2.5GbE (LAN to key servers)
    • Intel i350-T8: 8× 1GbE (LAN to remaining servers)
    • Intel QAT 8970: Crypto acceleration (TLS/IPsec/compression)
    • 2× LSI 9207-8e: SAS HBAs for 4 external shelves

Proxmox VE Hosts (Existing)

  • HPE ProLiant ML110 Gen9:

    • CPU: Intel Xeon E5-series
    • Memory: Remaining DDR4 ECC RDIMM after Router allocation
    • Storage: Local SSDs/HDDs for OS and VM disks
    • Networking: 1GbE onboard NICs; optional Intel add-in NICs
  • Dell PowerEdge R630:

    • CPU: Intel Xeon E5 v3/v4 dual-socket
    • Memory: Remaining DDR4 ECC RDIMM (32GB spare pool noted)
    • Storage: PERC or HBA with SSDs
    • Networking: 1/10GbE depending on NICs installed

Storage Shelves

  • Quantity: 4 external SAS JBOD shelves
  • Connectivity: Each shelf via SFF-8644 to LSI HBAs; dual-pathing optional
  • Role: Backing storage for VMs, Kubernetes PVCs, and NAS services

WAN Connectivity

  • Providers: 4× Spectrum Internet 1Gbps
  • Termination: i350-T4 on Router server
  • Routing: Multi-WAN policy routing and failover; per-ISP health checks

Software Stack

Router Server

  • Base OS: Windows Server Core with Hyper-V (for HCI integration) OR Proxmox VE (uniform virtualization)
  • Network Services:
    • OpenWrt VM: Multi-WAN (mwan3), firewall, VLANs, policy routing
    • Intel PROSet drivers for all NICs
    • QAT drivers/qatlib + OpenSSL QAT engine
  • Storage Services:
    • LSI HBAs: IT mode, mpt3sas driver, attach shelves
    • Storage Spaces Direct: Pools/volumes for VM and app storage
    • Optional ZFS on Linux (VM or host) for NAS
  • Management:
    • Windows Admin Center (WAC): Cluster lifecycle, health
    • Azure Arc agent: Connected Machine agent on Linux VMs/hosts

Proxmox VE (ML110, R630)

  • Hypervisor: Latest Proxmox VE
  • Guests: Ubuntu LTS for app services, Cloudflare Tunnel endpoints, monitoring, logging, Arc agents
  • Storage: Connect to shelves via exported protocols (NFS/iSCSI) or pass-through HBAs/volumes
  • Networking: Tag VLANs per VM bridge; allocate vNICs tied to VLAN schema

Ubuntu Service VMs

  • Cloudflare Tunnel (Zero Trust): cloudflared to publish internal apps (WAC, dashboards, SSH, selected services) without inbound ports
  • Azure Arc agent: Connected Machine agent to enroll Linux VMs and hosts for policy/monitor/defender/update
  • Observability: Prometheus, Grafana, Loki/OpenSearch for logs; syslog from Router and Proxmox nodes
  • Reverse proxy: NGINX/Traefik with mTLS, integrated behind Cloudflare
  • Automation/CI: GitLab Runner/Jenkins agents for local CI/CD pipelines

Key Integrations

Cloudflare

  • Zero Trust/Tunnel: Use cloudflared on Ubuntu VM in VLAN 99 to expose:
    • Management portals: WAC, Proxmox UI, dashboards (restrict via SSO/MFA)
    • Developer services: Git, CI, internal APIs
  • Policies: SSO (Azure AD/Okta), device posture checks, least privilege
  • WAF and routing: Protect public ingress; no inbound ports on Spectrum WAN CPE

Azure Arc

  • Targets: Ubuntu service VMs, optionally Proxmox hosts (as Linux), and Windows management VM
  • Process: Install Connected Machine agent; validate Arc connection; enable Azure Policy, Monitor, Defender, and Update Manager
  • Proxy considerations: If outbound constraints apply, onboarding via proxy methods is documented

High-Level Data Flows

  • North-south: 4× Spectrum WAN → Router (OpenWrt VM) → Cloudflare Tunnel outbound only for published services
  • East-west: VLAN-segmented traffic across Proxmox nodes, Ubuntu VMs, storage shelves; QAT accelerates crypto within Router server for site-to-site VPN if needed
  • Storage: Router server's HBAs → shelves; exports (NFS/SMB/iSCSI) → Proxmox/Ubuntu VMs

Security Model

  • Perimeter: No inbound ports; Cloudflare Tunnel + Zero Trust policies
  • Identity: SSO + MFA for management; role-based access
  • Network: Inter-VLAN default deny; explicit allow for app→storage, monitoring→inbound
  • Supply chain: Signed commits/artifacts; secret vault (no secrets in repos)
  • Azure governance: Policies for baseline configuration and updates via Arc

Milestones for Success

  1. Foundation - Hardware ready, base software installed
  2. Infrastructure Automation - Azure Arc agents installed, storage configured
  3. Networking and Storage Services - OpenWrt VM with multi-WAN, VLAN segmentation, storage exports
  4. VM and Platform - Ubuntu VMs deployed, Proxmox bridges mapped to VLANs
  5. Secure External Access and Governance - Cloudflare Tunnel published, Azure governance via Arc
  6. Operations and Continuous Improvement - Observability dashboards live, runbooks documented