Files
loc_az_hci/README.md
defiQUG c39465c2bd
Some checks failed
Test / test (push) Has been cancelled
Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-08 09:04:46 -08:00

479 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Proxmox VE → Azure Arc → Hybrid Cloud Stack
Complete end-to-end implementation package for transforming two Proxmox VE hosts into a fully Azure-integrated Hybrid Cloud stack with high availability, Kubernetes orchestration, GitOps workflows, and blockchain infrastructure services.
## 🎯 Overview
This project provides a comprehensive blueprint and automation scripts to deploy:
- **Proxmox VE Cluster**: 2-node high-availability cluster with shared storage
- **Azure Arc Integration**: Full visibility and management from Azure Portal
- **Kubernetes (K3s)**: Lightweight Kubernetes cluster for container orchestration
- **GitOps Workflow**: Declarative infrastructure and application management
- **Private Git/DevOps**: Self-hosted Git repository (Gitea/GitLab)
- **Hybrid Cloud Stack**: Complete blockchain and monitoring services
## 🏗️ Architecture
```
Azure Portal
Azure Arc (Servers, Kubernetes, GitOps)
Proxmox VE Cluster (2 Nodes)
Kubernetes (K3s) + Applications
HC Stack Services (Besu, Firefly, Chainlink, Blockscout, Cacti, NGINX)
```
See [Architecture Documentation](docs/architecture.md) for detailed architecture overview.
## 🖥️ Azure Stack HCI Architecture
This project now includes a complete **Azure Stack HCI integration** with Cloudflare Zero Trust, comprehensive network segmentation, and centralized storage management.
### Key Components
- **Router/Switch/Storage Controller Server**: New server acting as router, switch, and storage controller
- 4× Spectrum WAN connections (multi-WAN load balancing)
- OpenWrt VM for network routing and firewall
- Storage Spaces Direct for 4× external storage shelves
- Intel QAT 8970 for crypto acceleration
- **Proxmox VE Hosts**: Existing HPE ML110 Gen9 and Dell R630
- VLAN bridges mapped to network schema
- Storage mounts from Router server
- Azure Arc Connected Machine agents
- **Ubuntu Service VMs**: Cloudflare Tunnel, reverse proxy, observability, CI/CD
- All VMs with Azure Arc agents
- VLAN-segmented network access
- **Cloudflare Zero Trust**: Secure external access without inbound ports
- Tunnel for WAC, Proxmox UI, dashboards, Git, CI
- SSO/MFA policies
- WAF protection
- **Azure Arc Governance**: Complete Azure integration
- Policy enforcement
- Monitoring and Defender
- Update Management
### Network Topology
- **VLAN 10**: Storage (10.10.10.0/24)
- **VLAN 20**: Compute (10.10.20.0/24)
- **VLAN 30**: App Tier (10.10.30.0/24)
- **VLAN 40**: Observability (10.10.40.0/24)
- **VLAN 50**: Dev/Test (10.10.50.0/24)
- **VLAN 60**: Management (10.10.60.0/24)
- **VLAN 99**: DMZ (10.10.99.0/24)
### Documentation
- **[Complete Architecture](docs/complete-architecture.md)**: Full Azure Stack HCI architecture
- **[Hardware BOM](docs/hardware-bom.md)**: Complete bill of materials
- **[PCIe Allocation](docs/pcie-allocation.md)**: Slot allocation map
- **[Network Topology](docs/network-topology.md)**: VLAN/IP schema and routing
- **[Bring-Up Checklist](docs/bring-up-checklist.md)**: Day-one installation guide
- **[Cloudflare Integration](docs/cloudflare-integration.md)**: Tunnel and Zero Trust setup
- **[Azure Arc Onboarding](docs/azure-arc-onboarding.md)**: Agent installation and governance
### Quick Start (Azure Stack HCI)
1. **Hardware Setup**: Install Router server with all PCIe cards
2. **OS Installation**: Windows Server Core or Proxmox VE
3. **Driver Installation**: Run driver installation scripts
4. **Network Configuration**: Configure OpenWrt and VLANs
5. **Storage Configuration**: Flash HBAs to IT mode, configure S2D
6. **Azure Arc Onboarding**: Install agents on all hosts/VMs
7. **Cloudflare Setup**: Configure Tunnel and Zero Trust
8. **Service Deployment**: Deploy Ubuntu VMs and services
See [Bring-Up Checklist](docs/bring-up-checklist.md) for detailed steps.
## 📋 Prerequisites
### Hardware Requirements
- **2 Proxmox VE hosts** with:
- Proxmox VE 7.0+ installed
- Minimum 8GB RAM per node (16GB+ recommended)
- Static IP addresses
- Network connectivity between nodes
- Internet access for Azure Arc connectivity
### Software Requirements
- Azure subscription with Contributor role
- Azure CLI installed and authenticated
- kubectl (for Kubernetes management)
- SSH access to all nodes
- NFS server (optional, for shared storage)
### Network Requirements
- Static IP addresses for all nodes
- DNS resolution (or hosts file configuration)
- Outbound HTTPS (443) for Azure Arc connectivity
- Cluster communication ports (5404-5412 UDP)
## 🚀 Quick Start
### 1. Clone Repository
```bash
git clone <repository-url>
cd loc_az_hci
```
### 2. Configure Environment Variables
Create a `.env` file from the template:
```bash
cp .env.example .env
```
Edit `.env` and fill in your credentials:
- **Azure**: Subscription ID, Tenant ID, and optionally Service Principal credentials
- **Cloudflare**: API Token and Account Email
- **Proxmox**: `PVE_ROOT_PASS` (shared root password) and URLs for each host
- ML110: `PROXMOX_ML110_URL`
- R630: `PROXMOX_R630_URL`
**Note**: Proxmox uses self-signed SSL certificates by default. Browser security warnings are normal. For production, use Cloudflare Tunnel (handles SSL termination) or configure proper certificates.
**Important**: Never commit `.env` to version control. It's already in `.gitignore`.
Load environment variables in your shell:
```bash
# Source the .env file (if your scripts support it)
export $(cat .env | grep -v '^#' | xargs)
```
Or use a tool like `direnv` or `dotenv` to automatically load `.env` files.
### 3. Configure Proxmox Cluster
**On Node 1**:
```bash
export NODE_IP=192.168.1.10
export NODE_GATEWAY=192.168.1.1
export NODE_HOSTNAME=pve-node-1
./infrastructure/proxmox/network-config.sh
./infrastructure/proxmox/cluster-setup.sh
```
**On Node 2**:
```bash
export NODE_IP=192.168.1.11
export NODE_GATEWAY=192.168.1.1
export NODE_HOSTNAME=pve-node-2
export CLUSTER_NODE_IP=192.168.1.10
./infrastructure/proxmox/network-config.sh
export NODE_ROLE=join
./infrastructure/proxmox/cluster-setup.sh
```
### 4. Onboard to Azure Arc
**On each Proxmox node**:
```bash
export RESOURCE_GROUP=HC-Stack
export TENANT_ID=$(az account show --query tenantId -o tsv)
export SUBSCRIPTION_ID=$(az account show --query id -o tsv)
export LOCATION=eastus
./scripts/azure-arc/onboard-proxmox-hosts.sh
```
### 5. Deploy Kubernetes
**On K3s VM**:
```bash
./infrastructure/kubernetes/k3s-install.sh
export RESOURCE_GROUP=HC-Stack
export CLUSTER_NAME=proxmox-k3s-cluster
./infrastructure/kubernetes/arc-onboard-k8s.sh
```
### 6. Deploy Git Server
**Option A: Gitea (Recommended)**:
```bash
./infrastructure/gitops/gitea-deploy.sh
```
**Option B: GitLab CE**:
```bash
./infrastructure/gitops/gitlab-deploy.sh
```
### 7. Configure GitOps
1. Create Git repository in your Git server
2. Copy `gitops/` directory to repository
3. Configure GitOps in Azure Portal or using Flux CLI
### 8. Deploy HC Stack Services
Deploy via GitOps (recommended) or manually:
```bash
# Manual deployment
helm install besu ./gitops/apps/besu -n blockchain
helm install firefly ./gitops/apps/firefly -n blockchain
helm install chainlink-ccip ./gitops/apps/chainlink-ccip -n blockchain
helm install blockscout ./gitops/apps/blockscout -n blockchain
helm install cacti ./gitops/apps/cacti -n monitoring
helm install nginx-proxy ./gitops/apps/nginx-proxy -n hc-stack
```
## 📁 Project Structure
```
loc_az_hci/
├── infrastructure/
│ ├── proxmox/ # Proxmox cluster setup scripts
│ ├── kubernetes/ # K3s installation scripts
│ └── gitops/ # Git server deployment scripts
├── scripts/
│ ├── azure-arc/ # Azure Arc onboarding scripts
│ └── utils/ # Utility scripts
├── terraform/
│ ├── proxmox/ # Proxmox Terraform modules
│ ├── azure-arc/ # Azure Arc Terraform modules
│ └── kubernetes/ # Kubernetes Terraform modules
├── gitops/
│ ├── infrastructure/ # Base infrastructure manifests
│ └── apps/ # Application Helm charts
│ ├── besu/
│ ├── firefly/
│ ├── chainlink-ccip/
│ ├── blockscout/
│ ├── cacti/
│ └── nginx-proxy/
├── docker-compose/
│ ├── gitea.yml # Gitea Docker Compose
│ └── gitlab.yml # GitLab Docker Compose
├── docs/
│ ├── architecture.md # Architecture documentation
│ ├── network-topology.md
│ ├── deployment-guide.md
│ └── runbooks/ # Operational runbooks
├── diagrams/
│ ├── architecture.mmd
│ ├── network-topology.mmd
│ └── deployment-flow.mmd
└── config/
├── azure-arc-config.yaml
└── gitops-config.yaml
├── .env.example # Environment variables template
└── .gitignore # Git ignore rules (includes .env)
```
## 📚 Documentation
- **[Architecture Overview](docs/architecture.md)**: Complete system architecture
- **[Network Topology](docs/network-topology.md)**: Network design and configuration
- **[Deployment Guide](docs/deployment-guide.md)**: Step-by-step deployment instructions
- **[Runbooks](docs/runbooks/)**: Operational procedures
- [Proxmox Operations](docs/runbooks/proxmox-operations.md)
- [Azure Arc Troubleshooting](docs/runbooks/azure-arc-troubleshooting.md)
- [GitOps Workflow](docs/runbooks/gitops-workflow.md)
## 🔧 Configuration
### Environment Variables (.env)
This project uses a `.env` file to manage credentials securely. **Never commit `.env` to version control.**
1. **Copy the template:**
```bash
cp .env.example .env
```
2. **Edit `.env` with your credentials:**
- Azure: `AZURE_SUBSCRIPTION_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`
- Cloudflare: `CLOUDFLARE_API_KEY` (or `CLOUDFLARE_API_TOKEN`), `CLOUDFLARE_ACCOUNT_ID`, `CLOUDFLARE_ZONE_ID`, `CLOUDFLARE_DOMAIN`, `CLOUDFLARE_TUNNEL_TOKEN`
**Note**: Cloudflare API Key and Tunnel Token are configured. Zero Trust features may require additional subscription/permissions.
- Proxmox: `PVE_ROOT_PASS` (shared root password for all instances)
- Proxmox ML110: `PROXMOX_ML110_URL` (use internal IP: `192.168.1.206:8006` for local network)
- Proxmox R630: `PROXMOX_R630_URL` (use internal IP: `192.168.1.49:8006` for local network)
**Note**:
- The username `root@pam` is implied and should not be stored. For production, use RBAC accounts and API tokens instead of root credentials.
- Use internal IPs (192.168.x.x) for local network access. External IPs are available for VPN/public access.
3. **Load environment variables:**
```bash
# In bash scripts, source the .env file
if [ -f .env ]; then
export $(cat .env | grep -v '^#' | xargs)
fi
```
See `.env.example` for all available configuration options.
### Azure Arc Configuration
Edit `config/azure-arc-config.yaml` with your Azure credentials (or use environment variables from `.env`):
```yaml
azure:
subscription_id: "your-subscription-id"
tenant_id: "your-tenant-id"
resource_group: "HC-Stack"
location: "eastus"
```
**Note**: Scripts will use environment variables from `.env` if available, which takes precedence over YAML config files.
### GitOps Configuration
Edit `config/gitops-config.yaml` with your Git repository details:
```yaml
git:
repository: "http://git.local:3000/user/gitops-repo.git"
branch: "main"
path: "gitops/"
```
## 🛠️ Tools and Scripts
### Prerequisites Check
```bash
./scripts/utils/prerequisites-check.sh
```
### Proxmox Operations
- `infrastructure/proxmox/network-config.sh`: Configure network
- `infrastructure/proxmox/cluster-setup.sh`: Create/join cluster
- `infrastructure/proxmox/nfs-storage.sh`: Configure NFS storage
### Azure Arc Operations
- `scripts/azure-arc/onboard-proxmox-hosts.sh`: Onboard Proxmox hosts
- `scripts/azure-arc/onboard-vms.sh`: Onboard VMs
- `scripts/azure-arc/resource-bridge-setup.sh`: Setup Resource Bridge
### Kubernetes Operations
- `infrastructure/kubernetes/k3s-install.sh`: Install K3s
- `infrastructure/kubernetes/arc-onboard-k8s.sh`: Onboard to Azure Arc
### Git/DevOps Operations
- `infrastructure/gitops/gitea-deploy.sh`: Deploy Gitea
- `infrastructure/gitops/gitlab-deploy.sh`: Deploy GitLab
- `infrastructure/gitops/azure-devops-agent.sh`: Setup Azure DevOps agent
## 🎨 Diagrams
View architecture diagrams:
- [Architecture Diagram](diagrams/architecture.mmd)
- [Network Topology](diagrams/network-topology.mmd)
- [Deployment Flow](diagrams/deployment-flow.mmd)
## 🔒 Security
- Network isolation and firewall rules
- Azure Arc managed identities and RBAC
- Kubernetes RBAC and network policies
- TLS/SSL with Cert-Manager
- Secrets management via `.env` file (excluded from version control)
- Proxmox VE RBAC best practices (see [Proxmox RBAC Guide](docs/security/proxmox-rbac.md))
- Consider Azure Key Vault integration for production deployments
## 📊 Monitoring
- **Cacti**: Network and system monitoring
- **Azure Monitor**: Metrics and logs via Azure Arc
- **Kubernetes Metrics**: Pod and service metrics
- **Azure Defender**: Security monitoring
## 🔄 High Availability
- Proxmox 2-node cluster with shared storage
- VM high availability with automatic failover
- Kubernetes multiple replicas for stateless services
- Load balancing via NGINX Ingress
## 🚨 Troubleshooting
See runbooks for common issues:
- [Azure Arc Troubleshooting](docs/runbooks/azure-arc-troubleshooting.md)
- [Proxmox Operations](docs/runbooks/proxmox-operations.md)
- [GitOps Workflow](docs/runbooks/gitops-workflow.md)
## 🤝 Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request
## 📝 License
This project is provided as-is for educational and deployment purposes.
## 🙏 Acknowledgments
- Proxmox VE team for excellent virtualization platform
- Microsoft Azure Arc team for hybrid cloud capabilities
- Kubernetes and K3s communities
- All open-source projects used in this stack
## 📞 Support
For issues and questions:
1. Check the [Documentation](docs/)
2. Review [Runbooks](docs/runbooks/)
3. Open an issue in the repository
## 🎯 Next Steps
After deployment:
1. Review and customize configurations
2. Set up monitoring and alerting
3. Configure backup and disaster recovery
4. Implement security policies
5. Plan for scaling and expansion
---
**Happy Deploying! 🚀**
---
## Archived Projects
This project contains archived content from related projects:
### PanTel (6G/GPU Archive)
- **Archive Location**: Archive beginning with `6g_gpu*` in this repository
- **Project**: PanTel telecommunications and connectivity infrastructure project
- **Joint Venture**: PanTel is a joint venture between Sankofa and PANDA (Pan-African Network for Digital Advancement)
- **Status**: Archived content - see [pan-tel](../pan-tel/) project directory for project information
- **Note**: This content is archived here and will be unpacked to the `pan-tel` project directory when ready for integration into the panda_monorepo
---