- Marked submodules ai-mcp-pmm-controller, explorer-monorepo, and smom-dbis-138 as dirty to reflect recent changes. - Updated documentation to clarify operator script usage, including dotenv loading and task execution instructions. - Enhanced the README and various index files to provide clearer navigation and task completion guidance. Made-with: Cursor
16 KiB
13× R630 Proxmox Cluster — DoD/MIL-Spec HA Master Plan
Last Updated: 2026-03-02
Document Version: 1.0
Status: Active — Master plan for 13-node HA, RAM/storage, and DoD/MIL compliance
1. Executive Summary
This document defines the target architecture for a 13-node Dell PowerEdge R630 Proxmox cluster with:
- Full HA and failover (shared storage, HA manager, fencing, automatic recovery).
- DoD/MIL-spec alignment (STIG-style hardening, audit, encryption, change control, documentation).
- RAM and drive specifications for each R630 to support Ceph, VMs/containers, and growth.
Scope: All 13 R630s as Proxmox cluster nodes; optional separate management node (e.g. ml110) or integration of management on a subset of R630s. Design assumes hyper-converged (Proxmox + Ceph on same nodes) for shared storage and true HA.
Extended inventory: The same site includes 3× Dell R750 servers, 2× Dell Precision 7920 workstations, and 2× UniFi Dream Machine Pro (gateways). See HARDWARE_INVENTORY_MASTER.md, 13_NODE_NETWORK_AND_CABLING_CHECKLIST.md, and 13_NODE_AND_ASSETS_BRING_ONLINE_CHECKLIST.md for network topology, cabling, and bring-online order.
2. Cluster Design — 13 Nodes
2.1 Node roles and quorum
| Item | Requirement |
|---|---|
| Total nodes | 13 × R630 |
| Quorum | Majority = 7. With 13 nodes, up to 6 can be down and cluster still has quorum. |
| Fencing | Required for HA: failed node must be fenced (power off/reboot) so Ceph and HA manager can safely restart resources elsewhere. |
| Qdevice | Optional: add a quorum device (e.g. small VM or appliance) so quorum survives more node failures; not required with 13 nodes but improves resilience. |
2.2 Recommended node layout
| Role | Node count | Purpose |
|---|---|---|
| Proxmox + Ceph MON/MGR/OSD | 13 | Every R630 runs Proxmox and participates in Ceph (MON, MGR, OSD) for shared storage. |
| Ceph OSD | 13 | Each node contributes disk as Ceph OSD; replication (e.g. size=3, min_size=2) across nodes. |
| Proxmox HA | 13 | HA manager can restart VMs/containers on any node; VM disks on Ceph. |
| Optional dedicated | 0 | No dedicated “monitor-only” nodes required; MON/MGR run on all or a subset (e.g. 3–5 MONs). |
2.3 Network and addressing
- Management: One subnet (e.g. 192.168.11.0/24) for Proxmox API, SSH, Ceph public/cluster.
- Ceph: Separate VLAN or subnet for Ceph cluster network (recommended for DoD: isolate storage traffic).
- VLANs: Same VLAN-aware bridge (e.g. vmbr0) on all nodes so VMs/containers keep IPs when failed over.
- IP plan for 13 R630s: Reserve 13 consecutive IPs (e.g. 192.168.11.11–192.168.11.23 for r630-01 … r630-13). Document in
config/ip-addresses.confand DNS.
2.4 Switching (10G backbone)
Inventory: 2 × UniFi XG 10G 16-port switches (see HARDWARE_INVENTORY_MASTER.md).
- Use for Ceph cluster network and inter-node traffic; connect all 13 R630s via 10G for storage and replication.
- Redundancy: Two switches allow dual-attach per node (e.g. one link per switch or LACP) for HA.
- Management: Can stay on existing 1G LAN or use 10G for management if NICs support it.
3. RAM Specifications — R630
3.1 R630 memory capabilities (reference)
| Spec | Value |
|---|---|
| DIMM slots | 24 (12 per socket in 2-socket) |
| Max RAM | Up to 1.5 TB (with compatible LRDIMMs) |
| Typical configs | 32 GB, 64 GB, 128 GB, 256 GB, 384 GB, 512 GB (depending on DIMM size and count) |
| ECC | Required for DoD/MIL; R630 supports ECC RDIMM/LRDIMM |
3.2 Recommended RAM per node (DoD HA + Ceph)
| Tier | RAM per node | Use case |
|---|---|---|
| Minimum | 128 GB | Ceph OSD + a few VMs; acceptable for lab or light production. |
| Recommended | 256 GB | Production: Ceph (OSD + MON/MGR) + many VMs/containers; headroom for failover and recovery. |
| High | 384–512 GB | Heavy workloads, large Ceph OSD count per node, or when consolidating from existing 503 GB nodes. |
Ceph guidance: Proxmox/Ceph recommend ≥ 8 GiB per OSD for OSD memory. With 6–8 OSDs per node (see storage), 48–64 GiB for Ceph plus Proxmox and guest overhead → 128 GB minimum, 256 GB recommended.
DoD/MIL note: Prefer 256 GB per node for 13-node production so that (1) multiple node failures still leave enough capacity for HA migrations and (2) Ceph recovery and rebalancing do not cause OOM or instability.
3.3 RAM placement (if mixing sizes)
If not all nodes have the same RAM:
- Put largest RAM in nodes that run the most VMs or Ceph MON/MGR.
- Ensure at least 128 GB on every node that runs Ceph OSDs.
- Document exact DIMM layout per node (slot, size, speed) for change control and troubleshooting.
4. Drive Specifications — R630
4.1 R630 drive options (reference)
- Internal bays: Typically 8 × 2.5" SATA/SAS (or 10-bay with optional kit); some configs support NVMe (e.g. 4 × NVMe via PCIe).
- Boot: 2 drives in mirror (ZFS mirror or hardware RAID1) for Proxmox OS — redundant, DoD-compliant.
- Data: Remaining drives for Ceph OSD and/or local LVM (if hybrid).
4.2 Recommended drive layout per R630 (full Ceph)
| Purpose | Drives | Type | Size (example) | Configuration |
|---|---|---|---|---|
| Boot (OS) | 2 | SSD | 240–480 GB each | ZFS mirror (preferred) or HW RAID1; Proxmox root only. |
| Ceph OSD | 4–6 | SSD (or NVMe) | 480 GB – 1 TB each | One OSD per drive; no RAID (Ceph provides replication). |
Example per node: 2 × 480 GB boot (ZFS mirror) + 6 × 960 GB SSD = 6 Ceph OSDs per node.
Cluster total: 13 × 6 = 78 OSDs; with replication 3×, usable capacity ≈ (78 × 0.9 TB) / 3 ≈ ~23 TB (before bluestore overhead; adjust for actual sizes).
4.3 DoD/MIL storage requirements
- Encryption: At-rest encryption for sensitive data. Options: Ceph encryption (e.g. dm-crypt for OSD), or encrypted VMs (LUKS inside guest). Document which layers are encrypted and key management.
- Integrity: ZFS for boot (checksum, scrub). Ceph provides replication and recovery; use bluestore with checksums.
- Sanitization: Follow DoD 5220.22-M or NIST SP 800-88 for decommissioning/destruction of drives.
- Spare: Maintain spare drives and document replacement and wipe procedures.
4.4 Sizing for your workload
- Current (from docs): ~50+ VMIDs, mix of Besu, Blockscout, DBIS, NPMplus, etc.; growth ~20–50 GB/month.
- Target: Size Ceph pool so that used + 2 years growth stays < 75% of usable. Example: 15–20 TB usable → ~5–7 TB used now + growth headroom.
5. Full HA and Failover Architecture
5.1 Components
| Component | Role |
|---|---|
| Proxmox cluster | 13 nodes; same cluster name; corosync for quorum. |
| Ceph | Shared storage: MON (3–5 nodes), MGR (2+), OSD on all 13. Replication size=3, min_size=2. |
| Proxmox HA | HA manager enabled; VMs/containers on Ceph added as HA resources; start/stop order and groups as needed. |
| Fencing (STONITH) | Mandatory: when a node is declared lost, fence device powers it off (or reboots) so Ceph and HA can safely reassign resources. Use Proxmox’s built-in fence agents (e.g. fence_pve with Proxmox API or IPMI/IDRAC). |
| Network | Redundant links where possible; same VLAN/bridge config on all nodes so failover does not change VM IPs. |
5.2 Ceph design (summary)
- Pools: At least one pool for VM/container disks (e.g.
ceph-vm); optionally separate pool for backups or bulk data. - Replication: size=3, min_size=2; tolerate 2 node failures without data loss (with 13 nodes).
- Network: Separate cluster network (e.g. 10.x or dedicated VLAN) for Ceph backend traffic; public for client (Proxmox) access.
- MON/MGR: 3 or 5 MONs (odd); 2 MGRs minimum. Spread across nodes for availability.
5.3 HA resource and failover behavior
- HA resources: Add each critical VM/CT as HA resource; define groups (e.g. “database first, then app”) and restart order.
- Failure: Node down → fencing → Ceph marks OSDs out → HA manager restarts VMs on other nodes using Ceph disks.
- Maintenance: Put node in maintenance → migrate VMs off (or let HA relocate) → fence not triggered; perform RAM/drive work.
5.4 What “full HA” gives you (DoD-relevant)
- No single point of failure: Storage replicated; compute can run on any node.
- Automatic failover: No manual migration for HA-managed guests.
- Controlled maintenance: Node can be taken down without losing services; documented procedures for patching and hardware changes.
6. DoD/MIL-Spec Compliance Framework
6.1 Alignment with DISA STIG / DoD requirements
DoD/MIL typically implies (summary; you must map to your exact ATO/contract):
| Area | Requirement | Implementation |
|---|---|---|
| Hardening | DISA STIG or equivalent for OS and applications | Apply STIG/CIS to Debian (Proxmox host) and guests; document exceptions. |
| Authentication | Strong auth, no default passwords, MFA where required | SSH key-only on Proxmox; no password SSH; RBAC in Proxmox; MFA for critical UIs if required. |
| Access control | Least privilege, RBAC, audit | Proxmox roles and permissions; separate admin vs operator; audit logs. |
| Encryption | TLS in transit; encryption at rest for sensitive data | TLS 1.2+ for API and Ceph; at-rest encryption (Ceph or LUKS) as required. |
| Audit and logging | Centralized, tamper-resistant, retention | rsyslog/syslog-ng to central log host; retention per policy; integrity (e.g. signed/hash). |
| Change control | Documented changes, rollback capability | Change tickets; config in Git; backups before changes; runbooks. |
| Backup and recovery | Regular backups, tested restore | Proxmox backups to separate storage; Ceph snapshots; DR runbook and tests. |
| Physical and environmental | Physical security, power, cooling | Out of scope for this doc; document in facility plan. |
6.2 Hardening checklist (Proxmox + Debian)
Use this as an operational checklist; align with your STIG version.
Proxmox hosts (Debian base):
- SSH: Key-only auth; PasswordAuthentication no; PermitRootLogin prohibit-password or key-only; strong ciphers/KexAlgorithms.
- Firewall: Restrict Proxmox API (8006) and SSH to management VLAN/CIDR; default deny.
- Services: Disable unnecessary services; only Proxmox, Ceph, corosync, and required dependencies.
- Session timeout: User session timeout (e.g. 900 s) in shell profile and/or Proxmox UI.
- TLS: TLS 1.2+ only; strong ciphers for pveproxy and Ceph.
- Updates: Security updates applied on a defined schedule; test in non-prod first.
- FIPS: If required by contract, use FIPS-validated crypto (kernel/openssl); document and test.
- File permissions: Sensitive files (keys, tokens) mode 600/400; no world-writable.
- Audit: auditd or equivalent for critical files and commands; logs to central host.
Ceph:
- Auth: Cephx enabled; key management per DoD key management policy.
- Network: Cluster network isolated; no Ceph ports exposed to user VLANs.
- Encryption: At-rest encryption for OSD if required; key escrow and rotation documented.
Guests (VMs/containers):
- Per-guest hardening: STIG/CIS per OS (e.g. Ubuntu, RHEL); documented baseline.
- Secrets: No secrets in configs in Git; use Vault or Proxmox secrets where applicable.
Existing automation (this repo): Use scripts/security/run-security-on-proxmox-hosts.sh (SSH key-only + firewall 8006), scripts/security/setup-ssh-key-auth.sh, and scripts/security/firewall-proxmox-8006.sh; extend to all 13 hosts and run with --apply after validating with --dry-run. Extend host list in scripts or via env (e.g. all R630 IPs).
6.3 Audit and documentation
- Configuration baseline: All Proxmox and Ceph configs in version control; changes via PR/ticket.
- Runbooks: Install, upgrade, add node, remove node, replace drive, fence test, backup/restore, disaster recovery.
- Evidence: Run STIG/CIS scans (e.g. OpenSCAP, Nessus) and retain reports for assessors.
- Change log: Document every change (who, when, why, ticket); link to runbook.
7. Phased Implementation
Phase 1 — Prepare (no downtime)
- IP and DNS: Assign and document 13 IPs for R630s; update
config/ip-addresses.confand DNS. - RAM: Upgrade all 13 R630s to at least 128 GB (256 GB recommended); document DIMM layout.
- Drives: Install boot mirror (2 × SSD) and data drives (4–6 SSD per node) on each R630; configure ZFS mirror for boot.
- Proxmox install: Install Proxmox VE on all 13; same version; join to one cluster; configure VLAN-aware bridge and management IPs.
- Hardening: Apply SSH key-only, firewall, and STIG/CIS checklist to all nodes; document exceptions.
Phase 2 — Ceph
- Ceph install: Install Ceph on all 13 nodes (Proxmox Ceph integration); create MON (3 or 5), MGR (2), OSD (all nodes).
- Pools: Create replication pool (size=3, min_size=2) for VM disks; add as Proxmox storage.
- Network: Configure Ceph public and cluster networks; validate connectivity and latency.
- Tests: Fill and drain; kill OSD/node and verify recovery; document procedures.
Phase 3 — HA and fencing
- Fencing: Configure fence_pve (or IPMI/IDRAC) for each node; test fence from another node.
- HA manager: Enable HA in cluster; add critical VMs/containers as HA resources; set groups and order.
- Failover tests: Power off one node; verify fencing and HA restart on another node; repeat for 2-node failure if desired.
- Runbooks: Document failover test results and operational procedures.
Phase 4 — Migrate workload
- Migrate disks: Move VM/container disks from local storage to Ceph (live migration or backup/restore).
- Decommission local-only: Once all HA resources are on Ceph, remove or repurpose local LVM for non-HA or cache.
- Monitoring and alerting: Integrate with central monitoring; alerts for quorum loss, Ceph health, fence events, HA failures.
Phase 5 — DoD/MIL continuous compliance
- Scans: Schedule STIG/CIS scans; remediate and document exceptions.
- Backup and DR: Automate backups; test restore quarterly; update DR runbook.
- Change control: All changes via ticket + runbook; config in Git; periodic review of permissions and audit logs.
8. References and Related Docs
| Document | Purpose |
|---|---|
| PROXMOX_HA_CLUSTER_ROADMAP.md | Current HA roadmap (3-node); extend to 13-node. |
| PROXMOX_CLUSTER_ARCHITECTURE.md | Cluster and storage overview. |
| PHYSICAL_DRIVES_AND_CONFIG.md | Current drive layout (existing 2 R630s + ml110). |
| Proxmox Ceph documentation | Ceph in Proxmox. |
| Proxmox HA | High Availability. |
| DISA STIG | DISA STIGs; Debian/Ubuntu and application STIGs. |
| CIS Benchmarks | CIS Benchmarks; Debian, Proxmox if available. |
9. Summary Table
| Item | Specification |
|---|---|
| Nodes | 13 × Dell PowerEdge R630 |
| Quorum | Majority 7; up to 6 nodes can fail |
| RAM per node | Minimum 128 GB; recommended 256 GB (DoD production) |
| Boot | 2 × SSD (e.g. 240–480 GB) ZFS mirror per node |
| Data (Ceph) | 4–6 × SSD (e.g. 480 GB – 1 TB) per node, one OSD per drive |
| Shared storage | Ceph replicated (size=3, min_size=2) |
| HA | Proxmox HA manager; fencing (STONITH) required |
| Hardening | STIG/CIS alignment; SSH key-only; firewall; TLS; audit; change control |
| Encryption | TLS in transit; at-rest per policy (Ceph or LUKS) |
Owner: Architecture / Infrastructure
Review: Quarterly or when adding nodes / changing compliance scope
Change control: Update version and “Last Updated” when changing this plan; link change ticket.