Files
proxmox/docs/02-architecture/R630_13_NODE_DOD_HA_MASTER_PLAN.md
defiQUG e4c9dda0fd
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
chore: update submodule references and documentation
- Marked submodules ai-mcp-pmm-controller, explorer-monorepo, and smom-dbis-138 as dirty to reflect recent changes.
- Updated documentation to clarify operator script usage, including dotenv loading and task execution instructions.
- Enhanced the README and various index files to provide clearer navigation and task completion guidance.

Made-with: Cursor
2026-03-04 02:03:08 -08:00

16 KiB
Raw Blame History

13× R630 Proxmox Cluster — DoD/MIL-Spec HA Master Plan

Last Updated: 2026-03-02
Document Version: 1.0
Status: Active — Master plan for 13-node HA, RAM/storage, and DoD/MIL compliance


1. Executive Summary

This document defines the target architecture for a 13-node Dell PowerEdge R630 Proxmox cluster with:

  • Full HA and failover (shared storage, HA manager, fencing, automatic recovery).
  • DoD/MIL-spec alignment (STIG-style hardening, audit, encryption, change control, documentation).
  • RAM and drive specifications for each R630 to support Ceph, VMs/containers, and growth.

Scope: All 13 R630s as Proxmox cluster nodes; optional separate management node (e.g. ml110) or integration of management on a subset of R630s. Design assumes hyper-converged (Proxmox + Ceph on same nodes) for shared storage and true HA.

Extended inventory: The same site includes 3× Dell R750 servers, 2× Dell Precision 7920 workstations, and 2× UniFi Dream Machine Pro (gateways). See HARDWARE_INVENTORY_MASTER.md, 13_NODE_NETWORK_AND_CABLING_CHECKLIST.md, and 13_NODE_AND_ASSETS_BRING_ONLINE_CHECKLIST.md for network topology, cabling, and bring-online order.


2. Cluster Design — 13 Nodes

2.1 Node roles and quorum

Item Requirement
Total nodes 13 × R630
Quorum Majority = 7. With 13 nodes, up to 6 can be down and cluster still has quorum.
Fencing Required for HA: failed node must be fenced (power off/reboot) so Ceph and HA manager can safely restart resources elsewhere.
Qdevice Optional: add a quorum device (e.g. small VM or appliance) so quorum survives more node failures; not required with 13 nodes but improves resilience.
Role Node count Purpose
Proxmox + Ceph MON/MGR/OSD 13 Every R630 runs Proxmox and participates in Ceph (MON, MGR, OSD) for shared storage.
Ceph OSD 13 Each node contributes disk as Ceph OSD; replication (e.g. size=3, min_size=2) across nodes.
Proxmox HA 13 HA manager can restart VMs/containers on any node; VM disks on Ceph.
Optional dedicated 0 No dedicated “monitor-only” nodes required; MON/MGR run on all or a subset (e.g. 35 MONs).

2.3 Network and addressing

  • Management: One subnet (e.g. 192.168.11.0/24) for Proxmox API, SSH, Ceph public/cluster.
  • Ceph: Separate VLAN or subnet for Ceph cluster network (recommended for DoD: isolate storage traffic).
  • VLANs: Same VLAN-aware bridge (e.g. vmbr0) on all nodes so VMs/containers keep IPs when failed over.
  • IP plan for 13 R630s: Reserve 13 consecutive IPs (e.g. 192.168.11.11192.168.11.23 for r630-01 … r630-13). Document in config/ip-addresses.conf and DNS.

2.4 Switching (10G backbone)

Inventory: 2 × UniFi XG 10G 16-port switches (see HARDWARE_INVENTORY_MASTER.md).

  • Use for Ceph cluster network and inter-node traffic; connect all 13 R630s via 10G for storage and replication.
  • Redundancy: Two switches allow dual-attach per node (e.g. one link per switch or LACP) for HA.
  • Management: Can stay on existing 1G LAN or use 10G for management if NICs support it.

3. RAM Specifications — R630

3.1 R630 memory capabilities (reference)

Spec Value
DIMM slots 24 (12 per socket in 2-socket)
Max RAM Up to 1.5 TB (with compatible LRDIMMs)
Typical configs 32 GB, 64 GB, 128 GB, 256 GB, 384 GB, 512 GB (depending on DIMM size and count)
ECC Required for DoD/MIL; R630 supports ECC RDIMM/LRDIMM
Tier RAM per node Use case
Minimum 128 GB Ceph OSD + a few VMs; acceptable for lab or light production.
Recommended 256 GB Production: Ceph (OSD + MON/MGR) + many VMs/containers; headroom for failover and recovery.
High 384512 GB Heavy workloads, large Ceph OSD count per node, or when consolidating from existing 503 GB nodes.

Ceph guidance: Proxmox/Ceph recommend ≥ 8 GiB per OSD for OSD memory. With 68 OSDs per node (see storage), 4864 GiB for Ceph plus Proxmox and guest overhead → 128 GB minimum, 256 GB recommended.

DoD/MIL note: Prefer 256 GB per node for 13-node production so that (1) multiple node failures still leave enough capacity for HA migrations and (2) Ceph recovery and rebalancing do not cause OOM or instability.

3.3 RAM placement (if mixing sizes)

If not all nodes have the same RAM:

  • Put largest RAM in nodes that run the most VMs or Ceph MON/MGR.
  • Ensure at least 128 GB on every node that runs Ceph OSDs.
  • Document exact DIMM layout per node (slot, size, speed) for change control and troubleshooting.

4. Drive Specifications — R630

4.1 R630 drive options (reference)

  • Internal bays: Typically 8 × 2.5" SATA/SAS (or 10-bay with optional kit); some configs support NVMe (e.g. 4 × NVMe via PCIe).
  • Boot: 2 drives in mirror (ZFS mirror or hardware RAID1) for Proxmox OS — redundant, DoD-compliant.
  • Data: Remaining drives for Ceph OSD and/or local LVM (if hybrid).
Purpose Drives Type Size (example) Configuration
Boot (OS) 2 SSD 240480 GB each ZFS mirror (preferred) or HW RAID1; Proxmox root only.
Ceph OSD 46 SSD (or NVMe) 480 GB 1 TB each One OSD per drive; no RAID (Ceph provides replication).

Example per node: 2 × 480 GB boot (ZFS mirror) + 6 × 960 GB SSD = 6 Ceph OSDs per node.
Cluster total: 13 × 6 = 78 OSDs; with replication 3×, usable capacity ≈ (78 × 0.9 TB) / 3 ≈ ~23 TB (before bluestore overhead; adjust for actual sizes).

4.3 DoD/MIL storage requirements

  • Encryption: At-rest encryption for sensitive data. Options: Ceph encryption (e.g. dm-crypt for OSD), or encrypted VMs (LUKS inside guest). Document which layers are encrypted and key management.
  • Integrity: ZFS for boot (checksum, scrub). Ceph provides replication and recovery; use bluestore with checksums.
  • Sanitization: Follow DoD 5220.22-M or NIST SP 800-88 for decommissioning/destruction of drives.
  • Spare: Maintain spare drives and document replacement and wipe procedures.

4.4 Sizing for your workload

  • Current (from docs): ~50+ VMIDs, mix of Besu, Blockscout, DBIS, NPMplus, etc.; growth ~2050 GB/month.
  • Target: Size Ceph pool so that used + 2 years growth stays < 75% of usable. Example: 1520 TB usable → ~57 TB used now + growth headroom.

5. Full HA and Failover Architecture

5.1 Components

Component Role
Proxmox cluster 13 nodes; same cluster name; corosync for quorum.
Ceph Shared storage: MON (35 nodes), MGR (2+), OSD on all 13. Replication size=3, min_size=2.
Proxmox HA HA manager enabled; VMs/containers on Ceph added as HA resources; start/stop order and groups as needed.
Fencing (STONITH) Mandatory: when a node is declared lost, fence device powers it off (or reboots) so Ceph and HA can safely reassign resources. Use Proxmoxs built-in fence agents (e.g. fence_pve with Proxmox API or IPMI/IDRAC).
Network Redundant links where possible; same VLAN/bridge config on all nodes so failover does not change VM IPs.

5.2 Ceph design (summary)

  • Pools: At least one pool for VM/container disks (e.g. ceph-vm); optionally separate pool for backups or bulk data.
  • Replication: size=3, min_size=2; tolerate 2 node failures without data loss (with 13 nodes).
  • Network: Separate cluster network (e.g. 10.x or dedicated VLAN) for Ceph backend traffic; public for client (Proxmox) access.
  • MON/MGR: 3 or 5 MONs (odd); 2 MGRs minimum. Spread across nodes for availability.

5.3 HA resource and failover behavior

  • HA resources: Add each critical VM/CT as HA resource; define groups (e.g. “database first, then app”) and restart order.
  • Failure: Node down → fencing → Ceph marks OSDs out → HA manager restarts VMs on other nodes using Ceph disks.
  • Maintenance: Put node in maintenance → migrate VMs off (or let HA relocate) → fence not triggered; perform RAM/drive work.

5.4 What “full HA” gives you (DoD-relevant)

  • No single point of failure: Storage replicated; compute can run on any node.
  • Automatic failover: No manual migration for HA-managed guests.
  • Controlled maintenance: Node can be taken down without losing services; documented procedures for patching and hardware changes.

6. DoD/MIL-Spec Compliance Framework

6.1 Alignment with DISA STIG / DoD requirements

DoD/MIL typically implies (summary; you must map to your exact ATO/contract):

Area Requirement Implementation
Hardening DISA STIG or equivalent for OS and applications Apply STIG/CIS to Debian (Proxmox host) and guests; document exceptions.
Authentication Strong auth, no default passwords, MFA where required SSH key-only on Proxmox; no password SSH; RBAC in Proxmox; MFA for critical UIs if required.
Access control Least privilege, RBAC, audit Proxmox roles and permissions; separate admin vs operator; audit logs.
Encryption TLS in transit; encryption at rest for sensitive data TLS 1.2+ for API and Ceph; at-rest encryption (Ceph or LUKS) as required.
Audit and logging Centralized, tamper-resistant, retention rsyslog/syslog-ng to central log host; retention per policy; integrity (e.g. signed/hash).
Change control Documented changes, rollback capability Change tickets; config in Git; backups before changes; runbooks.
Backup and recovery Regular backups, tested restore Proxmox backups to separate storage; Ceph snapshots; DR runbook and tests.
Physical and environmental Physical security, power, cooling Out of scope for this doc; document in facility plan.

6.2 Hardening checklist (Proxmox + Debian)

Use this as an operational checklist; align with your STIG version.

Proxmox hosts (Debian base):

  • SSH: Key-only auth; PasswordAuthentication no; PermitRootLogin prohibit-password or key-only; strong ciphers/KexAlgorithms.
  • Firewall: Restrict Proxmox API (8006) and SSH to management VLAN/CIDR; default deny.
  • Services: Disable unnecessary services; only Proxmox, Ceph, corosync, and required dependencies.
  • Session timeout: User session timeout (e.g. 900 s) in shell profile and/or Proxmox UI.
  • TLS: TLS 1.2+ only; strong ciphers for pveproxy and Ceph.
  • Updates: Security updates applied on a defined schedule; test in non-prod first.
  • FIPS: If required by contract, use FIPS-validated crypto (kernel/openssl); document and test.
  • File permissions: Sensitive files (keys, tokens) mode 600/400; no world-writable.
  • Audit: auditd or equivalent for critical files and commands; logs to central host.

Ceph:

  • Auth: Cephx enabled; key management per DoD key management policy.
  • Network: Cluster network isolated; no Ceph ports exposed to user VLANs.
  • Encryption: At-rest encryption for OSD if required; key escrow and rotation documented.

Guests (VMs/containers):

  • Per-guest hardening: STIG/CIS per OS (e.g. Ubuntu, RHEL); documented baseline.
  • Secrets: No secrets in configs in Git; use Vault or Proxmox secrets where applicable.

Existing automation (this repo): Use scripts/security/run-security-on-proxmox-hosts.sh (SSH key-only + firewall 8006), scripts/security/setup-ssh-key-auth.sh, and scripts/security/firewall-proxmox-8006.sh; extend to all 13 hosts and run with --apply after validating with --dry-run. Extend host list in scripts or via env (e.g. all R630 IPs).

6.3 Audit and documentation

  • Configuration baseline: All Proxmox and Ceph configs in version control; changes via PR/ticket.
  • Runbooks: Install, upgrade, add node, remove node, replace drive, fence test, backup/restore, disaster recovery.
  • Evidence: Run STIG/CIS scans (e.g. OpenSCAP, Nessus) and retain reports for assessors.
  • Change log: Document every change (who, when, why, ticket); link to runbook.

7. Phased Implementation

Phase 1 — Prepare (no downtime)

  1. IP and DNS: Assign and document 13 IPs for R630s; update config/ip-addresses.conf and DNS.
  2. RAM: Upgrade all 13 R630s to at least 128 GB (256 GB recommended); document DIMM layout.
  3. Drives: Install boot mirror (2 × SSD) and data drives (46 SSD per node) on each R630; configure ZFS mirror for boot.
  4. Proxmox install: Install Proxmox VE on all 13; same version; join to one cluster; configure VLAN-aware bridge and management IPs.
  5. Hardening: Apply SSH key-only, firewall, and STIG/CIS checklist to all nodes; document exceptions.

Phase 2 — Ceph

  1. Ceph install: Install Ceph on all 13 nodes (Proxmox Ceph integration); create MON (3 or 5), MGR (2), OSD (all nodes).
  2. Pools: Create replication pool (size=3, min_size=2) for VM disks; add as Proxmox storage.
  3. Network: Configure Ceph public and cluster networks; validate connectivity and latency.
  4. Tests: Fill and drain; kill OSD/node and verify recovery; document procedures.

Phase 3 — HA and fencing

  1. Fencing: Configure fence_pve (or IPMI/IDRAC) for each node; test fence from another node.
  2. HA manager: Enable HA in cluster; add critical VMs/containers as HA resources; set groups and order.
  3. Failover tests: Power off one node; verify fencing and HA restart on another node; repeat for 2-node failure if desired.
  4. Runbooks: Document failover test results and operational procedures.

Phase 4 — Migrate workload

  1. Migrate disks: Move VM/container disks from local storage to Ceph (live migration or backup/restore).
  2. Decommission local-only: Once all HA resources are on Ceph, remove or repurpose local LVM for non-HA or cache.
  3. Monitoring and alerting: Integrate with central monitoring; alerts for quorum loss, Ceph health, fence events, HA failures.

Phase 5 — DoD/MIL continuous compliance

  1. Scans: Schedule STIG/CIS scans; remediate and document exceptions.
  2. Backup and DR: Automate backups; test restore quarterly; update DR runbook.
  3. Change control: All changes via ticket + runbook; config in Git; periodic review of permissions and audit logs.

Document Purpose
PROXMOX_HA_CLUSTER_ROADMAP.md Current HA roadmap (3-node); extend to 13-node.
PROXMOX_CLUSTER_ARCHITECTURE.md Cluster and storage overview.
PHYSICAL_DRIVES_AND_CONFIG.md Current drive layout (existing 2 R630s + ml110).
Proxmox Ceph documentation Ceph in Proxmox.
Proxmox HA High Availability.
DISA STIG DISA STIGs; Debian/Ubuntu and application STIGs.
CIS Benchmarks CIS Benchmarks; Debian, Proxmox if available.

9. Summary Table

Item Specification
Nodes 13 × Dell PowerEdge R630
Quorum Majority 7; up to 6 nodes can fail
RAM per node Minimum 128 GB; recommended 256 GB (DoD production)
Boot 2 × SSD (e.g. 240480 GB) ZFS mirror per node
Data (Ceph) 46 × SSD (e.g. 480 GB 1 TB) per node, one OSD per drive
Shared storage Ceph replicated (size=3, min_size=2)
HA Proxmox HA manager; fencing (STONITH) required
Hardening STIG/CIS alignment; SSH key-only; firewall; TLS; audit; change control
Encryption TLS in transit; at-rest per policy (Ceph or LUKS)

Owner: Architecture / Infrastructure
Review: Quarterly or when adding nodes / changing compliance scope
Change control: Update version and “Last Updated” when changing this plan; link change ticket.