Files

Deploy to Phoenix / deploy (push) Has been cancelled

Details

chore: update submodule references and documentation

- Marked submodules ai-mcp-pmm-controller, explorer-monorepo, and smom-dbis-138 as dirty to reflect recent changes.
- Updated documentation to clarify operator script usage, including dotenv loading and task execution instructions.
- Enhanced the README and various index files to provide clearer navigation and task completion guidance.

Made-with: Cursor

2026-03-04 02:03:08 -08:00

16 KiB

Raw Blame History

13× R630 Proxmox Cluster — DoD/MIL-Spec HA Master Plan

Last Updated: 2026-03-02
Document Version: 1.0
Status: Active — Master plan for 13-node HA, RAM/storage, and DoD/MIL compliance

1. Executive Summary

This document defines the target architecture for a 13-node Dell PowerEdge R630 Proxmox cluster with:

Full HA and failover (shared storage, HA manager, fencing, automatic recovery).
DoD/MIL-spec alignment (STIG-style hardening, audit, encryption, change control, documentation).
RAM and drive specifications for each R630 to support Ceph, VMs/containers, and growth.

Scope: All 13 R630s as Proxmox cluster nodes; optional separate management node (e.g. ml110) or integration of management on a subset of R630s. Design assumes hyper-converged (Proxmox + Ceph on same nodes) for shared storage and true HA.

Extended inventory: The same site includes 3× Dell R750 servers, 2× Dell Precision 7920 workstations, and 2× UniFi Dream Machine Pro (gateways). See HARDWARE_INVENTORY_MASTER.md, 13_NODE_NETWORK_AND_CABLING_CHECKLIST.md, and 13_NODE_AND_ASSETS_BRING_ONLINE_CHECKLIST.md for network topology, cabling, and bring-online order.

2. Cluster Design — 13 Nodes

2.1 Node roles and quorum

Item	Requirement
Total nodes	13 × R630
Quorum	Majority = 7. With 13 nodes, up to 6 can be down and cluster still has quorum.
Fencing	Required for HA: failed node must be fenced (power off/reboot) so Ceph and HA manager can safely restart resources elsewhere.
Qdevice	Optional: add a quorum device (e.g. small VM or appliance) so quorum survives more node failures; not required with 13 nodes but improves resilience.

2.2 Recommended node layout

Role	Node count	Purpose
Proxmox + Ceph MON/MGR/OSD	13	Every R630 runs Proxmox and participates in Ceph (MON, MGR, OSD) for shared storage.
Ceph OSD	13	Each node contributes disk as Ceph OSD; replication (e.g. size=3, min_size=2) across nodes.
Proxmox HA	13	HA manager can restart VMs/containers on any node; VM disks on Ceph.
Optional dedicated	0	No dedicated “monitor-only” nodes required; MON/MGR run on all or a subset (e.g. 3–5 MONs).

2.3 Network and addressing

Management: One subnet (e.g. 192.168.11.0/24) for Proxmox API, SSH, Ceph public/cluster.
Ceph: Separate VLAN or subnet for Ceph cluster network (recommended for DoD: isolate storage traffic).
VLANs: Same VLAN-aware bridge (e.g. vmbr0) on all nodes so VMs/containers keep IPs when failed over.
IP plan for 13 R630s: Reserve 13 consecutive IPs (e.g. 192.168.11.11–192.168.11.23 for r630-01 … r630-13). Document in config/ip-addresses.conf and DNS.

2.4 Switching (10G backbone)

Inventory: 2 × UniFi XG 10G 16-port switches (see HARDWARE_INVENTORY_MASTER.md).

Use for Ceph cluster network and inter-node traffic; connect all 13 R630s via 10G for storage and replication.
Redundancy: Two switches allow dual-attach per node (e.g. one link per switch or LACP) for HA.
Management: Can stay on existing 1G LAN or use 10G for management if NICs support it.

3. RAM Specifications — R630

3.1 R630 memory capabilities (reference)

Spec	Value
DIMM slots	24 (12 per socket in 2-socket)
Max RAM	Up to 1.5 TB (with compatible LRDIMMs)
Typical configs	32 GB, 64 GB, 128 GB, 256 GB, 384 GB, 512 GB (depending on DIMM size and count)
ECC	Required for DoD/MIL; R630 supports ECC RDIMM/LRDIMM

3.2 Recommended RAM per node (DoD HA + Ceph)

Tier	RAM per node	Use case
Minimum	128 GB	Ceph OSD + a few VMs; acceptable for lab or light production.
Recommended	256 GB	Production: Ceph (OSD + MON/MGR) + many VMs/containers; headroom for failover and recovery.
High	384–512 GB	Heavy workloads, large Ceph OSD count per node, or when consolidating from existing 503 GB nodes.

Ceph guidance: Proxmox/Ceph recommend ≥ 8 GiB per OSD for OSD memory. With 6–8 OSDs per node (see storage), 48–64 GiB for Ceph plus Proxmox and guest overhead → 128 GB minimum, 256 GB recommended.

DoD/MIL note: Prefer 256 GB per node for 13-node production so that (1) multiple node failures still leave enough capacity for HA migrations and (2) Ceph recovery and rebalancing do not cause OOM or instability.

3.3 RAM placement (if mixing sizes)

If not all nodes have the same RAM:

Put largest RAM in nodes that run the most VMs or Ceph MON/MGR.
Ensure at least 128 GB on every node that runs Ceph OSDs.
Document exact DIMM layout per node (slot, size, speed) for change control and troubleshooting.

4. Drive Specifications — R630

4.1 R630 drive options (reference)

Internal bays: Typically 8 × 2.5" SATA/SAS (or 10-bay with optional kit); some configs support NVMe (e.g. 4 × NVMe via PCIe).
Boot: 2 drives in mirror (ZFS mirror or hardware RAID1) for Proxmox OS — redundant, DoD-compliant.
Data: Remaining drives for Ceph OSD and/or local LVM (if hybrid).

4.2 Recommended drive layout per R630 (full Ceph)

Purpose	Drives	Type	Size (example)	Configuration
Boot (OS)	2	SSD	240–480 GB each	ZFS mirror (preferred) or HW RAID1; Proxmox root only.
Ceph OSD	4–6	SSD (or NVMe)	480 GB – 1 TB each	One OSD per drive; no RAID (Ceph provides replication).

Example per node: 2 × 480 GB boot (ZFS mirror) + 6 × 960 GB SSD = 6 Ceph OSDs per node.
Cluster total: 13 × 6 = 78 OSDs; with replication 3×, usable capacity ≈ (78 × 0.9 TB) / 3 ≈ ~23 TB (before bluestore overhead; adjust for actual sizes).

4.3 DoD/MIL storage requirements

Encryption: At-rest encryption for sensitive data. Options: Ceph encryption (e.g. dm-crypt for OSD), or encrypted VMs (LUKS inside guest). Document which layers are encrypted and key management.
Integrity: ZFS for boot (checksum, scrub). Ceph provides replication and recovery; use bluestore with checksums.
Sanitization: Follow DoD 5220.22-M or NIST SP 800-88 for decommissioning/destruction of drives.
Spare: Maintain spare drives and document replacement and wipe procedures.

4.4 Sizing for your workload

Current (from docs): ~50+ VMIDs, mix of Besu, Blockscout, DBIS, NPMplus, etc.; growth ~20–50 GB/month.
Target: Size Ceph pool so that used + 2 years growth stays < 75% of usable. Example: 15–20 TB usable → ~5–7 TB used now + growth headroom.

5. Full HA and Failover Architecture

5.1 Components

Component	Role
Proxmox cluster	13 nodes; same cluster name; corosync for quorum.
Ceph	Shared storage: MON (3–5 nodes), MGR (2+), OSD on all 13. Replication size=3, min_size=2.
Proxmox HA	HA manager enabled; VMs/containers on Ceph added as HA resources; start/stop order and groups as needed.
Fencing (STONITH)	Mandatory: when a node is declared lost, fence device powers it off (or reboots) so Ceph and HA can safely reassign resources. Use Proxmox’s built-in fence agents (e.g. fence_pve with Proxmox API or IPMI/IDRAC).
Network	Redundant links where possible; same VLAN/bridge config on all nodes so failover does not change VM IPs.

5.2 Ceph design (summary)

Pools: At least one pool for VM/container disks (e.g. ceph-vm); optionally separate pool for backups or bulk data.
Replication: size=3, min_size=2; tolerate 2 node failures without data loss (with 13 nodes).
Network: Separate cluster network (e.g. 10.x or dedicated VLAN) for Ceph backend traffic; public for client (Proxmox) access.
MON/MGR: 3 or 5 MONs (odd); 2 MGRs minimum. Spread across nodes for availability.

5.3 HA resource and failover behavior

HA resources: Add each critical VM/CT as HA resource; define groups (e.g. “database first, then app”) and restart order.
Failure: Node down → fencing → Ceph marks OSDs out → HA manager restarts VMs on other nodes using Ceph disks.
Maintenance: Put node in maintenance → migrate VMs off (or let HA relocate) → fence not triggered; perform RAM/drive work.

5.4 What “full HA” gives you (DoD-relevant)

No single point of failure: Storage replicated; compute can run on any node.
Automatic failover: No manual migration for HA-managed guests.
Controlled maintenance: Node can be taken down without losing services; documented procedures for patching and hardware changes.

6. DoD/MIL-Spec Compliance Framework

6.1 Alignment with DISA STIG / DoD requirements

DoD/MIL typically implies (summary; you must map to your exact ATO/contract):

Area	Requirement	Implementation
Hardening	DISA STIG or equivalent for OS and applications	Apply STIG/CIS to Debian (Proxmox host) and guests; document exceptions.
Authentication	Strong auth, no default passwords, MFA where required	SSH key-only on Proxmox; no password SSH; RBAC in Proxmox; MFA for critical UIs if required.
Access control	Least privilege, RBAC, audit	Proxmox roles and permissions; separate admin vs operator; audit logs.
Encryption	TLS in transit; encryption at rest for sensitive data	TLS 1.2+ for API and Ceph; at-rest encryption (Ceph or LUKS) as required.
Audit and logging	Centralized, tamper-resistant, retention	rsyslog/syslog-ng to central log host; retention per policy; integrity (e.g. signed/hash).
Change control	Documented changes, rollback capability	Change tickets; config in Git; backups before changes; runbooks.
Backup and recovery	Regular backups, tested restore	Proxmox backups to separate storage; Ceph snapshots; DR runbook and tests.
Physical and environmental	Physical security, power, cooling	Out of scope for this doc; document in facility plan.

6.2 Hardening checklist (Proxmox + Debian)

Use this as an operational checklist; align with your STIG version.

Proxmox hosts (Debian base):

SSH: Key-only auth; PasswordAuthentication no; PermitRootLogin prohibit-password or key-only; strong ciphers/KexAlgorithms.
Firewall: Restrict Proxmox API (8006) and SSH to management VLAN/CIDR; default deny.
Services: Disable unnecessary services; only Proxmox, Ceph, corosync, and required dependencies.
Session timeout: User session timeout (e.g. 900 s) in shell profile and/or Proxmox UI.
TLS: TLS 1.2+ only; strong ciphers for pveproxy and Ceph.
Updates: Security updates applied on a defined schedule; test in non-prod first.
FIPS: If required by contract, use FIPS-validated crypto (kernel/openssl); document and test.
File permissions: Sensitive files (keys, tokens) mode 600/400; no world-writable.
Audit: auditd or equivalent for critical files and commands; logs to central host.

Ceph:

Auth: Cephx enabled; key management per DoD key management policy.
Network: Cluster network isolated; no Ceph ports exposed to user VLANs.
Encryption: At-rest encryption for OSD if required; key escrow and rotation documented.

Guests (VMs/containers):

Per-guest hardening: STIG/CIS per OS (e.g. Ubuntu, RHEL); documented baseline.
Secrets: No secrets in configs in Git; use Vault or Proxmox secrets where applicable.

Existing automation (this repo): Use scripts/security/run-security-on-proxmox-hosts.sh (SSH key-only + firewall 8006), scripts/security/setup-ssh-key-auth.sh, and scripts/security/firewall-proxmox-8006.sh; extend to all 13 hosts and run with --apply after validating with --dry-run. Extend host list in scripts or via env (e.g. all R630 IPs).

6.3 Audit and documentation

Configuration baseline: All Proxmox and Ceph configs in version control; changes via PR/ticket.
Runbooks: Install, upgrade, add node, remove node, replace drive, fence test, backup/restore, disaster recovery.
Evidence: Run STIG/CIS scans (e.g. OpenSCAP, Nessus) and retain reports for assessors.
Change log: Document every change (who, when, why, ticket); link to runbook.

7. Phased Implementation

Phase 1 — Prepare (no downtime)

IP and DNS: Assign and document 13 IPs for R630s; update config/ip-addresses.conf and DNS.
RAM: Upgrade all 13 R630s to at least 128 GB (256 GB recommended); document DIMM layout.
Drives: Install boot mirror (2 × SSD) and data drives (4–6 SSD per node) on each R630; configure ZFS mirror for boot.
Proxmox install: Install Proxmox VE on all 13; same version; join to one cluster; configure VLAN-aware bridge and management IPs.
Hardening: Apply SSH key-only, firewall, and STIG/CIS checklist to all nodes; document exceptions.

Phase 2 — Ceph

Ceph install: Install Ceph on all 13 nodes (Proxmox Ceph integration); create MON (3 or 5), MGR (2), OSD (all nodes).
Pools: Create replication pool (size=3, min_size=2) for VM disks; add as Proxmox storage.
Network: Configure Ceph public and cluster networks; validate connectivity and latency.
Tests: Fill and drain; kill OSD/node and verify recovery; document procedures.

Phase 3 — HA and fencing

Fencing: Configure fence_pve (or IPMI/IDRAC) for each node; test fence from another node.
HA manager: Enable HA in cluster; add critical VMs/containers as HA resources; set groups and order.
Failover tests: Power off one node; verify fencing and HA restart on another node; repeat for 2-node failure if desired.
Runbooks: Document failover test results and operational procedures.

Phase 4 — Migrate workload

Migrate disks: Move VM/container disks from local storage to Ceph (live migration or backup/restore).
Decommission local-only: Once all HA resources are on Ceph, remove or repurpose local LVM for non-HA or cache.
Monitoring and alerting: Integrate with central monitoring; alerts for quorum loss, Ceph health, fence events, HA failures.

Phase 5 — DoD/MIL continuous compliance

Scans: Schedule STIG/CIS scans; remediate and document exceptions.
Backup and DR: Automate backups; test restore quarterly; update DR runbook.
Change control: All changes via ticket + runbook; config in Git; periodic review of permissions and audit logs.

Document	Purpose
PROXMOX_HA_CLUSTER_ROADMAP.md	Current HA roadmap (3-node); extend to 13-node.
PROXMOX_CLUSTER_ARCHITECTURE.md	Cluster and storage overview.
PHYSICAL_DRIVES_AND_CONFIG.md	Current drive layout (existing 2 R630s + ml110).
Proxmox Ceph documentation	Ceph in Proxmox.
Proxmox HA	High Availability.
DISA STIG	DISA STIGs; Debian/Ubuntu and application STIGs.
CIS Benchmarks	CIS Benchmarks; Debian, Proxmox if available.

9. Summary Table

Item	Specification
Nodes	13 × Dell PowerEdge R630
Quorum	Majority 7; up to 6 nodes can fail
RAM per node	Minimum 128 GB; recommended 256 GB (DoD production)
Boot	2 × SSD (e.g. 240–480 GB) ZFS mirror per node
Data (Ceph)	4–6 × SSD (e.g. 480 GB – 1 TB) per node, one OSD per drive
Shared storage	Ceph replicated (size=3, min_size=2)
HA	Proxmox HA manager; fencing (STONITH) required
Hardening	STIG/CIS alignment; SSH key-only; firewall; TLS; audit; change control
Encryption	TLS in transit; at-rest per policy (Ceph or LUKS)

Owner: Architecture / Infrastructure
Review: Quarterly or when adding nodes / changing compliance scope
Change control: Update version and “Last Updated” when changing this plan; link change ticket.

16 KiB Raw Blame History Unescape Escape