Files
proxmox/docs/04-configuration/PHOENIX_VAULT_CLUSTER_DEPLOYMENT.md
defiQUG fbda1b4beb
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:46:57 -08:00

16 KiB
Raw Blame History

Sankofa Phoenix Vault Cluster - Full Redundancy Deployment

Last Updated: 2026-01-31
Document Version: 1.0
Status: Active Documentation


Date: 2025-01-27
Status: 📋 Deployment Plan
Purpose: Deploy fully redundant HashiCorp Vault cluster for Sankofa Phoenix


Executive Summary

This document provides a complete deployment plan for a highly available HashiCorp Vault cluster for Sankofa Phoenix with full redundancy, using Raft storage backend and distributed across multiple Proxmox nodes.


Architecture Overview

Cluster Design

Cluster Type: Raft-based High Availability (HA)
Node Count: 3 nodes (minimum for Raft consensus)
Redundancy: Full redundancy with automatic failover
Storage: Integrated Raft storage (no external storage required)

Network Configuration

Network: 192.168.11.0/24 (Main network, no VLAN)
Gateway: 192.168.11.1
IP Allocation: 192.168.11.200-202 (Vault cluster nodes)


VMID and IP Allocation

Vault Cluster Nodes

Node VMID Hostname IP Address Proxmox Host Status
Vault Node 1 8640 vault-phoenix-1 192.168.11.200 r630-01 (192.168.11.11) Deployed
Vault Node 2 8641 vault-phoenix-2 192.168.11.201 r630-02 (192.168.11.12) Deployed
Vault Node 3 8642 vault-phoenix-3 192.168.11.202 r630-01 (192.168.11.11) Deployed

Load Balancer / Service Discovery

Service IP Address Purpose
Vault API Endpoint 192.168.11.200-202 Any node (or use DNS round-robin)
Vault Cluster Endpoint 192.168.11.200-202:8201 Cluster communication

Note: For production, consider using a load balancer or DNS round-robin across all three nodes.


Resource Requirements

Per Node Specifications

Resource Allocation Notes
CPU Cores 2 Minimum for Vault operations
Memory 4GB Recommended for HA cluster
Storage 50GB Raft storage + logs
Network VLAN 160 Phoenix service network

Total Cluster Resources

  • Total CPU: 6 cores (2 per node × 3 nodes)
  • Total Memory: 12GB (4GB per node × 3 nodes)
  • Total Storage: 150GB (50GB per node × 3 nodes)

Deployment Steps

Quick Start

# Dry run first (recommended)
cd /home/intlc/projects/proxmox
DRY_RUN=true ./scripts/deploy-phoenix-vault-cluster.sh

# Live deployment
DRY_RUN=false ./scripts/deploy-phoenix-vault-cluster.sh

Manual Deployment (If Script Fails)

Phase 1: Container Creation

Node 1 (VMID 8640):

ssh root@192.168.11.11
pct create 8640 local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
  --hostname vault-phoenix-1 \
  --cores 2 --memory 4096 --swap 2048 \
  --storage local-lvm --rootfs local-lvm:50 \
  --net0 name=eth0,bridge=vmbr0,ip=192.168.11.200/24,gw=192.168.11.1 \
  --onboot 1 --unprivileged 0 \
  --features nesting=1
pct start 8640

Node 2 (VMID 8641):

ssh root@192.168.11.12
pct create 8641 local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
  --hostname vault-phoenix-2 \
  --cores 2 --memory 4096 --swap 2048 \
  --storage local-lvm --rootfs local-lvm:50 \
  --net0 name=eth0,bridge=vmbr0,ip=192.168.11.201/24,gw=192.168.11.1 \
  --onboot 1 --unprivileged 0 \
  --features nesting=1
pct start 8641

Node 3 (VMID 8642):

ssh root@192.168.11.11  # or r630-02
pct create 8642 local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
  --hostname vault-phoenix-3 \
  --cores 2 --memory 4096 --swap 2048 \
  --storage local-lvm --rootfs local-lvm:50 \
  --net0 name=eth0,bridge=vmbr0,ip=192.168.11.202/24,gw=192.168.11.1 \
  --onboot 1 --unprivileged 0 \
  --features nesting=1
pct start 8642

Phase 2: Vault Installation

On All Nodes:

# Enter container (repeat for each VMID: 8640, 8641, 8642)
pct enter 8640  # or 8641, 8642

# Update system
apt-get update
apt-get upgrade -y

# Install dependencies
apt-get install -y curl unzip wget gnupg software-properties-common jq

# Add HashiCorp GPG key
curl -fsSL https://apt.releases.hashicorp.com/gpg | apt-key add -

# Add HashiCorp repository
apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"

# Install Vault
apt-get update
apt-get install -y vault

# Verify installation
vault version

Phase 3: Vault Configuration

Node 1 Configuration (/etc/vault.d/vault.hcl):

ui = true

listener "tcp" {
  address          = "0.0.0.0:8200"
  cluster_address  = "192.168.11.200:8201"
  tls_disable      = 1  # Enable TLS in production
}

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "vault-phoenix-1"
  
  retry_join {
    leader_api_addr = "http://192.168.11.200:8200"
  }
  retry_join {
    leader_api_addr = "http://192.168.11.201:8200"
  }
  retry_join {
    leader_api_addr = "http://192.168.11.202:8200"
  }
}

api_addr = "http://192.168.11.200:8200"
cluster_addr = "http://192.168.11.200:8201"

log_level = "INFO"
log_file = "/var/log/vault/vault.log"
log_rotate_duration = "24h"
log_rotate_max_files = 30

Node 2 Configuration (/etc/vault.d/vault.hcl):

ui = true

listener "tcp" {
  address          = "0.0.0.0:8200"
  cluster_address  = "10.160.0.41:8201"
  tls_disable      = 1
}

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "vault-phoenix-2"
  
  retry_join {
    leader_api_addr = "http://10.160.0.40:8200"
  }
  retry_join {
    leader_api_addr = "http://10.160.0.41:8200"
  }
  retry_join {
    leader_api_addr = "http://10.160.0.42:8200"
  }
}

api_addr = "http://10.160.0.41:8200"
cluster_addr = "http://10.160.0.41:8201"

log_level = "INFO"
log_file = "/var/log/vault/vault.log"
log_rotate_duration = "24h"
log_rotate_max_files = 30

Node 3 Configuration (/etc/vault.d/vault.hcl):

ui = true

listener "tcp" {
  address          = "0.0.0.0:8200"
  cluster_address  = "10.160.0.42:8201"
  tls_disable      = 1
}

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "vault-phoenix-3"
  
  retry_join {
    leader_api_addr = "http://10.160.0.40:8200"
  }
  retry_join {
    leader_api_addr = "http://10.160.0.41:8200"
  }
  retry_join {
    leader_api_addr = "http://10.160.0.42:8200"
  }
}

api_addr = "http://10.160.0.42:8200"
cluster_addr = "http://10.160.0.42:8201"

log_level = "INFO"
log_file = "/var/log/vault/vault.log"
log_rotate_duration = "24h"
log_rotate_max_files = 30

Phase 4: Systemd Service Setup

On All Nodes:

# Create vault user
useradd --system --home /opt/vault --shell /bin/false vault

# Create directories
mkdir -p /opt/vault/data
mkdir -p /etc/vault.d
mkdir -p /var/log/vault
chown -R vault:vault /opt/vault
chown -R vault:vault /var/log/vault

# Create systemd service
cat > /etc/systemd/system/vault.service << 'EOF'
[Unit]
Description=HashiCorp Vault - A tool for managing secrets
Documentation=https://www.vaultproject.io/docs/
After=network-online.target
Wants=network-online.target
ConditionFileNotEmpty=/etc/vault.d/vault.hcl

[Service]
Type=notify
User=vault
Group=vault
ProtectSystem=full
ProtectHome=read-only
PrivateTmp=yes
PrivateDevices=yes
SecureBits=keep-caps
AmbientCapabilities=CAP_IPC_LOCK
CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK
NoNewPrivileges=yes
ExecStart=/usr/bin/vault server -config=/etc/vault.d/vault.hcl
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=5
TimeoutStopSec=30
StartLimitInterval=200
StartLimitBurst=5
LimitNOFILE=65536
LimitMEMLOCK=infinity

[Install]
WantedBy=multi-user.target
EOF

# Enable service
systemctl daemon-reload
systemctl enable vault

Phase 5: Cluster Initialization

Initialize Cluster (Node 1 Only):

# Enter Node 1
pct enter 8640

# Start Vault service
systemctl start vault

# Wait for service to start
sleep 5

# Initialize Vault (only on first node)
vault operator init \
  -key-shares=5 \
  -key-threshold=3 \
  -recovery-shares=5 \
  -recovery-threshold=3 \
  -format=json > /tmp/vault-init.json

# Save unseal keys securely
cat /tmp/vault-init.json | jq -r '.unseal_keys_b64[]'
cat /tmp/vault-init.json | jq -r '.root_token'

# Unseal Node 1 (requires 3 of 5 keys)
vault operator unseal <unseal-key-1>
vault operator unseal <unseal-key-2>
vault operator unseal <unseal-key-3>

Join Nodes 2 and 3 to Cluster:

# On Node 2
pct enter 8641
systemctl start vault

# On Node 3
pct enter 8642
systemctl start vault

# Nodes will automatically join via retry_join configuration
# Verify cluster status
vault operator raft list-peers

High Availability Features

Automatic Failover

  • Leader Election: Raft consensus automatically elects new leader if current leader fails
  • No Manual Intervention: Cluster continues operating with remaining nodes
  • Automatic Rejoin: Failed nodes automatically rejoin when restored

Data Redundancy

  • Raft Replication: All data replicated across all nodes
  • Consensus: Requires majority of nodes (2 of 3) for writes
  • Durability: Data persisted on all nodes

Network Redundancy

  • Multiple Nodes: Deployed across different Proxmox hosts
  • VLAN Isolation: Network isolation via VLAN 160
  • Load Distribution: Can use DNS round-robin or load balancer

Verification

Check Cluster Health

# On any node
vault status

# List all peers
vault operator raft list-peers

# Check health endpoints
curl http://10.160.0.40:8200/v1/sys/health
curl http://10.160.0.41:8200/v1/sys/health
curl http://10.160.0.42:8200/v1/sys/health

Expected Output

All nodes should show:

  • "sealed": false
  • "ha_enabled": true
  • "cluster_name": "vault-cluster-..."

Test Failover

# Stop Node 1 (leader)
pct stop 8640

# Verify cluster continues operating
vault status  # On Node 2 or 3

# Check new leader election
vault operator raft list-peers

# Restart Node 1
pct start 8640

# Verify it rejoins cluster
vault operator raft list-peers

Security Configuration

TLS/HTTPS (Production)

# Enable TLS in production
listener "tcp" {
  address          = "0.0.0.0:8200"
  cluster_address  = "10.160.0.40:8201"
  tls_cert_file    = "/opt/vault/tls/vault.crt"
  tls_key_file     = "/opt/vault/tls/vault.key"
  tls_min_version  = "1.2"
}

HSM Integration (Optional)

To add HSM backend for auto-unseal:

# Add HSM seal configuration
seal "pkcs11" {
  lib            = "/usr/lib/softhsm/libsofthsm2.so"
  slot           = "0"
  pin            = "your-hsm-pin"
  key_label      = "vault-hsm-key"
  hmac_key_label = "vault-hmac-key"
  generate_key   = "true"
}

Secret Organization Structure

Phoenix-Specific Paths

secret/
├── phoenix/
│   ├── api/
│   │   ├── jwt-secrets/
│   │   └── api-keys/
│   ├── database/
│   │   ├── postgres/
│   │   └── redis/
│   ├── keycloak/
│   │   ├── admin-credentials/
│   │   └── oidc-secrets/
│   └── services/
│       ├── blockchain/
│       ├── integrations/
│       └── monitoring/
├── sankofa/
│   ├── legacy-secrets/
│   └── migration-secrets/
└── infrastructure/
    ├── cloudflare/
    ├── proxmox/
    └── network/

Access Control Setup

Authentication Methods

# Enable AppRole for Phoenix services
vault auth enable approle

# Create role for Phoenix API
vault write auth/approle/role/phoenix-api \
  token_policies="phoenix-api-policy" \
  bind_secret_id=true \
  token_ttl=1h \
  token_max_ttl=4h

# Create role for Phoenix Portal
vault write auth/approle/role/phoenix-portal \
  token_policies="phoenix-portal-policy" \
  bind_secret_id=true \
  token_ttl=1h \
  token_max_ttl=4h

Policies

# Phoenix API policy
vault policy write phoenix-api-policy - <<EOF
# Read secrets for Phoenix API
path "secret/data/phoenix/api/*" {
  capabilities = ["read"]
}

path "secret/data/phoenix/database/*" {
  capabilities = ["read"]
}
EOF

# Phoenix Portal policy
vault policy write phoenix-portal-policy - <<EOF
# Read secrets for Phoenix Portal
path "secret/data/phoenix/api/jwt-secrets" {
  capabilities = ["read"]
}
EOF

Integration with Phoenix Services

Phoenix API Integration

// Example: Phoenix API connecting to Vault
import Vault from 'node-vault';

const vault = Vault({
  endpoint: 'http://10.160.0.40:8200',
  token: process.env.VAULT_TOKEN
});

// Get database credentials
const dbCreds = await vault.read('secret/data/phoenix/database/postgres');

Environment Variables

# Phoenix services should use:
VAULT_ADDR=http://10.160.0.40:8200
VAULT_ROLE_ID=<approle-role-id>
VAULT_SECRET_ID=<approle-secret-id>

Monitoring and Maintenance

Health Checks

# Cluster health endpoint
curl http://10.160.0.40:8200/v1/sys/health

# Metrics endpoint (if enabled)
curl http://10.160.0.40:8200/v1/sys/metrics

Logging

  • Log Location: /var/log/vault/vault.log
  • Log Rotation: 24 hours, 30 files retained
  • Log Level: INFO (adjustable)

Backup Procedures

# Snapshot Raft storage
vault operator raft snapshot save /backup/vault-snapshot-$(date +%Y%m%d).snapshot

# Restore from snapshot
vault operator raft snapshot restore /backup/vault-snapshot-YYYYMMDD.snapshot

Disaster Recovery

Backup Strategy

  1. Raft Snapshots: Daily automated snapshots
  2. Off-site Storage: Backup snapshots to secure location
  3. Key Management: Secure storage of unseal keys and root token

Recovery Procedures

  1. Node Failure: Automatic failover, no action needed
  2. Cluster Failure: Restore from snapshot, reinitialize if needed
  3. Data Loss: Restore from latest snapshot

Deployment Checklist

Pre-Deployment

  • Verify VLAN 160 is configured on Proxmox hosts
  • Verify IP addresses 10.160.0.40-42 are available
  • Verify VMIDs 8640-8642 are available
  • Verify storage capacity on target hosts

Deployment

  • Create container 8640 (Node 1)
  • Create container 8641 (Node 2)
  • Create container 8642 (Node 3)
  • Install Vault on all nodes
  • Configure Vault on all nodes
  • Start Vault services
  • Initialize cluster (Node 1)
  • Join nodes to cluster
  • Verify cluster health

Post-Deployment

  • Configure authentication methods
  • Create policies
  • Set up secret paths
  • Configure monitoring
  • Test failover scenarios
  • Document access procedures
  • Set up backup procedures

Cost Estimation

Infrastructure Costs

  • Compute: 6 CPU cores, 12GB RAM (existing Proxmox infrastructure)
  • Storage: 150GB (existing Proxmox storage)
  • Network: VLAN 160 (existing network infrastructure)
  • Total: No additional infrastructure costs (uses existing resources)

Optional Costs

  • HSM Integration: See MASTER_SECRETS_INVENTORY.md for HSM costs
  • TLS Certificates: Let's Encrypt (free) or commercial certificates
  • Monitoring: Existing monitoring infrastructure

Timeline

Phase Duration Activities
Phase 1 1 hour Container creation
Phase 2 30 min Vault installation
Phase 3 1 hour Configuration
Phase 4 30 min Service setup
Phase 5 30 min Cluster initialization
Phase 6 1 hour Verification and testing
Total ~4.5 hours Complete deployment

Next Steps

  1. Review and Approve Plan

    • Review this deployment plan
    • Verify resource availability
    • Approve VMID and IP allocations
  2. Begin Deployment

    • Follow deployment steps in order
    • Verify each phase before proceeding
    • Document any deviations
  3. Post-Deployment

    • Migrate secrets from existing Vault (VMID 108) if needed
    • Update Phoenix services to use new Vault cluster
    • Set up monitoring and alerting


Status: 📋 Ready for Deployment
Last Updated: 2025-01-27