Files
proxmox/docs/04-configuration/VAULT_OPERATIONS_GUIDE.md
defiQUG fbda1b4beb
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:46:57 -08:00

6.4 KiB

Vault Operations Guide

Last Updated: 2026-02-01
Document Version: 1.0
Status: Active Documentation


Date: 2026-01-19
Status: Complete
Purpose: Day-to-day operations guide for Vault cluster


Quick Reference

Cluster Information


Daily Operations

Health Checks

Run health check script:

./scripts/vault-health-check.sh

With cluster status:

VAULT_TOKEN=<root-token> ./scripts/vault-health-check.sh

Check Cluster Status

ssh root@192.168.11.11 "pct exec 8640 -- bash -c 'export VAULT_ADDR=http://127.0.0.1:8200 && export VAULT_TOKEN=<token> && vault operator raft list-peers'"

Check Node Status

# Node 1
ssh root@192.168.11.11 "pct exec 8640 -- vault status"

# Node 2
ssh root@192.168.11.12 "pct exec 8641 -- vault status"

# Node 3
ssh root@192.168.11.11 "pct exec 8642 -- vault status"

Backup Operations

Manual Backup

VAULT_TOKEN=<root-token> ./scripts/vault-backup.sh

Automated Backups

Add to crontab:

# Daily backup at 2 AM
0 2 * * * cd /home/intlc/projects/proxmox && VAULT_TOKEN=<token> ./scripts/vault-backup.sh

Restore from Backup

# On Vault node
export VAULT_ADDR=http://127.0.0.1:8200
export VAULT_TOKEN=<root-token>
vault operator raft snapshot restore /path/to/backup.snapshot

Unsealing Operations

Unseal a Node

# On the node
export VAULT_ADDR=http://127.0.0.1:8200
vault operator unseal <key-1>
vault operator unseal <key-2>
vault operator unseal <key-3>

Unseal All Nodes

# Node 1
ssh root@192.168.11.11 "pct exec 8640 -- bash -c 'export VAULT_ADDR=http://127.0.0.1:8200 && vault operator unseal <key-1> && vault operator unseal <key-2> && vault operator unseal <key-3>'"

# Node 2
ssh root@192.168.11.12 "pct exec 8641 -- bash -c 'export VAULT_ADDR=http://127.0.0.1:8200 && vault operator unseal <key-1> && vault operator unseal <key-2> && vault operator unseal <key-3>'"

# Node 3
ssh root@192.168.11.11 "pct exec 8642 -- bash -c 'export VAULT_ADDR=http://127.0.0.1:8200 && vault operator unseal <key-1> && vault operator unseal <key-2> && vault operator unseal <key-3>'"

Secret Management

Create/Update Secret

vault kv put secret/phoenix/database/postgres \
  username=phoenix \
  password=new_password \
  host=db.example.com \
  port=5432 \
  database=phoenix

Read Secret

vault kv get secret/phoenix/database/postgres

List Secrets

vault kv list secret/phoenix/

Delete Secret

vault kv delete secret/phoenix/old-secret

Policy Management

List Policies

vault policy list

Read Policy

vault policy read phoenix-api-policy

Update Policy

vault policy write phoenix-api-policy - <<EOF
# Updated policy content
path "secret/data/phoenix/api/*" {
  capabilities = ["read"]
}
EOF

AppRole Management

List AppRoles

vault list auth/approle/role

Get Role ID

vault read auth/approle/role/phoenix-api/role-id

Generate Secret ID

vault write -f auth/approle/role/phoenix-api/secret-id

Rotate Secret ID

# Generate new secret ID
NEW_SECRET_ID=$(vault write -field=secret_id -f auth/approle/role/phoenix-api/secret-id)

# Update service configuration with new secret ID
# Then delete old secret IDs if needed

Monitoring

Enable Audit Logging

vault audit enable file file_path=/var/log/vault/audit.log

View Logs

# Service logs
ssh root@192.168.11.11 "pct exec 8640 -- journalctl -u vault -f"

# Audit logs
ssh root@192.168.11.11 "pct exec 8640 -- tail -f /var/log/vault/audit.log"

Metrics (if enabled)

curl http://192.168.11.200:8200/v1/sys/metrics?format=prometheus

Troubleshooting

Node Not Joining Cluster

  1. Check network connectivity:
ping 10.160.0.40
ping 10.160.0.41
ping 10.160.0.42
  1. Check Vault logs:
ssh root@192.168.11.11 "pct exec 8640 -- journalctl -u vault -n 50"
  1. Verify configuration:
ssh root@192.168.11.11 "pct exec 8640 -- cat /etc/vault.d/vault.hcl"

Service Won't Start

  1. Check service status:
ssh root@192.168.11.11 "pct exec 8640 -- systemctl status vault"
  1. Check configuration:
ssh root@192.168.11.11 "pct exec 8640 -- vault server -config=/etc/vault.d/vault.hcl -verify-only"
  1. Check logs:
ssh root@192.168.11.11 "pct exec 8640 -- journalctl -u vault -n 100"

Cluster Split-Brain

If cluster loses quorum:

  1. Identify nodes with latest data
  2. Remove failed nodes from cluster:
vault operator raft remove-peer <node-id>
  1. Rejoin nodes:
# Nodes will auto-rejoin via retry_join configuration

Maintenance

Restart Node

# Stop node
ssh root@192.168.11.11 "pct stop 8640"

# Start node
ssh root@192.168.11.11 "pct start 8640"

# Unseal after restart
ssh root@192.168.11.11 "pct exec 8640 -- bash -c 'export VAULT_ADDR=http://127.0.0.1:8200 && vault operator unseal <key-1> && vault operator unseal <key-2> && vault operator unseal <key-3>'"

Update Vault

  1. Backup cluster
  2. Update on one node at a time
  3. Restart node
  4. Unseal node
  5. Verify cluster health
  6. Repeat for other nodes

Scale Cluster

To add a node:

  1. Create new container
  2. Install Vault
  3. Configure with same cluster settings
  4. Start Vault
  5. Node will auto-join via retry_join

Emergency Procedures

Complete Cluster Failure

  1. Restore from latest backup
  2. Initialize new cluster if needed
  3. Restore Raft snapshot
  4. Unseal all nodes

Lost Unseal Keys

If unseal keys are lost:

  • Use recovery keys (if configured)
  • Or reinitialize cluster (data will be lost)

Data Corruption

  1. Stop affected node
  2. Restore from backup
  3. Restart node
  4. Verify data integrity


Status: Complete
Last Updated: 2026-01-19