- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands - CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround - CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check - NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere - MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates - LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference Co-authored-by: Cursor <cursoragent@cursor.com>
20 KiB
NPMplus High Availability (HA) Setup Guide
Last Updated: 2026-01-31
Document Version: 1.0
Status: Active Documentation
Date: 2026-01-20
Status: Complete HA Architecture Guide
Purpose: Comprehensive guide for deploying High Availability NPMplus architecture
Overview
This guide provides step-by-step instructions for deploying a highly available NPMplus setup to eliminate the single point of failure in the ingress architecture.
Current Architecture
- Single NPMplus Instance: VMID 10233 on r630-01 (192.168.11.166)
- Single Point of Failure: All 19+ domains depend on one container
- No Redundancy: Container failure = complete ingress outage
Target HA Architecture
- Multiple NPMplus Instances: Primary + Secondary (optionally Tertiary)
- Shared Storage: Database and certificates synchronized
- Load Balancer: Distributes traffic across instances
- Automatic Failover: Health checks and automatic routing
HA Architecture Options
Option 1: Active-Passive with Keepalived (Recommended for Start)
Architecture:
Internet
↓
Cloudflare DNS → 76.53.10.36
↓
UDM Pro Port Forward (80/443)
↓
Keepalived Virtual IP (192.168.11.166)
├─ Primary NPMplus (VMID 10233) - Active
└─ Secondary NPMplus (VMID 10234) - Standby
↓
Backend VMs
Pros:
- Simple configuration
- No changes to existing DNS/port forwarding
- Automatic failover
- Single active instance (easier certificate management)
Cons:
- Secondary instance idle (no load distribution)
- Requires shared storage for certificates
Option 2: Active-Active with HAProxy Load Balancer
Architecture:
Internet
↓
Cloudflare DNS → 76.53.10.36
↓
UDM Pro Port Forward (80/443)
↓
HAProxy (192.168.11.166)
├─ Primary NPMplus (VMID 10233) - Active
└─ Secondary NPMplus (VMID 10234) - Active
↓
Backend VMs
Pros:
- Load distribution across instances
- Better resource utilization
- Automatic failover
- Can handle more traffic
Cons:
- More complex configuration
- Requires shared storage for database and certificates
- Need to handle SSL termination at HAProxy or NPMplus
Option 3: Active-Active with Shared Database (Advanced)
Architecture:
Internet
↓
Cloudflare DNS → 76.53.10.36
↓
UDM Pro Port Forward (80/443)
↓
Keepalived Virtual IP (192.168.11.166)
├─ Primary NPMplus (VMID 10233)
└─ Secondary NPMplus (VMID 10234)
↓ (Shared Resources)
├─ PostgreSQL/MariaDB Database (Shared)
├─ NFS/GlusterFS for Certificates (Shared)
└─ Shared Configuration Storage
↓
Backend VMs
Pros:
- True active-active (both instances serving traffic)
- Shared database ensures configuration sync
- Shared certificate storage
Cons:
- Most complex to implement
- Requires external database
- Requires shared file storage (NFS/GlusterFS)
- NPMplus uses SQLite (would need migration)
Recommended Approach: Active-Passive with Keepalived
For the initial HA implementation, Option 1 (Active-Passive with Keepalived) is recommended because:
- Minimal changes to existing architecture
- Reuses existing NPMplus configuration
- Easier to implement and test
- Can be upgraded to active-active later
This guide focuses on Option 1, with notes on how to upgrade to Option 2 later.
Prerequisites
Infrastructure Requirements
- Primary Proxmox Host: r630-01 (192.168.11.11) - Existing NPMplus
- Secondary Proxmox Host: r630-02 (192.168.11.12) or ml110 (192.168.11.10) - For secondary NPMplus
- Shared Storage: NFS or rsync-based synchronization for certificates
- Network: Both hosts on same VLAN (192.168.11.0/24)
Software Requirements
- Keepalived (for virtual IP)
- rsync or NFS (for certificate synchronization)
- Monitoring tools (for health checks)
Current NPMplus Details
- VMID: 10233
- Host: r630-01 (192.168.11.11)
- Container IP: 192.168.11.166 (eth0)
- Management Port: 81
- Database:
/data/database.sqlite - Certificates:
/data/tls/certbot/live/
Step-by-Step Implementation
Phase 1: Prepare Secondary NPMplus Instance
Step 1.1: Create Secondary NPMplus Container
Target: VMID 10234 on r630-02 (192.168.11.12)
# On Proxmox host (r630-02)
CTID=10234
HOSTNAME="npmplus-secondary"
IP="192.168.11.168"
BRIDGE="vmbr0"
# Download Alpine template
pveam download local alpine-3.22-default_20241208_amd64.tar.xz
# Create container
pct create $CTID \
local:vztmpl/alpine-3.22-default_20241208_amd64.tar.xz \
--hostname $HOSTNAME \
--memory 1024 \
--cores 2 \
--rootfs local-lvm:5 \
--net0 name=eth0,bridge=$BRIDGE,ip=$IP/24,gw=192.168.11.1 \
--unprivileged 1 \
--features nesting=1
# Start container
pct start $CTID
# Wait for container to be ready
sleep 10
Step 1.2: Install NPMplus on Secondary Instance
# SSH to Proxmox host
ssh root@192.168.11.12
# Enter container
pct exec 10234 -- ash
# Install dependencies
apk update
apk add --no-cache tzdata gawk yq docker docker-compose curl bash rsync
# Start Docker
rc-service docker start
rc-update add docker default
# Wait for Docker
sleep 5
# Fetch NPMplus compose file
cd /opt
curl -fsSL "https://raw.githubusercontent.com/ZoeyVid/NPMplus/refs/heads/develop/compose.yaml" -o compose.yaml
# Update compose file with timezone and email
TZ="America/New_York"
ACME_EMAIL="nsatoshi2007@hotmail.com"
yq -i "
.services.npmplus.environment |=
(map(select(. != \"TZ=*\" and . != \"ACME_EMAIL=*\")) +
[\"TZ=$TZ\", \"ACME_EMAIL=$ACME_EMAIL\"])
" compose.yaml
# Start NPMplus (DO NOT start services yet - will sync config first)
docker compose up -d
Step 1.3: Configure Secondary Container Network
# Secondary container should have static IP
# VMID 10234: 192.168.11.167 (eth0)
# Verify IP
pct exec 10234 -- ip addr show eth0
Phase 2: Set Up Certificate Synchronization
Step 2.1: Create Certificate Sync Script
Location: scripts/npmplus/sync-certificates.sh
#!/bin/bash
# Synchronize NPMplus certificates from primary to secondary
set -euo pipefail
PRIMARY_HOST="192.168.11.11"
PRIMARY_VMID="10233"
SECONDARY_HOST="192.168.11.12"
SECONDARY_VMID="10234"
CERT_PATH="/data/tls/certbot/live"
# Colors
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m'
log_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
log_info "Starting certificate synchronization..."
# Sync certificates from primary to secondary
rsync -avz --delete \
-e "ssh -o StrictHostKeyChecking=no" \
root@$PRIMARY_HOST:"/var/lib/vz/containers/$PRIMARY_VMID/var/lib/docker/volumes/npmplus_data/_data/tls/certbot/live/" \
root@$SECONDARY_HOST:"/var/lib/vz/containers/$SECONDARY_VMID/var/lib/docker/volumes/npmplus_data/_data/tls/certbot/live/"
log_info "Certificate synchronization complete"
Make executable:
chmod +x scripts/npmplus/sync-certificates.sh
Step 2.2: Set Up Automated Certificate Sync
Cron Job (runs every 5 minutes):
# On primary Proxmox host (r630-01)
crontab -e
# Add:
*/5 * * * * /home/intlc/projects/proxmox/scripts/npmplus/sync-certificates.sh >> /var/log/npmplus-cert-sync.log 2>&1
Phase 3: Set Up Keepalived for Virtual IP
Step 3.1: Install Keepalived on Proxmox Hosts
# On both primary and secondary Proxmox hosts
apt update
apt install -y keepalived
Step 3.2: Configure Keepalived on Primary Host (r630-01)
File: /etc/keepalived/keepalived.conf
vrrp_script chk_npmplus {
script "/usr/local/bin/check-npmplus-health.sh"
interval 5
weight -10
fall 2
rise 2
}
vrrp_instance VI_NPMPLUS {
state MASTER
interface vmbr0
virtual_router_id 51
priority 110
advert_int 1
authentication {
auth_type PASS
auth_pass npmplus_ha_2024
}
virtual_ipaddress {
192.168.11.166/24
}
track_script {
chk_npmplus
}
notify_master "/usr/local/bin/keepalived-notify.sh master"
notify_backup "/usr/local/bin/keepalived-notify.sh backup"
notify_fault "/usr/local/bin/keepalived-notify.sh fault"
}
Step 3.3: Configure Keepalived on Secondary Host (r630-02)
File: /etc/keepalived/keepalived.conf
vrrp_script chk_npmplus {
script "/usr/local/bin/check-npmplus-health.sh"
interval 5
weight -10
fall 2
rise 2
}
vrrp_instance VI_NPMPLUS {
state BACKUP
interface vmbr0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass npmplus_ha_2024
}
virtual_ipaddress {
192.168.11.166/24
}
track_script {
chk_npmplus
}
notify_master "/usr/local/bin/keepalived-notify.sh master"
notify_backup "/usr/local/bin/keepalived-notify.sh backup"
notify_fault "/usr/local/bin/keepalived-notify.sh fault"
}
Step 3.4: Create Health Check Script
File: /usr/local/bin/check-npmplus-health.sh (on both hosts)
#!/bin/bash
# Check NPMplus health and return 0 if healthy, 1 if unhealthy
PRIMARY_HOST="192.168.11.11"
PRIMARY_VMID="10233"
SECONDARY_HOST="192.168.11.12"
SECONDARY_VMID="10234"
HOSTNAME=$(hostname)
if [ "$HOSTNAME" = "r630-01" ]; then
VMID=$PRIMARY_VMID
elif [ "$HOSTNAME" = "r630-02" ]; then
VMID=$SECONDARY_VMID
else
exit 1
fi
# Check if container is running
if ! pct status $VMID 2>/dev/null | grep -q "running"; then
exit 1
fi
# Check if NPMplus container is healthy
if ! pct exec $VMID -- docker ps --filter "name=npmplus" --format "{{.Status}}" | grep -q "healthy\|Up"; then
exit 1
fi
# Check if NPMplus web interface responds
if ! pct exec $VMID -- curl -s -k -f -o /dev/null --max-time 5 https://localhost:81 >/dev/null 2>&1; then
exit 1
fi
# All checks passed
exit 0
Make executable:
chmod +x /usr/local/bin/check-npmplus-health.sh
Step 3.5: Create Notification Script
File: /usr/local/bin/keepalived-notify.sh (on both hosts)
#!/bin/bash
# Handle Keepalived state changes
STATE=$1
LOGFILE="/var/log/keepalived-notify.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
case "$STATE" in
"master")
echo "[$TIMESTAMP] Transitioned to MASTER - This node now owns VIP 192.168.11.166" >> "$LOGFILE"
# Optionally: Start services, send alerts, etc.
;;
"backup")
echo "[$TIMESTAMP] Transitioned to BACKUP - Standby mode" >> "$LOGFILE"
;;
"fault")
echo "[$TIMESTAMP] Transitioned to FAULT - Health check failed" >> "$LOGFILE"
# Optionally: Send critical alerts
;;
esac
Make executable:
chmod +x /usr/local/bin/keepalived-notify.sh
Step 3.6: Start Keepalived
# On both hosts
systemctl enable keepalived
systemctl start keepalived
# Verify status
systemctl status keepalived
ip addr show vmbr0 | grep 192.168.11.166
Phase 4: Sync Configuration to Secondary
Step 4.1: Export Primary Configuration
Script: scripts/npmplus/export-primary-config.sh
#!/bin/bash
# Export primary NPMplus configuration
PRIMARY_HOST="192.168.11.11"
PRIMARY_VMID="10233"
BACKUP_DIR="/tmp/npmplus-config-backup-$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
# Export database
ssh root@$PRIMARY_HOST "pct exec $PRIMARY_VMID -- docker exec npmplus sqlite3 /data/database.sqlite '.dump'" > "$BACKUP_DIR/database.sql"
# Export proxy hosts via API (if available)
NPM_URL="https://192.168.11.166:81"
NPM_EMAIL="nsatoshi2007@hotmail.com"
NPM_PASSWORD="your-password" # Update from .env
TOKEN_RESPONSE=$(curl -s -k -X POST "$NPM_URL/api/tokens" \
-H "Content-Type: application/json" \
-d "{\"identity\":\"$NPM_EMAIL\",\"secret\":\"$NPM_PASSWORD\"}")
TOKEN=$(echo "$TOKEN_RESPONSE" | jq -r '.token')
curl -s -k -X GET "$NPM_URL/api/nginx/proxy-hosts" \
-H "Authorization: Bearer $TOKEN" | jq '.' > "$BACKUP_DIR/proxy_hosts.json"
curl -s -k -X GET "$NPM_URL/api/nginx/certificates" \
-H "Authorization: Bearer $TOKEN" | jq '.' > "$BACKUP_DIR/certificates.json"
echo "Configuration exported to $BACKUP_DIR"
Step 4.2: Import Configuration to Secondary
Script: scripts/npmplus/import-secondary-config.sh
#!/bin/bash
# Import configuration to secondary NPMplus
SECONDARY_HOST="192.168.11.12"
SECONDARY_VMID="10234"
BACKUP_DIR="$1" # Path to backup directory from Step 4.1
if [ -z "$BACKUP_DIR" ] || [ ! -d "$BACKUP_DIR" ]; then
echo "Usage: $0 <backup-directory>"
exit 1
fi
# Import database (requires stopping NPMplus first)
ssh root@$SECONDARY_HOST "pct exec $SECONDARY_VMID -- docker stop npmplus"
# Copy database backup
scp "$BACKUP_DIR/database.sql" root@$SECONDARY_HOST:/tmp/
# Import database
ssh root@$SECONDARY_HOST "pct exec $SECONDARY_VMID -- bash -c '
cat /tmp/database.sql | docker exec -i npmplus sqlite3 /data/database.sqlite
'"
# Restart NPMplus
ssh root@$SECONDARY_HOST "pct exec $SECONDARY_VMID -- docker start npmplus"
# Wait for NPMplus to be ready
sleep 10
echo "Configuration imported to secondary NPMplus"
Phase 5: Set Up Configuration Sync (Ongoing)
Step 5.1: Create Configuration Sync Script
Script: scripts/npmplus/sync-config.sh
#!/bin/bash
# Sync NPMplus configuration from primary to secondary
PRIMARY_HOST="192.168.11.11"
PRIMARY_VMID="10233"
SECONDARY_HOST="192.168.11.12"
SECONDARY_VMID="10234"
NPM_URL="https://192.168.11.166:81"
NPM_EMAIL="nsatoshi2007@hotmail.com"
NPM_PASSWORD="${NPM_PASSWORD:-}" # From .env
if [ -z "$NPM_PASSWORD" ]; then
echo "ERROR: NPM_PASSWORD not set"
exit 1
fi
# Authenticate
TOKEN_RESPONSE=$(curl -s -k -X POST "$NPM_URL/api/tokens" \
-H "Content-Type: application/json" \
-d "{\"identity\":\"$NPM_EMAIL\",\"secret\":\"$NPM_PASSWORD\"}")
TOKEN=$(echo "$TOKEN_RESPONSE" | jq -r '.token')
if [ -z "$TOKEN" ] || [ "$TOKEN" = "null" ]; then
echo "ERROR: Authentication failed"
exit 1
fi
# Export from primary
curl -s -k -X GET "$NPM_URL/api/nginx/proxy-hosts" \
-H "Authorization: Bearer $TOKEN" > /tmp/proxy_hosts_primary.json
# Get secondary URL (will be different when not active)
SECONDARY_URL="https://192.168.11.168:81"
# For now, manual sync is required
# In future: implement API-based sync or shared database
echo "Manual configuration sync required"
echo "Export from: $NPM_URL"
echo "Import to: $SECONDARY_URL"
Note: Full automated configuration sync requires either:
- Shared database (PostgreSQL/MariaDB migration)
- API-based sync script (more complex)
- Manual sync process for configuration changes
For now: Configuration changes must be manually replicated to secondary.
Phase 6: Testing and Validation
Step 6.1: Test Virtual IP Failover
# On primary host
ip addr show vmbr0 | grep 192.168.11.166
# Should show: 192.168.11.166
# Simulate primary failure
systemctl stop keepalived
# Wait 5-10 seconds
sleep 10
# Check secondary host
ssh root@192.168.11.12 "ip addr show vmbr0 | grep 192.168.11.166"
# Should now show: 192.168.11.166 (VIP moved to secondary)
# Test connectivity
curl -k https://192.168.11.166:81
# Should connect to secondary NPMplus
# Restore primary
systemctl start keepalived
# Wait for failback
sleep 10
Step 6.2: Test Certificate Access
# Verify certificates exist on secondary
ssh root@192.168.11.12 "pct exec 10234 -- ls -la /var/lib/docker/volumes/npmplus_data/_data/tls/certbot/live/"
# Test SSL endpoint
curl -vI https://explorer.d-bis.org
# Should show valid certificate
Step 6.3: Test Proxy Host Functionality
# Test each domain from external
for domain in explorer.d-bis.org mim4u.org rpc-http-pub.d-bis.org; do
echo "Testing $domain..."
curl -I "https://$domain" 2>&1 | grep -E "HTTP|Server"
done
Monitoring and Maintenance
Health Monitoring
Script: scripts/npmplus/monitor-ha-status.sh
#!/bin/bash
# Monitor HA status and send alerts if needed
VIP="192.168.11.166"
PRIMARY_HOST="192.168.11.11"
SECONDARY_HOST="192.168.11.12"
# Check who owns VIP
VIP_OWNER=$(ssh root@$PRIMARY_HOST "ip addr show vmbr0 | grep $VIP" && echo "$PRIMARY_HOST" || \
ssh root@$SECONDARY_HOST "ip addr show vmbr0 | grep $VIP" && echo "$SECONDARY_HOST" || \
echo "UNKNOWN")
echo "VIP $VIP owner: $VIP_OWNER"
# Check Keepalived status on both hosts
PRIMARY_STATUS=$(ssh root@$PRIMARY_HOST "systemctl is-active keepalived" 2>/dev/null || echo "unknown")
SECONDARY_STATUS=$(ssh root@$SECONDARY_HOST "systemctl is-active keepalived" 2>/dev/null || echo "unknown")
echo "Primary Keepalived: $PRIMARY_STATUS"
echo "Secondary Keepalived: $SECONDARY_STATUS"
# Alert if both are down
if [ "$PRIMARY_STATUS" != "active" ] && [ "$SECONDARY_STATUS" != "active" ]; then
echo "ALERT: Both Keepalived instances are down!"
# Send alert (email, webhook, etc.)
fi
Cron Job:
*/5 * * * * /home/intlc/projects/proxmox/scripts/npmplus/monitor-ha-status.sh >> /var/log/npmplus-ha-monitor.log 2>&1
Upgrading to Active-Active (Future)
To upgrade from Active-Passive to Active-Active:
Option A: HAProxy Load Balancer
- Deploy HAProxy on dedicated VM/container (VMID 10235)
- Configure HAProxy to balance between both NPMplus instances
- Update UDM Pro port forwarding to point to HAProxy IP
- Configure shared storage for certificates
- Implement shared database (PostgreSQL migration)
Option B: DNS Round-Robin
- Assign multiple IPs to NPMplus instances
- Configure DNS round-robin (not recommended for SSL termination)
Troubleshooting
Issue: VIP not moving to secondary
Symptoms: Primary fails but secondary doesn't take over
Check:
# Check Keepalived logs
journalctl -u keepalived -n 50
# Check health check script
/usr/local/bin/check-npmplus-health.sh
echo $? # Should return 0 if healthy
# Check firewall (VRRP uses multicast)
iptables -L | grep 224.0.0.0
Solution: Ensure VRRP multicast traffic (224.0.0.0/8) is allowed between hosts.
Issue: Certificates out of sync
Symptoms: Secondary shows certificate errors
Solution:
# Manually sync certificates
bash scripts/npmplus/sync-certificates.sh
# Verify sync
ssh root@192.168.11.12 "ls -la /var/lib/docker/volumes/npmplus_data/_data/tls/certbot/live/"
Issue: Configuration mismatch
Symptoms: Proxy hosts work on primary but not secondary
Solution:
# Export from primary
bash scripts/npmplus/export-primary-config.sh
# Import to secondary
bash scripts/npmplus/import-secondary-config.sh /tmp/npmplus-config-backup-*
Rollback Plan
If HA setup causes issues:
-
Disable Keepalived on Secondary:
ssh root@192.168.11.12 "systemctl stop keepalived" systemctl disable keepalived -
Ensure Primary Owns VIP:
systemctl restart keepalived ip addr show vmbr0 | grep 192.168.11.166 -
Stop Secondary NPMplus (optional):
ssh root@192.168.11.12 "pct stop 10234" -
Remove Secondary Container (if not needed):
ssh root@192.168.11.12 "pct destroy 10234"
Cost and Resource Impact
Additional Resources Required
- Secondary NPMplus Container: ~1 GB RAM, 5 GB disk, 2 CPU cores
- Keepalived: Minimal overhead (< 10 MB RAM)
- Network: VRRP multicast traffic (minimal)
- Storage: Certificate sync storage (same as primary)
Maintenance Overhead
- Certificate Sync: Automated (every 5 minutes)
- Configuration Sync: Manual (when changes made)
- Monitoring: Automated (every 5 minutes)
Next Steps
- Review and Approve HA Architecture
- Schedule Maintenance Window (if required)
- Create Secondary NPMplus Instance (Phase 1)
- Set Up Certificate Sync (Phase 2)
- Configure Keepalived (Phase 3)
- Sync Configuration (Phase 4)
- Test Failover (Phase 6)
- Enable Monitoring (Monitoring section)
References
- Keepalived Documentation: https://www.keepalived.org/manpage.html
- NPMplus GitHub: https://github.com/ZoeyVid/NPMplus
- VRRP Protocol: RFC 3768
- Current Architecture:
docs/04-configuration/DNS_NPMPLUS_VM_COMPREHENSIVE_ARCHITECTURE.md
Last Updated: 2026-01-20
Status: Ready for Implementation
Estimated Implementation Time: 4-6 hours