Files
proxmox/docs/03-deployment/DEPLOYMENT_RUNBOOK.md
defiQUG cb47cce074 Complete markdown files cleanup and organization
- Organized 252 files across project
- Root directory: 187 → 2 files (98.9% reduction)
- Moved configuration guides to docs/04-configuration/
- Moved troubleshooting guides to docs/09-troubleshooting/
- Moved quick start guides to docs/01-getting-started/
- Moved reports to reports/ directory
- Archived temporary files
- Generated comprehensive reports and documentation
- Created maintenance scripts and guides

All files organized according to established standards.
2026-01-06 01:46:25 -08:00

8.9 KiB

Deployment Runbook

SolaceScanScout Explorer - Production Deployment Guide

Last Updated: $(date)
Version: 1.0.0


Table of Contents

  1. Pre-Deployment Checklist
  2. Environment Setup
  3. Database Migration
  4. Service Deployment
  5. Health Checks
  6. Rollback Procedures
  7. Post-Deployment Verification
  8. Troubleshooting

Pre-Deployment Checklist

Infrastructure Requirements

  • Kubernetes cluster (AKS) or VM infrastructure ready
  • PostgreSQL 16+ with TimescaleDB extension
  • Redis cluster (for production cache/rate limiting)
  • Elasticsearch/OpenSearch cluster
  • Load balancer configured
  • SSL certificates provisioned
  • DNS records configured
  • Monitoring stack deployed (Prometheus, Grafana)

Configuration

  • Environment variables configured
  • Secrets stored in Key Vault
  • Database credentials verified
  • Redis connection string verified
  • RPC endpoint URLs verified
  • JWT secret configured (strong random value)

Code & Artifacts

  • All tests passing
  • Docker images built and tagged
  • Images pushed to container registry
  • Database migrations reviewed
  • Rollback plan documented

Environment Setup

1. Set Environment Variables

# Database
export DB_HOST=postgres.example.com
export DB_PORT=5432
export DB_USER=explorer
export DB_PASSWORD=<from-key-vault>
export DB_NAME=explorer

# Redis (for production)
export REDIS_URL=redis://redis.example.com:6379

# RPC
export RPC_URL=https://rpc.d-bis.org
export WS_URL=wss://rpc.d-bis.org

# Application
export CHAIN_ID=138
export PORT=8080
export JWT_SECRET=<strong-random-secret>

# Optional
export LOG_LEVEL=info
export ENABLE_METRICS=true

2. Verify Secrets

# Test database connection
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"

# Test Redis connection
redis-cli -u $REDIS_URL ping

# Test RPC endpoint
curl -X POST $RPC_URL \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'

Database Migration

1. Backup Existing Database

# Create backup
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > backup_$(date +%Y%m%d_%H%M%S).sql

# Verify backup
ls -lh backup_*.sql

2. Run Migrations

cd explorer-monorepo/backend/database/migrations

# Review pending migrations
go run migrate.go --status

# Run migrations
go run migrate.go --up

# Verify migration
go run migrate.go --status

3. Verify Schema

psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\dt"
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\d blocks"
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\d transactions"

Service Deployment

Option 1: Kubernetes Deployment

1. Deploy API Server

kubectl apply -f k8s/api-server-deployment.yaml
kubectl apply -f k8s/api-server-service.yaml
kubectl apply -f k8s/api-server-ingress.yaml

# Verify deployment
kubectl get pods -l app=api-server
kubectl logs -f deployment/api-server

2. Deploy Indexer

kubectl apply -f k8s/indexer-deployment.yaml

# Verify deployment
kubectl get pods -l app=indexer
kubectl logs -f deployment/indexer

3. Rolling Update

# Update image
kubectl set image deployment/api-server api-server=registry.example.com/explorer-api:v1.1.0

# Monitor rollout
kubectl rollout status deployment/api-server

# Rollback if needed
kubectl rollout undo deployment/api-server

Option 2: Docker Compose Deployment

cd explorer-monorepo/deployment

# Start services
docker-compose up -d

# Verify services
docker-compose ps
docker-compose logs -f api-server

Health Checks

1. API Health Endpoint

# Check health
curl https://api.d-bis.org/health

# Expected response
{
  "status": "ok",
  "timestamp": "2024-01-01T00:00:00Z",
  "database": "connected"
}

2. Service Health

# Kubernetes
kubectl get pods
kubectl describe pod <pod-name>

# Docker
docker ps
docker inspect <container-id>

3. Database Connectivity

# From API server
curl https://api.d-bis.org/health | jq .database

# Direct check
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT COUNT(*) FROM blocks;"

4. Redis Connectivity

# Test Redis
redis-cli -u $REDIS_URL ping

# Check cache stats
redis-cli -u $REDIS_URL INFO stats

Rollback Procedures

Quick Rollback (Kubernetes)

# Rollback to previous version
kubectl rollout undo deployment/api-server
kubectl rollout undo deployment/indexer

# Verify rollback
kubectl rollout status deployment/api-server

Database Rollback

# Restore from backup
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < backup_YYYYMMDD_HHMMSS.sql

# Or rollback migrations
cd explorer-monorepo/backend/database/migrations
go run migrate.go --down 1

Full Rollback

# 1. Stop new services
kubectl scale deployment/api-server --replicas=0
kubectl scale deployment/indexer --replicas=0

# 2. Restore database
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < backup_YYYYMMDD_HHMMSS.sql

# 3. Start previous version
kubectl set image deployment/api-server api-server=registry.example.com/explorer-api:v1.0.0
kubectl scale deployment/api-server --replicas=3

Post-Deployment Verification

1. Functional Tests

# Test Track 1 endpoints (public)
curl https://api.d-bis.org/api/v1/track1/blocks/latest

# Test search
curl https://api.d-bis.org/api/v1/search?q=1000

# Test health
curl https://api.d-bis.org/health

2. Performance Tests

# Load test
ab -n 1000 -c 10 https://api.d-bis.org/api/v1/track1/blocks/latest

# Check response times
curl -w "@curl-format.txt" -o /dev/null -s https://api.d-bis.org/api/v1/track1/blocks/latest

3. Monitoring

  • Check Grafana dashboards
  • Verify Prometheus metrics
  • Check error rates
  • Monitor response times
  • Check database connection pool
  • Verify Redis cache hit rate

Troubleshooting

Common Issues

1. Database Connection Errors

Symptoms: 500 errors, "database connection failed"

Resolution:

# Check database status
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"

# Check connection pool
# Review database/migrations for connection pool settings

# Restart service
kubectl rollout restart deployment/api-server

2. Redis Connection Errors

Symptoms: Cache misses, rate limiting not working

Resolution:

# Test Redis connection
redis-cli -u $REDIS_URL ping

# Check Redis logs
kubectl logs -l app=redis

# Fallback to in-memory (temporary)
# Remove REDIS_URL from environment

3. High Memory Usage

Symptoms: OOM kills, slow responses

Resolution:

# Check memory usage
kubectl top pods

# Increase memory limits
kubectl set resources deployment/api-server --limits=memory=2Gi

# Review cache TTL settings

4. Slow Response Times

Symptoms: High latency, timeout errors

Resolution:

# Check database query performance
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "EXPLAIN ANALYZE SELECT * FROM blocks LIMIT 10;"

# Check indexer lag
curl https://api.d-bis.org/api/v1/track2/stats

# Review connection pool settings

Emergency Procedures

Service Outage

  1. Immediate Actions:

    • Check service status: kubectl get pods
    • Check logs: kubectl logs -f deployment/api-server
    • Check database: psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"
    • Check Redis: redis-cli -u $REDIS_URL ping
  2. Quick Recovery:

    • Restart services: kubectl rollout restart deployment/api-server
    • Scale up: kubectl scale deployment/api-server --replicas=5
    • Rollback if needed: kubectl rollout undo deployment/api-server
  3. Communication:

    • Update status page
    • Notify team via Slack/email
    • Document incident

Data Corruption

  1. Immediate Actions:

    • Stop writes: kubectl scale deployment/api-server --replicas=0
    • Backup current state: pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > emergency_backup.sql
  2. Recovery:

    • Restore from last known good backup
    • Verify data integrity
    • Resume services

Maintenance Windows

Scheduled Maintenance

  1. Pre-Maintenance:

    • Notify users 24 hours in advance
    • Create maintenance mode flag
    • Prepare rollback plan
  2. During Maintenance:

    • Enable maintenance mode
    • Perform updates
    • Run health checks
  3. Post-Maintenance:

    • Disable maintenance mode
    • Verify all services
    • Monitor for issues

Contact Information

  • On-Call Engineer: Check PagerDuty
  • Slack Channel: #explorer-deployments
  • Emergency: [Emergency Contact]

Document Version: 1.0.0
Last Reviewed: $(date)
Next Review: $(date -d "+3 months")