Files
proxmox/docs/06-besu/STABILITY_REMEDIATION_EXECUTION_PLAN.md
defiQUG fbda1b4beb
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:46:57 -08:00

3.6 KiB

Stability Remediation Execution Plan

Last Updated: 2026-01-31
Document Version: 1.0
Status: Active Documentation


Date: 2025-01-20
Status: 📋 READY FOR EXECUTION
Priority: 🔴 CRITICAL


Immediate Execution Steps

Step 1: Deploy Enhanced Systemd Services (30 minutes)

Action: Update all validator systemd services with enhanced restart policies

# For each validator, update systemd service
# Use enhanced-besu-validator.service as template
# Deploy check-validator-prerequisites.sh and verify-validator-started.sh

Expected Outcome: Validators auto-restart on failure with health checks


Step 2: Deploy Configuration Auto-Fix (15 minutes)

Action: Run auto-fix script on all validators

cd /home/intlc/projects/proxmox
./scripts/monitoring/auto-fix-validator-config.sh

Expected Outcome: All validators have consistent, correct configuration


Step 3: Deploy Health Monitoring (30 minutes)

Action: Set up health checks on all validators

# Deploy monitoring scripts
./scripts/monitoring/setup-validator-monitoring.sh

# Test health checks
./scripts/monitoring/check-validator-health.sh

Expected Outcome: Continuous health monitoring active


Step 4: Deploy Block Production Monitor (15 minutes)

Action: Start continuous block production monitoring

# Start as background service
nohup ./scripts/monitoring/monitor-block-production.sh > /var/log/block-monitor.log 2>&1 &

Expected Outcome: Continuous block production monitoring with alerts


Step 5: Deploy Transaction Pool Monitor (15 minutes)

Action: Start transaction pool monitoring

# Start as background service
nohup ./scripts/monitoring/monitor-transaction-pool.sh > /var/log/txpool-monitor.log 2>&1 &

Expected Outcome: Continuous transaction pool monitoring


Step 6: Deploy Master Monitor (15 minutes)

Action: Start master stability monitor

# Start as systemd service or background process
nohup ./scripts/monitoring/master-stability-monitor.sh > /var/log/stability-monitor.log 2>&1 &

Expected Outcome: Comprehensive stability monitoring active


Validation Steps

After Deployment

  1. Verify Health Checks:

    ./scripts/monitoring/check-validator-health.sh
    
  2. Verify Block Production:

    ./scripts/monitoring/monitor-block-production.sh --once
    
  3. Verify Configuration:

    ./scripts/monitoring/validate-all-configs.sh
    
  4. Check Monitoring Logs:

    tail -f /var/log/block-monitor.log
    tail -f /var/log/txpool-monitor.log
    tail -f /var/log/stability-monitor.log
    

Success Criteria

Immediate (Day 1)

  • All validators have enhanced systemd services
  • Auto-fix scripts deployed
  • Health monitoring active
  • Block production monitoring active

Short-term (Week 1)

  • All monitoring scripts running
  • Alerting configured
  • No configuration issues
  • Block production stable

Long-term (Month 1)

  • 99.9% block production uptime
  • < 5 minute MTTR
  • Automated recovery working
  • Comprehensive monitoring coverage

Maintenance Schedule

Daily

  • Review monitoring logs
  • Check for alerts
  • Verify block production

Weekly

  • Run comprehensive health audit
  • Review configuration consistency
  • Update documentation

Monthly

  • Performance review
  • Process improvements
  • Capacity planning

Status: Ready for immediate execution
Estimated Time: 2-3 hours for full deployment
Priority: Execute immediately