Files
proxmox/docs/10-best-practices/SERVICE_STATE_MACHINE.md
defiQUG fbda1b4beb
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:46:57 -08:00

6.7 KiB

Service State Machine

Last Updated: 2025-01-20
Document Version: 1.0
Status: Active Documentation


Overview

This document defines the state machine for services in the infrastructure, including valid states, transitions, and recovery actions.


Service State Diagram

stateDiagram-v2
    [*] --> Stopped
    Stopped --> Starting: start()
    Starting --> Running: initialized successfully
    Starting --> Error: initialization failed
    Running --> Stopping: stop()
    Running --> Error: runtime error
    Stopping --> Stopped: stopped successfully
    Stopping --> Error: stop failed
    Error --> Stopped: reset()
    Error --> Starting: restart()
    Running --> Restarting: restart()
    Restarting --> Starting: restart initiated

State Definitions

Stopped

Description: Service is not running

Characteristics:

  • No processes active
  • No resources allocated
  • Configuration may be present

Entry Conditions:

  • Initial state
  • After successful stop
  • After reset from error

Exit Conditions:

  • Service started (start())

Starting

Description: Service is initializing

Characteristics:

  • Process starting
  • Configuration loading
  • Resources being allocated
  • Network connections being established

Entry Conditions:

  • Service start requested
  • Restart initiated

Exit Conditions:

  • Initialization successful → Running
  • Initialization failed → Error

Typical Duration:

  • 10-60 seconds (depending on service)

Running

Description: Service is operational

Characteristics:

  • Process active
  • Handling requests
  • Monitoring active
  • Health checks passing

Entry Conditions:

  • Successful initialization
  • Service started successfully

Exit Conditions:

  • Stop requested → Stopping
  • Runtime error → Error
  • Restart requested → Restarting

Verification:

  • Health check endpoint responding
  • Service logs showing normal operation
  • Metrics indicating activity

Stopping

Description: Service is shutting down

Characteristics:

  • Graceful shutdown in progress
  • Finishing current requests
  • Releasing resources
  • Closing connections

Entry Conditions:

  • Stop requested
  • Service shutdown initiated

Exit Conditions:

  • Shutdown successful → Stopped
  • Shutdown failed → Error

Typical Duration:

  • 5-30 seconds (graceful shutdown)

Error

Description: Service is in error state

Characteristics:

  • Service not functioning correctly
  • Error logs present
  • May be partially running
  • Requires intervention

Entry Conditions:

  • Initialization failed
  • Runtime error occurred
  • Stop operation failed

Exit Conditions:

  • Reset requested → Stopped
  • Restart requested → Starting

Recovery Actions:

  • Check error logs
  • Verify configuration
  • Check dependencies
  • Restart service

Restarting

Description: Service restart in progress

Characteristics:

  • Stop operation initiated
  • Will transition to Starting after stop

Entry Conditions:

  • Restart requested while Running

Exit Conditions:

  • Stop complete → Starting

State Transitions

Transition: start()

From: Stopped
To: Starting
Action: Start service process
Verification: Process started, logs show initialization


Transition: initialized successfully

From: Starting
To: Running
Condition: All initialization steps completed
Verification: Health check passes, service responding


Transition: initialization failed

From: Starting
To: Error
Condition: Initialization error occurred
Action: Log error, stop process
Recovery: Check logs, fix configuration, restart


Transition: stop()

From: Running
To: Stopping
Action: Initiate graceful shutdown
Verification: Shutdown process started


Transition: stopped successfully

From: Stopping
To: Stopped
Condition: Shutdown completed
Verification: Process terminated, resources released


Transition: stop failed

From: Stopping
To: Error
Condition: Shutdown error occurred
Action: Force stop if needed
Recovery: Manual intervention may be required


Transition: runtime error

From: Running
To: Error
Condition: Runtime error detected
Action: Log error, attempt recovery
Recovery: Check logs, fix issue, restart


Transition: reset()

From: Error
To: Stopped
Action: Reset service to clean state
Verification: Service stopped, error state cleared


Transition: restart()

From: Error
To: Starting
Action: Restart service from error state
Verification: Service starting, initialization in progress


Service-Specific State Machines

Besu Node States

Additional States:

  • Syncing: Blockchain synchronization in progress
  • Synced: Blockchain fully synchronized
  • Consensus: Participating in consensus (validators)

State Flow:

Starting → Syncing → Synced → Running (with Consensus if validator)

Cloudflare Tunnel States

Additional States:

  • Connecting: Establishing tunnel connection
  • Connected: Tunnel connected to Cloudflare
  • Reconnecting: Reconnecting after disconnection

State Flow:

Starting → Connecting → Connected → Running
Running → Reconnecting → Connected → Running

Monitoring and Alerts

State Monitoring

Metrics to Track:

  • Current state
  • State transition frequency
  • Time in each state
  • Error state occurrences

Alerts:

  • Service in Error state > 5 minutes
  • Frequent state transitions (thrashing)
  • Service stuck in Starting > 10 minutes
  • Service in Stopping > 2 minutes

Recovery Procedures

From Error State

Step 1: Diagnose

# Check service logs
journalctl -u <service> -n 100

# Check service status
systemctl status <service>

# Check error messages
journalctl -u <service> | grep -i error

Step 2: Fix Issue

  • Fix configuration errors
  • Resolve dependency issues
  • Address resource constraints
  • Fix network problems

Step 3: Recover

# Option 1: Restart
systemctl restart <service>

# Option 2: Reset and start
systemctl stop <service>
# Fix issues
systemctl start <service>


Last Updated: 2025-01-20
Review Cycle: Quarterly