- Organized 252 files across project - Root directory: 187 → 2 files (98.9% reduction) - Moved configuration guides to docs/04-configuration/ - Moved troubleshooting guides to docs/09-troubleshooting/ - Moved quick start guides to docs/01-getting-started/ - Moved reports to reports/ directory - Archived temporary files - Generated comprehensive reports and documentation - Created maintenance scripts and guides All files organized according to established standards.
6.7 KiB
Service State Machine
Last Updated: 2025-01-20
Document Version: 1.0
Status: Active Documentation
Overview
This document defines the state machine for services in the infrastructure, including valid states, transitions, and recovery actions.
Service State Diagram
stateDiagram-v2
[*] --> Stopped
Stopped --> Starting: start()
Starting --> Running: initialized successfully
Starting --> Error: initialization failed
Running --> Stopping: stop()
Running --> Error: runtime error
Stopping --> Stopped: stopped successfully
Stopping --> Error: stop failed
Error --> Stopped: reset()
Error --> Starting: restart()
Running --> Restarting: restart()
Restarting --> Starting: restart initiated
State Definitions
Stopped
Description: Service is not running
Characteristics:
- No processes active
- No resources allocated
- Configuration may be present
Entry Conditions:
- Initial state
- After successful stop
- After reset from error
Exit Conditions:
- Service started (
start())
Starting
Description: Service is initializing
Characteristics:
- Process starting
- Configuration loading
- Resources being allocated
- Network connections being established
Entry Conditions:
- Service start requested
- Restart initiated
Exit Conditions:
- Initialization successful → Running
- Initialization failed → Error
Typical Duration:
- 10-60 seconds (depending on service)
Running
Description: Service is operational
Characteristics:
- Process active
- Handling requests
- Monitoring active
- Health checks passing
Entry Conditions:
- Successful initialization
- Service started successfully
Exit Conditions:
- Stop requested → Stopping
- Runtime error → Error
- Restart requested → Restarting
Verification:
- Health check endpoint responding
- Service logs showing normal operation
- Metrics indicating activity
Stopping
Description: Service is shutting down
Characteristics:
- Graceful shutdown in progress
- Finishing current requests
- Releasing resources
- Closing connections
Entry Conditions:
- Stop requested
- Service shutdown initiated
Exit Conditions:
- Shutdown successful → Stopped
- Shutdown failed → Error
Typical Duration:
- 5-30 seconds (graceful shutdown)
Error
Description: Service is in error state
Characteristics:
- Service not functioning correctly
- Error logs present
- May be partially running
- Requires intervention
Entry Conditions:
- Initialization failed
- Runtime error occurred
- Stop operation failed
Exit Conditions:
- Reset requested → Stopped
- Restart requested → Starting
Recovery Actions:
- Check error logs
- Verify configuration
- Check dependencies
- Restart service
Restarting
Description: Service restart in progress
Characteristics:
- Stop operation initiated
- Will transition to Starting after stop
Entry Conditions:
- Restart requested while Running
Exit Conditions:
- Stop complete → Starting
State Transitions
Transition: start()
From: Stopped
To: Starting
Action: Start service process
Verification: Process started, logs show initialization
Transition: initialized successfully
From: Starting
To: Running
Condition: All initialization steps completed
Verification: Health check passes, service responding
Transition: initialization failed
From: Starting
To: Error
Condition: Initialization error occurred
Action: Log error, stop process
Recovery: Check logs, fix configuration, restart
Transition: stop()
From: Running
To: Stopping
Action: Initiate graceful shutdown
Verification: Shutdown process started
Transition: stopped successfully
From: Stopping
To: Stopped
Condition: Shutdown completed
Verification: Process terminated, resources released
Transition: stop failed
From: Stopping
To: Error
Condition: Shutdown error occurred
Action: Force stop if needed
Recovery: Manual intervention may be required
Transition: runtime error
From: Running
To: Error
Condition: Runtime error detected
Action: Log error, attempt recovery
Recovery: Check logs, fix issue, restart
Transition: reset()
From: Error
To: Stopped
Action: Reset service to clean state
Verification: Service stopped, error state cleared
Transition: restart()
From: Error
To: Starting
Action: Restart service from error state
Verification: Service starting, initialization in progress
Service-Specific State Machines
Besu Node States
Additional States:
- Syncing: Blockchain synchronization in progress
- Synced: Blockchain fully synchronized
- Consensus: Participating in consensus (validators)
State Flow:
Starting → Syncing → Synced → Running (with Consensus if validator)
Cloudflare Tunnel States
Additional States:
- Connecting: Establishing tunnel connection
- Connected: Tunnel connected to Cloudflare
- Reconnecting: Reconnecting after disconnection
State Flow:
Starting → Connecting → Connected → Running
Running → Reconnecting → Connected → Running
Monitoring and Alerts
State Monitoring
Metrics to Track:
- Current state
- State transition frequency
- Time in each state
- Error state occurrences
Alerts:
- Service in Error state > 5 minutes
- Frequent state transitions (thrashing)
- Service stuck in Starting > 10 minutes
- Service in Stopping > 2 minutes
Recovery Procedures
From Error State
Step 1: Diagnose
# Check service logs
journalctl -u <service> -n 100
# Check service status
systemctl status <service>
# Check error messages
journalctl -u <service> | grep -i error
Step 2: Fix Issue
- Fix configuration errors
- Resolve dependency issues
- Address resource constraints
- Fix network problems
Step 3: Recover
# Option 1: Restart
systemctl restart <service>
# Option 2: Reset and start
systemctl stop <service>
# Fix issues
systemctl start <service>
Related Documentation
- OPERATIONAL_RUNBOOKS.md ⭐⭐ - Operational procedures
- TROUBLESHOOTING_FAQ.md ⭐⭐⭐ - Troubleshooting guide
- BESU_NODE_STARTUP_SEQUENCE.md ⭐ - Besu startup sequence
Last Updated: 2025-01-20
Review Cycle: Quarterly