Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands - CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround - CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check - NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere - MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates - LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference Co-authored-by: Cursor <cursoragent@cursor.com>
351 lines
6.7 KiB
Markdown
351 lines
6.7 KiB
Markdown
# Service State Machine
|
|
|
|
**Last Updated:** 2025-01-20
|
|
**Document Version:** 1.0
|
|
**Status:** Active Documentation
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This document defines the state machine for services in the infrastructure, including valid states, transitions, and recovery actions.
|
|
|
|
---
|
|
|
|
## Service State Diagram
|
|
|
|
```mermaid
|
|
stateDiagram-v2
|
|
[*] --> Stopped
|
|
Stopped --> Starting: start()
|
|
Starting --> Running: initialized successfully
|
|
Starting --> Error: initialization failed
|
|
Running --> Stopping: stop()
|
|
Running --> Error: runtime error
|
|
Stopping --> Stopped: stopped successfully
|
|
Stopping --> Error: stop failed
|
|
Error --> Stopped: reset()
|
|
Error --> Starting: restart()
|
|
Running --> Restarting: restart()
|
|
Restarting --> Starting: restart initiated
|
|
```
|
|
|
|
---
|
|
|
|
## State Definitions
|
|
|
|
### Stopped
|
|
|
|
**Description:** Service is not running
|
|
|
|
**Characteristics:**
|
|
- No processes active
|
|
- No resources allocated
|
|
- Configuration may be present
|
|
|
|
**Entry Conditions:**
|
|
- Initial state
|
|
- After successful stop
|
|
- After reset from error
|
|
|
|
**Exit Conditions:**
|
|
- Service started (`start()`)
|
|
|
|
---
|
|
|
|
### Starting
|
|
|
|
**Description:** Service is initializing
|
|
|
|
**Characteristics:**
|
|
- Process starting
|
|
- Configuration loading
|
|
- Resources being allocated
|
|
- Network connections being established
|
|
|
|
**Entry Conditions:**
|
|
- Service start requested
|
|
- Restart initiated
|
|
|
|
**Exit Conditions:**
|
|
- Initialization successful → Running
|
|
- Initialization failed → Error
|
|
|
|
**Typical Duration:**
|
|
- 10-60 seconds (depending on service)
|
|
|
|
---
|
|
|
|
### Running
|
|
|
|
**Description:** Service is operational
|
|
|
|
**Characteristics:**
|
|
- Process active
|
|
- Handling requests
|
|
- Monitoring active
|
|
- Health checks passing
|
|
|
|
**Entry Conditions:**
|
|
- Successful initialization
|
|
- Service started successfully
|
|
|
|
**Exit Conditions:**
|
|
- Stop requested → Stopping
|
|
- Runtime error → Error
|
|
- Restart requested → Restarting
|
|
|
|
**Verification:**
|
|
- Health check endpoint responding
|
|
- Service logs showing normal operation
|
|
- Metrics indicating activity
|
|
|
|
---
|
|
|
|
### Stopping
|
|
|
|
**Description:** Service is shutting down
|
|
|
|
**Characteristics:**
|
|
- Graceful shutdown in progress
|
|
- Finishing current requests
|
|
- Releasing resources
|
|
- Closing connections
|
|
|
|
**Entry Conditions:**
|
|
- Stop requested
|
|
- Service shutdown initiated
|
|
|
|
**Exit Conditions:**
|
|
- Shutdown successful → Stopped
|
|
- Shutdown failed → Error
|
|
|
|
**Typical Duration:**
|
|
- 5-30 seconds (graceful shutdown)
|
|
|
|
---
|
|
|
|
### Error
|
|
|
|
**Description:** Service is in error state
|
|
|
|
**Characteristics:**
|
|
- Service not functioning correctly
|
|
- Error logs present
|
|
- May be partially running
|
|
- Requires intervention
|
|
|
|
**Entry Conditions:**
|
|
- Initialization failed
|
|
- Runtime error occurred
|
|
- Stop operation failed
|
|
|
|
**Exit Conditions:**
|
|
- Reset requested → Stopped
|
|
- Restart requested → Starting
|
|
|
|
**Recovery Actions:**
|
|
- Check error logs
|
|
- Verify configuration
|
|
- Check dependencies
|
|
- Restart service
|
|
|
|
---
|
|
|
|
### Restarting
|
|
|
|
**Description:** Service restart in progress
|
|
|
|
**Characteristics:**
|
|
- Stop operation initiated
|
|
- Will transition to Starting after stop
|
|
|
|
**Entry Conditions:**
|
|
- Restart requested while Running
|
|
|
|
**Exit Conditions:**
|
|
- Stop complete → Starting
|
|
|
|
---
|
|
|
|
## State Transitions
|
|
|
|
### Transition: start()
|
|
|
|
**From:** Stopped
|
|
**To:** Starting
|
|
**Action:** Start service process
|
|
**Verification:** Process started, logs show initialization
|
|
|
|
---
|
|
|
|
### Transition: initialized successfully
|
|
|
|
**From:** Starting
|
|
**To:** Running
|
|
**Condition:** All initialization steps completed
|
|
**Verification:** Health check passes, service responding
|
|
|
|
---
|
|
|
|
### Transition: initialization failed
|
|
|
|
**From:** Starting
|
|
**To:** Error
|
|
**Condition:** Initialization error occurred
|
|
**Action:** Log error, stop process
|
|
**Recovery:** Check logs, fix configuration, restart
|
|
|
|
---
|
|
|
|
### Transition: stop()
|
|
|
|
**From:** Running
|
|
**To:** Stopping
|
|
**Action:** Initiate graceful shutdown
|
|
**Verification:** Shutdown process started
|
|
|
|
---
|
|
|
|
### Transition: stopped successfully
|
|
|
|
**From:** Stopping
|
|
**To:** Stopped
|
|
**Condition:** Shutdown completed
|
|
**Verification:** Process terminated, resources released
|
|
|
|
---
|
|
|
|
### Transition: stop failed
|
|
|
|
**From:** Stopping
|
|
**To:** Error
|
|
**Condition:** Shutdown error occurred
|
|
**Action:** Force stop if needed
|
|
**Recovery:** Manual intervention may be required
|
|
|
|
---
|
|
|
|
### Transition: runtime error
|
|
|
|
**From:** Running
|
|
**To:** Error
|
|
**Condition:** Runtime error detected
|
|
**Action:** Log error, attempt recovery
|
|
**Recovery:** Check logs, fix issue, restart
|
|
|
|
---
|
|
|
|
### Transition: reset()
|
|
|
|
**From:** Error
|
|
**To:** Stopped
|
|
**Action:** Reset service to clean state
|
|
**Verification:** Service stopped, error state cleared
|
|
|
|
---
|
|
|
|
### Transition: restart()
|
|
|
|
**From:** Error
|
|
**To:** Starting
|
|
**Action:** Restart service from error state
|
|
**Verification:** Service starting, initialization in progress
|
|
|
|
---
|
|
|
|
## Service-Specific State Machines
|
|
|
|
### Besu Node States
|
|
|
|
**Additional States:**
|
|
- **Syncing:** Blockchain synchronization in progress
|
|
- **Synced:** Blockchain fully synchronized
|
|
- **Consensus:** Participating in consensus (validators)
|
|
|
|
**State Flow:**
|
|
```
|
|
Starting → Syncing → Synced → Running (with Consensus if validator)
|
|
```
|
|
|
|
---
|
|
|
|
### Cloudflare Tunnel States
|
|
|
|
**Additional States:**
|
|
- **Connecting:** Establishing tunnel connection
|
|
- **Connected:** Tunnel connected to Cloudflare
|
|
- **Reconnecting:** Reconnecting after disconnection
|
|
|
|
**State Flow:**
|
|
```
|
|
Starting → Connecting → Connected → Running
|
|
Running → Reconnecting → Connected → Running
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring and Alerts
|
|
|
|
### State Monitoring
|
|
|
|
**Metrics to Track:**
|
|
- Current state
|
|
- State transition frequency
|
|
- Time in each state
|
|
- Error state occurrences
|
|
|
|
**Alerts:**
|
|
- Service in Error state > 5 minutes
|
|
- Frequent state transitions (thrashing)
|
|
- Service stuck in Starting > 10 minutes
|
|
- Service in Stopping > 2 minutes
|
|
|
|
---
|
|
|
|
## Recovery Procedures
|
|
|
|
### From Error State
|
|
|
|
**Step 1: Diagnose**
|
|
```bash
|
|
# Check service logs
|
|
journalctl -u <service> -n 100
|
|
|
|
# Check service status
|
|
systemctl status <service>
|
|
|
|
# Check error messages
|
|
journalctl -u <service> | grep -i error
|
|
```
|
|
|
|
**Step 2: Fix Issue**
|
|
- Fix configuration errors
|
|
- Resolve dependency issues
|
|
- Address resource constraints
|
|
- Fix network problems
|
|
|
|
**Step 3: Recover**
|
|
```bash
|
|
# Option 1: Restart
|
|
systemctl restart <service>
|
|
|
|
# Option 2: Reset and start
|
|
systemctl stop <service>
|
|
# Fix issues
|
|
systemctl start <service>
|
|
```
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- **[OPERATIONAL_RUNBOOKS.md](../03-deployment/OPERATIONAL_RUNBOOKS.md)** ⭐⭐ - Operational procedures
|
|
- **[TROUBLESHOOTING_FAQ.md](/docs/09-troubleshooting/TROUBLESHOOTING_FAQ.md)** ⭐⭐⭐ - Troubleshooting guide
|
|
- **[BESU_ALLOWLIST_RUNBOOK.md](../06-besu/BESU_ALLOWLIST_RUNBOOK.md)** ⭐ - Besu allowlist and node operations
|
|
|
|
---
|
|
|
|
**Last Updated:** 2025-01-20
|
|
**Review Cycle:** Quarterly
|