Files
explorer-monorepo/docs/CCIP_MONITOR_METRICS.md

241 lines
4.8 KiB
Markdown

# CCIP Monitor Metrics Documentation
**Date**: 2025-01-12
**Network**: ChainID 138
---
## Overview
This document describes the metrics available from the CCIP Monitor service.
---
## CCIP Monitor Service
### Service Details
- **Container**: VMID 3501
- **Service**: `ccip-monitor`
- **Metrics Port**: 8000
- **Metrics Endpoint**: `http://localhost:8000/metrics`
---
## Available Metrics
### System Metrics
#### `ccip_monitor_up`
- **Type**: Gauge
- **Description**: Service availability (1 = up, 0 = down)
- **Labels**: None
#### `ccip_monitor_rpc_connected`
- **Type**: Gauge
- **Description**: RPC connection status (1 = connected, 0 = disconnected)
- **Labels**: None
---
### CCIP Message Metrics
#### `ccip_messages_sent_total`
- **Type**: Counter
- **Description**: Total number of CCIP messages sent
- **Labels**:
- `source_chain`: Source chain identifier
- `destination_chain`: Destination chain identifier
- `status`: Message status (success, failed)
#### `ccip_messages_received_total`
- **Type**: Counter
- **Description**: Total number of CCIP messages received
- **Labels**:
- `source_chain`: Source chain identifier
- `destination_chain`: Destination chain identifier
- `status`: Message status (success, failed)
#### `ccip_messages_pending`
- **Type**: Gauge
- **Description**: Number of pending CCIP messages
- **Labels**:
- `source_chain`: Source chain identifier
- `destination_chain`: Destination chain identifier
---
### Bridge Metrics
#### `bridge_transactions_total`
- **Type**: Counter
- **Description**: Total number of bridge transactions
- **Labels**:
- `bridge_type`: Bridge type (WETH9, WETH10)
- `destination_chain`: Destination chain identifier
- `status`: Transaction status (success, failed)
#### `bridge_token_amount_total`
- **Type**: Counter
- **Description**: Total amount of tokens bridged
- **Labels**:
- `bridge_type`: Bridge type (WETH9, WETH10)
- `destination_chain`: Destination chain identifier
- `token_type`: Token type
---
### Fee Metrics
#### `ccip_fees_paid_total`
- **Type**: Counter
- **Description**: Total CCIP fees paid
- **Labels**:
- `fee_token`: Fee token address
- `destination_chain`: Destination chain identifier
#### `ccip_fee_calculation_errors_total`
- **Type**: Counter
- **Description**: Total fee calculation errors
- **Labels**: None
---
### Error Metrics
#### `ccip_errors_total`
- **Type**: Counter
- **Description**: Total number of errors
- **Labels**:
- `error_type`: Error type
- `component`: Component where error occurred
---
## Querying Metrics
### Using curl
```bash
curl http://localhost:8000/metrics
```
### Using Prometheus
If Prometheus is configured to scrape the metrics endpoint:
```promql
# Service availability
ccip_monitor_up
# Total messages sent
sum(ccip_messages_sent_total)
# Pending messages
sum(ccip_messages_pending)
# Bridge transactions
sum(bridge_transactions_total)
```
---
## Metric Examples
### Example Metrics Output
```
# HELP ccip_monitor_up Service availability
# TYPE ccip_monitor_up gauge
ccip_monitor_up 1
# HELP ccip_messages_sent_total Total CCIP messages sent
# TYPE ccip_messages_sent_total counter
ccip_messages_sent_total{source_chain="138",destination_chain="1",status="success"} 10
ccip_messages_sent_total{source_chain="138",destination_chain="1",status="failed"} 1
# HELP bridge_transactions_total Total bridge transactions
# TYPE bridge_transactions_total counter
bridge_transactions_total{bridge_type="WETH9",destination_chain="1",status="success"} 5
```
---
## Monitoring Setup
### Prometheus Configuration
```yaml
scrape_configs:
- job_name: 'ccip-monitor'
static_configs:
- targets: ['localhost:8000']
```
### Grafana Dashboard
Create dashboard with:
- Service availability
- Message throughput
- Bridge transaction volume
- Error rates
- Fee usage
---
## Alerting
### Recommended Alerts
1. **Service Down**
- Alert when `ccip_monitor_up == 0`
- Severity: Critical
2. **High Error Rate**
- Alert when error rate exceeds threshold
- Severity: Warning
3. **Pending Messages**
- Alert when pending messages exceed threshold
- Severity: Warning
4. **RPC Disconnected**
- Alert when `ccip_monitor_rpc_connected == 0`
- Severity: Critical
---
## Health Check
### Using Health Check Script
```bash
./scripts/check-ccip-monitor-health.sh
```
### Manual Check
```bash
# Check service status
pct exec 3501 -- systemctl status ccip-monitor
# Check metrics endpoint
curl http://localhost:8000/metrics
# Check logs
pct exec 3501 -- journalctl -u ccip-monitor -n 50
```
---
## Related Documentation
- [CCIP Operations Runbook](./CCIP_OPERATIONS_RUNBOOK.md) (Task 135)
- [CCIP Configuration Status](./CCIP_CONFIGURATION_STATUS.md)
- [Complete Task Catalog](./CCIP_COMPLETE_TASK_CATALOG.md)
---
**Last Updated**: 2025-01-12