241 lines
4.8 KiB
Markdown
241 lines
4.8 KiB
Markdown
# CCIP Monitor Metrics Documentation
|
|
|
|
**Date**: 2025-01-12
|
|
**Network**: ChainID 138
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This document describes the metrics available from the CCIP Monitor service.
|
|
|
|
---
|
|
|
|
## CCIP Monitor Service
|
|
|
|
### Service Details
|
|
|
|
- **Container**: VMID 3501
|
|
- **Service**: `ccip-monitor`
|
|
- **Metrics Port**: 8000
|
|
- **Metrics Endpoint**: `http://localhost:8000/metrics`
|
|
|
|
---
|
|
|
|
## Available Metrics
|
|
|
|
### System Metrics
|
|
|
|
#### `ccip_monitor_up`
|
|
- **Type**: Gauge
|
|
- **Description**: Service availability (1 = up, 0 = down)
|
|
- **Labels**: None
|
|
|
|
#### `ccip_monitor_rpc_connected`
|
|
- **Type**: Gauge
|
|
- **Description**: RPC connection status (1 = connected, 0 = disconnected)
|
|
- **Labels**: None
|
|
|
|
---
|
|
|
|
### CCIP Message Metrics
|
|
|
|
#### `ccip_messages_sent_total`
|
|
- **Type**: Counter
|
|
- **Description**: Total number of CCIP messages sent
|
|
- **Labels**:
|
|
- `source_chain`: Source chain identifier
|
|
- `destination_chain`: Destination chain identifier
|
|
- `status`: Message status (success, failed)
|
|
|
|
#### `ccip_messages_received_total`
|
|
- **Type**: Counter
|
|
- **Description**: Total number of CCIP messages received
|
|
- **Labels**:
|
|
- `source_chain`: Source chain identifier
|
|
- `destination_chain`: Destination chain identifier
|
|
- `status`: Message status (success, failed)
|
|
|
|
#### `ccip_messages_pending`
|
|
- **Type**: Gauge
|
|
- **Description**: Number of pending CCIP messages
|
|
- **Labels**:
|
|
- `source_chain`: Source chain identifier
|
|
- `destination_chain`: Destination chain identifier
|
|
|
|
---
|
|
|
|
### Bridge Metrics
|
|
|
|
#### `bridge_transactions_total`
|
|
- **Type**: Counter
|
|
- **Description**: Total number of bridge transactions
|
|
- **Labels**:
|
|
- `bridge_type`: Bridge type (WETH9, WETH10)
|
|
- `destination_chain`: Destination chain identifier
|
|
- `status`: Transaction status (success, failed)
|
|
|
|
#### `bridge_token_amount_total`
|
|
- **Type**: Counter
|
|
- **Description**: Total amount of tokens bridged
|
|
- **Labels**:
|
|
- `bridge_type`: Bridge type (WETH9, WETH10)
|
|
- `destination_chain`: Destination chain identifier
|
|
- `token_type`: Token type
|
|
|
|
---
|
|
|
|
### Fee Metrics
|
|
|
|
#### `ccip_fees_paid_total`
|
|
- **Type**: Counter
|
|
- **Description**: Total CCIP fees paid
|
|
- **Labels**:
|
|
- `fee_token`: Fee token address
|
|
- `destination_chain`: Destination chain identifier
|
|
|
|
#### `ccip_fee_calculation_errors_total`
|
|
- **Type**: Counter
|
|
- **Description**: Total fee calculation errors
|
|
- **Labels**: None
|
|
|
|
---
|
|
|
|
### Error Metrics
|
|
|
|
#### `ccip_errors_total`
|
|
- **Type**: Counter
|
|
- **Description**: Total number of errors
|
|
- **Labels**:
|
|
- `error_type`: Error type
|
|
- `component`: Component where error occurred
|
|
|
|
---
|
|
|
|
## Querying Metrics
|
|
|
|
### Using curl
|
|
|
|
```bash
|
|
curl http://localhost:8000/metrics
|
|
```
|
|
|
|
### Using Prometheus
|
|
|
|
If Prometheus is configured to scrape the metrics endpoint:
|
|
|
|
```promql
|
|
# Service availability
|
|
ccip_monitor_up
|
|
|
|
# Total messages sent
|
|
sum(ccip_messages_sent_total)
|
|
|
|
# Pending messages
|
|
sum(ccip_messages_pending)
|
|
|
|
# Bridge transactions
|
|
sum(bridge_transactions_total)
|
|
```
|
|
|
|
---
|
|
|
|
## Metric Examples
|
|
|
|
### Example Metrics Output
|
|
|
|
```
|
|
# HELP ccip_monitor_up Service availability
|
|
# TYPE ccip_monitor_up gauge
|
|
ccip_monitor_up 1
|
|
|
|
# HELP ccip_messages_sent_total Total CCIP messages sent
|
|
# TYPE ccip_messages_sent_total counter
|
|
ccip_messages_sent_total{source_chain="138",destination_chain="1",status="success"} 10
|
|
ccip_messages_sent_total{source_chain="138",destination_chain="1",status="failed"} 1
|
|
|
|
# HELP bridge_transactions_total Total bridge transactions
|
|
# TYPE bridge_transactions_total counter
|
|
bridge_transactions_total{bridge_type="WETH9",destination_chain="1",status="success"} 5
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring Setup
|
|
|
|
### Prometheus Configuration
|
|
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: 'ccip-monitor'
|
|
static_configs:
|
|
- targets: ['localhost:8000']
|
|
```
|
|
|
|
### Grafana Dashboard
|
|
|
|
Create dashboard with:
|
|
- Service availability
|
|
- Message throughput
|
|
- Bridge transaction volume
|
|
- Error rates
|
|
- Fee usage
|
|
|
|
---
|
|
|
|
## Alerting
|
|
|
|
### Recommended Alerts
|
|
|
|
1. **Service Down**
|
|
- Alert when `ccip_monitor_up == 0`
|
|
- Severity: Critical
|
|
|
|
2. **High Error Rate**
|
|
- Alert when error rate exceeds threshold
|
|
- Severity: Warning
|
|
|
|
3. **Pending Messages**
|
|
- Alert when pending messages exceed threshold
|
|
- Severity: Warning
|
|
|
|
4. **RPC Disconnected**
|
|
- Alert when `ccip_monitor_rpc_connected == 0`
|
|
- Severity: Critical
|
|
|
|
---
|
|
|
|
## Health Check
|
|
|
|
### Using Health Check Script
|
|
|
|
```bash
|
|
./scripts/check-ccip-monitor-health.sh
|
|
```
|
|
|
|
### Manual Check
|
|
|
|
```bash
|
|
# Check service status
|
|
pct exec 3501 -- systemctl status ccip-monitor
|
|
|
|
# Check metrics endpoint
|
|
curl http://localhost:8000/metrics
|
|
|
|
# Check logs
|
|
pct exec 3501 -- journalctl -u ccip-monitor -n 50
|
|
```
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- [CCIP Operations Runbook](./CCIP_OPERATIONS_RUNBOOK.md) (Task 135)
|
|
- [CCIP Configuration Status](./CCIP_CONFIGURATION_STATUS.md)
|
|
- [Complete Task Catalog](./CCIP_COMPLETE_TASK_CATALOG.md)
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-01-12
|
|
|