203 lines
4.1 KiB
Markdown
203 lines
4.1 KiB
Markdown
|
|
# CCIP Operations Runbook
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
This runbook provides operational procedures for managing CCIP (Chainlink Cross-Chain Interoperability Protocol) on the DeFi Oracle Meta Mainnet.
|
||
|
|
|
||
|
|
## Daily Operations
|
||
|
|
|
||
|
|
### Health Checks
|
||
|
|
|
||
|
|
1. **Check CCIP Router Status**
|
||
|
|
```bash
|
||
|
|
kubectl get pods -n besu-network -l app=ccip-router
|
||
|
|
kubectl logs -n besu-network -l app=ccip-router --tail=100
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Check Message Processing**
|
||
|
|
```bash
|
||
|
|
# Check recent messages
|
||
|
|
cast logs --from-block latest-100 --address $CCIP_SENDER --rpc-url $RPC_URL | grep MessageSent
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Monitor Metrics**
|
||
|
|
- Message send success rate
|
||
|
|
- Message delivery latency
|
||
|
|
- Fee consumption
|
||
|
|
- Error rates
|
||
|
|
|
||
|
|
### LINK Balance Monitoring
|
||
|
|
|
||
|
|
1. **Check Balance**
|
||
|
|
```bash
|
||
|
|
cast call $LINK_TOKEN "balanceOf(address)" $CCIP_SENDER --rpc-url $RPC_URL
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Set Alert Threshold**
|
||
|
|
- Alert when balance < 10 LINK
|
||
|
|
- Alert when balance < 5 LINK (critical)
|
||
|
|
|
||
|
|
3. **Refill Balance**
|
||
|
|
```bash
|
||
|
|
cast send $LINK_TOKEN "transfer(address,uint256)" $CCIP_SENDER $AMOUNT \
|
||
|
|
--rpc-url $RPC_URL --private-key $PRIVATE_KEY
|
||
|
|
```
|
||
|
|
|
||
|
|
## Weekly Operations
|
||
|
|
|
||
|
|
### Review Message Statistics
|
||
|
|
|
||
|
|
1. **Message Volume**
|
||
|
|
- Total messages sent
|
||
|
|
- Success rate
|
||
|
|
- Average latency
|
||
|
|
|
||
|
|
2. **Fee Analysis**
|
||
|
|
- Total fees spent
|
||
|
|
- Average fee per message
|
||
|
|
- Fee trends
|
||
|
|
|
||
|
|
3. **Error Analysis**
|
||
|
|
- Error types
|
||
|
|
- Error frequency
|
||
|
|
- Root causes
|
||
|
|
|
||
|
|
### Performance Review
|
||
|
|
|
||
|
|
1. **Latency Analysis**
|
||
|
|
- Average delivery time
|
||
|
|
- P95/P99 latency
|
||
|
|
- Outliers
|
||
|
|
|
||
|
|
2. **Throughput**
|
||
|
|
- Messages per hour/day
|
||
|
|
- Peak load times
|
||
|
|
- Capacity planning
|
||
|
|
|
||
|
|
## Monthly Operations
|
||
|
|
|
||
|
|
### Security Review
|
||
|
|
|
||
|
|
1. **Access Control Audit**
|
||
|
|
- Review authorized senders/receivers
|
||
|
|
- Check for unauthorized access
|
||
|
|
- Verify role assignments
|
||
|
|
|
||
|
|
2. **Message Validation**
|
||
|
|
- Review message format compliance
|
||
|
|
- Check for anomalies
|
||
|
|
- Verify replay protection
|
||
|
|
|
||
|
|
### Cost Optimization
|
||
|
|
|
||
|
|
1. **Fee Optimization**
|
||
|
|
- Review fee trends
|
||
|
|
- Identify optimization opportunities
|
||
|
|
- Implement improvements
|
||
|
|
|
||
|
|
2. **Message Optimization**
|
||
|
|
- Reduce message size
|
||
|
|
- Batch updates when possible
|
||
|
|
- Optimize encoding
|
||
|
|
|
||
|
|
## Incident Response
|
||
|
|
|
||
|
|
### Message Delivery Failure
|
||
|
|
|
||
|
|
1. **Identify Issue**
|
||
|
|
- Check message status
|
||
|
|
- Verify target chain status
|
||
|
|
- Check router logs
|
||
|
|
|
||
|
|
2. **Diagnose**
|
||
|
|
- Check LINK balance
|
||
|
|
- Verify router configuration
|
||
|
|
- Check target chain connectivity
|
||
|
|
|
||
|
|
3. **Resolve**
|
||
|
|
- Fix underlying issue
|
||
|
|
- Resend message if needed
|
||
|
|
- Update configuration if required
|
||
|
|
|
||
|
|
### High Error Rate
|
||
|
|
|
||
|
|
1. **Investigate**
|
||
|
|
- Check error logs
|
||
|
|
- Identify error patterns
|
||
|
|
- Review recent changes
|
||
|
|
|
||
|
|
2. **Mitigate**
|
||
|
|
- Pause sending if critical
|
||
|
|
- Fix root cause
|
||
|
|
- Resume operations
|
||
|
|
|
||
|
|
### Router Unavailable
|
||
|
|
|
||
|
|
1. **Check Status**
|
||
|
|
- Verify router deployment
|
||
|
|
- Check service health
|
||
|
|
- Review logs
|
||
|
|
|
||
|
|
2. **Recovery**
|
||
|
|
- Restart router if needed
|
||
|
|
- Verify connectivity
|
||
|
|
- Test message sending
|
||
|
|
|
||
|
|
## Maintenance
|
||
|
|
|
||
|
|
### Contract Upgrades
|
||
|
|
|
||
|
|
1. **Plan Upgrade**
|
||
|
|
- Review upgrade proposal
|
||
|
|
- Test in staging
|
||
|
|
- Schedule maintenance window
|
||
|
|
|
||
|
|
2. **Execute Upgrade**
|
||
|
|
- Pause operations
|
||
|
|
- Deploy new contracts
|
||
|
|
- Update configurations
|
||
|
|
- Resume operations
|
||
|
|
|
||
|
|
3. **Verify**
|
||
|
|
- Test message sending
|
||
|
|
- Verify message receiving
|
||
|
|
- Monitor for issues
|
||
|
|
|
||
|
|
### Configuration Changes
|
||
|
|
|
||
|
|
1. **Review Impact**
|
||
|
|
- Assess change impact
|
||
|
|
- Test in staging
|
||
|
|
- Plan rollback
|
||
|
|
|
||
|
|
2. **Apply Changes**
|
||
|
|
- Update configuration
|
||
|
|
- Monitor closely
|
||
|
|
- Verify functionality
|
||
|
|
|
||
|
|
## Monitoring
|
||
|
|
|
||
|
|
### Key Metrics
|
||
|
|
|
||
|
|
- `ccip_messages_sent_total`: Total messages sent
|
||
|
|
- `ccip_messages_received_total`: Total messages received
|
||
|
|
- `ccip_message_latency_seconds`: Message delivery latency
|
||
|
|
- `ccip_fees_total`: Total LINK spent on fees
|
||
|
|
- `ccip_errors_total`: Total errors
|
||
|
|
|
||
|
|
### Alerts
|
||
|
|
|
||
|
|
- High error rate (> 5%)
|
||
|
|
- Low success rate (< 95%)
|
||
|
|
- High latency (> 5 minutes)
|
||
|
|
- Low LINK balance (< 10 LINK)
|
||
|
|
- Router unavailable
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- [CCIP Integration Guide](../docs/CCIP_INTEGRATION.md)
|
||
|
|
- [CCIP Router Setup](../docs/CCIP_ROUTER_SETUP.md)
|
||
|
|
- [CCIP Troubleshooting](../docs/CCIP_TROUBLESHOOTING.md)
|
||
|
|
- [CCIP Incident Response](ccip-incident-response.md)
|
||
|
|
|