- Introduced Aggregator.sol for Chainlink-compatible oracle functionality, including round-based updates and access control. - Added OracleWithCCIP.sol to extend Aggregator with CCIP cross-chain messaging capabilities. - Created .gitmodules to include OpenZeppelin contracts as a submodule. - Developed a comprehensive deployment guide in NEXT_STEPS_COMPLETE_GUIDE.md for Phase 2 and smart contract deployment. - Implemented Vite configuration for the orchestration portal, supporting both Vue and React frameworks. - Added server-side logic for the Multi-Cloud Orchestration Portal, including API endpoints for environment management and monitoring. - Created scripts for resource import and usage validation across non-US regions. - Added tests for CCIP error handling and integration to ensure robust functionality. - Included various new files and directories for the orchestration portal and deployment scripts.
203 lines
4.1 KiB
Markdown
203 lines
4.1 KiB
Markdown
# CCIP Operations Runbook
|
|
|
|
## Overview
|
|
|
|
This runbook provides operational procedures for managing CCIP (Chainlink Cross-Chain Interoperability Protocol) on the DeFi Oracle Meta Mainnet.
|
|
|
|
## Daily Operations
|
|
|
|
### Health Checks
|
|
|
|
1. **Check CCIP Router Status**
|
|
```bash
|
|
kubectl get pods -n besu-network -l app=ccip-router
|
|
kubectl logs -n besu-network -l app=ccip-router --tail=100
|
|
```
|
|
|
|
2. **Check Message Processing**
|
|
```bash
|
|
# Check recent messages
|
|
cast logs --from-block latest-100 --address $CCIP_SENDER --rpc-url $RPC_URL | grep MessageSent
|
|
```
|
|
|
|
3. **Monitor Metrics**
|
|
- Message send success rate
|
|
- Message delivery latency
|
|
- Fee consumption
|
|
- Error rates
|
|
|
|
### LINK Balance Monitoring
|
|
|
|
1. **Check Balance**
|
|
```bash
|
|
cast call $LINK_TOKEN "balanceOf(address)" $CCIP_SENDER --rpc-url $RPC_URL
|
|
```
|
|
|
|
2. **Set Alert Threshold**
|
|
- Alert when balance < 10 LINK
|
|
- Alert when balance < 5 LINK (critical)
|
|
|
|
3. **Refill Balance**
|
|
```bash
|
|
cast send $LINK_TOKEN "transfer(address,uint256)" $CCIP_SENDER $AMOUNT \
|
|
--rpc-url $RPC_URL --private-key $PRIVATE_KEY
|
|
```
|
|
|
|
## Weekly Operations
|
|
|
|
### Review Message Statistics
|
|
|
|
1. **Message Volume**
|
|
- Total messages sent
|
|
- Success rate
|
|
- Average latency
|
|
|
|
2. **Fee Analysis**
|
|
- Total fees spent
|
|
- Average fee per message
|
|
- Fee trends
|
|
|
|
3. **Error Analysis**
|
|
- Error types
|
|
- Error frequency
|
|
- Root causes
|
|
|
|
### Performance Review
|
|
|
|
1. **Latency Analysis**
|
|
- Average delivery time
|
|
- P95/P99 latency
|
|
- Outliers
|
|
|
|
2. **Throughput**
|
|
- Messages per hour/day
|
|
- Peak load times
|
|
- Capacity planning
|
|
|
|
## Monthly Operations
|
|
|
|
### Security Review
|
|
|
|
1. **Access Control Audit**
|
|
- Review authorized senders/receivers
|
|
- Check for unauthorized access
|
|
- Verify role assignments
|
|
|
|
2. **Message Validation**
|
|
- Review message format compliance
|
|
- Check for anomalies
|
|
- Verify replay protection
|
|
|
|
### Cost Optimization
|
|
|
|
1. **Fee Optimization**
|
|
- Review fee trends
|
|
- Identify optimization opportunities
|
|
- Implement improvements
|
|
|
|
2. **Message Optimization**
|
|
- Reduce message size
|
|
- Batch updates when possible
|
|
- Optimize encoding
|
|
|
|
## Incident Response
|
|
|
|
### Message Delivery Failure
|
|
|
|
1. **Identify Issue**
|
|
- Check message status
|
|
- Verify target chain status
|
|
- Check router logs
|
|
|
|
2. **Diagnose**
|
|
- Check LINK balance
|
|
- Verify router configuration
|
|
- Check target chain connectivity
|
|
|
|
3. **Resolve**
|
|
- Fix underlying issue
|
|
- Resend message if needed
|
|
- Update configuration if required
|
|
|
|
### High Error Rate
|
|
|
|
1. **Investigate**
|
|
- Check error logs
|
|
- Identify error patterns
|
|
- Review recent changes
|
|
|
|
2. **Mitigate**
|
|
- Pause sending if critical
|
|
- Fix root cause
|
|
- Resume operations
|
|
|
|
### Router Unavailable
|
|
|
|
1. **Check Status**
|
|
- Verify router deployment
|
|
- Check service health
|
|
- Review logs
|
|
|
|
2. **Recovery**
|
|
- Restart router if needed
|
|
- Verify connectivity
|
|
- Test message sending
|
|
|
|
## Maintenance
|
|
|
|
### Contract Upgrades
|
|
|
|
1. **Plan Upgrade**
|
|
- Review upgrade proposal
|
|
- Test in staging
|
|
- Schedule maintenance window
|
|
|
|
2. **Execute Upgrade**
|
|
- Pause operations
|
|
- Deploy new contracts
|
|
- Update configurations
|
|
- Resume operations
|
|
|
|
3. **Verify**
|
|
- Test message sending
|
|
- Verify message receiving
|
|
- Monitor for issues
|
|
|
|
### Configuration Changes
|
|
|
|
1. **Review Impact**
|
|
- Assess change impact
|
|
- Test in staging
|
|
- Plan rollback
|
|
|
|
2. **Apply Changes**
|
|
- Update configuration
|
|
- Monitor closely
|
|
- Verify functionality
|
|
|
|
## Monitoring
|
|
|
|
### Key Metrics
|
|
|
|
- `ccip_messages_sent_total`: Total messages sent
|
|
- `ccip_messages_received_total`: Total messages received
|
|
- `ccip_message_latency_seconds`: Message delivery latency
|
|
- `ccip_fees_total`: Total LINK spent on fees
|
|
- `ccip_errors_total`: Total errors
|
|
|
|
### Alerts
|
|
|
|
- High error rate (> 5%)
|
|
- Low success rate (< 95%)
|
|
- High latency (> 5 minutes)
|
|
- Low LINK balance (< 10 LINK)
|
|
- Router unavailable
|
|
|
|
## References
|
|
|
|
- [CCIP Integration Guide](../docs/CCIP_INTEGRATION.md)
|
|
- [CCIP Router Setup](../docs/CCIP_ROUTER_SETUP.md)
|
|
- [CCIP Troubleshooting](../docs/CCIP_TROUBLESHOOTING.md)
|
|
- [CCIP Incident Response](ccip-incident-response.md)
|
|
|