- Introduced Aggregator.sol for Chainlink-compatible oracle functionality, including round-based updates and access control. - Added OracleWithCCIP.sol to extend Aggregator with CCIP cross-chain messaging capabilities. - Created .gitmodules to include OpenZeppelin contracts as a submodule. - Developed a comprehensive deployment guide in NEXT_STEPS_COMPLETE_GUIDE.md for Phase 2 and smart contract deployment. - Implemented Vite configuration for the orchestration portal, supporting both Vue and React frameworks. - Added server-side logic for the Multi-Cloud Orchestration Portal, including API endpoints for environment management and monitoring. - Created scripts for resource import and usage validation across non-US regions. - Added tests for CCIP error handling and integration to ensure robust functionality. - Included various new files and directories for the orchestration portal and deployment scripts.
4.1 KiB
4.1 KiB
CCIP Operations Runbook
Overview
This runbook provides operational procedures for managing CCIP (Chainlink Cross-Chain Interoperability Protocol) on the DeFi Oracle Meta Mainnet.
Daily Operations
Health Checks
-
Check CCIP Router Status
kubectl get pods -n besu-network -l app=ccip-router kubectl logs -n besu-network -l app=ccip-router --tail=100 -
Check Message Processing
# Check recent messages cast logs --from-block latest-100 --address $CCIP_SENDER --rpc-url $RPC_URL | grep MessageSent -
Monitor Metrics
- Message send success rate
- Message delivery latency
- Fee consumption
- Error rates
LINK Balance Monitoring
-
Check Balance
cast call $LINK_TOKEN "balanceOf(address)" $CCIP_SENDER --rpc-url $RPC_URL -
Set Alert Threshold
- Alert when balance < 10 LINK
- Alert when balance < 5 LINK (critical)
-
Refill Balance
cast send $LINK_TOKEN "transfer(address,uint256)" $CCIP_SENDER $AMOUNT \ --rpc-url $RPC_URL --private-key $PRIVATE_KEY
Weekly Operations
Review Message Statistics
-
Message Volume
- Total messages sent
- Success rate
- Average latency
-
Fee Analysis
- Total fees spent
- Average fee per message
- Fee trends
-
Error Analysis
- Error types
- Error frequency
- Root causes
Performance Review
-
Latency Analysis
- Average delivery time
- P95/P99 latency
- Outliers
-
Throughput
- Messages per hour/day
- Peak load times
- Capacity planning
Monthly Operations
Security Review
-
Access Control Audit
- Review authorized senders/receivers
- Check for unauthorized access
- Verify role assignments
-
Message Validation
- Review message format compliance
- Check for anomalies
- Verify replay protection
Cost Optimization
-
Fee Optimization
- Review fee trends
- Identify optimization opportunities
- Implement improvements
-
Message Optimization
- Reduce message size
- Batch updates when possible
- Optimize encoding
Incident Response
Message Delivery Failure
-
Identify Issue
- Check message status
- Verify target chain status
- Check router logs
-
Diagnose
- Check LINK balance
- Verify router configuration
- Check target chain connectivity
-
Resolve
- Fix underlying issue
- Resend message if needed
- Update configuration if required
High Error Rate
-
Investigate
- Check error logs
- Identify error patterns
- Review recent changes
-
Mitigate
- Pause sending if critical
- Fix root cause
- Resume operations
Router Unavailable
-
Check Status
- Verify router deployment
- Check service health
- Review logs
-
Recovery
- Restart router if needed
- Verify connectivity
- Test message sending
Maintenance
Contract Upgrades
-
Plan Upgrade
- Review upgrade proposal
- Test in staging
- Schedule maintenance window
-
Execute Upgrade
- Pause operations
- Deploy new contracts
- Update configurations
- Resume operations
-
Verify
- Test message sending
- Verify message receiving
- Monitor for issues
Configuration Changes
-
Review Impact
- Assess change impact
- Test in staging
- Plan rollback
-
Apply Changes
- Update configuration
- Monitor closely
- Verify functionality
Monitoring
Key Metrics
ccip_messages_sent_total: Total messages sentccip_messages_received_total: Total messages receivedccip_message_latency_seconds: Message delivery latencyccip_fees_total: Total LINK spent on feesccip_errors_total: Total errors
Alerts
- High error rate (> 5%)
- Low success rate (< 95%)
- High latency (> 5 minutes)
- Low LINK balance (< 10 LINK)
- Router unavailable