Files

defiQUG 1fb7266469 Add Oracle Aggregator and CCIP Integration

- Introduced Aggregator.sol for Chainlink-compatible oracle functionality, including round-based updates and access control.
- Added OracleWithCCIP.sol to extend Aggregator with CCIP cross-chain messaging capabilities.
- Created .gitmodules to include OpenZeppelin contracts as a submodule.
- Developed a comprehensive deployment guide in NEXT_STEPS_COMPLETE_GUIDE.md for Phase 2 and smart contract deployment.
- Implemented Vite configuration for the orchestration portal, supporting both Vue and React frameworks.
- Added server-side logic for the Multi-Cloud Orchestration Portal, including API endpoints for environment management and monitoring.
- Created scripts for resource import and usage validation across non-US regions.
- Added tests for CCIP error handling and integration to ensure robust functionality.
- Included various new files and directories for the orchestration portal and deployment scripts.

2025-12-12 14:57:48 -08:00

4.5 KiB

Raw Blame History

Monitoring Setup Guide

Last Updated: 2025-01-27
Status: Active

This guide explains how to set up and configure the monitoring stack for the DeFi Oracle Meta Mainnet.

Overview
Monitoring Stack
Setup Instructions
Dashboards
Alerts
Troubleshooting

Overview

The monitoring stack consists of:

Prometheus - Metrics collection
Grafana - Visualization and dashboards
Loki - Log aggregation
Alertmanager - Alert routing and notification
Jaeger - Distributed tracing
OpenTelemetry - Observability framework

Monitoring Stack

Prometheus

Purpose: Metrics collection and storage

Features:

Scrapes metrics from all Besu nodes
Custom metrics for oracle updates
Alert rules for node health

Grafana

Purpose: Visualization and dashboards

Dashboards:

Besu node health
Block production metrics
RPC performance metrics
Oracle feed status
CCIP monitoring

Loki

Purpose: Log aggregation

Features:

Centralized log collection
Structured logging
Log retention policies

Alertmanager

Purpose: Alert routing and notification

Features:

Alert routing
Notification channels (email, Slack, PagerDuty)
Alert inhibition rules

Setup Instructions

1. Deploy Prometheus

# Deploy Prometheus
kubectl apply -f monitoring/k8s/prometheus.yaml

# Verify deployment
kubectl get pods -n monitoring -l app=prometheus

2. Deploy Grafana

# Deploy Grafana using Helm
helm install grafana grafana/grafana -n monitoring

# Get admin password
kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode

3. Deploy Loki

# Deploy Loki
kubectl apply -f monitoring/k8s/loki.yaml

# Verify deployment
kubectl get pods -n monitoring -l app=loki

4. Deploy Alertmanager

# Deploy Alertmanager
kubectl apply -f monitoring/k8s/alertmanager.yaml

# Verify deployment
kubectl get pods -n monitoring -l app=alertmanager

5. Configure Service Discovery

Prometheus needs to discover Besu nodes:

# prometheus-config.yaml
scrape_configs:
  - job_name: 'besu-nodes'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - besu-network

Dashboards

Besu Node Dashboard

Metrics:

Block production rate
Transaction throughput
Gas usage
Peer connections
Sync status

Access: Grafana → Dashboards → Besu Node Health

RPC Performance Dashboard

Metrics:

Request rate
Response time (p50, p95, p99)
Error rate
Method distribution

Access: Grafana → Dashboards → RPC Performance

Oracle Dashboard

Metrics:

Update frequency
Round completion time
Deviation from sources
Transmitter status

Access: Grafana → Dashboards → Oracle Status

CCIP Dashboard

Metrics:

Message throughput
Cross-chain latency
Fee accumulation
Error rate

Access: Grafana → Dashboards → CCIP Monitoring

Alerts

Critical Alerts

Node Down: Besu node not responding
Block Production Stopped: No blocks produced in 30 seconds
High Error Rate: Error rate > 5%
Oracle Down: Oracle not updating

Warning Alerts

High Latency: P95 latency > 300ms
Low Throughput: Throughput < 50% of normal
High Gas Usage: Gas usage > 80% of limit

Alert Configuration

# alertmanager-config.yaml
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'default'
  routes:
    - match:
        severity: critical
      receiver: 'critical-alerts'

Troubleshooting

Prometheus Not Scraping

Symptoms: No metrics in Prometheus

Solution:

Check service discovery configuration
Verify node labels match
Check network connectivity
Review Prometheus logs

Grafana Not Showing Data

Symptoms: Dashboards show "No data"

Solution:

Verify Prometheus data source
Check query syntax
Verify time range
Check metric names

Alerts Not Firing

Symptoms: Conditions met but no alerts

Solution:

Check alert rule syntax
Verify Alertmanager configuration
Check notification channels
Review Alertmanager logs

Last Updated: 2025-01-27

4.5 KiB Raw Blame History

Monitoring Setup Guide

Table of Contents

Overview

Monitoring Stack

Prometheus

Grafana

Loki

Alertmanager

Setup Instructions

1. Deploy Prometheus

2. Deploy Grafana

3. Deploy Loki

4. Deploy Alertmanager

5. Configure Service Discovery

Dashboards

Besu Node Dashboard

RPC Performance Dashboard

Oracle Dashboard

CCIP Dashboard

Alerts

Critical Alerts

Warning Alerts

Alert Configuration

Troubleshooting

Prometheus Not Scraping

Grafana Not Showing Data

Alerts Not Firing

Related Documentation

4.5 KiB

Raw Blame History