# Monitoring Setup Guide **Last Updated**: 2025-01-27 **Status**: Active This guide explains how to set up and configure the monitoring stack for the DeFi Oracle Meta Mainnet. ## Table of Contents - [Overview](#overview) - [Monitoring Stack](#monitoring-stack) - [Setup Instructions](#setup-instructions) - [Dashboards](#dashboards) - [Alerts](#alerts) - [Troubleshooting](#troubleshooting) ## Overview The monitoring stack consists of: - **Prometheus** - Metrics collection - **Grafana** - Visualization and dashboards - **Loki** - Log aggregation - **Alertmanager** - Alert routing and notification - **Jaeger** - Distributed tracing - **OpenTelemetry** - Observability framework ## Monitoring Stack ### Prometheus **Purpose**: Metrics collection and storage **Features**: - Scrapes metrics from all Besu nodes - Custom metrics for oracle updates - Alert rules for node health ### Grafana **Purpose**: Visualization and dashboards **Dashboards**: - Besu node health - Block production metrics - RPC performance metrics - Oracle feed status - CCIP monitoring ### Loki **Purpose**: Log aggregation **Features**: - Centralized log collection - Structured logging - Log retention policies ### Alertmanager **Purpose**: Alert routing and notification **Features**: - Alert routing - Notification channels (email, Slack, PagerDuty) - Alert inhibition rules ## Setup Instructions ### 1. Deploy Prometheus ```bash # Deploy Prometheus kubectl apply -f monitoring/k8s/prometheus.yaml # Verify deployment kubectl get pods -n monitoring -l app=prometheus ``` ### 2. Deploy Grafana ```bash # Deploy Grafana using Helm helm install grafana grafana/grafana -n monitoring # Get admin password kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode ``` ### 3. Deploy Loki ```bash # Deploy Loki kubectl apply -f monitoring/k8s/loki.yaml # Verify deployment kubectl get pods -n monitoring -l app=loki ``` ### 4. Deploy Alertmanager ```bash # Deploy Alertmanager kubectl apply -f monitoring/k8s/alertmanager.yaml # Verify deployment kubectl get pods -n monitoring -l app=alertmanager ``` ### 5. Configure Service Discovery Prometheus needs to discover Besu nodes: ```yaml # prometheus-config.yaml scrape_configs: - job_name: 'besu-nodes' kubernetes_sd_configs: - role: pod namespaces: names: - besu-network ``` ## Dashboards ### Besu Node Dashboard **Metrics**: - Block production rate - Transaction throughput - Gas usage - Peer connections - Sync status **Access**: Grafana → Dashboards → Besu Node Health ### RPC Performance Dashboard **Metrics**: - Request rate - Response time (p50, p95, p99) - Error rate - Method distribution **Access**: Grafana → Dashboards → RPC Performance ### Oracle Dashboard **Metrics**: - Update frequency - Round completion time - Deviation from sources - Transmitter status **Access**: Grafana → Dashboards → Oracle Status ### CCIP Dashboard **Metrics**: - Message throughput - Cross-chain latency - Fee accumulation - Error rate **Access**: Grafana → Dashboards → CCIP Monitoring ## Alerts ### Critical Alerts - **Node Down**: Besu node not responding - **Block Production Stopped**: No blocks produced in 30 seconds - **High Error Rate**: Error rate > 5% - **Oracle Down**: Oracle not updating ### Warning Alerts - **High Latency**: P95 latency > 300ms - **Low Throughput**: Throughput < 50% of normal - **High Gas Usage**: Gas usage > 80% of limit ### Alert Configuration ```yaml # alertmanager-config.yaml route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 12h receiver: 'default' routes: - match: severity: critical receiver: 'critical-alerts' ``` ## Troubleshooting ### Prometheus Not Scraping **Symptoms**: No metrics in Prometheus **Solution**: 1. Check service discovery configuration 2. Verify node labels match 3. Check network connectivity 4. Review Prometheus logs ### Grafana Not Showing Data **Symptoms**: Dashboards show "No data" **Solution**: 1. Verify Prometheus data source 2. Check query syntax 3. Verify time range 4. Check metric names ### Alerts Not Firing **Symptoms**: Conditions met but no alerts **Solution**: 1. Check alert rule syntax 2. Verify Alertmanager configuration 3. Check notification channels 4. Review Alertmanager logs ## Related Documentation - [Architecture Documentation](../architecture/ARCHITECTURE.md) - [Deployment Guide](../deployment/DEPLOYMENT.md) - [Troubleshooting Guide](../guides/TROUBLESHOOTING.md) --- **Last Updated**: 2025-01-27