Files
Sankofa/infrastructure/monitoring
defiQUG 9daf1fd378 Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution
- Enhance API schema with expanded type definitions and resolvers
- Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth
- Implement new services: AI optimization, billing, blockchain, compliance, marketplace
- Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage)
- Update Crossplane provider with enhanced VM management capabilities
- Add comprehensive test suite for API endpoints and services
- Update frontend components with improved GraphQL subscriptions and real-time updates
- Enhance security configurations and headers (CSP, CORS, etc.)
- Update documentation and configuration files
- Add new CI/CD workflows and validation scripts
- Implement design system improvements and UI enhancements
2025-12-12 18:01:35 -08:00
..

Infrastructure Monitoring

Comprehensive monitoring solutions for all infrastructure components in Sankofa Phoenix.

Overview

This directory contains monitoring components including custom Prometheus exporters, Grafana dashboards, and alerting rules for infrastructure monitoring.

Components

Exporters (exporters/)

Custom Prometheus exporters for:

  • Proxmox VE metrics
  • TP-Link Omada metrics
  • Network switch/router metrics
  • Infrastructure health checks

Dashboards (dashboards/)

Grafana dashboards for:

  • Infrastructure overview
  • Proxmox cluster health
  • Network performance
  • Omada controller status
  • Site-level monitoring

Exporters

Proxmox Exporter

The Proxmox exporter (pve_exporter) provides metrics for:

  • VM status and resource usage
  • Node health and performance
  • Storage pool utilization
  • Network interface statistics
  • Cluster status

Installation:

pip install pve_exporter

Configuration:

exporter:
  listen_address: 0.0.0.0:9221
  proxmox:
    endpoint: https://pve1.sankofa.nexus:8006
    username: monitoring@pam
    password: ${PROXMOX_PASSWORD}

Omada Exporter

Custom exporter for TP-Link Omada Controller metrics:

  • Access point status
  • Client device counts
  • Network throughput
  • Controller health

See: exporters/omada_exporter/ for implementation

Network Exporter

SNMP-based exporter for network devices:

  • Switch port statistics
  • Router interface metrics
  • VLAN utilization
  • Network topology changes

See: exporters/network_exporter/ for implementation

Dashboards

Infrastructure Overview

Comprehensive dashboard showing:

  • All sites status
  • Resource utilization
  • Health scores
  • Alert summary

Location: dashboards/infrastructure-overview.json

Proxmox Cluster

Dashboard for Proxmox clusters:

  • Cluster health
  • Node performance
  • VM resource usage
  • Storage utilization

Location: dashboards/proxmox-cluster.json

Network Performance

Network performance dashboard:

  • Bandwidth utilization
  • Latency metrics
  • Error rates
  • Top talkers

Location: dashboards/network-performance.json

Omada Controller

Omada-specific dashboard:

  • Controller status
  • Access point health
  • Client statistics
  • Network policies

Location: dashboards/omada-controller.json

Installation

Deploy Exporters

# Deploy all exporters
kubectl apply -f exporters/manifests/

# Or deploy individually
kubectl apply -f exporters/manifests/proxmox-exporter.yaml
kubectl apply -f exporters/manifests/omada-exporter.yaml

Import Dashboards

# Import all dashboards to Grafana
./scripts/import-dashboards.sh

# Or import individually
grafana-cli admin import-dashboard dashboards/infrastructure-overview.json

Configuration

Prometheus Scrape Configuration

scrape_configs:
  - job_name: 'proxmox'
    static_configs:
      - targets:
        - 'pve-exporter.monitoring.svc.cluster.local:9221'
  
  - job_name: 'omada'
    static_configs:
      - targets:
        - 'omada-exporter.monitoring.svc.cluster.local:9222'
  
  - job_name: 'network'
    static_configs:
      - targets:
        - 'network-exporter.monitoring.svc.cluster.local:9223'

Alerting Rules

Alert rules are defined in exporters/alert-rules/:

  • proxmox-alerts.yaml: Proxmox cluster alerts
  • omada-alerts.yaml: Omada controller alerts
  • network-alerts.yaml: Network infrastructure alerts

Metrics

Proxmox Metrics

  • pve_node_status: Node status (0=offline, 1=online)
  • pve_vm_status: VM status
  • pve_storage_used_bytes: Storage usage
  • pve_network_rx_bytes: Network receive bytes
  • pve_network_tx_bytes: Network transmit bytes

Omada Metrics

  • omada_ap_status: Access point status
  • omada_clients_total: Total client count
  • omada_throughput_bytes: Network throughput
  • omada_controller_status: Controller health

Network Metrics

  • network_port_status: Switch port status
  • network_port_rx_bytes: Port receive bytes
  • network_port_tx_bytes: Port transmit bytes
  • network_vlan_utilization: VLAN utilization

Alerts

Critical Alerts

  • Proxmox cluster node down
  • Omada controller unreachable
  • Network switch offline
  • High resource utilization (>90%)

Warning Alerts

  • High resource utilization (>80%)
  • Network latency spikes
  • Access point offline
  • Storage pool >80% full

Troubleshooting

Exporter Issues

# Check exporter status
kubectl get pods -n monitoring -l app=proxmox-exporter

# View exporter logs
kubectl logs -n monitoring -l app=proxmox-exporter

# Test exporter endpoint
curl http://proxmox-exporter.monitoring.svc.cluster.local:9221/metrics

Dashboard Issues

# Verify dashboard import
grafana-cli admin ls-dashboard

# Check dashboard data sources
# In Grafana UI: Configuration > Data Sources