Files
Sankofa/infrastructure/monitoring/README.md
defiQUG 9daf1fd378 Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution
- Enhance API schema with expanded type definitions and resolvers
- Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth
- Implement new services: AI optimization, billing, blockchain, compliance, marketplace
- Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage)
- Update Crossplane provider with enhanced VM management capabilities
- Add comprehensive test suite for API endpoints and services
- Update frontend components with improved GraphQL subscriptions and real-time updates
- Enhance security configurations and headers (CSP, CORS, etc.)
- Update documentation and configuration files
- Add new CI/CD workflows and validation scripts
- Implement design system improvements and UI enhancements
2025-12-12 18:01:35 -08:00

241 lines
5.0 KiB
Markdown

# Infrastructure Monitoring
Comprehensive monitoring solutions for all infrastructure components in Sankofa Phoenix.
## Overview
This directory contains monitoring components including custom Prometheus exporters, Grafana dashboards, and alerting rules for infrastructure monitoring.
## Components
### Exporters (`exporters/`)
Custom Prometheus exporters for:
- Proxmox VE metrics
- TP-Link Omada metrics
- Network switch/router metrics
- Infrastructure health checks
### Dashboards (`dashboards/`)
Grafana dashboards for:
- Infrastructure overview
- Proxmox cluster health
- Network performance
- Omada controller status
- Site-level monitoring
## Exporters
### Proxmox Exporter
The Proxmox exporter (`pve_exporter`) provides metrics for:
- VM status and resource usage
- Node health and performance
- Storage pool utilization
- Network interface statistics
- Cluster status
**Installation:**
```bash
pip install pve_exporter
```
**Configuration:**
```yaml
exporter:
listen_address: 0.0.0.0:9221
proxmox:
endpoint: https://pve1.sankofa.nexus:8006
username: monitoring@pam
password: ${PROXMOX_PASSWORD}
```
### Omada Exporter
Custom exporter for TP-Link Omada Controller metrics:
- Access point status
- Client device counts
- Network throughput
- Controller health
**See**: `exporters/omada_exporter/` for implementation
### Network Exporter
SNMP-based exporter for network devices:
- Switch port statistics
- Router interface metrics
- VLAN utilization
- Network topology changes
**See**: `exporters/network_exporter/` for implementation
## Dashboards
### Infrastructure Overview
Comprehensive dashboard showing:
- All sites status
- Resource utilization
- Health scores
- Alert summary
**Location**: `dashboards/infrastructure-overview.json`
### Proxmox Cluster
Dashboard for Proxmox clusters:
- Cluster health
- Node performance
- VM resource usage
- Storage utilization
**Location**: `dashboards/proxmox-cluster.json`
### Network Performance
Network performance dashboard:
- Bandwidth utilization
- Latency metrics
- Error rates
- Top talkers
**Location**: `dashboards/network-performance.json`
### Omada Controller
Omada-specific dashboard:
- Controller status
- Access point health
- Client statistics
- Network policies
**Location**: `dashboards/omada-controller.json`
## Installation
### Deploy Exporters
```bash
# Deploy all exporters
kubectl apply -f exporters/manifests/
# Or deploy individually
kubectl apply -f exporters/manifests/proxmox-exporter.yaml
kubectl apply -f exporters/manifests/omada-exporter.yaml
```
### Import Dashboards
```bash
# Import all dashboards to Grafana
./scripts/import-dashboards.sh
# Or import individually
grafana-cli admin import-dashboard dashboards/infrastructure-overview.json
```
## Configuration
### Prometheus Scrape Configuration
```yaml
scrape_configs:
- job_name: 'proxmox'
static_configs:
- targets:
- 'pve-exporter.monitoring.svc.cluster.local:9221'
- job_name: 'omada'
static_configs:
- targets:
- 'omada-exporter.monitoring.svc.cluster.local:9222'
- job_name: 'network'
static_configs:
- targets:
- 'network-exporter.monitoring.svc.cluster.local:9223'
```
### Alerting Rules
Alert rules are defined in `exporters/alert-rules/`:
- `proxmox-alerts.yaml`: Proxmox cluster alerts
- `omada-alerts.yaml`: Omada controller alerts
- `network-alerts.yaml`: Network infrastructure alerts
## Metrics
### Proxmox Metrics
- `pve_node_status`: Node status (0=offline, 1=online)
- `pve_vm_status`: VM status
- `pve_storage_used_bytes`: Storage usage
- `pve_network_rx_bytes`: Network receive bytes
- `pve_network_tx_bytes`: Network transmit bytes
### Omada Metrics
- `omada_ap_status`: Access point status
- `omada_clients_total`: Total client count
- `omada_throughput_bytes`: Network throughput
- `omada_controller_status`: Controller health
### Network Metrics
- `network_port_status`: Switch port status
- `network_port_rx_bytes`: Port receive bytes
- `network_port_tx_bytes`: Port transmit bytes
- `network_vlan_utilization`: VLAN utilization
## Alerts
### Critical Alerts
- Proxmox cluster node down
- Omada controller unreachable
- Network switch offline
- High resource utilization (>90%)
### Warning Alerts
- High resource utilization (>80%)
- Network latency spikes
- Access point offline
- Storage pool >80% full
## Troubleshooting
### Exporter Issues
```bash
# Check exporter status
kubectl get pods -n monitoring -l app=proxmox-exporter
# View exporter logs
kubectl logs -n monitoring -l app=proxmox-exporter
# Test exporter endpoint
curl http://proxmox-exporter.monitoring.svc.cluster.local:9221/metrics
```
### Dashboard Issues
```bash
# Verify dashboard import
grafana-cli admin ls-dashboard
# Check dashboard data sources
# In Grafana UI: Configuration > Data Sources
```
## Related Documentation
- [Proxmox Management](../proxmox/README.md)
- [Omada Management](../omada/README.md)
- [Network Management](../network/README.md)
- [Infrastructure Management](../README.md)