Grafana Dashboards
This directory contains Grafana dashboard JSON files for monitoring the DBIS Core Banking System.
Dashboard List
1. System Health Dashboard (system-health.json)
Purpose: Overall system health and status monitoring
Key Metrics:
- Service health status
- Overall system availability
- Error rates (5xx, 4xx)
- CPU and memory usage by service
- Database connection pool status
- Active sessions
- Queue lengths
Refresh Interval: 30s
Tags: system, health, overview
2. API Performance Dashboard (api-performance.json)
Purpose: API endpoint performance and latency monitoring
Key Metrics:
- Request rate by endpoint
- Response time percentiles (P50, P95, P99)
- Error rate by endpoint
- Top endpoints by request volume
- Request distribution by method and status code
- SLO compliance (availability, latency)
- Request duration distribution
Refresh Interval: 30s
Tags: api, performance, latency
3. Ledger Operations Dashboard (ledger-operations.json)
Purpose: Ledger entry and settlement operations monitoring
Key Metrics:
- Ledger entry rate by ledger ID
- Ledger entry amount by ledger and currency
- Settlement rate by status
- Settlement duration percentiles
- Outbox queue status and processing rate
- Balance updates by currency
- Failed posting operations
- Total ledger entries, active accounts, pending settlements
Refresh Interval: 30s
Tags: ledger, transactions, settlement
4. Security & Compliance Dashboard (security-compliance.json)
Purpose: Security events and compliance monitoring
Key Metrics:
- Authentication failures by reason
- Authorization failures by resource and action
- Sanctions screening results
- AML risk score distribution
- Audit log events by type
- Policy violations by type
- Failed transactions by reason
- Encryption key rotation status
- Data access events (PII, Financial)
- Security incidents and compliance violations (24h)
Refresh Interval: 30s
Tags: security, compliance, audit
Installation
Import Dashboards to Grafana
-
Via Grafana UI:
- Navigate to Grafana → Dashboards → Import
- Upload the JSON file or paste JSON content
- Configure data source and settings
- Save dashboard
-
Via Grafana Provisioning:
Create a provisioning configuration file:
# grafana/provisioning/dashboards/dashboards.yml apiVersion: 1 providers: - name: 'DBIS Core Dashboards' orgId: 1 folder: 'DBIS Core' type: file disableDeletion: false updateIntervalSeconds: 10 allowUiUpdates: true options: path: /etc/grafana/dashboardsCopy dashboard files to the provisioned path:
cp dbis_core/monitoring/grafana/dashboards/*.json /etc/grafana/dashboards/ -
Via Grafana API:
# Import dashboard via API curl -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <grafana-api-key>" \ -d @system-health.json \ http://grafana:3000/api/dashboards/db
Configuration
Data Source Configuration
Ensure Prometheus data source is configured in Grafana:
- Navigate to Configuration → Data Sources
- Add Prometheus data source
- Set URL:
http://prometheus:9090 - Configure scrape interval and timeouts
Variable Configuration
Some dashboards may use variables for filtering:
$datasource: Prometheus data source$service: Service name filter (optional)$environment: Environment filter (optional)
Metrics Requirements
Prometheus Metrics
These dashboards expect the following Prometheus metrics to be exported:
System Metrics
up{job="dbis-core"}process_cpu_seconds_total{job="dbis-core"}process_resident_memory_bytes{job="dbis-core"}db_pool_size{job="dbis-core"}db_pool_active{job="dbis-core"}db_pool_idle{job="dbis-core"}
API Metrics
http_requests_total{job="dbis-core",endpoint,method,status}http_request_duration_seconds_bucket{job="dbis-core",endpoint,le}
Ledger Metrics
ledger_entries_total{ledger_id}ledger_entry_amount_total{ledger_id,currency_code}settlement_total{status}settlement_duration_seconds_bucket{le}dbis_outbox_queue_lengthoutbox_processed_total{status}balance_updates_total{currency_code}ledger_posting_errors_total{error_type}
Security Metrics
authentication_failures_total{reason}authorization_failures_total{resource,action}sanctions_screening_total{result}aml_risk_score_bucket{le}audit_log_events_total{event_type}policy_violations_total{policy_type,violation_type}transaction_failures_total{reason}data_access_events_total{data_type,operation}security_incidents_totalcompliance_violations_total
Alerting
Recommended Alerts
Based on these dashboards, configure alerts for:
-
System Health:
- Service down (
up{job="dbis-core"} == 0) - High error rate (
rate(http_requests_total{status=~"5.."}[5m]) > 0.05) - High memory usage (
process_resident_memory_bytes > 8GB) - Database connection pool exhausted (
db_pool_active >= db_pool_size * 0.9)
- Service down (
-
API Performance:
- P95 latency > 500ms
- Availability < 99.9%
- Error rate > 0.1%
-
Ledger Operations:
- Outbox queue length > 1000
- Settlement failure rate > 1%
- Failed posting operations > 10/min
-
Security & Compliance:
- Authentication failure rate > 5%
- Sanctions match detected
- AML risk score > 80
- Security incident detected
- Compliance violation detected
References
- Metrics Specification:
explorer-monorepo/docs/specs/observability/metrics-monitoring.md - Tracing Dashboard:
smom-dbis-138/monitoring/grafana/dashboards/tracing.json - OpenTelemetry Configuration:
smom-dbis-138/monitoring/opentelemetry/otel-collector.yaml