Files

Grafana Dashboards

This directory contains Grafana dashboard JSON files for monitoring the DBIS Core Banking System.

Dashboard List

1. System Health Dashboard (system-health.json)

Purpose: Overall system health and status monitoring

Key Metrics:

  • Service health status
  • Overall system availability
  • Error rates (5xx, 4xx)
  • CPU and memory usage by service
  • Database connection pool status
  • Active sessions
  • Queue lengths

Refresh Interval: 30s

Tags: system, health, overview


2. API Performance Dashboard (api-performance.json)

Purpose: API endpoint performance and latency monitoring

Key Metrics:

  • Request rate by endpoint
  • Response time percentiles (P50, P95, P99)
  • Error rate by endpoint
  • Top endpoints by request volume
  • Request distribution by method and status code
  • SLO compliance (availability, latency)
  • Request duration distribution

Refresh Interval: 30s

Tags: api, performance, latency


3. Ledger Operations Dashboard (ledger-operations.json)

Purpose: Ledger entry and settlement operations monitoring

Key Metrics:

  • Ledger entry rate by ledger ID
  • Ledger entry amount by ledger and currency
  • Settlement rate by status
  • Settlement duration percentiles
  • Outbox queue status and processing rate
  • Balance updates by currency
  • Failed posting operations
  • Total ledger entries, active accounts, pending settlements

Refresh Interval: 30s

Tags: ledger, transactions, settlement


4. Security & Compliance Dashboard (security-compliance.json)

Purpose: Security events and compliance monitoring

Key Metrics:

  • Authentication failures by reason
  • Authorization failures by resource and action
  • Sanctions screening results
  • AML risk score distribution
  • Audit log events by type
  • Policy violations by type
  • Failed transactions by reason
  • Encryption key rotation status
  • Data access events (PII, Financial)
  • Security incidents and compliance violations (24h)

Refresh Interval: 30s

Tags: security, compliance, audit


Installation

Import Dashboards to Grafana

  1. Via Grafana UI:

    • Navigate to Grafana → Dashboards → Import
    • Upload the JSON file or paste JSON content
    • Configure data source and settings
    • Save dashboard
  2. Via Grafana Provisioning:

    Create a provisioning configuration file:

    # grafana/provisioning/dashboards/dashboards.yml
    apiVersion: 1
    
    providers:
      - name: 'DBIS Core Dashboards'
        orgId: 1
        folder: 'DBIS Core'
        type: file
        disableDeletion: false
        updateIntervalSeconds: 10
        allowUiUpdates: true
        options:
          path: /etc/grafana/dashboards
    

    Copy dashboard files to the provisioned path:

    cp dbis_core/monitoring/grafana/dashboards/*.json /etc/grafana/dashboards/
    
  3. Via Grafana API:

    # Import dashboard via API
    curl -X POST \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer <grafana-api-key>" \
      -d @system-health.json \
      http://grafana:3000/api/dashboards/db
    

Configuration

Data Source Configuration

Ensure Prometheus data source is configured in Grafana:

  1. Navigate to Configuration → Data Sources
  2. Add Prometheus data source
  3. Set URL: http://prometheus:9090
  4. Configure scrape interval and timeouts

Variable Configuration

Some dashboards may use variables for filtering:

  • $datasource: Prometheus data source
  • $service: Service name filter (optional)
  • $environment: Environment filter (optional)

Metrics Requirements

Prometheus Metrics

These dashboards expect the following Prometheus metrics to be exported:

System Metrics

  • up{job="dbis-core"}
  • process_cpu_seconds_total{job="dbis-core"}
  • process_resident_memory_bytes{job="dbis-core"}
  • db_pool_size{job="dbis-core"}
  • db_pool_active{job="dbis-core"}
  • db_pool_idle{job="dbis-core"}

API Metrics

  • http_requests_total{job="dbis-core",endpoint,method,status}
  • http_request_duration_seconds_bucket{job="dbis-core",endpoint,le}

Ledger Metrics

  • ledger_entries_total{ledger_id}
  • ledger_entry_amount_total{ledger_id,currency_code}
  • settlement_total{status}
  • settlement_duration_seconds_bucket{le}
  • dbis_outbox_queue_length
  • outbox_processed_total{status}
  • balance_updates_total{currency_code}
  • ledger_posting_errors_total{error_type}

Security Metrics

  • authentication_failures_total{reason}
  • authorization_failures_total{resource,action}
  • sanctions_screening_total{result}
  • aml_risk_score_bucket{le}
  • audit_log_events_total{event_type}
  • policy_violations_total{policy_type,violation_type}
  • transaction_failures_total{reason}
  • data_access_events_total{data_type,operation}
  • security_incidents_total
  • compliance_violations_total

Alerting

Based on these dashboards, configure alerts for:

  1. System Health:

    • Service down (up{job="dbis-core"} == 0)
    • High error rate (rate(http_requests_total{status=~"5.."}[5m]) > 0.05)
    • High memory usage (process_resident_memory_bytes > 8GB)
    • Database connection pool exhausted (db_pool_active >= db_pool_size * 0.9)
  2. API Performance:

    • P95 latency > 500ms
    • Availability < 99.9%
    • Error rate > 0.1%
  3. Ledger Operations:

    • Outbox queue length > 1000
    • Settlement failure rate > 1%
    • Failed posting operations > 10/min
  4. Security & Compliance:

    • Authentication failure rate > 5%
    • Sanctions match detected
    • AML risk score > 80
    • Security incident detected
    • Compliance violation detected

References

  • Metrics Specification: explorer-monorepo/docs/specs/observability/metrics-monitoring.md
  • Tracing Dashboard: smom-dbis-138/monitoring/grafana/dashboards/tracing.json
  • OpenTelemetry Configuration: smom-dbis-138/monitoring/opentelemetry/otel-collector.yaml