Files
proxmox/scripts/cloudflare-tunnels/docs/MONITORING_GUIDE.md
defiQUG cb47cce074 Complete markdown files cleanup and organization
- Organized 252 files across project
- Root directory: 187 → 2 files (98.9% reduction)
- Moved configuration guides to docs/04-configuration/
- Moved troubleshooting guides to docs/09-troubleshooting/
- Moved quick start guides to docs/01-getting-started/
- Moved reports to reports/ directory
- Archived temporary files
- Generated comprehensive reports and documentation
- Created maintenance scripts and guides

All files organized according to established standards.
2026-01-06 01:46:25 -08:00

7.1 KiB

Monitoring Guide

Complete guide for monitoring Cloudflare tunnels.

Overview

Monitoring ensures your tunnels are healthy and alerts you to issues before they impact users.

Monitoring Components

  1. Health Checks - Verify tunnels are running
  2. Connectivity Tests - Verify DNS and HTTPS work
  3. Log Monitoring - Watch for errors
  4. Alerting - Notify on failures

Quick Start

One-Time Health Check

./scripts/check-tunnel-health.sh

Continuous Monitoring

# Foreground (see output)
./scripts/monitor-tunnels.sh

# Background (daemon mode)
./scripts/monitor-tunnels.sh --daemon

Health Check Script

The check-tunnel-health.sh script performs comprehensive checks:

Checks Performed

  1. Service Status - Is the systemd service running?
  2. Log Errors - Are there recent errors in logs?
  3. DNS Resolution - Does DNS resolve correctly?
  4. HTTPS Connectivity - Can we connect via HTTPS?
  5. Internal Connectivity - Can VMID 102 reach Proxmox hosts?

Usage

# Run health check
./scripts/check-tunnel-health.sh

# Output shows:
# - Service status for each tunnel
# - DNS resolution status
# - HTTPS connectivity
# - Internal connectivity
# - Recent errors

Example Output

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Tunnel: ml110 (ml110-01.d-bis.org)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[✓] Service is running
[✓] No recent errors in logs
[✓] DNS resolution: OK
  → 104.16.132.229
[✓] HTTPS connectivity: OK
[✓] Internal connectivity to 192.168.11.10:8006: OK

Monitoring Script

The monitor-tunnels.sh script provides continuous monitoring:

Features

  • Continuous health checks
  • Automatic restart on failure
  • Alerting on failures
  • Logging to file
  • Daemon mode support

Usage

# Foreground mode (see output)
./scripts/monitor-tunnels.sh

# Daemon mode (background)
./scripts/monitor-tunnels.sh --daemon

# Check if daemon is running
ps aux | grep monitor-tunnels

# Stop daemon
kill $(cat /tmp/cloudflared-monitor.pid)

Configuration

Edit the script to customize:

CHECK_INTERVAL=60        # Check every 60 seconds
LOG_FILE="/var/log/cloudflared-monitor.log"
ALERT_SCRIPT="./scripts/alert-tunnel-failure.sh"

Alerting

Email Alerts

Configure email alerts in alert-tunnel-failure.sh:

# Set email address
export ALERT_EMAIL="admin@yourdomain.com"

# Ensure mail/sendmail is installed
apt-get install -y mailutils

Webhook Alerts

Configure webhook alerts (Slack, Discord, etc.):

# Set webhook URL
export ALERT_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

Test Alerts

# Test alert script
./scripts/alert-tunnel-failure.sh ml110 service_down

Log Monitoring

View Logs

# All tunnels
journalctl -u cloudflared-* -f

# Specific tunnel
journalctl -u cloudflared-ml110 -f

# Last 100 lines
journalctl -u cloudflared-ml110 -n 100

# Since specific time
journalctl -u cloudflared-ml110 --since "1 hour ago"

Log Rotation

Systemd handles log rotation automatically. To customize:

# Edit logrotate config
sudo nano /etc/logrotate.d/cloudflared

# Add:
/var/log/cloudflared/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
}

Metrics

Cloudflare Dashboard

View tunnel metrics in Cloudflare dashboard:

  1. Go to: Zero Trust → Networks → Tunnels
  2. Click on tunnel to view:
    • Connection status
    • Uptime
    • Traffic statistics
    • Error rates

Local Metrics

Tunnels expose metrics endpoints (if configured):

# ml110 tunnel metrics
curl http://127.0.0.1:9091/metrics

# r630-01 tunnel metrics
curl http://127.0.0.1:9092/metrics

# r630-02 tunnel metrics
curl http://127.0.0.1:9093/metrics

Automated Monitoring Setup

Create a systemd timer for automated health checks:

# Create timer unit
sudo nano /etc/systemd/system/cloudflared-healthcheck.timer

# Add:
[Unit]
Description=Cloudflare Tunnel Health Check Timer
Requires=cloudflared-healthcheck.service

[Timer]
OnBootSec=5min
OnUnitActiveSec=5min
Unit=cloudflared-healthcheck.service

[Install]
WantedBy=timers.target
# Create service unit
sudo nano /etc/systemd/system/cloudflared-healthcheck.service

# Add:
[Unit]
Description=Cloudflare Tunnel Health Check
After=network.target

[Service]
Type=oneshot
ExecStart=/path/to/scripts/check-tunnel-health.sh
StandardOutput=journal
StandardError=journal
# Enable and start
sudo systemctl enable cloudflared-healthcheck.timer
sudo systemctl start cloudflared-healthcheck.timer

Cron Job (Alternative)

# Edit crontab
crontab -e

# Add (check every 5 minutes):
*/5 * * * * /path/to/scripts/check-tunnel-health.sh >> /var/log/tunnel-health.log 2>&1

Monitoring Best Practices

  1. Run health checks regularly - At least every 5 minutes
  2. Monitor logs - Watch for errors
  3. Set up alerts - Get notified immediately on failures
  4. Review metrics - Track trends over time
  5. Test alerts - Verify alerting works
  6. Document incidents - Keep track of issues

Integration with Monitoring Systems

Prometheus

If using Prometheus, you can scrape tunnel metrics:

# prometheus.yml
scrape_configs:
  - job_name: 'cloudflared'
    static_configs:
      - targets: ['127.0.0.1:9091', '127.0.0.1:9092', '127.0.0.1:9093']

Grafana

Create dashboards in Grafana:

  • Tunnel uptime
  • Connection status
  • Error rates
  • Response times

Nagios/Icinga

Create service checks:

# Check service status
check_nrpe -H localhost -c check_cloudflared_ml110

# Check connectivity
check_http -H ml110-01.d-bis.org -S

Troubleshooting Monitoring

Health Check Fails

# Run manually with verbose output
bash -x ./scripts/check-tunnel-health.sh

# Check individual components
systemctl status cloudflared-ml110
dig ml110-01.d-bis.org
curl -I https://ml110-01.d-bis.org

Monitor Script Not Working

# Check if daemon is running
ps aux | grep monitor-tunnels

# Check log file
tail -f /var/log/cloudflared-monitor.log

# Run in foreground to see errors
./scripts/monitor-tunnels.sh

Alerts Not Sending

# Test alert script
./scripts/alert-tunnel-failure.sh ml110 service_down

# Check email configuration
echo "Test" | mail -s "Test" admin@yourdomain.com

# Check webhook
curl -X POST -H "Content-Type: application/json" \
  -d '{"text":"test"}' $ALERT_WEBHOOK

Next Steps

After setting up monitoring:

  1. Verify health checks run successfully
  2. Test alerting (trigger a test failure)
  3. Set up log aggregation (if needed)
  4. Create dashboards (if using Grafana)
  5. Document monitoring procedures

Support

For monitoring issues:

  1. Check Troubleshooting Guide
  2. Review script logs
  3. Test components individually
  4. Check systemd service status