# Monitoring Guide Complete guide for monitoring Cloudflare tunnels. ## Overview Monitoring ensures your tunnels are healthy and alerts you to issues before they impact users. ## Monitoring Components 1. **Health Checks** - Verify tunnels are running 2. **Connectivity Tests** - Verify DNS and HTTPS work 3. **Log Monitoring** - Watch for errors 4. **Alerting** - Notify on failures ## Quick Start ### One-Time Health Check ```bash ./scripts/check-tunnel-health.sh ``` ### Continuous Monitoring ```bash # Foreground (see output) ./scripts/monitor-tunnels.sh # Background (daemon mode) ./scripts/monitor-tunnels.sh --daemon ``` ## Health Check Script The `check-tunnel-health.sh` script performs comprehensive checks: ### Checks Performed 1. **Service Status** - Is the systemd service running? 2. **Log Errors** - Are there recent errors in logs? 3. **DNS Resolution** - Does DNS resolve correctly? 4. **HTTPS Connectivity** - Can we connect via HTTPS? 5. **Internal Connectivity** - Can VMID 102 reach Proxmox hosts? ### Usage ```bash # Run health check ./scripts/check-tunnel-health.sh # Output shows: # - Service status for each tunnel # - DNS resolution status # - HTTPS connectivity # - Internal connectivity # - Recent errors ``` ### Example Output ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Tunnel: ml110 (ml110-01.d-bis.org) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ [✓] Service is running [✓] No recent errors in logs [✓] DNS resolution: OK → 104.16.132.229 [✓] HTTPS connectivity: OK [✓] Internal connectivity to 192.168.11.10:8006: OK ``` ## Monitoring Script The `monitor-tunnels.sh` script provides continuous monitoring: ### Features - ✅ Continuous health checks - ✅ Automatic restart on failure - ✅ Alerting on failures - ✅ Logging to file - ✅ Daemon mode support ### Usage ```bash # Foreground mode (see output) ./scripts/monitor-tunnels.sh # Daemon mode (background) ./scripts/monitor-tunnels.sh --daemon # Check if daemon is running ps aux | grep monitor-tunnels # Stop daemon kill $(cat /tmp/cloudflared-monitor.pid) ``` ### Configuration Edit the script to customize: ```bash CHECK_INTERVAL=60 # Check every 60 seconds LOG_FILE="/var/log/cloudflared-monitor.log" ALERT_SCRIPT="./scripts/alert-tunnel-failure.sh" ``` ## Alerting ### Email Alerts Configure email alerts in `alert-tunnel-failure.sh`: ```bash # Set email address export ALERT_EMAIL="admin@yourdomain.com" # Ensure mail/sendmail is installed apt-get install -y mailutils ``` ### Webhook Alerts Configure webhook alerts (Slack, Discord, etc.): ```bash # Set webhook URL export ALERT_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL" ``` ### Test Alerts ```bash # Test alert script ./scripts/alert-tunnel-failure.sh ml110 service_down ``` ## Log Monitoring ### View Logs ```bash # All tunnels journalctl -u cloudflared-* -f # Specific tunnel journalctl -u cloudflared-ml110 -f # Last 100 lines journalctl -u cloudflared-ml110 -n 100 # Since specific time journalctl -u cloudflared-ml110 --since "1 hour ago" ``` ### Log Rotation Systemd handles log rotation automatically. To customize: ```bash # Edit logrotate config sudo nano /etc/logrotate.d/cloudflared # Add: /var/log/cloudflared/*.log { daily rotate 7 compress delaycompress missingok notifempty } ``` ## Metrics ### Cloudflare Dashboard View tunnel metrics in Cloudflare dashboard: 1. **Go to:** Zero Trust → Networks → Tunnels 2. **Click on tunnel** to view: - Connection status - Uptime - Traffic statistics - Error rates ### Local Metrics Tunnels expose metrics endpoints (if configured): ```bash # ml110 tunnel metrics curl http://127.0.0.1:9091/metrics # r630-01 tunnel metrics curl http://127.0.0.1:9092/metrics # r630-02 tunnel metrics curl http://127.0.0.1:9093/metrics ``` ## Automated Monitoring Setup ### Systemd Timer (Recommended) Create a systemd timer for automated health checks: ```bash # Create timer unit sudo nano /etc/systemd/system/cloudflared-healthcheck.timer # Add: [Unit] Description=Cloudflare Tunnel Health Check Timer Requires=cloudflared-healthcheck.service [Timer] OnBootSec=5min OnUnitActiveSec=5min Unit=cloudflared-healthcheck.service [Install] WantedBy=timers.target ``` ```bash # Create service unit sudo nano /etc/systemd/system/cloudflared-healthcheck.service # Add: [Unit] Description=Cloudflare Tunnel Health Check After=network.target [Service] Type=oneshot ExecStart=/path/to/scripts/check-tunnel-health.sh StandardOutput=journal StandardError=journal ``` ```bash # Enable and start sudo systemctl enable cloudflared-healthcheck.timer sudo systemctl start cloudflared-healthcheck.timer ``` ### Cron Job (Alternative) ```bash # Edit crontab crontab -e # Add (check every 5 minutes): */5 * * * * /path/to/scripts/check-tunnel-health.sh >> /var/log/tunnel-health.log 2>&1 ``` ## Monitoring Best Practices 1. ✅ **Run health checks regularly** - At least every 5 minutes 2. ✅ **Monitor logs** - Watch for errors 3. ✅ **Set up alerts** - Get notified immediately on failures 4. ✅ **Review metrics** - Track trends over time 5. ✅ **Test alerts** - Verify alerting works 6. ✅ **Document incidents** - Keep track of issues ## Integration with Monitoring Systems ### Prometheus If using Prometheus, you can scrape tunnel metrics: ```yaml # prometheus.yml scrape_configs: - job_name: 'cloudflared' static_configs: - targets: ['127.0.0.1:9091', '127.0.0.1:9092', '127.0.0.1:9093'] ``` ### Grafana Create dashboards in Grafana: - Tunnel uptime - Connection status - Error rates - Response times ### Nagios/Icinga Create service checks: ```bash # Check service status check_nrpe -H localhost -c check_cloudflared_ml110 # Check connectivity check_http -H ml110-01.d-bis.org -S ``` ## Troubleshooting Monitoring ### Health Check Fails ```bash # Run manually with verbose output bash -x ./scripts/check-tunnel-health.sh # Check individual components systemctl status cloudflared-ml110 dig ml110-01.d-bis.org curl -I https://ml110-01.d-bis.org ``` ### Monitor Script Not Working ```bash # Check if daemon is running ps aux | grep monitor-tunnels # Check log file tail -f /var/log/cloudflared-monitor.log # Run in foreground to see errors ./scripts/monitor-tunnels.sh ``` ### Alerts Not Sending ```bash # Test alert script ./scripts/alert-tunnel-failure.sh ml110 service_down # Check email configuration echo "Test" | mail -s "Test" admin@yourdomain.com # Check webhook curl -X POST -H "Content-Type: application/json" \ -d '{"text":"test"}' $ALERT_WEBHOOK ``` ## Next Steps After setting up monitoring: 1. ✅ Verify health checks run successfully 2. ✅ Test alerting (trigger a test failure) 3. ✅ Set up log aggregation (if needed) 4. ✅ Create dashboards (if using Grafana) 5. ✅ Document monitoring procedures ## Support For monitoring issues: 1. Check [Troubleshooting Guide](TROUBLESHOOTING.md) 2. Review script logs 3. Test components individually 4. Check systemd service status