- Organized 252 files across project - Root directory: 187 → 2 files (98.9% reduction) - Moved configuration guides to docs/04-configuration/ - Moved troubleshooting guides to docs/09-troubleshooting/ - Moved quick start guides to docs/01-getting-started/ - Moved reports to reports/ directory - Archived temporary files - Generated comprehensive reports and documentation - Created maintenance scripts and guides All files organized according to established standards.
7.1 KiB
7.1 KiB
Monitoring Guide
Complete guide for monitoring Cloudflare tunnels.
Overview
Monitoring ensures your tunnels are healthy and alerts you to issues before they impact users.
Monitoring Components
- Health Checks - Verify tunnels are running
- Connectivity Tests - Verify DNS and HTTPS work
- Log Monitoring - Watch for errors
- Alerting - Notify on failures
Quick Start
One-Time Health Check
./scripts/check-tunnel-health.sh
Continuous Monitoring
# Foreground (see output)
./scripts/monitor-tunnels.sh
# Background (daemon mode)
./scripts/monitor-tunnels.sh --daemon
Health Check Script
The check-tunnel-health.sh script performs comprehensive checks:
Checks Performed
- Service Status - Is the systemd service running?
- Log Errors - Are there recent errors in logs?
- DNS Resolution - Does DNS resolve correctly?
- HTTPS Connectivity - Can we connect via HTTPS?
- Internal Connectivity - Can VMID 102 reach Proxmox hosts?
Usage
# Run health check
./scripts/check-tunnel-health.sh
# Output shows:
# - Service status for each tunnel
# - DNS resolution status
# - HTTPS connectivity
# - Internal connectivity
# - Recent errors
Example Output
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Tunnel: ml110 (ml110-01.d-bis.org)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[✓] Service is running
[✓] No recent errors in logs
[✓] DNS resolution: OK
→ 104.16.132.229
[✓] HTTPS connectivity: OK
[✓] Internal connectivity to 192.168.11.10:8006: OK
Monitoring Script
The monitor-tunnels.sh script provides continuous monitoring:
Features
- ✅ Continuous health checks
- ✅ Automatic restart on failure
- ✅ Alerting on failures
- ✅ Logging to file
- ✅ Daemon mode support
Usage
# Foreground mode (see output)
./scripts/monitor-tunnels.sh
# Daemon mode (background)
./scripts/monitor-tunnels.sh --daemon
# Check if daemon is running
ps aux | grep monitor-tunnels
# Stop daemon
kill $(cat /tmp/cloudflared-monitor.pid)
Configuration
Edit the script to customize:
CHECK_INTERVAL=60 # Check every 60 seconds
LOG_FILE="/var/log/cloudflared-monitor.log"
ALERT_SCRIPT="./scripts/alert-tunnel-failure.sh"
Alerting
Email Alerts
Configure email alerts in alert-tunnel-failure.sh:
# Set email address
export ALERT_EMAIL="admin@yourdomain.com"
# Ensure mail/sendmail is installed
apt-get install -y mailutils
Webhook Alerts
Configure webhook alerts (Slack, Discord, etc.):
# Set webhook URL
export ALERT_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
Test Alerts
# Test alert script
./scripts/alert-tunnel-failure.sh ml110 service_down
Log Monitoring
View Logs
# All tunnels
journalctl -u cloudflared-* -f
# Specific tunnel
journalctl -u cloudflared-ml110 -f
# Last 100 lines
journalctl -u cloudflared-ml110 -n 100
# Since specific time
journalctl -u cloudflared-ml110 --since "1 hour ago"
Log Rotation
Systemd handles log rotation automatically. To customize:
# Edit logrotate config
sudo nano /etc/logrotate.d/cloudflared
# Add:
/var/log/cloudflared/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
}
Metrics
Cloudflare Dashboard
View tunnel metrics in Cloudflare dashboard:
- Go to: Zero Trust → Networks → Tunnels
- Click on tunnel to view:
- Connection status
- Uptime
- Traffic statistics
- Error rates
Local Metrics
Tunnels expose metrics endpoints (if configured):
# ml110 tunnel metrics
curl http://127.0.0.1:9091/metrics
# r630-01 tunnel metrics
curl http://127.0.0.1:9092/metrics
# r630-02 tunnel metrics
curl http://127.0.0.1:9093/metrics
Automated Monitoring Setup
Systemd Timer (Recommended)
Create a systemd timer for automated health checks:
# Create timer unit
sudo nano /etc/systemd/system/cloudflared-healthcheck.timer
# Add:
[Unit]
Description=Cloudflare Tunnel Health Check Timer
Requires=cloudflared-healthcheck.service
[Timer]
OnBootSec=5min
OnUnitActiveSec=5min
Unit=cloudflared-healthcheck.service
[Install]
WantedBy=timers.target
# Create service unit
sudo nano /etc/systemd/system/cloudflared-healthcheck.service
# Add:
[Unit]
Description=Cloudflare Tunnel Health Check
After=network.target
[Service]
Type=oneshot
ExecStart=/path/to/scripts/check-tunnel-health.sh
StandardOutput=journal
StandardError=journal
# Enable and start
sudo systemctl enable cloudflared-healthcheck.timer
sudo systemctl start cloudflared-healthcheck.timer
Cron Job (Alternative)
# Edit crontab
crontab -e
# Add (check every 5 minutes):
*/5 * * * * /path/to/scripts/check-tunnel-health.sh >> /var/log/tunnel-health.log 2>&1
Monitoring Best Practices
- ✅ Run health checks regularly - At least every 5 minutes
- ✅ Monitor logs - Watch for errors
- ✅ Set up alerts - Get notified immediately on failures
- ✅ Review metrics - Track trends over time
- ✅ Test alerts - Verify alerting works
- ✅ Document incidents - Keep track of issues
Integration with Monitoring Systems
Prometheus
If using Prometheus, you can scrape tunnel metrics:
# prometheus.yml
scrape_configs:
- job_name: 'cloudflared'
static_configs:
- targets: ['127.0.0.1:9091', '127.0.0.1:9092', '127.0.0.1:9093']
Grafana
Create dashboards in Grafana:
- Tunnel uptime
- Connection status
- Error rates
- Response times
Nagios/Icinga
Create service checks:
# Check service status
check_nrpe -H localhost -c check_cloudflared_ml110
# Check connectivity
check_http -H ml110-01.d-bis.org -S
Troubleshooting Monitoring
Health Check Fails
# Run manually with verbose output
bash -x ./scripts/check-tunnel-health.sh
# Check individual components
systemctl status cloudflared-ml110
dig ml110-01.d-bis.org
curl -I https://ml110-01.d-bis.org
Monitor Script Not Working
# Check if daemon is running
ps aux | grep monitor-tunnels
# Check log file
tail -f /var/log/cloudflared-monitor.log
# Run in foreground to see errors
./scripts/monitor-tunnels.sh
Alerts Not Sending
# Test alert script
./scripts/alert-tunnel-failure.sh ml110 service_down
# Check email configuration
echo "Test" | mail -s "Test" admin@yourdomain.com
# Check webhook
curl -X POST -H "Content-Type: application/json" \
-d '{"text":"test"}' $ALERT_WEBHOOK
Next Steps
After setting up monitoring:
- ✅ Verify health checks run successfully
- ✅ Test alerting (trigger a test failure)
- ✅ Set up log aggregation (if needed)
- ✅ Create dashboards (if using Grafana)
- ✅ Document monitoring procedures
Support
For monitoring issues:
- Check Troubleshooting Guide
- Review script logs
- Test components individually
- Check systemd service status