Files

defiQUG cb47cce074 Complete markdown files cleanup and organization

- Organized 252 files across project
- Root directory: 187 → 2 files (98.9% reduction)
- Moved configuration guides to docs/04-configuration/
- Moved troubleshooting guides to docs/09-troubleshooting/
- Moved quick start guides to docs/01-getting-started/
- Moved reports to reports/ directory
- Archived temporary files
- Generated comprehensive reports and documentation
- Created maintenance scripts and guides

All files organized according to established standards.

2026-01-06 01:46:25 -08:00

7.1 KiB

Raw Permalink Blame History

Monitoring Guide

Complete guide for monitoring Cloudflare tunnels.

Overview

Monitoring ensures your tunnels are healthy and alerts you to issues before they impact users.

Monitoring Components

Health Checks - Verify tunnels are running
Connectivity Tests - Verify DNS and HTTPS work
Log Monitoring - Watch for errors
Alerting - Notify on failures

Quick Start

One-Time Health Check

./scripts/check-tunnel-health.sh

Continuous Monitoring

# Foreground (see output)
./scripts/monitor-tunnels.sh

# Background (daemon mode)
./scripts/monitor-tunnels.sh --daemon

Health Check Script

The check-tunnel-health.sh script performs comprehensive checks:

Checks Performed

Service Status - Is the systemd service running?
Log Errors - Are there recent errors in logs?
DNS Resolution - Does DNS resolve correctly?
HTTPS Connectivity - Can we connect via HTTPS?
Internal Connectivity - Can VMID 102 reach Proxmox hosts?

Usage

# Run health check
./scripts/check-tunnel-health.sh

# Output shows:
# - Service status for each tunnel
# - DNS resolution status
# - HTTPS connectivity
# - Internal connectivity
# - Recent errors

Example Output

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Tunnel: ml110 (ml110-01.d-bis.org)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[✓] Service is running
[✓] No recent errors in logs
[✓] DNS resolution: OK
  → 104.16.132.229
[✓] HTTPS connectivity: OK
[✓] Internal connectivity to 192.168.11.10:8006: OK

Monitoring Script

The monitor-tunnels.sh script provides continuous monitoring:

Features

✅ Continuous health checks
✅ Automatic restart on failure
✅ Alerting on failures
✅ Logging to file
✅ Daemon mode support

Usage

# Foreground mode (see output)
./scripts/monitor-tunnels.sh

# Daemon mode (background)
./scripts/monitor-tunnels.sh --daemon

# Check if daemon is running
ps aux | grep monitor-tunnels

# Stop daemon
kill $(cat /tmp/cloudflared-monitor.pid)

Configuration

Edit the script to customize:

CHECK_INTERVAL=60        # Check every 60 seconds
LOG_FILE="/var/log/cloudflared-monitor.log"
ALERT_SCRIPT="./scripts/alert-tunnel-failure.sh"

Alerting

Email Alerts

Configure email alerts in alert-tunnel-failure.sh:

# Set email address
export ALERT_EMAIL="admin@yourdomain.com"

# Ensure mail/sendmail is installed
apt-get install -y mailutils

Webhook Alerts

Configure webhook alerts (Slack, Discord, etc.):

# Set webhook URL
export ALERT_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

Test Alerts

# Test alert script
./scripts/alert-tunnel-failure.sh ml110 service_down

Log Monitoring

View Logs

# All tunnels
journalctl -u cloudflared-* -f

# Specific tunnel
journalctl -u cloudflared-ml110 -f

# Last 100 lines
journalctl -u cloudflared-ml110 -n 100

# Since specific time
journalctl -u cloudflared-ml110 --since "1 hour ago"

Log Rotation

Systemd handles log rotation automatically. To customize:

# Edit logrotate config
sudo nano /etc/logrotate.d/cloudflared

# Add:
/var/log/cloudflared/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
}

Metrics

Cloudflare Dashboard

View tunnel metrics in Cloudflare dashboard:

Go to: Zero Trust → Networks → Tunnels
Click on tunnel to view:
- Connection status
- Uptime
- Traffic statistics
- Error rates

Local Metrics

Tunnels expose metrics endpoints (if configured):

# ml110 tunnel metrics
curl http://127.0.0.1:9091/metrics

# r630-01 tunnel metrics
curl http://127.0.0.1:9092/metrics

# r630-02 tunnel metrics
curl http://127.0.0.1:9093/metrics

Automated Monitoring Setup

Systemd Timer (Recommended)

Create a systemd timer for automated health checks:

# Create timer unit
sudo nano /etc/systemd/system/cloudflared-healthcheck.timer

# Add:
[Unit]
Description=Cloudflare Tunnel Health Check Timer
Requires=cloudflared-healthcheck.service

[Timer]
OnBootSec=5min
OnUnitActiveSec=5min
Unit=cloudflared-healthcheck.service

[Install]
WantedBy=timers.target

# Create service unit
sudo nano /etc/systemd/system/cloudflared-healthcheck.service

# Add:
[Unit]
Description=Cloudflare Tunnel Health Check
After=network.target

[Service]
Type=oneshot
ExecStart=/path/to/scripts/check-tunnel-health.sh
StandardOutput=journal
StandardError=journal

# Enable and start
sudo systemctl enable cloudflared-healthcheck.timer
sudo systemctl start cloudflared-healthcheck.timer

Cron Job (Alternative)

# Edit crontab
crontab -e

# Add (check every 5 minutes):
*/5 * * * * /path/to/scripts/check-tunnel-health.sh >> /var/log/tunnel-health.log 2>&1

Monitoring Best Practices

✅ Run health checks regularly - At least every 5 minutes
✅ Monitor logs - Watch for errors
✅ Set up alerts - Get notified immediately on failures
✅ Review metrics - Track trends over time
✅ Test alerts - Verify alerting works
✅ Document incidents - Keep track of issues

Integration with Monitoring Systems

Prometheus

If using Prometheus, you can scrape tunnel metrics:

# prometheus.yml
scrape_configs:
  - job_name: 'cloudflared'
    static_configs:
      - targets: ['127.0.0.1:9091', '127.0.0.1:9092', '127.0.0.1:9093']

Grafana

Create dashboards in Grafana:

Tunnel uptime
Connection status
Error rates
Response times

Nagios/Icinga

Create service checks:

# Check service status
check_nrpe -H localhost -c check_cloudflared_ml110

# Check connectivity
check_http -H ml110-01.d-bis.org -S

Troubleshooting Monitoring

Health Check Fails

# Run manually with verbose output
bash -x ./scripts/check-tunnel-health.sh

# Check individual components
systemctl status cloudflared-ml110
dig ml110-01.d-bis.org
curl -I https://ml110-01.d-bis.org

Monitor Script Not Working

# Check if daemon is running
ps aux | grep monitor-tunnels

# Check log file
tail -f /var/log/cloudflared-monitor.log

# Run in foreground to see errors
./scripts/monitor-tunnels.sh

Alerts Not Sending

# Test alert script
./scripts/alert-tunnel-failure.sh ml110 service_down

# Check email configuration
echo "Test" | mail -s "Test" admin@yourdomain.com

# Check webhook
curl -X POST -H "Content-Type: application/json" \
  -d '{"text":"test"}' $ALERT_WEBHOOK

Next Steps

After setting up monitoring:

✅ Verify health checks run successfully
✅ Test alerting (trigger a test failure)
✅ Set up log aggregation (if needed)
✅ Create dashboards (if using Grafana)
✅ Document monitoring procedures

Support

For monitoring issues:

Check Troubleshooting Guide
Review script logs
Test components individually
Check systemd service status

7.1 KiB Raw Permalink Blame History

Monitoring Guide

Overview

Monitoring Components

Quick Start

One-Time Health Check

Continuous Monitoring

Health Check Script

Checks Performed

Usage

Example Output

Monitoring Script

Features

Usage

Configuration

Alerting

Email Alerts

Webhook Alerts

Test Alerts

Log Monitoring

View Logs

Log Rotation

Metrics

Cloudflare Dashboard

Local Metrics

Automated Monitoring Setup

Systemd Timer (Recommended)

Cron Job (Alternative)

Monitoring Best Practices

Integration with Monitoring Systems

Prometheus

Grafana

Nagios/Icinga

Troubleshooting Monitoring

Health Check Fails

Monitor Script Not Working

Alerts Not Sending

Next Steps

Support

7.1 KiB

Raw Permalink Blame History