Files
proxmox/docs/04-configuration/STORAGE_GROWTH_AUTOMATION_TASKS.md
defiQUG bea1903ac9
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
Sync all local changes: docs, config, scripts, submodule refs, verification evidence
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-21 15:46:06 -08:00

6.8 KiB
Raw Blame History

Storage Growth & Health — Automation Tasks, Fixes, and Migrations

Last updated: 2026-02-15
Purpose: List all tasks to automate proactive storage monitoring, plus required fixes and migrations.


1. Tasks to automate

1.1 Scheduled data collection

# Task Description How
A1 Storage snapshot + history append Run collect-storage-growth-data.sh --append on a schedule so history.csv grows for trend analysis. Cron every 6 hours (or daily). Use scripts/maintenance/schedule-storage-growth-cron.sh --install.
A2 Snapshot retention Prune old snapshot files under logs/storage-growth/ so the directory does not grow unbounded. Done. Script: scripts/monitoring/prune-storage-snapshots.sh (default keep 30 days; --days N, --dry-run). Schedule weekly or run manually.
A3 History CSV retention Cap history.csv size (keep last 10k rows or ~90 days). Done. Script: scripts/monitoring/prune-storage-history.sh (default 90 days proxy; --max-rows N, --days N, --dry-run). Run weekly via schedule-storage-growth-cron (prune line).

1.2 Threshold checks and alerting

# Task Description How
A4 Thin pool / pvesm check (all hosts) Fail or warn when any hosts thin pool or pvesm storage is ≥ 95% (critical) or ≥ 80% (warn). Done. In daily-weekly-checks.sh weekly (F3/M2).
A5 In-CT disk check in cron Run check-disk-all-vmids.sh on a schedule and log or alert on WARN/CRIT. Done. Called from daily-weekly-checks.sh daily (cron 08:00).
A6 Integrate with existing storage-monitor.sh storage-monitor.sh already has WARN 80%, CRIT 90% and optional ALERT_EMAIL / ALERT_WEBHOOK. Done. scripts/maintenance/schedule-storage-monitor-cron.sh --install (daily 07:00).
A7 Metric file for alerting Write a metric file (e.g. logs/storage-growth/last_run.metric) with max thin pool % and timestamp so an external monitor can alert. Done. Weekly run writes STORAGE_METRIC_FILE (storage_max_pct, storage_metric_timestamp).

1.3 Proactive remediation (optional)

# Task Description How
A8 Weekly fstrim in CTs Run fstrim inside running CTs on hosts with thin pools to reclaim space. Done. scripts/maintenance/fstrim-all-running-ct.sh; run from daily-weekly-checks.sh weekly.
A9 Logrotate audit Ensure high-log VMIDs (10130, 10150, 10151, 5000, 10233, 10234, 2400) have logrotate or equivalent. Done. Runbook: docs/04-configuration/LOGROTATE_AUDIT_RUNBOOK.md.
A10 Journal vacuum Run journalctl --vacuum-time=7d in key CTs on a schedule. Done. scripts/maintenance/journal-vacuum-key-ct.sh; run from daily-weekly-checks.sh weekly.

2. Fixes required

# Fix Location Detail
F1 Implement or remove --json scripts/monitoring/collect-storage-growth-data.sh Done. --json outputs a JSON object with timestamp and csv_rows (array of CSV line strings).
F2 CSV quoting for detail column scripts/monitoring/collect-storage-growth-data.sh Done. Detail field is quoted when it contains commas or quotes via csv_quote().
F3 Thin pool check on all three hosts scripts/maintenance/daily-weekly-checks.sh Done. [138a] now runs thin pool/storage check on r630-02, r630-01, and ml110 (WARN ≥85%, FAIL ≥95%/100%).
F4 PROJECT_ROOT in cron schedule-daily-weekly-cron.sh / new storage cron Cron lines use $PROJECT_ROOT; crontab is installed by the user who runs the script, so path is correct. For schedule-storage-growth-cron.sh use same pattern (cd $PROJECT_ROOT && ...).

3. Migrations

# Migration Description
M1 Add schedule-storage-growth-cron.sh Done. Script: scripts/maintenance/schedule-storage-growth-cron.sh (same style as schedule-daily-weekly-cron.sh): --show, --install, --remove. Cron runs collect-storage-growth-data.sh --append every 6 hours.
M2 Extend weekly checks to all-host thin pool Done. Implemented with F3 in daily-weekly-checks.sh: check_thin_pool_one_host for r630-02, r630-01, ml110.
M3 Doc and index updates Done. STORAGE_GROWTH_AND_HEALTH.md references schedule-storage-growth-cron.sh and prune script; MASTER_INDEX and OPERATIONAL_RUNBOOKS list storage growth cron.
M4 Optional: CI job Add a GitHub Actions (or Gitea) workflow that runs collect-storage-growth-data.sh --csv (or a dry run that only checks script syntax / host reachability) so config changes dont break the script. Optional because the script requires LAN/SSH to hosts.

4. Implementation order

  1. F2 (CSV quoting) and F1 (--json) in collect-storage-growth-data.sh.
  2. M1 Add schedule-storage-growth-cron.sh and M3 update docs.
  3. F3 and M2 Extend daily-weekly-checks.sh to check thin pool on all three hosts.
  4. A1 Install storage growth cron (via M1).
  5. A2 Add prune-storage-snapshots.sh and schedule weekly (or in same cron wrapper).
  6. A4/A7 Optionally have weekly check write a metric file; wire A5 (check-disk-all-vmids) into daily if desired.
  7. A8A10 As needed (fstrim, logrotate audit, journal vacuum).

5. Quick reference

Script Purpose
scripts/monitoring/collect-storage-growth-data.sh Collect host + VM storage; output snapshot + optional growth table; --append for history.csv.
scripts/maintenance/schedule-storage-growth-cron.sh Install/show/remove cron for storage collection (every 6h).
scripts/monitoring/prune-storage-snapshots.sh Prune snapshot_*.txt older than N days (default 30); --days N, --dry-run.
scripts/monitoring/prune-storage-history.sh Prune history.csv to last N rows (default ~90d); --days N, --max-rows N, --dry-run.
scripts/maintenance/daily-weekly-checks.sh Daily: explorer, RPC, indexer lag, in-CT disk (A5). Weekly: config API, thin pool, fstrim (A8), journal vacuum (A10), storage metric (A7).
scripts/maintenance/check-disk-all-vmids.sh In-CT df / for all running CTs; WARN 85%, CRIT 95%.
scripts/maintenance/schedule-storage-monitor-cron.sh Install/show/remove cron for storage-monitor.sh (daily 07:00).
scripts/maintenance/fstrim-all-running-ct.sh fstrim -v / in all running CTs; --dry-run.
scripts/maintenance/journal-vacuum-key-ct.sh journalctl --vacuum-time=7d in key CTs; --dry-run.
scripts/storage-monitor.sh Host pvesm + VG; alerts at 80%/90%; optional email/webhook.
docs/04-configuration/STORAGE_GROWTH_AND_HEALTH.md Growth table template, factors, thresholds, how to use data.