Files
proxmox/docs/04-configuration/STORAGE_RECOMMENDATIONS_BY_FILL_RATE.md
defiQUG b3a8fe4496
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
chore: sync all changes to Gitea
- Config, docs, scripts, and backup manifests
- Submodule refs unchanged (m = modified content in submodules)

Made-with: Cursor
2026-03-02 11:37:34 -08:00

7.7 KiB
Raw Blame History

Storage Recommendations by Fill Rate and Growth

Last updated: 2026-02-28

Based on current usage, history in logs/storage-growth/history.csv, and physical drive layout across ml110, r630-01, and r630-02.

Completed (2026-02-28): Storage growth cron verified; prune (VMID 5000 + r630-01 CTs) run; ml110 sdb added to VG pve and data thin pool extended to ~1.7 TB (ml110 data now ~11% used). Phase 1 migration (r630-01 data → thin1): 8 CTs migrated (10233, 10120, 10100, 10101, 10235, 10236, 7804, 8640); r630-01 data 65.8% (was 72%), thin1 50.6%.


1. Thresholds and monitoring

Level Use % Action
Healthy < 75% Continue normal collection; review quarterly.
Watch 7584% Weekly review; plan prune or migration.
WARN 8594% Prune and/or migrate within 12 weeks; do not add new large CTs.
CRIT ≥ 95% Immediate action; LVM thin pools can fail or go read-only.

Current scripts: check-disk-all-vmids.sh uses WARN 85%, CRIT 95% for container root usage. These recommendations apply to host storage (pvesm / LVM) as well.


2. Observed fill behavior (from history)

Host Storage Trend (recent) Implied rate / note
ml110 data ~28.7% → ~25% (Feb 15 → 27) Slight decrease (prune/dedup). Plenty of free space.
r630-01 data 88% → 100% → 72% → 65.8% (Phase 1 migration) After Phase 1 (8 CTs data→thin1). Main growth host (validators, RPCs, many CTs).
r630-02 thin1-r630-02 ~26.5% stable Low growth.
r630-02 thin2 ~4.8% → ~9% after 5000 migration Now holds Blockscout (5000); monitor.
r630-02 thin5 Was 84.6% → 0% after migration Empty; available for future moves.

Conclusion: The pool that fills fastest and needs the most attention is r630-01 data (72% now; many CTs, Besu/DB growth). ml110 data is stable and has headroom. r630-02 is manageable if you avoid concentrating more large CTs on a single thin pool.


3. Recommendations by host and pool

ml110

  • data / local-lvm (~25%)

    • Rate: Low/slow.
    • Recommendations:
      • Keep running collect-storage-growth-data.sh --append (e.g. cron every 6h).
      • Prune logs in CTs periodically (e.g. with fix-storage-r630-01-and-thin5.sh-style logic for ml110 or a dedicated prune script).
      • No urgency; review again when approaching 70%.
  • Unused sdb (931G)

    • Recommendation: Use it before adding new disks elsewhere.
      • Option A: Add sdb to VG pve and extend the data thin pool (or create a second thin pool). Frees pressure on sda and doubles effective data capacity.
      • Option B: Create a separate VG + thin pool on sdb for new or migrated CTs.
    • Document the chosen layout and any new Proxmox storage names in storage.cfg and in PHYSICAL_DRIVES_AND_CONFIG.md.

r630-01

  • data / local-lvm (~72%)

    • Rate: Highest risk; this pool has the most CTs and Besu/DB growth.
    • Recommendations:
      1. Short term:
        • Run log/journal prune on all r630-01 CTs regularly (e.g. fix-storage-r630-01-and-thin5.sh Phase 2, or a cron job).
        • Keep storage growth collection (e.g. every 6h) and review weekly when > 70%.
      2. Before 85%:
        • Move one or more large CTs to thin1 on r630-01 (thin1 ~43% used, has space) if VMIDs allow, or plan migration to r630-02 thin pools.
        • Identify biggest CTs: check-disk-all-vmids.sh and lvs on r630-01 (data pool).
      3. Before 90%:
        • Decide on expansion (e.g. add disks to RAID10 and extend md0/LVM) or permanent migration of several CTs to r630-02.
    • Do not let this pool sit above 85% for long; it has already hit 100% once.
  • thin1 (~43%)

    • Rate: Moderate.
    • Recommendations: Use as spillover for data pool migrations when possible. Monitor monthly; act if > 75%.

r630-02

  • thin1-r630-02 (~26%)

    • Rate: Low.
    • Recommendation: Monitor; no change needed unless you add many CTs here.
  • thin2 (~9% after 5000 migration)

    • Rate: May grow with Blockscout (5000) and other CTs.
    • Recommendations:
      • Run VMID 5000 prune periodically: vmid5000-free-disk-and-logs.sh.
      • If thin2 approaches 75%, consider moving one CT to thin5 (now empty) or thin6.
  • thin3, thin4, thin6 (roughly 1122%)

    • Rate: Low to moderate.
    • Recommendation: Include in weekly pvesm/lvs review; no special action unless one pool trends > 75%.
  • thin5 (0% after migration)

    • Recommendation: Keep as reserve for migrations from thin2 or other pools when they approach WARN.

4. Operational schedule (by fill rate)

When Action
Always Cron: collect-storage-growth-data.sh --append every 6h; weekly: prune-storage-snapshots.sh (e.g. Sun 08:00).
Weekly Review pvesm status and lvs (or run audit-proxmox-rpc-storage.sh); check any pool > 70%.
75% ≤ use < 85% Plan and run prune; plan migration for largest CTs on that pool; consider using ml110 sdb (if not yet in use).
85% ≤ use < 95% Execute prune and migration within 12 weeks; do not add new large VMs/CTs to that pool.
≥ 95% Immediate prune + migration; consider emergency migration to ml110 (after adding sdb) or r630-02.

5. Scripts to support these recommendations

Script Purpose
scripts/monitoring/collect-storage-growth-data.sh --append Record fill over time (for rate).
scripts/maintenance/schedule-storage-growth-cron.sh --install Install 6h collect + weekly prune.
scripts/audit-proxmox-rpc-storage.sh Current pvesm + RPC rootfs mapping.
scripts/maintenance/check-disk-all-vmids.sh Per-CT disk usage (find big consumers).
scripts/maintenance/fix-storage-r630-01-and-thin5.sh Prune 5000 + r630-01 CT logs; optional migrate 5000.
scripts/maintenance/migrate-ct-r630-01-data-to-thin1.sh <VMID> Migrate one CT from r630-01 data → thin1 (same host).
scripts/maintenance/vmid5000-free-disk-and-logs.sh Prune Blockscout (5000) only.

6. Adding ml110 sdb to increase capacity (suggested steps)

  1. On ml110: vgextend pve /dev/sdb (if sdb is already a PV) or pvcreate /dev/sdb && vgextend pve /dev/sdb.
  2. Extend the data thin pool: lvextend -L +900G /dev/pve/data (or use lvextend -l +100%FREE and adjust as needed).
  3. Re-run pvesm status and update documentation.
  4. No CT migration required; existing LVs on data can use the new space.

(If sdb is a raw disk with no PV, partition or use full disk as PV per your policy; then add to pve and extend the data LV as above.)


7. Summary table by risk

Host Pool Current (approx) Risk Priority recommendation
ml110 data ~11% (post-extension) Low Done: sdb added; pool ~1.7 TB. Monitor as before.
ml110 sdb In use (extended data) Done: sdb added to pve, data thin pool extended (~1.7 TB total).
r630-01 data ~72% High Prune weekly; plan migrations before 85%; consider thin1 spillover.
r630-01 thin1 ~43% Medium Use for migrations from data; monitor monthly.
r630-02 thin1-r630-02 ~26% Low Monitor.
r630-02 thin2 ~9% Low Prune 5000 periodically; watch growth.
r630-02 thin5 0% Low Keep as reserve for migrations.
r630-02 thin3, thin4, thin6 ~1122% Low Include in weekly review.

These recommendations are based on the rate of filling observed in history and current configurations; adjust thresholds or schedule if your growth pattern changes.