Files
ProxmoxVE/misc/error_handler.func

573 lines
25 KiB
Bash
Raw Normal View History

Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
#!/usr/bin/env bash
# ------------------------------------------------------------------------------
# ERROR HANDLER - ERROR & SIGNAL MANAGEMENT
# ------------------------------------------------------------------------------
# Copyright (c) 2021-2026 community-scripts ORG
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# Author: MickLesk (CanbiZ)
# License: MIT | https://github.com/community-scripts/ProxmoxVE/raw/main/LICENSE
# ------------------------------------------------------------------------------
#
# Provides comprehensive error handling and signal management for all scripts.
# Includes:
# - Exit code explanations (shell, package managers, databases, custom codes)
# - Error handler with detailed logging
# - Signal handlers (EXIT, INT, TERM)
# - Initialization function for trap setup
#
# Usage:
# source <(curl -fsSL .../error_handler.func)
# catch_errors
#
# ------------------------------------------------------------------------------
# ==============================================================================
# SECTION 1: EXIT CODE EXPLANATIONS
# ==============================================================================
# ------------------------------------------------------------------------------
# explain_exit_code()
#
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
# - Canonical version is defined in api.func (sourced before this file)
# - This section only provides a fallback if api.func was not loaded
# - See api.func SECTION 1 for the authoritative exit code mappings
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# ------------------------------------------------------------------------------
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
if ! declare -f explain_exit_code &>/dev/null; then
explain_exit_code() {
local code="$1"
case "$code" in
1) echo "General error / Operation not permitted" ;;
2) echo "Misuse of shell builtins (e.g. syntax error)" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
3) echo "General syntax or argument error" ;;
10) echo "Docker / privileged mode required (unsupported environment)" ;;
4) echo "curl: Feature not supported or protocol error" ;;
5) echo "curl: Could not resolve proxy" ;;
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
6) echo "curl: DNS resolution failed (could not resolve host)" ;;
7) echo "curl: Failed to connect (network unreachable / host down)" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
8) echo "curl: Server reply error (FTP/SFTP or apk untrusted key)" ;;
16) echo "curl: HTTP/2 framing layer error" ;;
18) echo "curl: Partial file (transfer not completed)" ;;
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
22) echo "curl: HTTP error returned (404, 429, 500+)" ;;
23) echo "curl: Write error (disk full or permissions)" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
24) echo "curl: Write to local file failed" ;;
25) echo "curl: Upload failed" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
26) echo "curl: Read error on local file (I/O)" ;;
27) echo "curl: Out of memory (memory allocation failed)" ;;
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
28) echo "curl: Operation timeout (network slow or server not responding)" ;;
30) echo "curl: FTP port command failed" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
32) echo "curl: FTP SIZE command failed" ;;
33) echo "curl: HTTP range error" ;;
34) echo "curl: HTTP post error" ;;
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
35) echo "curl: SSL/TLS handshake failed (certificate error)" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
36) echo "curl: FTP bad download resume" ;;
39) echo "curl: LDAP search failed" ;;
44) echo "curl: Internal error (bad function call order)" ;;
45) echo "curl: Interface error (failed to bind to specified interface)" ;;
46) echo "curl: Bad password entered" ;;
47) echo "curl: Too many redirects" ;;
48) echo "curl: Unknown command line option specified" ;;
51) echo "curl: SSL peer certificate or SSH host key verification failed" ;;
52) echo "curl: Empty reply from server (got nothing)" ;;
55) echo "curl: Failed sending network data" ;;
56) echo "curl: Receive error (connection reset by peer)" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
57) echo "curl: Unrecoverable poll/select error (system I/O failure)" ;;
59) echo "curl: Couldn't use specified SSL cipher" ;;
61) echo "curl: Bad/unrecognized transfer encoding" ;;
63) echo "curl: Maximum file size exceeded" ;;
75) echo "Temporary failure (retry later)" ;;
78) echo "curl: Remote file not found (404 on FTP/file)" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
79) echo "curl: SSH session error (key exchange/auth failed)" ;;
92) echo "curl: HTTP/2 stream error (protocol violation)" ;;
95) echo "curl: HTTP/3 layer error" ;;
64) echo "Usage error (wrong arguments)" ;;
65) echo "Data format error (bad input data)" ;;
66) echo "Input file not found (cannot open input)" ;;
67) echo "User not found (addressee unknown)" ;;
68) echo "Host not found (hostname unknown)" ;;
69) echo "Service unavailable" ;;
70) echo "Internal software error" ;;
71) echo "System error (OS-level failure)" ;;
72) echo "Critical OS file missing" ;;
73) echo "Cannot create output file" ;;
74) echo "I/O error" ;;
76) echo "Remote protocol error" ;;
77) echo "Permission denied" ;;
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
100) echo "APT: Package manager error (broken packages / dependency problems)" ;;
101) echo "APT: Configuration error (bad sources.list, malformed config)" ;;
102) echo "APT: Lock held by another process (dpkg/apt still running)" ;;
124) echo "Command timed out (timeout command)" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
125) echo "Command failed to start (Docker daemon or execution error)" ;;
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
126) echo "Command invoked cannot execute (permission problem?)" ;;
127) echo "Command not found" ;;
128) echo "Invalid argument to exit" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
129) echo "Killed by SIGHUP (terminal closed / hangup)" ;;
130) echo "Aborted by user (SIGINT)" ;;
131) echo "Killed by SIGQUIT (core dumped)" ;;
132) echo "Killed by SIGILL (illegal CPU instruction)" ;;
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
134) echo "Process aborted (SIGABRT - possibly Node.js heap overflow)" ;;
137) echo "Killed (SIGKILL / Out of memory?)" ;;
139) echo "Segmentation fault (core dumped)" ;;
141) echo "Broken pipe (SIGPIPE - output closed prematurely)" ;;
143) echo "Terminated (SIGTERM)" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
144) echo "Killed by signal 16 (SIGUSR1 / SIGSTKFLT)" ;;
146) echo "Killed by signal 18 (SIGTSTP)" ;;
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
150) echo "Systemd: Service failed to start" ;;
151) echo "Systemd: Service unit not found" ;;
152) echo "Permission denied (EACCES)" ;;
153) echo "Build/compile failed (make/gcc/cmake)" ;;
154) echo "Node.js: Native addon build failed (node-gyp)" ;;
160) echo "Python: Virtualenv / uv environment missing or broken" ;;
161) echo "Python: Dependency resolution failed" ;;
162) echo "Python: Installation aborted (permissions or EXTERNALLY-MANAGED)" ;;
170) echo "PostgreSQL: Connection failed (server not running / wrong socket)" ;;
171) echo "PostgreSQL: Authentication failed (bad user/password)" ;;
172) echo "PostgreSQL: Database does not exist" ;;
173) echo "PostgreSQL: Fatal error in query / syntax" ;;
180) echo "MySQL/MariaDB: Connection failed (server not running / wrong socket)" ;;
181) echo "MySQL/MariaDB: Authentication failed (bad user/password)" ;;
182) echo "MySQL/MariaDB: Database does not exist" ;;
183) echo "MySQL/MariaDB: Fatal error in query / syntax" ;;
190) echo "MongoDB: Connection failed (server not running)" ;;
191) echo "MongoDB: Authentication failed (bad user/password)" ;;
192) echo "MongoDB: Database not found" ;;
193) echo "MongoDB: Fatal query error" ;;
200) echo "Proxmox: Failed to create lock file" ;;
203) echo "Proxmox: Missing CTID variable" ;;
204) echo "Proxmox: Missing PCT_OSTYPE variable" ;;
205) echo "Proxmox: Invalid CTID (<100)" ;;
206) echo "Proxmox: CTID already in use" ;;
207) echo "Proxmox: Password contains unescaped special characters" ;;
208) echo "Proxmox: Invalid configuration (DNS/MAC/Network format)" ;;
209) echo "Proxmox: Container creation failed" ;;
210) echo "Proxmox: Cluster not quorate" ;;
211) echo "Proxmox: Timeout waiting for template lock" ;;
212) echo "Proxmox: Storage type 'iscsidirect' does not support containers (VMs only)" ;;
213) echo "Proxmox: Storage type does not support 'rootdir' content" ;;
214) echo "Proxmox: Not enough storage space" ;;
215) echo "Proxmox: Container created but not listed (ghost state)" ;;
216) echo "Proxmox: RootFS entry missing in config" ;;
217) echo "Proxmox: Storage not accessible" ;;
218) echo "Proxmox: Template file corrupted or incomplete" ;;
219) echo "Proxmox: CephFS does not support containers - use RBD" ;;
220) echo "Proxmox: Unable to resolve template path" ;;
221) echo "Proxmox: Template file not readable" ;;
222) echo "Proxmox: Template download failed" ;;
223) echo "Proxmox: Template not available after download" ;;
224) echo "Proxmox: PBS storage is for backups only" ;;
225) echo "Proxmox: No template available for OS/Version" ;;
231) echo "Proxmox: LXC stack upgrade failed" ;;
core: smart recovery for failed installs | extend exit_codes (#11221) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
2026-02-17 12:14:46 +01:00
239) echo "npm/Node.js: Unexpected runtime error or dependency failure" ;;
core: remove old Go API and extend misc/api.func with new backend (#11822) * Remove Go API and extend misc/api.func Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling. * Start install timer and refine error reporting Call start_install_timer during build startup and overhaul exit/error reporting. Changes: - Invoke start_install_timer early in misc/build.func to track install duration. - Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings. - Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.). - In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup. Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup. * Report install start/failure to telemetry API Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
2026-02-12 11:55:13 +01:00
243) echo "Node.js: Out of memory (JavaScript heap out of memory)" ;;
245) echo "Node.js: Invalid command-line option" ;;
246) echo "Node.js: Internal JavaScript Parse Error" ;;
247) echo "Node.js: Fatal internal error" ;;
248) echo "Node.js: Invalid C++ addon / N-API failure" ;;
249) echo "npm/pnpm/yarn: Unknown fatal error" ;;
255) echo "DPKG: Fatal internal error" ;;
*) echo "Unknown error" ;;
esac
}
fi
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# ==============================================================================
# SECTION 2: ERROR HANDLERS
# ==============================================================================
# ------------------------------------------------------------------------------
# error_handler()
#
# - Main error handler triggered by ERR trap
# - Arguments: exit_code, command, line_number
# - Behavior:
# * Returns silently if exit_code is 0 (success)
# * Sources explain_exit_code() for detailed error description
# * Displays error message with:
# - Line number where error occurred
# - Exit code with explanation
# - Command that failed
# * Shows last 20 lines of SILENT_LOGFILE if available
# * Copies log to container /root for later inspection
# * Exits with original exit code
# ------------------------------------------------------------------------------
error_handler() {
local exit_code=${1:-$?}
local command=${2:-${BASH_COMMAND:-unknown}}
local line_number=${BASH_LINENO[0]:-unknown}
command="${command//\$STD/}"
if [[ "$exit_code" -eq 0 ]]; then
return 0
fi
core: remove duplicate traps, consolidate error handling and harden signal traps (#12316) * fix(zammad): configure Elasticsearch for LXC container startup - Set discovery.type: single-node (required for single-node ES) - Set xpack.security.enabled: false (not needed in local LXC) - Set bootstrap.memory_lock: false (fails in unprivileged LXC) - Add startup wait loop (up to 60s) to ensure ES is ready before Zammad installation continues Fixes #12301-related recurring Elasticsearch startup failures * refactor(api): eliminate duplicate traps, harden error handling & telemetry Phase 1 - Structural: - Remove api_exit_script() and 5 inline traps from build.func - error_handler.func is now the sole trap owner via catch_errors() - Update api.func comment reference (api_exit_script -> on_exit) Phase 2 - Quality: - Add stop_spinner() + cursor restore to error_handler(), on_interrupt(), on_terminate(), on_hangup() to prevent spinner/cursor artifacts - Enhance _send_abort_telemetry() with error text (last 20 log lines), duration calculation, and 2 retry attempts (was fire-and-forget) - Harden json_escape() to also strip DEL (0x7F) character * fix(build): show spinner during post_update_to_api to prevent Ctrl+Z abort post_update_to_api can take up to 33 seconds worst-case (3 curl attempts x 10s timeout + sleep delays). Without any terminal output during this time, users think the script is stuck and press Ctrl+Z, which prevents the recovery menu from ever appearing. Add msg_info spinner before both post_update_to_api calls in the failure path (initial report + final force retry after recovery menu). * fix(build): prevent SIGTSTP from killing recovery dialog - Replace msg_info/stop_spinner with plain echo for telemetry reporting The background spinner process in non-interactive shells (bash -c) can trigger SIGTSTP, stopping the entire process group before the recovery dialog appears. Plain echo avoids this. - Add trap '' TSTP at failure path entry to ignore suspension signals Prevents Ctrl+Z or terminal-related SIGTSTP from interrupting the recovery menu. Restored with trap - TSTP before exit. - Root cause: msg_info starts a background process (spinner &) that is not properly detached in non-interactive shells where job control (set -m) is OFF. The disown builtin has no effect without job control, leaving the spinner in the same process group. This can cause terminal I/O conflicts during the 33-second post_update_to_api retry window, resulting in [2]+ Stopped. * fix(test): initialize colors and remove illegal local in test harness - Call load_functions() after sourcing core.func to initialize color/formatting/icon variables (RD, GN, YW, CL, TAB, etc.) - Remove 'local' keyword from top-level scope (not inside function) - Default REPO_SOURCE to ref_api instead of main * chore: remove test-recovery-dialog.sh from branch * Revert "fix(zammad): configure Elasticsearch for LXC container startup" This reverts commit 10e450b72f68696775f3b5a0b0c4e416e1b4136b. * fix(build): show telemetry status only in verbose mode Telemetry reporting is an implementation detail that doesn't help the user during failure recovery. Wrap echo statements with VERBOSE check so they only appear when verbose mode is enabled.
2026-02-25 14:08:24 +01:00
# Stop spinner and restore cursor FIRST — before any output
# This prevents spinner text overlapping with error messages
if declare -f stop_spinner >/dev/null 2>&1; then
stop_spinner 2>/dev/null || true
fi
printf "\e[?25h"
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
local explanation
explanation="$(explain_exit_code "$exit_code")"
# ALWAYS report failure to API immediately - don't wait for container checks
# This ensures we capture failures that occur before/after container exists
if declare -f post_update_to_api &>/dev/null; then
post_update_to_api "failed" "$exit_code" 2>/dev/null || true
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
else
# Container context: post_update_to_api not available (api.func not sourced)
# Send status directly via curl so container failures are never lost
_send_abort_telemetry "$exit_code" 2>/dev/null || true
fi
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# Use msg_error if available, fallback to echo
if declare -f msg_error >/dev/null 2>&1; then
msg_error "in line ${line_number}: exit code ${exit_code} (${explanation}): while executing command ${command}"
else
echo -e "\n${RD}[ERROR]${CL} in line ${RD}${line_number}${CL}: exit code ${RD}${exit_code}${CL} (${explanation}): while executing command ${YWB}${command}${CL}\n"
fi
if [[ -n "${DEBUG_LOGFILE:-}" ]]; then
{
echo "------ ERROR ------"
echo "Timestamp : $(date '+%Y-%m-%d %H:%M:%S')"
echo "Exit Code : $exit_code ($explanation)"
echo "Line : $line_number"
echo "Command : $command"
echo "-------------------"
} >>"$DEBUG_LOGFILE"
fi
# Get active log file (BUILD_LOG or INSTALL_LOG)
local active_log=""
if declare -f get_active_logfile >/dev/null 2>&1; then
active_log="$(get_active_logfile)"
elif [[ -n "${SILENT_LOGFILE:-}" ]]; then
active_log="$SILENT_LOGFILE"
fi
# If active_log points to a container-internal path that doesn't exist on host,
# fall back to BUILD_LOG (host-side log)
if [[ -n "$active_log" && ! -s "$active_log" && -n "${BUILD_LOG:-}" && -s "${BUILD_LOG}" ]]; then
active_log="$BUILD_LOG"
fi
# Show last log lines if available
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
if [[ -n "$active_log" && -s "$active_log" ]]; then
core: improve error reporting with structured error strings and better categorization + output formatting (#11907) * fix(telemetry): improve error reporting with structured error strings and better categorization - Add build_error_string() that creates structured format: 'exit_code=N | description\n---\n<last 20 log lines>' - Fix categorize_error() to map ALL known exit codes: - Added: shell(1,2), proxmox(200-231), service(150-154), database(170-193), runtime(243-249), signal(139,141,143) - Split timeout from network (28 was in both) - Added DPKG(255) to dependency category - Update all API functions to use build_error_string(): post_update_to_api, post_update_to_api_extended, post_tool_to_api, post_addon_to_api - Add ensure_log_on_host() calls to on_exit, on_interrupt, on_terminate handlers to prevent race condition where telemetry reports before container log is pulled to host * fix(ui): improve error output formatting and remove redundant log paths - error_handler: Use msg_info/msg_ok/msg_warn for container cleanup instead of raw echo with manual ANSI codes - error_handler: Add ❓ icon before 'Remove broken container?' prompt - error_handler: Indent log output with TAB for visual consistency - build.func: Use msg_custom for installation log path display - build.func: Use msg_info → msg_ok for container removal flow - build.func: Use msg_warn for 'kept for debugging' message - core.func/vm-core.func: Remove redundant container-internal log path display (📋 View full log) since combined log on host is the canonical location shown after failure
2026-02-14 15:28:30 +01:00
echo -e "\n${TAB}--- Last 20 lines of log ---"
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
tail -n 20 "$active_log"
core: improve error reporting with structured error strings and better categorization + output formatting (#11907) * fix(telemetry): improve error reporting with structured error strings and better categorization - Add build_error_string() that creates structured format: 'exit_code=N | description\n---\n<last 20 log lines>' - Fix categorize_error() to map ALL known exit codes: - Added: shell(1,2), proxmox(200-231), service(150-154), database(170-193), runtime(243-249), signal(139,141,143) - Split timeout from network (28 was in both) - Added DPKG(255) to dependency category - Update all API functions to use build_error_string(): post_update_to_api, post_update_to_api_extended, post_tool_to_api, post_addon_to_api - Add ensure_log_on_host() calls to on_exit, on_interrupt, on_terminate handlers to prevent race condition where telemetry reports before container log is pulled to host * fix(ui): improve error output formatting and remove redundant log paths - error_handler: Use msg_info/msg_ok/msg_warn for container cleanup instead of raw echo with manual ANSI codes - error_handler: Add ❓ icon before 'Remove broken container?' prompt - error_handler: Indent log output with TAB for visual consistency - build.func: Use msg_custom for installation log path display - build.func: Use msg_info → msg_ok for container removal flow - build.func: Use msg_warn for 'kept for debugging' message - core.func/vm-core.func: Remove redundant container-internal log path display (📋 View full log) since combined log on host is the canonical location shown after failure
2026-02-14 15:28:30 +01:00
echo -e "${TAB}-----------------------------------\n"
fi
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# Detect context: Container (INSTALL_LOG set + inside container /root) vs Host
if [[ -n "${INSTALL_LOG:-}" && -f "${INSTALL_LOG:-}" && -d /root ]]; then
# CONTAINER CONTEXT: Copy log and create flag file for host
local container_log="/root/.install-${SESSION_ID:-error}.log"
cp "${INSTALL_LOG}" "$container_log" 2>/dev/null || true
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# Create error flag file with exit code for host detection
echo "$exit_code" >"/root/.install-${SESSION_ID:-error}.failed" 2>/dev/null || true
# Log path is shown by host as combined log - no need to show container path
else
# HOST CONTEXT: Show local log path and offer container cleanup
if [[ -n "$active_log" && -s "$active_log" ]]; then
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
if declare -f msg_custom >/dev/null 2>&1; then
msg_custom "📋" "${YW}" "Full log: ${active_log}"
else
echo -e "${YW}Full log:${CL} ${BL}${active_log}${CL}"
fi
fi
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# Offer to remove container if it exists (build errors after container creation)
if [[ -n "${CTID:-}" ]] && command -v pct &>/dev/null && pct status "$CTID" &>/dev/null; then
echo ""
if declare -f msg_custom >/dev/null 2>&1; then
echo -en "${TAB}${TAB}${YW}Remove broken container ${CTID}? (Y/n) [auto-remove in 60s]: ${CL}"
else
echo -en "${YW}Remove broken container ${CTID}? (Y/n) [auto-remove in 60s]: ${CL}"
fi
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# Reset terminal state and read directly from /dev/tty (fresh open = clean state)
stty sane 2>/dev/null || true
local response=""
if read -t 60 -r response </dev/tty 2>/dev/null; then
if [[ -z "$response" || "$response" =~ ^[Yy]$ ]]; then
core: improve error reporting with structured error strings and better categorization + output formatting (#11907) * fix(telemetry): improve error reporting with structured error strings and better categorization - Add build_error_string() that creates structured format: 'exit_code=N | description\n---\n<last 20 log lines>' - Fix categorize_error() to map ALL known exit codes: - Added: shell(1,2), proxmox(200-231), service(150-154), database(170-193), runtime(243-249), signal(139,141,143) - Split timeout from network (28 was in both) - Added DPKG(255) to dependency category - Update all API functions to use build_error_string(): post_update_to_api, post_update_to_api_extended, post_tool_to_api, post_addon_to_api - Add ensure_log_on_host() calls to on_exit, on_interrupt, on_terminate handlers to prevent race condition where telemetry reports before container log is pulled to host * fix(ui): improve error output formatting and remove redundant log paths - error_handler: Use msg_info/msg_ok/msg_warn for container cleanup instead of raw echo with manual ANSI codes - error_handler: Add ❓ icon before 'Remove broken container?' prompt - error_handler: Indent log output with TAB for visual consistency - build.func: Use msg_custom for installation log path display - build.func: Use msg_info → msg_ok for container removal flow - build.func: Use msg_warn for 'kept for debugging' message - core.func/vm-core.func: Remove redundant container-internal log path display (📋 View full log) since combined log on host is the canonical location shown after failure
2026-02-14 15:28:30 +01:00
echo ""
if declare -f msg_info >/dev/null 2>&1; then
msg_info "Removing container ${CTID}"
core: improve error reporting with structured error strings and better categorization + output formatting (#11907) * fix(telemetry): improve error reporting with structured error strings and better categorization - Add build_error_string() that creates structured format: 'exit_code=N | description\n---\n<last 20 log lines>' - Fix categorize_error() to map ALL known exit codes: - Added: shell(1,2), proxmox(200-231), service(150-154), database(170-193), runtime(243-249), signal(139,141,143) - Split timeout from network (28 was in both) - Added DPKG(255) to dependency category - Update all API functions to use build_error_string(): post_update_to_api, post_update_to_api_extended, post_tool_to_api, post_addon_to_api - Add ensure_log_on_host() calls to on_exit, on_interrupt, on_terminate handlers to prevent race condition where telemetry reports before container log is pulled to host * fix(ui): improve error output formatting and remove redundant log paths - error_handler: Use msg_info/msg_ok/msg_warn for container cleanup instead of raw echo with manual ANSI codes - error_handler: Add ❓ icon before 'Remove broken container?' prompt - error_handler: Indent log output with TAB for visual consistency - build.func: Use msg_custom for installation log path display - build.func: Use msg_info → msg_ok for container removal flow - build.func: Use msg_warn for 'kept for debugging' message - core.func/vm-core.func: Remove redundant container-internal log path display (📋 View full log) since combined log on host is the canonical location shown after failure
2026-02-14 15:28:30 +01:00
else
echo -e "${YW}Removing container ${CTID}${CL}"
core: improve error reporting with structured error strings and better categorization + output formatting (#11907) * fix(telemetry): improve error reporting with structured error strings and better categorization - Add build_error_string() that creates structured format: 'exit_code=N | description\n---\n<last 20 log lines>' - Fix categorize_error() to map ALL known exit codes: - Added: shell(1,2), proxmox(200-231), service(150-154), database(170-193), runtime(243-249), signal(139,141,143) - Split timeout from network (28 was in both) - Added DPKG(255) to dependency category - Update all API functions to use build_error_string(): post_update_to_api, post_update_to_api_extended, post_tool_to_api, post_addon_to_api - Add ensure_log_on_host() calls to on_exit, on_interrupt, on_terminate handlers to prevent race condition where telemetry reports before container log is pulled to host * fix(ui): improve error output formatting and remove redundant log paths - error_handler: Use msg_info/msg_ok/msg_warn for container cleanup instead of raw echo with manual ANSI codes - error_handler: Add ❓ icon before 'Remove broken container?' prompt - error_handler: Indent log output with TAB for visual consistency - build.func: Use msg_custom for installation log path display - build.func: Use msg_info → msg_ok for container removal flow - build.func: Use msg_warn for 'kept for debugging' message - core.func/vm-core.func: Remove redundant container-internal log path display (📋 View full log) since combined log on host is the canonical location shown after failure
2026-02-14 15:28:30 +01:00
fi
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
pct stop "$CTID" &>/dev/null || true
pct destroy "$CTID" &>/dev/null || true
core: improve error reporting with structured error strings and better categorization + output formatting (#11907) * fix(telemetry): improve error reporting with structured error strings and better categorization - Add build_error_string() that creates structured format: 'exit_code=N | description\n---\n<last 20 log lines>' - Fix categorize_error() to map ALL known exit codes: - Added: shell(1,2), proxmox(200-231), service(150-154), database(170-193), runtime(243-249), signal(139,141,143) - Split timeout from network (28 was in both) - Added DPKG(255) to dependency category - Update all API functions to use build_error_string(): post_update_to_api, post_update_to_api_extended, post_tool_to_api, post_addon_to_api - Add ensure_log_on_host() calls to on_exit, on_interrupt, on_terminate handlers to prevent race condition where telemetry reports before container log is pulled to host * fix(ui): improve error output formatting and remove redundant log paths - error_handler: Use msg_info/msg_ok/msg_warn for container cleanup instead of raw echo with manual ANSI codes - error_handler: Add ❓ icon before 'Remove broken container?' prompt - error_handler: Indent log output with TAB for visual consistency - build.func: Use msg_custom for installation log path display - build.func: Use msg_info → msg_ok for container removal flow - build.func: Use msg_warn for 'kept for debugging' message - core.func/vm-core.func: Remove redundant container-internal log path display (📋 View full log) since combined log on host is the canonical location shown after failure
2026-02-14 15:28:30 +01:00
if declare -f msg_ok >/dev/null 2>&1; then
msg_ok "Container ${CTID} removed"
else
echo -e "${GN}${CL} Container ${CTID} removed"
fi
elif [[ "$response" =~ ^[Nn]$ ]]; then
echo ""
if declare -f msg_warn >/dev/null 2>&1; then
msg_warn "Container ${CTID} kept for debugging"
else
echo -e "${YW}Container ${CTID} kept for debugging${CL}"
fi
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
fi
else
# Timeout - auto-remove
echo ""
if declare -f msg_info >/dev/null 2>&1; then
msg_info "No response - removing container ${CTID}"
else
echo -e "${YW}No response - removing container ${CTID}${CL}"
fi
pct stop "$CTID" &>/dev/null || true
pct destroy "$CTID" &>/dev/null || true
if declare -f msg_ok >/dev/null 2>&1; then
msg_ok "Container ${CTID} removed"
else
echo -e "${GN}${CL} Container ${CTID} removed"
fi
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
fi
# Force one final status update attempt after cleanup
# This ensures status is updated even if the first attempt failed (e.g., HTTP 400)
if declare -f post_update_to_api &>/dev/null; then
post_update_to_api "failed" "$exit_code" "force"
fi
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
fi
fi
exit "$exit_code"
}
# ==============================================================================
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
# SECTION 3: TELEMETRY & CLEANUP HELPERS FOR SIGNAL HANDLERS
# ==============================================================================
# ------------------------------------------------------------------------------
# _send_abort_telemetry()
#
# - Sends failure/abort status to telemetry API
# - Works in BOTH host context (post_update_to_api available) and
# container context (only curl available, api.func not sourced)
# - Container context is critical: without this, container-side failures
# and signal exits are never reported, leaving records stuck in
# "installing" or "configuring" forever
# - Arguments: $1 = exit_code
# ------------------------------------------------------------------------------
_send_abort_telemetry() {
local exit_code="${1:-1}"
# Try full API function first (host context - api.func sourced)
if declare -f post_update_to_api &>/dev/null; then
post_update_to_api "failed" "$exit_code" 2>/dev/null || true
return
fi
# Fallback: direct curl (container context - api.func NOT sourced)
# This is the ONLY way containers can report failures to telemetry
command -v curl &>/dev/null || return 0
[[ "${DIAGNOSTICS:-no}" == "no" ]] && return 0
[[ -z "${RANDOM_UUID:-}" ]] && return 0
core: remove duplicate traps, consolidate error handling and harden signal traps (#12316) * fix(zammad): configure Elasticsearch for LXC container startup - Set discovery.type: single-node (required for single-node ES) - Set xpack.security.enabled: false (not needed in local LXC) - Set bootstrap.memory_lock: false (fails in unprivileged LXC) - Add startup wait loop (up to 60s) to ensure ES is ready before Zammad installation continues Fixes #12301-related recurring Elasticsearch startup failures * refactor(api): eliminate duplicate traps, harden error handling & telemetry Phase 1 - Structural: - Remove api_exit_script() and 5 inline traps from build.func - error_handler.func is now the sole trap owner via catch_errors() - Update api.func comment reference (api_exit_script -> on_exit) Phase 2 - Quality: - Add stop_spinner() + cursor restore to error_handler(), on_interrupt(), on_terminate(), on_hangup() to prevent spinner/cursor artifacts - Enhance _send_abort_telemetry() with error text (last 20 log lines), duration calculation, and 2 retry attempts (was fire-and-forget) - Harden json_escape() to also strip DEL (0x7F) character * fix(build): show spinner during post_update_to_api to prevent Ctrl+Z abort post_update_to_api can take up to 33 seconds worst-case (3 curl attempts x 10s timeout + sleep delays). Without any terminal output during this time, users think the script is stuck and press Ctrl+Z, which prevents the recovery menu from ever appearing. Add msg_info spinner before both post_update_to_api calls in the failure path (initial report + final force retry after recovery menu). * fix(build): prevent SIGTSTP from killing recovery dialog - Replace msg_info/stop_spinner with plain echo for telemetry reporting The background spinner process in non-interactive shells (bash -c) can trigger SIGTSTP, stopping the entire process group before the recovery dialog appears. Plain echo avoids this. - Add trap '' TSTP at failure path entry to ignore suspension signals Prevents Ctrl+Z or terminal-related SIGTSTP from interrupting the recovery menu. Restored with trap - TSTP before exit. - Root cause: msg_info starts a background process (spinner &) that is not properly detached in non-interactive shells where job control (set -m) is OFF. The disown builtin has no effect without job control, leaving the spinner in the same process group. This can cause terminal I/O conflicts during the 33-second post_update_to_api retry window, resulting in [2]+ Stopped. * fix(test): initialize colors and remove illegal local in test harness - Call load_functions() after sourcing core.func to initialize color/formatting/icon variables (RD, GN, YW, CL, TAB, etc.) - Remove 'local' keyword from top-level scope (not inside function) - Default REPO_SOURCE to ref_api instead of main * chore: remove test-recovery-dialog.sh from branch * Revert "fix(zammad): configure Elasticsearch for LXC container startup" This reverts commit 10e450b72f68696775f3b5a0b0c4e416e1b4136b. * fix(build): show telemetry status only in verbose mode Telemetry reporting is an implementation detail that doesn't help the user during failure recovery. Wrap echo statements with VERBOSE check so they only appear when verbose mode is enabled.
2026-02-25 14:08:24 +01:00
# Collect last 20 log lines for error diagnosis (best-effort)
local error_text=""
if [[ -n "${INSTALL_LOG:-}" && -s "${INSTALL_LOG}" ]]; then
error_text=$(tail -n 20 "$INSTALL_LOG" 2>/dev/null | sed 's/\x1b\[[0-9;]*[a-zA-Z]//g; s/\\/\\\\/g; s/"/\\"/g; s/\r//g' | tr '\n' '|' | sed 's/|$//' | tr -d '\000-\010\013\014\016-\037\177') || true
fi
# Calculate duration if start time is available
local duration=""
if [[ -n "${DIAGNOSTICS_START_TIME:-}" ]]; then
duration=$(($(date +%s) - DIAGNOSTICS_START_TIME))
fi
# Build JSON payload with error context
local payload
payload="{\"random_id\":\"${RANDOM_UUID}\",\"execution_id\":\"${EXECUTION_ID:-${RANDOM_UUID}}\",\"type\":\"${TELEMETRY_TYPE:-lxc}\",\"nsapp\":\"${NSAPP:-${app:-unknown}}\",\"status\":\"failed\",\"exit_code\":${exit_code}"
[[ -n "$error_text" ]] && payload="${payload},\"error\":\"${error_text}\""
[[ -n "$duration" ]] && payload="${payload},\"duration\":${duration}"
payload="${payload}}"
local api_url="${TELEMETRY_URL:-https://telemetry.community-scripts.org/telemetry}"
# 2 attempts (retry once on failure) — original had no retry
local attempt
for attempt in 1 2; do
if curl -fsS -m 5 -X POST "$api_url" \
-H "Content-Type: application/json" \
-d "$payload" &>/dev/null; then
return 0
fi
[[ $attempt -eq 1 ]] && sleep 1
done
return 0
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
}
# ------------------------------------------------------------------------------
# _stop_container_if_installing()
#
# - Stops the LXC container if we're in the install phase
# - Prevents orphaned container processes when the host exits due to a signal
# (SSH disconnect, Ctrl+C, SIGTERM) — without this, the container keeps
# running and may send "configuring" status AFTER the host already sent
# "failed", leaving records permanently stuck in "configuring"
# - Only acts when:
# * CONTAINER_INSTALLING flag is set (during lxc-attach in build_container)
# * CTID is set (container was created)
# * pct command is available (we're on the Proxmox host, not inside a container)
# - Does NOT destroy the container — just stops it for potential debugging
# ------------------------------------------------------------------------------
_stop_container_if_installing() {
[[ "${CONTAINER_INSTALLING:-}" == "true" ]] || return 0
[[ -n "${CTID:-}" ]] || return 0
command -v pct &>/dev/null || return 0
pct stop "$CTID" 2>/dev/null || true
}
# ==============================================================================
# SECTION 4: SIGNAL HANDLERS
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# ==============================================================================
# ------------------------------------------------------------------------------
# on_exit()
#
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
# - EXIT trap handler — runs on EVERY script termination
# - Catches orphaned "installing"/"configuring" records:
# * If post_to_api sent "installing" but post_update_to_api never ran
# * Reports final status to prevent records stuck forever
# - Best-effort log collection for failed installs
# - Stops orphaned container processes on failure
# - Cleans up lock files
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# ------------------------------------------------------------------------------
on_exit() {
local exit_code=$?
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
# Report orphaned "installing" records to telemetry API
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
# Catches ALL exit paths: errors, signals, AND clean exits where
# post_to_api was called but post_update_to_api was never called
if [[ "${POST_TO_API_DONE:-}" == "true" && "${POST_UPDATE_DONE:-}" != "true" ]]; then
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
if [[ $exit_code -ne 0 ]]; then
_send_abort_telemetry "$exit_code"
elif declare -f post_update_to_api >/dev/null 2>&1; then
post_update_to_api "done" "0" 2>/dev/null || true
fi
fi
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
# Best-effort log collection on failure (non-critical, telemetry already sent)
if [[ $exit_code -ne 0 ]] && declare -f ensure_log_on_host >/dev/null 2>&1; then
ensure_log_on_host 2>/dev/null || true
fi
# Stop orphaned container if we're in the install phase and exiting with error
if [[ $exit_code -ne 0 ]]; then
_stop_container_if_installing
fi
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
[[ -n "${lockfile:-}" && -e "$lockfile" ]] && rm -f "$lockfile"
exit "$exit_code"
}
# ------------------------------------------------------------------------------
# on_interrupt()
#
# - SIGINT (Ctrl+C) trap handler
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
# - Reports status FIRST (time-critical: container may be dying)
# - Stops orphaned container to prevent "configuring" ghost records
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# - Exits with code 130 (128 + SIGINT=2)
# ------------------------------------------------------------------------------
on_interrupt() {
core: remove duplicate traps, consolidate error handling and harden signal traps (#12316) * fix(zammad): configure Elasticsearch for LXC container startup - Set discovery.type: single-node (required for single-node ES) - Set xpack.security.enabled: false (not needed in local LXC) - Set bootstrap.memory_lock: false (fails in unprivileged LXC) - Add startup wait loop (up to 60s) to ensure ES is ready before Zammad installation continues Fixes #12301-related recurring Elasticsearch startup failures * refactor(api): eliminate duplicate traps, harden error handling & telemetry Phase 1 - Structural: - Remove api_exit_script() and 5 inline traps from build.func - error_handler.func is now the sole trap owner via catch_errors() - Update api.func comment reference (api_exit_script -> on_exit) Phase 2 - Quality: - Add stop_spinner() + cursor restore to error_handler(), on_interrupt(), on_terminate(), on_hangup() to prevent spinner/cursor artifacts - Enhance _send_abort_telemetry() with error text (last 20 log lines), duration calculation, and 2 retry attempts (was fire-and-forget) - Harden json_escape() to also strip DEL (0x7F) character * fix(build): show spinner during post_update_to_api to prevent Ctrl+Z abort post_update_to_api can take up to 33 seconds worst-case (3 curl attempts x 10s timeout + sleep delays). Without any terminal output during this time, users think the script is stuck and press Ctrl+Z, which prevents the recovery menu from ever appearing. Add msg_info spinner before both post_update_to_api calls in the failure path (initial report + final force retry after recovery menu). * fix(build): prevent SIGTSTP from killing recovery dialog - Replace msg_info/stop_spinner with plain echo for telemetry reporting The background spinner process in non-interactive shells (bash -c) can trigger SIGTSTP, stopping the entire process group before the recovery dialog appears. Plain echo avoids this. - Add trap '' TSTP at failure path entry to ignore suspension signals Prevents Ctrl+Z or terminal-related SIGTSTP from interrupting the recovery menu. Restored with trap - TSTP before exit. - Root cause: msg_info starts a background process (spinner &) that is not properly detached in non-interactive shells where job control (set -m) is OFF. The disown builtin has no effect without job control, leaving the spinner in the same process group. This can cause terminal I/O conflicts during the 33-second post_update_to_api retry window, resulting in [2]+ Stopped. * fix(test): initialize colors and remove illegal local in test harness - Call load_functions() after sourcing core.func to initialize color/formatting/icon variables (RD, GN, YW, CL, TAB, etc.) - Remove 'local' keyword from top-level scope (not inside function) - Default REPO_SOURCE to ref_api instead of main * chore: remove test-recovery-dialog.sh from branch * Revert "fix(zammad): configure Elasticsearch for LXC container startup" This reverts commit 10e450b72f68696775f3b5a0b0c4e416e1b4136b. * fix(build): show telemetry status only in verbose mode Telemetry reporting is an implementation detail that doesn't help the user during failure recovery. Wrap echo statements with VERBOSE check so they only appear when verbose mode is enabled.
2026-02-25 14:08:24 +01:00
# Stop spinner and restore cursor before any output
if declare -f stop_spinner >/dev/null 2>&1; then
stop_spinner 2>/dev/null || true
fi
printf "\e[?25h" 2>/dev/null || true
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
_send_abort_telemetry "130"
_stop_container_if_installing
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
if declare -f msg_error >/dev/null 2>&1; then
core: Execution ID & Telemetry Improvements (#12041) * fix: send telemetry BEFORE log collection in signal handlers - Swap ensure_log_on_host/post_update_to_api order in on_interrupt, on_terminate, api_exit_script, and inline SIGHUP/SIGINT/SIGTERM traps - For signal exits (>128): send telemetry immediately, then best-effort log collection - Add 2>/dev/null || true to all I/O in signal handlers to prevent SIGPIPE - Fix on_exit: exit_code=0 now reports 'done' instead of 'failed 1' - Root cause: pct pull hangs on dying containers blocked telemetry updates, leaving 595+ records stuck in 'installing' daily * feat: add execution_id to all telemetry payloads - Generate EXECUTION_ID from RANDOM_UUID in variables() - Export EXECUTION_ID to container environment - Add execution_id field to all 8 API payloads in api.func - Add execution_id to post_progress_to_api in install.func and alpine-install.func - Fallback to RANDOM_UUID when EXECUTION_ID not set (backward compat) * fix: correct telemetry type values for PVE and addon scripts - PVE scripts (tools/pve/*): change type 'tool' -> 'pve' - Addon scripts (tools/addon/*): fix 4 scripts that wrongly used 'tool' -> 'addon' (netdata, add-tailscale-lxc, add-netbird-lxc, all-templates) - api.func: post_tool_to_api sends type='pve', default fallback 'pve' - Aligns with PocketBase categories: lxc, vm, pve, addon * fix: persist diagnostics opt-in inside containers for addon telemetry - install.func + alpine-install.func: create /usr/local/community-scripts/diagnostics inside the container when DIAGNOSTICS=yes (from build.func export) - Enables addon scripts running later inside containers to find the opt-in - Update init_tool_telemetry default type from 'tool' to 'pve' * refactor: clean up diagnostics/telemetry opt-in system - diagnostics_check(): deduplicate heredoc (was 2x 22 lines), improve whiptail text with clear what/what-not collected, add telemetry + privacy links - diagnostics_menu(): better UX with current status, clear enable/disable buttons, note about existing containers - variables(): change DIAGNOSTICS default from 'yes' to 'no' (safe: no telemetry before user consents via diagnostics_check) - install.func + alpine-install.func: persist BOTH yes AND no in container so opt-out is explicit (not just missing file = no) - Fix typo 'menue' -> 'menu' in config file comments * fix: no pre-selection in telemetry dialog, link to telemetry-service README - Add --defaultno so 'No, opt out' is focused by default (user must Tab to Yes) - Change privacy link from discussions/1836 to telemetry-service#privacy--compliance * fix: use radiolist for telemetry dialog (no pre-selection) - Replace --yesno with --radiolist: user must actively SPACE-select an option - Both options start as OFF (no pre-selection) - Cancel/Exit defaults to 'no' (opt-out) * simplify: inline telemetry dialog text like other whiptail dialogs * improve: telemetry dialog with more detail, link to PRIVACY.md - Add what we collect / don't collect sections back to dialog - Link to telemetry-service/docs/PRIVACY.md instead of README anchor - Update config file comment with same link
2026-02-18 10:24:06 +01:00
msg_error "Interrupted by user (SIGINT)" 2>/dev/null || true
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
else
core: Execution ID & Telemetry Improvements (#12041) * fix: send telemetry BEFORE log collection in signal handlers - Swap ensure_log_on_host/post_update_to_api order in on_interrupt, on_terminate, api_exit_script, and inline SIGHUP/SIGINT/SIGTERM traps - For signal exits (>128): send telemetry immediately, then best-effort log collection - Add 2>/dev/null || true to all I/O in signal handlers to prevent SIGPIPE - Fix on_exit: exit_code=0 now reports 'done' instead of 'failed 1' - Root cause: pct pull hangs on dying containers blocked telemetry updates, leaving 595+ records stuck in 'installing' daily * feat: add execution_id to all telemetry payloads - Generate EXECUTION_ID from RANDOM_UUID in variables() - Export EXECUTION_ID to container environment - Add execution_id field to all 8 API payloads in api.func - Add execution_id to post_progress_to_api in install.func and alpine-install.func - Fallback to RANDOM_UUID when EXECUTION_ID not set (backward compat) * fix: correct telemetry type values for PVE and addon scripts - PVE scripts (tools/pve/*): change type 'tool' -> 'pve' - Addon scripts (tools/addon/*): fix 4 scripts that wrongly used 'tool' -> 'addon' (netdata, add-tailscale-lxc, add-netbird-lxc, all-templates) - api.func: post_tool_to_api sends type='pve', default fallback 'pve' - Aligns with PocketBase categories: lxc, vm, pve, addon * fix: persist diagnostics opt-in inside containers for addon telemetry - install.func + alpine-install.func: create /usr/local/community-scripts/diagnostics inside the container when DIAGNOSTICS=yes (from build.func export) - Enables addon scripts running later inside containers to find the opt-in - Update init_tool_telemetry default type from 'tool' to 'pve' * refactor: clean up diagnostics/telemetry opt-in system - diagnostics_check(): deduplicate heredoc (was 2x 22 lines), improve whiptail text with clear what/what-not collected, add telemetry + privacy links - diagnostics_menu(): better UX with current status, clear enable/disable buttons, note about existing containers - variables(): change DIAGNOSTICS default from 'yes' to 'no' (safe: no telemetry before user consents via diagnostics_check) - install.func + alpine-install.func: persist BOTH yes AND no in container so opt-out is explicit (not just missing file = no) - Fix typo 'menue' -> 'menu' in config file comments * fix: no pre-selection in telemetry dialog, link to telemetry-service README - Add --defaultno so 'No, opt out' is focused by default (user must Tab to Yes) - Change privacy link from discussions/1836 to telemetry-service#privacy--compliance * fix: use radiolist for telemetry dialog (no pre-selection) - Replace --yesno with --radiolist: user must actively SPACE-select an option - Both options start as OFF (no pre-selection) - Cancel/Exit defaults to 'no' (opt-out) * simplify: inline telemetry dialog text like other whiptail dialogs * improve: telemetry dialog with more detail, link to PRIVACY.md - Add what we collect / don't collect sections back to dialog - Link to telemetry-service/docs/PRIVACY.md instead of README anchor - Update config file comment with same link
2026-02-18 10:24:06 +01:00
echo -e "\n${RD}Interrupted by user (SIGINT)${CL}" 2>/dev/null || true
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
fi
exit 130
}
# ------------------------------------------------------------------------------
# on_terminate()
#
# - SIGTERM trap handler
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
# - Reports status FIRST (time-critical: process being killed)
# - Stops orphaned container to prevent "configuring" ghost records
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# - Exits with code 143 (128 + SIGTERM=15)
# ------------------------------------------------------------------------------
on_terminate() {
core: remove duplicate traps, consolidate error handling and harden signal traps (#12316) * fix(zammad): configure Elasticsearch for LXC container startup - Set discovery.type: single-node (required for single-node ES) - Set xpack.security.enabled: false (not needed in local LXC) - Set bootstrap.memory_lock: false (fails in unprivileged LXC) - Add startup wait loop (up to 60s) to ensure ES is ready before Zammad installation continues Fixes #12301-related recurring Elasticsearch startup failures * refactor(api): eliminate duplicate traps, harden error handling & telemetry Phase 1 - Structural: - Remove api_exit_script() and 5 inline traps from build.func - error_handler.func is now the sole trap owner via catch_errors() - Update api.func comment reference (api_exit_script -> on_exit) Phase 2 - Quality: - Add stop_spinner() + cursor restore to error_handler(), on_interrupt(), on_terminate(), on_hangup() to prevent spinner/cursor artifacts - Enhance _send_abort_telemetry() with error text (last 20 log lines), duration calculation, and 2 retry attempts (was fire-and-forget) - Harden json_escape() to also strip DEL (0x7F) character * fix(build): show spinner during post_update_to_api to prevent Ctrl+Z abort post_update_to_api can take up to 33 seconds worst-case (3 curl attempts x 10s timeout + sleep delays). Without any terminal output during this time, users think the script is stuck and press Ctrl+Z, which prevents the recovery menu from ever appearing. Add msg_info spinner before both post_update_to_api calls in the failure path (initial report + final force retry after recovery menu). * fix(build): prevent SIGTSTP from killing recovery dialog - Replace msg_info/stop_spinner with plain echo for telemetry reporting The background spinner process in non-interactive shells (bash -c) can trigger SIGTSTP, stopping the entire process group before the recovery dialog appears. Plain echo avoids this. - Add trap '' TSTP at failure path entry to ignore suspension signals Prevents Ctrl+Z or terminal-related SIGTSTP from interrupting the recovery menu. Restored with trap - TSTP before exit. - Root cause: msg_info starts a background process (spinner &) that is not properly detached in non-interactive shells where job control (set -m) is OFF. The disown builtin has no effect without job control, leaving the spinner in the same process group. This can cause terminal I/O conflicts during the 33-second post_update_to_api retry window, resulting in [2]+ Stopped. * fix(test): initialize colors and remove illegal local in test harness - Call load_functions() after sourcing core.func to initialize color/formatting/icon variables (RD, GN, YW, CL, TAB, etc.) - Remove 'local' keyword from top-level scope (not inside function) - Default REPO_SOURCE to ref_api instead of main * chore: remove test-recovery-dialog.sh from branch * Revert "fix(zammad): configure Elasticsearch for LXC container startup" This reverts commit 10e450b72f68696775f3b5a0b0c4e416e1b4136b. * fix(build): show telemetry status only in verbose mode Telemetry reporting is an implementation detail that doesn't help the user during failure recovery. Wrap echo statements with VERBOSE check so they only appear when verbose mode is enabled.
2026-02-25 14:08:24 +01:00
# Stop spinner and restore cursor before any output
if declare -f stop_spinner >/dev/null 2>&1; then
stop_spinner 2>/dev/null || true
fi
printf "\e[?25h" 2>/dev/null || true
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
_send_abort_telemetry "143"
_stop_container_if_installing
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
if declare -f msg_error >/dev/null 2>&1; then
core: Execution ID & Telemetry Improvements (#12041) * fix: send telemetry BEFORE log collection in signal handlers - Swap ensure_log_on_host/post_update_to_api order in on_interrupt, on_terminate, api_exit_script, and inline SIGHUP/SIGINT/SIGTERM traps - For signal exits (>128): send telemetry immediately, then best-effort log collection - Add 2>/dev/null || true to all I/O in signal handlers to prevent SIGPIPE - Fix on_exit: exit_code=0 now reports 'done' instead of 'failed 1' - Root cause: pct pull hangs on dying containers blocked telemetry updates, leaving 595+ records stuck in 'installing' daily * feat: add execution_id to all telemetry payloads - Generate EXECUTION_ID from RANDOM_UUID in variables() - Export EXECUTION_ID to container environment - Add execution_id field to all 8 API payloads in api.func - Add execution_id to post_progress_to_api in install.func and alpine-install.func - Fallback to RANDOM_UUID when EXECUTION_ID not set (backward compat) * fix: correct telemetry type values for PVE and addon scripts - PVE scripts (tools/pve/*): change type 'tool' -> 'pve' - Addon scripts (tools/addon/*): fix 4 scripts that wrongly used 'tool' -> 'addon' (netdata, add-tailscale-lxc, add-netbird-lxc, all-templates) - api.func: post_tool_to_api sends type='pve', default fallback 'pve' - Aligns with PocketBase categories: lxc, vm, pve, addon * fix: persist diagnostics opt-in inside containers for addon telemetry - install.func + alpine-install.func: create /usr/local/community-scripts/diagnostics inside the container when DIAGNOSTICS=yes (from build.func export) - Enables addon scripts running later inside containers to find the opt-in - Update init_tool_telemetry default type from 'tool' to 'pve' * refactor: clean up diagnostics/telemetry opt-in system - diagnostics_check(): deduplicate heredoc (was 2x 22 lines), improve whiptail text with clear what/what-not collected, add telemetry + privacy links - diagnostics_menu(): better UX with current status, clear enable/disable buttons, note about existing containers - variables(): change DIAGNOSTICS default from 'yes' to 'no' (safe: no telemetry before user consents via diagnostics_check) - install.func + alpine-install.func: persist BOTH yes AND no in container so opt-out is explicit (not just missing file = no) - Fix typo 'menue' -> 'menu' in config file comments * fix: no pre-selection in telemetry dialog, link to telemetry-service README - Add --defaultno so 'No, opt out' is focused by default (user must Tab to Yes) - Change privacy link from discussions/1836 to telemetry-service#privacy--compliance * fix: use radiolist for telemetry dialog (no pre-selection) - Replace --yesno with --radiolist: user must actively SPACE-select an option - Both options start as OFF (no pre-selection) - Cancel/Exit defaults to 'no' (opt-out) * simplify: inline telemetry dialog text like other whiptail dialogs * improve: telemetry dialog with more detail, link to PRIVACY.md - Add what we collect / don't collect sections back to dialog - Link to telemetry-service/docs/PRIVACY.md instead of README anchor - Update config file comment with same link
2026-02-18 10:24:06 +01:00
msg_error "Terminated by signal (SIGTERM)" 2>/dev/null || true
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
else
core: Execution ID & Telemetry Improvements (#12041) * fix: send telemetry BEFORE log collection in signal handlers - Swap ensure_log_on_host/post_update_to_api order in on_interrupt, on_terminate, api_exit_script, and inline SIGHUP/SIGINT/SIGTERM traps - For signal exits (>128): send telemetry immediately, then best-effort log collection - Add 2>/dev/null || true to all I/O in signal handlers to prevent SIGPIPE - Fix on_exit: exit_code=0 now reports 'done' instead of 'failed 1' - Root cause: pct pull hangs on dying containers blocked telemetry updates, leaving 595+ records stuck in 'installing' daily * feat: add execution_id to all telemetry payloads - Generate EXECUTION_ID from RANDOM_UUID in variables() - Export EXECUTION_ID to container environment - Add execution_id field to all 8 API payloads in api.func - Add execution_id to post_progress_to_api in install.func and alpine-install.func - Fallback to RANDOM_UUID when EXECUTION_ID not set (backward compat) * fix: correct telemetry type values for PVE and addon scripts - PVE scripts (tools/pve/*): change type 'tool' -> 'pve' - Addon scripts (tools/addon/*): fix 4 scripts that wrongly used 'tool' -> 'addon' (netdata, add-tailscale-lxc, add-netbird-lxc, all-templates) - api.func: post_tool_to_api sends type='pve', default fallback 'pve' - Aligns with PocketBase categories: lxc, vm, pve, addon * fix: persist diagnostics opt-in inside containers for addon telemetry - install.func + alpine-install.func: create /usr/local/community-scripts/diagnostics inside the container when DIAGNOSTICS=yes (from build.func export) - Enables addon scripts running later inside containers to find the opt-in - Update init_tool_telemetry default type from 'tool' to 'pve' * refactor: clean up diagnostics/telemetry opt-in system - diagnostics_check(): deduplicate heredoc (was 2x 22 lines), improve whiptail text with clear what/what-not collected, add telemetry + privacy links - diagnostics_menu(): better UX with current status, clear enable/disable buttons, note about existing containers - variables(): change DIAGNOSTICS default from 'yes' to 'no' (safe: no telemetry before user consents via diagnostics_check) - install.func + alpine-install.func: persist BOTH yes AND no in container so opt-out is explicit (not just missing file = no) - Fix typo 'menue' -> 'menu' in config file comments * fix: no pre-selection in telemetry dialog, link to telemetry-service README - Add --defaultno so 'No, opt out' is focused by default (user must Tab to Yes) - Change privacy link from discussions/1836 to telemetry-service#privacy--compliance * fix: use radiolist for telemetry dialog (no pre-selection) - Replace --yesno with --radiolist: user must actively SPACE-select an option - Both options start as OFF (no pre-selection) - Cancel/Exit defaults to 'no' (opt-out) * simplify: inline telemetry dialog text like other whiptail dialogs * improve: telemetry dialog with more detail, link to PRIVACY.md - Add what we collect / don't collect sections back to dialog - Link to telemetry-service/docs/PRIVACY.md instead of README anchor - Update config file comment with same link
2026-02-18 10:24:06 +01:00
echo -e "\n${RD}Terminated by signal (SIGTERM)${CL}" 2>/dev/null || true
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
fi
exit 143
}
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
# ------------------------------------------------------------------------------
# on_hangup()
#
# - SIGHUP trap handler (SSH disconnect, terminal closed)
# - CRITICAL: This was previously MISSING from catch_errors(), causing
# container processes to become orphans on SSH disconnect — the #1 cause
# of records stuck in "installing" and "configuring" states
# - Reports status via direct curl (terminal is already closed, no output)
# - Stops orphaned container to prevent ghost records
# - Exits with code 129 (128 + SIGHUP=1)
# ------------------------------------------------------------------------------
on_hangup() {
core: remove duplicate traps, consolidate error handling and harden signal traps (#12316) * fix(zammad): configure Elasticsearch for LXC container startup - Set discovery.type: single-node (required for single-node ES) - Set xpack.security.enabled: false (not needed in local LXC) - Set bootstrap.memory_lock: false (fails in unprivileged LXC) - Add startup wait loop (up to 60s) to ensure ES is ready before Zammad installation continues Fixes #12301-related recurring Elasticsearch startup failures * refactor(api): eliminate duplicate traps, harden error handling & telemetry Phase 1 - Structural: - Remove api_exit_script() and 5 inline traps from build.func - error_handler.func is now the sole trap owner via catch_errors() - Update api.func comment reference (api_exit_script -> on_exit) Phase 2 - Quality: - Add stop_spinner() + cursor restore to error_handler(), on_interrupt(), on_terminate(), on_hangup() to prevent spinner/cursor artifacts - Enhance _send_abort_telemetry() with error text (last 20 log lines), duration calculation, and 2 retry attempts (was fire-and-forget) - Harden json_escape() to also strip DEL (0x7F) character * fix(build): show spinner during post_update_to_api to prevent Ctrl+Z abort post_update_to_api can take up to 33 seconds worst-case (3 curl attempts x 10s timeout + sleep delays). Without any terminal output during this time, users think the script is stuck and press Ctrl+Z, which prevents the recovery menu from ever appearing. Add msg_info spinner before both post_update_to_api calls in the failure path (initial report + final force retry after recovery menu). * fix(build): prevent SIGTSTP from killing recovery dialog - Replace msg_info/stop_spinner with plain echo for telemetry reporting The background spinner process in non-interactive shells (bash -c) can trigger SIGTSTP, stopping the entire process group before the recovery dialog appears. Plain echo avoids this. - Add trap '' TSTP at failure path entry to ignore suspension signals Prevents Ctrl+Z or terminal-related SIGTSTP from interrupting the recovery menu. Restored with trap - TSTP before exit. - Root cause: msg_info starts a background process (spinner &) that is not properly detached in non-interactive shells where job control (set -m) is OFF. The disown builtin has no effect without job control, leaving the spinner in the same process group. This can cause terminal I/O conflicts during the 33-second post_update_to_api retry window, resulting in [2]+ Stopped. * fix(test): initialize colors and remove illegal local in test harness - Call load_functions() after sourcing core.func to initialize color/formatting/icon variables (RD, GN, YW, CL, TAB, etc.) - Remove 'local' keyword from top-level scope (not inside function) - Default REPO_SOURCE to ref_api instead of main * chore: remove test-recovery-dialog.sh from branch * Revert "fix(zammad): configure Elasticsearch for LXC container startup" This reverts commit 10e450b72f68696775f3b5a0b0c4e416e1b4136b. * fix(build): show telemetry status only in verbose mode Telemetry reporting is an implementation detail that doesn't help the user during failure recovery. Wrap echo statements with VERBOSE check so they only appear when verbose mode is enabled.
2026-02-25 14:08:24 +01:00
# Stop spinner (no cursor restore needed — terminal is already gone)
if declare -f stop_spinner >/dev/null 2>&1; then
stop_spinner 2>/dev/null || true
fi
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
_send_abort_telemetry "129"
_stop_container_if_installing
exit 129
}
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# ==============================================================================
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
# SECTION 5: INITIALIZATION
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# ==============================================================================
# ------------------------------------------------------------------------------
# catch_errors()
#
# - Initializes error handling and signal traps
# - Enables strict error handling:
# * set -Ee: Exit on error, inherit ERR trap in functions
# * set -o pipefail: Pipeline fails if any command fails
# * set -u: (optional) Exit on undefined variable (if STRICT_UNSET=1)
# - Sets up traps:
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
# * ERR → error_handler (script errors)
# * EXIT → on_exit (any termination — cleanup + orphan detection)
# * INT → on_interrupt (Ctrl+C)
# * TERM → on_terminate (kill / systemd stop)
# * HUP → on_hangup (SSH disconnect / terminal closed)
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
# - Call this function early in every script
# ------------------------------------------------------------------------------
catch_errors() {
set -Ee -o pipefail
if [ "${STRICT_UNSET:-0}" = "1" ]; then
set -u
fi
trap 'error_handler' ERR
trap on_exit EXIT
trap on_interrupt INT
trap on_terminate TERM
core: Enhance signal handling, reported "status" and logs (#12216) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log || true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.
2026-02-23 14:30:48 +01:00
trap on_hangup HUP
Three-tier defaults system | security improvements | error_handler | improved logging | improved container creation | improved architecture (#9540) * Refactor Core Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup. * Delete config-file.func * Update install.func * Refactor stop_all_services function and variable names Refactor service stopping logic and improve variable handling * Refactor installation script and update copyright Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process. * Update install.func * Update license comment format in install.func * Refactor IPv6 handling and enhance MOTD and SSH Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings. * big core refactor * Enhance IPv6 configuration menu options Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6. * Update default Node.js version to 24 LTS * Update misc/alpine-tools.func Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com> * indention * remove debugf and duplicate codes * Update whiptail backtitles and error codes Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes. * comments * Refactor error handling and clean up debug comments Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior. * Update build.func * feat: Improve LXC network checks and LINSTOR storage handling Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage. --------- Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
2025-12-04 07:52:18 +01:00
}