Files
proxmox/docs/archive/ML110_DEPLOYMENT_LOG_ANALYSIS.md

4.8 KiB

ML110 Deployment Log Analysis

Date: 2025-12-20
Deployment Attempt: Complete Validated Deployment (Option 1)

Summary

The deployment attempt encountered network configuration errors during container creation, but the containers were not actually created (despite success messages in the logs).

Key Findings

1. Network Configuration Errors

All container creation attempts failed with:

400 Parameter verification failed.
net0: invalid format - format error
net0.ip: invalid format - value does not look like a valid ipv4 network configuration

Affected Containers:

  • Validators: 1000, 1001, 1002, 1003, 1004
  • Sentries: 1500, 1501, 1502, 1503
  • RPC Nodes: 2500, 2501, 2502

2. Script Logic Issue

The deployment script reports "Container created" even when pct create fails. This is misleading because:

  • The pct create command returns an error (400 status)
  • Containers were never actually created (no config files exist)
  • The script continues execution as if containers exist
  • All subsequent steps fail because containers don't exist

3. Network Format Validation

Test Result: The network configuration format is CORRECT:

bridge=vmbr0,name=eth0,ip=192.168.11.100/24,gw=192.168.11.1,type=veth

This format successfully created test container 99999.

4. Container History

System logs show containers were created on Dec 19-20 and later deleted:

  • Validators 1000-1004: Created Dec 19, deleted Dec 20 06:21-06:22
  • Sentries 1500-1503: Created Dec 19, deleted Dec 20 06:22-06:23
  • RPC nodes 2500-2502: Created Dec 19, deleted Dec 20 06:21

5. Deployment Script Issues

Location: /opt/smom-dbis-138-proxmox/scripts/deployment/deploy-besu-nodes.sh

Problems:

  1. Error Handling: Script doesn't check pct create exit code properly
  2. False Success: Reports success even when container creation fails
  3. Variable Expansion: Possible issue with variable expansion in network config string

Expected Network Config (from script):

network_config="bridge=${PROXMOX_BRIDGE:-vmbr0},name=eth0,ip=${ip_address}/${netmask},gw=${gateway},type=veth"

6. Configuration Phase Issues

Since containers don't exist:

  • All configuration file copy attempts are skipped
  • Container status checks all fail (containers not running)
  • Network bootstrap fails (no containers to collect enodes from)

Root Cause Analysis

The actual error suggests that at runtime, the network configuration string may be malformed due to:

  1. Variable Not Set: PROXMOX_BRIDGE, GATEWAY, or NETMASK may be empty or incorrect
  2. Variable Expansion: Shell variable expansion might not be working as expected
  3. String Formatting: The network config string might be getting corrupted during variable substitution

Evidence

Working Containers (Reference)

Containers 100-105 exist and are running, using DHCP:

net0: bridge=vmbr0,firewall=1,hwaddr=BC:24:11:XX:XX:XX,ip=dhcp,type=veth

Test Container Creation

Created container 99999 successfully with static IP format:

bridge=vmbr0,name=eth0,ip=192.168.11.100/24,gw=192.168.11.1,type=veth

Recommendations

1. Fix Script Error Handling

Update deploy-besu-nodes.sh to properly check pct create exit code:

if ! pct create "$vmid" ...; then
    log_error "Failed to create container $vmid"
    return 1
fi
log_success "Container $vmid created"

2. Debug Variable Values

Add logging to show actual network config values before container creation:

log_info "Network config: $network_config"
log_info "PROXMOX_BRIDGE: ${PROXMOX_BRIDGE:-vmbr0}"
log_info "GATEWAY: ${GATEWAY:-192.168.11.1}"
log_info "NETMASK: ${NETMASK:-24}"
log_info "IP Address: $ip_address"

3. Verify Configuration File

Check that /opt/smom-dbis-138-proxmox/config/proxmox.conf and network.conf have correct values:

  • PROXMOX_BRIDGE=vmbr0
  • GATEWAY=192.168.11.1
  • NETMASK=24

4. Alternative: Use DHCP Initially

Since containers 100-105 work with DHCP, consider:

  1. Create containers with DHCP: ip=dhcp
  2. After creation, use pct set to configure static IPs (as in fix-container-ips.sh)

This two-step approach is more reliable.

Next Steps

  1. Fix Script: Update error handling in deploy-besu-nodes.sh
  2. Add Debugging: Add verbose logging for network config values
  3. Test Creation: Create a single test container to verify fix
  4. Re-run Deployment: Execute full deployment after fix

Log Files

  • Deployment Log: /opt/smom-dbis-138-proxmox/logs/deploy-validated-set-20251220-112033.log
  • System Logs: journalctl -u pve-container@* shows container lifecycle

Status: Deployment Failed - Containers not created due to network config error
Action Required: Fix script error handling and verify configuration variable values