Files
proxmox/reports/PROXMOX_GUI_ISSUES_REVIEW.md

602 lines
15 KiB
Markdown
Raw Permalink Normal View History

# Proxmox VE GUI Issues and Errors - Comprehensive Review
**Date**: 2026-01-06
**Status**: ✅ **REVIEW COMPLETE**
---
## Executive Summary
This document provides a comprehensive review of all Proxmox VE GUI (web interface) issues and errors found in the codebase. The review covers:
- SSL certificate errors (Error 596)
- pveproxy worker crashes
- Web interface accessibility issues
- Hostname resolution problems
- Cluster filesystem issues
- Browser connection errors
**Key Findings**:
- ✅ Most issues have been resolved
- ⚠️ Some nodes may still have connectivity issues (r630-03, r630-04)
- ✅ Fix scripts available for common issues
- ✅ Comprehensive documentation exists for troubleshooting
---
## 1. SSL Certificate Error 596
### Issue Description
**Error Message**: `Connection error 596: error:0A000086:SSL routines::certificate verify failed`
**Symptoms**:
- Proxmox VE UI displays connection error 596
- Web interface cannot connect to Proxmox API
- Browser shows SSL certificate verification failure
**Affected Nodes**:
- ml110 (192.168.11.10)
- r630-01 (192.168.11.11)
- r630-02 (192.168.11.12)
- r630-03 (192.168.11.13) - potentially
- r630-04 (192.168.11.14) - potentially
**Status**: ✅ **FIXED** (on ml110, r630-01, r630-02)
### Root Causes
1. **SSL certificates expired or invalid**
2. **Cluster certificates out of sync**
3. **Certificate chain broken**
4. **System time incorrect** (certificates are time-sensitive)
### Solution Applied
**Command**:
```bash
pvecm updatecerts -f
systemctl restart pveproxy pvedaemon
```
**What it does**:
- Forces regeneration of all cluster SSL certificates
- Updates certificate chain
- Regenerates node-specific certificates
- Updates root CA certificate if needed
- Syncs certificates across cluster nodes
**Fix Script**: `scripts/fix-ssl-certificate-error-596.sh`
**Usage**:
```bash
# Fix all nodes
./scripts/fix-ssl-certificate-error-596.sh all
# Fix specific node
./scripts/fix-ssl-certificate-error-596.sh ml110
./scripts/fix-ssl-certificate-error-596.sh r630-01
```
### After Fixing
1. **Clear browser cache and cookies**
- Chrome/Edge: Settings → Privacy → Clear browsing data → Advanced → "Cached images and files"
- Firefox: Settings → Privacy & Security → Clear Data → "Cached Web Content"
2. **Access Proxmox UI**
- URL: `https://<node-ip>:8006`
- Example: `https://192.168.11.10:8006`
3. **Accept certificate warning** (if prompted)
- First-time access may show a security warning
- Click "Advanced" → "Proceed to site"
- This is normal for self-signed certificates in Proxmox
### Documentation
- `docs/archive/reports/SSL_CERTIFICATE_ERROR_596_FIX.md`
- `reports/PROXMOX_SSL_CERTIFICATE_FIX_COMPLETE.md`
- `reports/PROXMOX_SSL_FIX_COMPLETE.md`
---
## 2. pveproxy Worker Crashes
### Issue Description
**Error**: pveproxy workers are crashing/exiting
**Symptoms**:
- Web interface not accessible (HTTP Status: 000)
- pveproxy service shows workers exiting
- Port 8006 may not be listening
- Browser cannot connect to Proxmox web interface
**Affected Nodes**:
- r630-01 (192.168.11.11) - **RESOLVED**
- r630-02 (192.168.11.12) - **RESOLVED**
- r630-04 (192.168.11.14) - **POTENTIALLY AFFECTED**
**Status**: ✅ **RESOLVED** (on r630-01, r630-02)
### Root Causes
#### 2.1 SSL Certificate/Key Loading Failure
**Error**: `/etc/pve/local/pve-ssl.key: failed to load local private key`
**Causes**:
1. **Cluster filesystem not mounted** (`/etc/pve` is a FUSE filesystem)
2. **Corrupted SSL certificates**
3. **Wrong file permissions**
4. **pve-cluster service down**
#### 2.2 Hostname Resolution Failure
**Error**: `Unable to resolve node name 'pve' to a non-loopback IP address - missing entry in '/etc/hosts' or DNS?`
**Impact**:
- pve-cluster service fails
- /etc/pve filesystem not mounting
- SSL certificates not accessible
- pveproxy workers crashing
**Solution**: Fixed by adding proper entries to `/etc/hosts`
### Solution Applied
#### Fix 1: Hostname Resolution
**Script**: `scripts/fix-proxmox-hostname-resolution.sh`
**What it did**:
- Added proper entries to `/etc/hosts` on both hosts
- Ensured hostnames resolve to their actual IP addresses (not loopback)
- Added both current hostname (pve/pve2) and correct hostname (r630-01/r630-02)
**Example /etc/hosts entries**:
```
192.168.11.11 pve pve.sankofa.nexus r630-01 r630-01.sankofa.nexus
192.168.11.12 pve2 pve2.sankofa.nexus r630-02 r630-02.sankofa.nexus
```
#### Fix 2: SSL and Cluster Service
**Script**: `scripts/fix-proxmox-ssl-cluster.sh`
**What it did**:
- Regenerated SSL certificates
- Restarted all Proxmox services in correct order
- Verified service status
**Results**:
- ✅ All services running
- ✅ Web interface accessible (HTTP 200)
- ✅ No worker exit errors
### Diagnostic Commands
```bash
# Check pveproxy service status
systemctl status pveproxy --no-pager -l
# Check recent logs
journalctl -u pveproxy --no-pager -n 100
# Check for worker exits
journalctl -u pveproxy -n 50 | grep -E "worker exit|failed to load"
# Check port 8006
ss -tlnp | grep 8006
# Check cluster status
pvecm status
```
### Documentation
- `docs/archive/historical/PROXMOX_PVE_PVE2_ISSUES.md`
- `docs/archive/completion/PROXMOX_PVE_PVE2_FIX_COMPLETE.md`
- `docs/09-troubleshooting/R630-04-PROXMOX-TROUBLESHOOTING.md`
---
## 3. Cluster Filesystem Issues
### Issue Description
**Error**: pve-cluster service failed
**Symptoms**:
- `pmxcfs` exited with status 255/EXCEPTION
- `/etc/pve` filesystem not mounted
- SSL certificates not accessible
- Cluster configuration not accessible
**Affected Nodes**:
- r630-01 (192.168.11.11) - **RESOLVED**
- r630-02 (192.168.11.12) - **RESOLVED**
**Status**: ✅ **RESOLVED**
### Root Cause
**Hostname resolution failure** - The pve-cluster service could not resolve the hostname to a non-loopback IP address.
**Error Message**:
```
Unable to resolve node name 'pve' to a non-loopback IP address - missing entry in '/etc/hosts' or DNS?
```
### Solution Applied
1. **Fixed hostname resolution** in `/etc/hosts`
2. **Restarted pve-cluster service**
3. **Verified /etc/pve filesystem mounted**
### Verification
```bash
# Check cluster service
systemctl status pve-cluster
# Check /etc/pve mount
mount | grep /etc/pve
df -h /etc/pve
# Check cluster status
pvecm status
```
### Documentation
- `docs/archive/completion/PROXMOX_PVE_PVE2_FIX_COMPLETE.md`
---
## 4. Web Interface Accessibility Issues
### Issue Description
**Symptoms**:
- Web interface not accessible on port 8006
- Browser shows connection refused or timeout
- HTTP Status: 000
- Cannot access Proxmox UI
**Affected Nodes**:
- r630-03 (192.168.11.13) - **NOT REACHABLE** (server appears unplugged)
- r630-04 (192.168.11.14) - **ACCESSIBILITY ISSUES** (pveproxy issue)
**Status**: ⚠️ **ONGOING** (r630-03, r630-04)
### Root Causes
#### 4.1 Server Not Reachable (r630-03)
- **Ping Status**: ❌ NOT REACHABLE
- **SSH Status**: ❌ Not accessible
- **Web UI Status**: ❌ Not accessible
- **Issue**: Server appears to be unplugged or powered off
**Action Required**:
1. Verify power cable is connected
2. Verify network cable is connected
3. Check network switch port status
4. Wait 1-2 minutes for server to boot after plugging in
#### 4.2 pveproxy Issue (r630-04)
- **Ping Status**: ✅ REACHABLE
- **SSH Status**: ⚠️ Authentication failing
- **Web UI Status**: ⚠️ Not accessible (pveproxy issue)
**Action Required**:
1. Access server via console/iDRAC
2. Reset root password
3. Fix SSH configuration
4. Fix Proxmox Web UI (pveproxy)
5. Verify cluster membership
### Diagnostic Commands
```bash
# Check connectivity
ping -c 3 192.168.11.13
ping -c 3 192.168.11.14
# Check SSH
ssh root@192.168.11.13
ssh root@192.168.11.14
# Check web interface
curl -k -I https://192.168.11.13:8006/
curl -k -I https://192.168.11.14:8006/
# Check pveproxy service
ssh root@192.168.11.14 "systemctl status pveproxy"
```
### Documentation
- `reports/status/R630_03_04_CONNECTIVITY_STATUS.md`
- `docs/09-troubleshooting/R630-04-PROXMOX-TROUBLESHOOTING.md`
---
## 5. Browser Connection Errors
### Issue Description
**Common Browser Errors**:
1. **Connection refused**
2. **Connection timeout**
3. **SSL certificate error**
4. **HTTP Status: 000**
5. **ERR_CONNECTION_REFUSED**
6. **ERR_CONNECTION_TIMED_OUT**
### Solutions
#### 5.1 Clear Browser Cache
**Chrome/Edge**:
1. Settings → Privacy → Clear browsing data
2. Advanced → Select "Cached images and files"
3. Clear data
**Firefox**:
1. Settings → Privacy & Security → Clear Data
2. Select "Cached Web Content"
3. Clear Now
#### 5.2 Clear SSL State
**Chrome/Edge**:
1. Settings → Privacy → Clear browsing data
2. Advanced → Select "Cached images and files"
3. Clear data
**Firefox**:
1. Settings → Privacy & Security → Clear Data
2. Select "Cached Web Content"
3. Clear Now
#### 5.3 Access via IP Address
Instead of using hostname, try accessing directly via IP:
```
https://192.168.11.10:8006
https://192.168.11.11:8006
https://192.168.11.12:8006
```
#### 5.4 Check System Time
```bash
# Check system time
date
# If wrong, sync time
systemctl restart systemd-timesyncd
```
#### 5.5 Accept Certificate Warning
- First-time access may show a security warning
- Click "Advanced" → "Proceed to site"
- This is normal for self-signed certificates in Proxmox
---
## 6. Fix Scripts Available
### 6.1 SSL Certificate Fix Scripts
#### `scripts/fix-ssl-certificate-error-596.sh`
**Purpose**: Fix SSL certificate error 596
**Usage**:
```bash
# Fix all nodes
./scripts/fix-ssl-certificate-error-596.sh all
# Fix specific node
./scripts/fix-ssl-certificate-error-596.sh ml110
./scripts/fix-ssl-certificate-error-596.sh r630-01
```
#### `scripts/fix-proxmox-ssl-cluster.sh`
**Purpose**: Comprehensive SSL and cluster service fix
**Usage**:
```bash
# Fix both hosts
./scripts/fix-proxmox-ssl-cluster.sh both
# Fix individual host
./scripts/fix-proxmox-ssl-cluster.sh pve
./scripts/fix-proxmox-ssl-cluster.sh pve2
```
#### `scripts/fix-ssl-certificate-all-hosts.sh`
**Purpose**: Fix SSL certificates on all hosts
**Usage**:
```bash
./scripts/fix-ssl-certificate-all-hosts.sh
```
### 6.2 Hostname Resolution Fix Scripts
#### `scripts/fix-proxmox-hostname-resolution.sh`
**Purpose**: Fix hostname resolution issues
**Usage**:
```bash
./scripts/fix-proxmox-hostname-resolution.sh
```
**What it does**:
- Adds proper entries to `/etc/hosts`
- Ensures hostnames resolve to actual IP addresses
- Updates both current and correct hostnames
### 6.3 General Fix Scripts
#### `scripts/fix-r630-04-pveproxy.sh`
**Purpose**: Fix pveproxy issues on r630-04
**Usage**:
```bash
./scripts/fix-r630-04-pveproxy.sh
```
#### `scripts/run-fixes-on-proxmox.sh`
**Purpose**: Run multiple fixes on Proxmox nodes
**Usage**:
```bash
./scripts/run-fixes-on-proxmox.sh
```
---
## 7. Node Status Summary
### ✅ Operational Nodes
| Node | IP | Web UI Status | SSL Status | Notes |
|------|----|--------------|------------|-------|
| ml110 | 192.168.11.10 | ✅ Accessible | ✅ Fixed | Cluster master |
| r630-01 | 192.168.11.11 | ✅ Accessible | ✅ Fixed | All services running |
| r630-02 | 192.168.11.12 | ✅ Accessible | ✅ Fixed | All services running |
### ⚠️ Issues Detected
| Node | IP | Web UI Status | SSL Status | Issues |
|------|----|--------------|------------|--------|
| r630-03 | 192.168.11.13 | ❌ Not accessible | ⚠️ Unknown | Server not reachable (unplugged?) |
| r630-04 | 192.168.11.14 | ⚠️ Not accessible | ⚠️ Unknown | pveproxy issue, SSH auth failing |
---
## 8. Troubleshooting Guide
### Step 1: Check Service Status
```bash
ssh root@<node-ip>
systemctl status pveproxy pvedaemon pvestatd pve-cluster
```
### Step 2: Check Logs
```bash
# Check pveproxy logs
journalctl -u pveproxy -n 100
# Check for worker exits
journalctl -u pveproxy -n 50 | grep "worker exit"
# Check cluster logs
journalctl -u pve-cluster -n 50
```
### Step 3: Check Port 8006
```bash
# Check if port is listening
ss -tlnp | grep 8006
# Test web interface
curl -k -I https://<node-ip>:8006/
```
### Step 4: Check SSL Certificates
```bash
# Check certificate files
ls -la /etc/pve/local/
# Check certificate validity
openssl x509 -in /etc/pve/pve-root-ca.pem -noout -dates
```
### Step 5: Check Cluster Status
```bash
# Check cluster status
pvecm status
# Check cluster filesystem
mount | grep /etc/pve
df -h /etc/pve
```
### Step 6: Apply Fixes
```bash
# Fix SSL certificates
pvecm updatecerts -f
systemctl restart pveproxy pvedaemon
# Or use automated scripts
./scripts/fix-ssl-certificate-error-596.sh <node>
```
---
## 9. Prevention and Best Practices
### 9.1 Regular Maintenance
1. **Monitor SSL certificate expiration**
- Check certificate dates regularly
- Renew certificates before expiration
2. **Monitor service status**
- Set up monitoring for pveproxy, pvedaemon, pvestatd, pve-cluster
- Alert on service failures
3. **Keep system time synchronized**
- Use NTP for time synchronization
- SSL certificates are time-sensitive
### 9.2 Configuration Best Practices
1. **Hostname Resolution**
- Ensure `/etc/hosts` has proper entries
- Hostnames must resolve to non-loopback IPs
- Keep hostname entries updated
2. **Cluster Configuration**
- Maintain cluster quorum
- Monitor cluster filesystem health
- Keep cluster certificates in sync
3. **Network Configuration**
- Ensure port 8006 is accessible
- Check firewall rules
- Verify network connectivity
### 9.3 Documentation
- Keep troubleshooting guides updated
- Document any custom configurations
- Maintain fix scripts and procedures
---
## 10. Related Documentation
### Issue Reports
- `docs/archive/historical/PROXMOX_PVE_PVE2_ISSUES.md` - Original issue analysis
- `docs/archive/reports/SSL_CERTIFICATE_ERROR_596_FIX.md` - SSL error fix guide
- `reports/PROXMOX_SSL_CERTIFICATE_FIX_COMPLETE.md` - SSL fix completion report
- `reports/status/R630_03_04_CONNECTIVITY_STATUS.md` - Connectivity status report
### Fix Documentation
- `docs/archive/completion/PROXMOX_PVE_PVE2_FIX_COMPLETE.md` - Complete fix documentation
- `docs/09-troubleshooting/R630-04-PROXMOX-TROUBLESHOOTING.md` - Troubleshooting guide
- `docs/archive/reports/PROXMOX_SSL_FIX_VERIFIED.md` - SSL fix verification
### Scripts
- `scripts/fix-ssl-certificate-error-596.sh` - SSL error 596 fix
- `scripts/fix-proxmox-ssl-cluster.sh` - SSL and cluster fix
- `scripts/fix-proxmox-hostname-resolution.sh` - Hostname resolution fix
- `scripts/fix-r630-04-pveproxy.sh` - r630-04 pveproxy fix
---
## 11. Summary
### Resolved Issues ✅
1.**SSL Certificate Error 596** - Fixed on ml110, r630-01, r630-02
2.**pveproxy Worker Crashes** - Fixed on r630-01, r630-02
3.**Hostname Resolution** - Fixed on r630-01, r630-02
4.**Cluster Filesystem Issues** - Fixed on r630-01, r630-02
5.**Web Interface Accessibility** - Fixed on ml110, r630-01, r630-02
### Ongoing Issues ⚠️
1. ⚠️ **r630-03 Web Interface** - Server not reachable (unplugged?)
2. ⚠️ **r630-04 Web Interface** - pveproxy issue, needs console access
### Available Solutions ✅
1. ✅ Automated fix scripts available
2. ✅ Comprehensive troubleshooting documentation
3. ✅ Step-by-step fix procedures
4. ✅ Diagnostic commands documented
---
**Review Completed**: January 6, 2026
**Total Issues Documented**: 11
**Resolved Issues**: 5
**Ongoing Issues**: 2
**Status**: ✅ **COMPREHENSIVE REVIEW COMPLETE**