- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands - CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround - CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check - NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere - MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates - LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference Co-authored-by: Cursor <cursoragent@cursor.com>
8.8 KiB
r630-02 Log Review Report
Date: 2026-01-19
Host: r630-02 (192.168.11.12)
Review Script: scripts/check-r630-02-logs.sh
Executive Summary
Overall Status: ⚠️ OPERATIONAL WITH CONCERNS
The host is operational with all Proxmox services running, but there are critical memory issues affecting containers, particularly container 7811 (mim-api-1).
Critical Issues
🔴 Out of Memory (OOM) Kills - CRITICAL
Multiple containers experiencing OOM kills:
Container 7811 (mim-api-1) - MOST AFFECTED
Recent OOM kills in container 7811:
- Jan 17 20:35:57 - systemd-journal killed (22MB)
- Jan 17 20:36:13 - node process killed (8.5MB)
- Jan 17 20:43:06 - install process killed (100MB)
- Jan 17 20:52:21 - systemd killed (100MB)
Pattern: Container 7811 is consistently hitting memory limits, causing multiple process kills.
Other Containers with OOM Kills:
- Jan 13 21:03:51 - systemd-journal (UID:100000) - 306MB
- Jan 13 21:47:43 - func process (UID:100000) - 535GB virtual memory
- Jan 14 01:16:47 - systemd-journal (UID:100000) - 100MB
- Jan 14 01:39:33 - npm exec func s (UID:100000) - 708MB
- Jan 14 07:42:15 - systemd-journal (UID:100000) - 39MB
- Jan 14 07:42:26 - npm exec func s (UID:100000) - 632MB
- Jan 14 09:37:11 - apt-get (UID:100000) - 88MB
- Jan 14 11:10:57 - node (UID:100000) - 331MB
- Jan 14 13:01:19 - python3 (UID:100000) - 38MB
- Jan 14 16:06:09 - npm exec func s (UID:100000) - 633MB
- Jan 14 16:40:16 - systemd-journal (UID:100000) - 31MB
- Jan 14 16:48:44 - networkd-dispat (UID:100000) - 29MB
- Jan 15 12:30:31 - systemd-journal (UID:100000) - 311MB
- Jan 15 12:30:33 - func (UID:100000) - 535GB virtual memory
- Jan 16 20:57:40 - systemd-journal (UID:100000) - 109MB
- Jan 17 11:35:10 - systemd-journal (UID:100000) - 43MB
- Jan 17 13:10:57 - networkd-dispat (UID:100000) - 29MB
- Jan 17 13:34:59 - node (UID:100000) - 330MB
- Jan 17 14:09:49 - python3 (UID:100000) - 20MB
- Jan 17 19:01:50 - apt-get (UID:100000) - 88MB
- Jan 17 19:38:39 - systemd-journal (UID:100000) - 31MB
- Jan 17 19:52:50 - node (UID:100000) - 330MB
- Jan 17 20:09:35 - apt-get (UID:100000) - 88MB
Analysis:
- All OOM kills are from containers (UID:100000 = container namespace)
- Most common victims: systemd-journal, node processes, npm exec func s, apt-get
- Container 7811 (mim-api-1) appears to be the most affected
- Some processes show very high virtual memory (535GB) which may indicate memory leaks
Recommendation:
- URGENT: Review and increase memory limits for affected containers, especially 7811
- Investigate memory leaks in node/npm processes
- Consider adding swap space (currently 0B)
Warnings
⚠️ Storage Volume Warnings
Repeated warnings about missing logical volumes:
pve/thin1- not foundpve/data- not found
Frequency: Every 10-20 seconds (pvestatd polling)
Analysis:
- These are likely false positives - Proxmox is checking for volumes that don't exist on this host
- The host uses different storage pools (thin1-r630-02, thin2, thin3, etc.)
- Not critical, but creates log noise
Recommendation:
- Can be safely ignored, or configure pvestatd to exclude these volumes
⚠️ Subscription Check Failures
DNS resolution failures when checking subscription:
- Jan 14 03:51:20 - DNS lookup failure
- Jan 15 05:53:54 - DNS lookup failure
- Jan 16 05:49:20 - DNS lookup failure
- Jan 17 03:38:58 - DNS lookup failure
- Jan 18 04:34:20 - DNS lookup failure
Analysis:
- Proxmox trying to check subscription status
- DNS resolution failing (likely network/DNS configuration issue)
- Non-critical - subscription check is optional
Recommendation:
- Fix DNS configuration if subscription checks are needed
- Or disable subscription checks if not using Proxmox subscription
Non-Critical Issues
ℹ️ ACPI/IPMI Errors (Boot-time)
Errors during system boot:
ACPI Error: AE_NOT_EXIST, Returned by Handler for [IPMI]ACPI Error: Region IPMI (ID=7) has no handlerscsi 0:0:32:0: Wrong diagnostic page
Analysis:
- Hardware/firmware related errors during boot
- System boots successfully despite errors
- Common on Dell servers with IPMI
Recommendation:
- Can be ignored unless IPMI functionality is needed
- May be resolved with BIOS/firmware updates
ℹ️ Corosync Connection Errors (Boot-time)
Errors during Proxmox cluster initialization:
quorum_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)cmap_initialize failed: CS_ERR_LIBRARYcpg_initialize failed: CS_ERR_LIBRARY
Analysis:
- Occurred during boot on Jan 13 10:47:38
- Cluster eventually initialized successfully
- Current status: Cluster is quorate and operational
Recommendation:
- Normal during boot - cluster needs time to establish connections
- No action needed if cluster is currently operational
ℹ️ Container Configuration File Error
Jan 17 23:40:08:
Configuration file 'nodes/r630-02/lxc/7810.conf' does not exist- Error during container start attempt
Analysis:
- Temporary issue - container 7810 (mim-web-1) is currently running
- May have been during a migration or configuration change
- Resolved - container is now operational
Recommendation:
- No action needed if container is currently running
Positive Findings
✅ Proxmox Services
All Proxmox services running:
- pve-cluster: ✅ Active (running) - 5 days uptime
- pvedaemon: ✅ Active (running) - 5 days uptime
- pveproxy: ✅ Active (running) - 5 days uptime
Status: All services healthy and operational
✅ System Health
- Uptime: 5 days, 14 hours (since Jan 13 10:47)
- Load Average: 6.97, 6.67, 6.40 (moderate for 56 CPU threads)
- No disk I/O errors: ✅
- No failed systemd services: ✅
- Cluster status: Quorate and operational
✅ Network
- No network-related errors in recent logs
- Docker overlayfs warnings are normal for Docker containers
- Bridge operations appear normal
✅ Authentication
- Recent SSH logins from 192.168.11.4 (expected management access)
- No unauthorized access attempts detected
- Public key authentication working correctly
System Information
Uptime
- Current: 5 days, 14 hours, 10 minutes
- Last Boot: Tue Jan 13 10:47:39 PST
- Previous Boot: Thu Jan 1 12:35 (11+ days uptime before reboot)
Load Average
- 1 minute: 6.97
- 5 minutes: 6.67
- 15 minutes: 6.40
Analysis: Moderate load for a system with 56 CPU threads (28 cores × 2 sockets)
Recommendations
🔴 Immediate Actions (Critical)
-
Fix OOM Issues:
- Review memory limits for all containers, especially 7811 (mim-api-1)
- Increase memory allocation for containers experiencing OOM kills
- Investigate memory leaks in node/npm processes
- Consider adding swap space (currently 0B)
-
Monitor Container 7811:
- Check current memory usage and limits
- Review application memory requirements
- Consider increasing memory limit or optimizing application
⚠️ Short-term Actions
-
Reduce Log Noise:
- Configure pvestatd to exclude non-existent volumes (pve/thin1, pve/data)
- Or suppress these specific warnings
-
DNS Configuration:
- Fix DNS resolution if subscription checks are needed
- Or disable subscription checks if not using Proxmox subscription
ℹ️ Long-term Actions
-
Memory Management:
- Implement memory monitoring and alerting
- Review and optimize container memory allocations
- Consider memory limits based on actual usage patterns
-
Hardware:
- Consider BIOS/firmware updates to resolve ACPI/IPMI errors (if IPMI needed)
- Monitor for any hardware-related issues
Log Review Commands
For detailed log investigation, use:
# SSH to host
ssh root@192.168.11.12
# Follow all logs
journalctl -f
# View last 100 errors
journalctl -p err -n 100
# Follow Proxmox cluster logs
journalctl -u pve-cluster -f
# View OOM kills
journalctl | grep -i "oom\|out of memory\|killed process"
# View recent kernel messages
dmesg | tail -100
# Check container memory limits
pct config <VMID> | grep memory
Conclusion
The host is operational with all critical services running. However, memory management issues are causing frequent OOM kills, particularly in container 7811 (mim-api-1). This should be addressed immediately to ensure container stability and prevent service interruptions.
Priority Actions:
- 🔴 URGENT: Address OOM kills in container 7811
- ⚠️ HIGH: Review memory limits for all containers
- ⚠️ MEDIUM: Reduce log noise from storage warnings
- ℹ️ LOW: Fix DNS for subscription checks (if needed)
Review completed: 2026-01-19
Next review recommended: After addressing OOM issues