Some checks failed
Test / test (push) Has been cancelled
Co-authored-by: Cursor <cursoragent@cursor.com>
5.3 KiB
5.3 KiB
Common Issues and Solutions
This document covers frequently encountered problems and their solutions.
Proxmox Issues
Cannot Connect to Proxmox Web UI
Symptoms:
- Browser shows connection error
- SSL certificate warning
Solutions:
- Verify IP address and port (default: 8006)
- Accept self-signed certificate in browser
- Check firewall rules:
iptables -L -n - Verify Proxmox service:
systemctl status pveproxy
VM Won't Start
Symptoms:
- VM shows as stopped
- Error messages in logs
Solutions:
- Check VM configuration:
qm config <vmid> - Verify storage availability:
pvesm status - Check resource limits:
pvesh get /nodes/<node>/status - Review VM logs:
journalctl -u qemu-server@<vmid>
Cluster Issues
Symptoms:
- Nodes not showing in cluster
- Quorum errors
Solutions:
- Check cluster status:
pvecm status - Verify network connectivity between nodes
- Check cluster configuration:
cat /etc/pve/corosync.conf - Restart cluster services:
systemctl restart pve-cluster
Azure Arc Issues
Agent Not Connecting
Symptoms:
- Machine not appearing in Azure Portal
- Connection errors in logs
Solutions:
- Check agent status:
azcmagent status - Verify network connectivity to Azure:
curl -v https://management.azure.com - Check agent logs:
journalctl -u himdsd -f - Re-register agent:
azcmagent connect --resource-group <rg> --tenant-id <tenant>
Policy Not Applying
Symptoms:
- Policies not showing as compliant
- Assignment errors
Solutions:
- Verify agent is connected:
azcmagent status - Check policy assignment in Azure Portal
- Review policy logs:
azcmagent show - Re-assign policies if needed
Kubernetes Issues
Pods Not Starting
Symptoms:
- Pods in Pending or CrashLoopBackOff state
- Resource errors
Solutions:
- Check pod status:
kubectl describe pod <pod-name> - Check node resources:
kubectl top nodes - Review pod logs:
kubectl logs <pod-name> - Check events:
kubectl get events --sort-by='.lastTimestamp'
Services Not Accessible
Symptoms:
- Cannot reach service endpoints
- Connection timeouts
Solutions:
- Check service configuration:
kubectl get svc <service-name> -o yaml - Verify endpoints:
kubectl get endpoints <service-name> - Check ingress configuration:
kubectl get ingress - Test from within cluster:
kubectl run test --image=busybox --rm -it -- wget -O- <service-url>
Network Issues
VLAN Not Working
Symptoms:
- VMs cannot communicate on VLAN
- Network isolation not working
Solutions:
- Verify VLAN configuration:
cat /etc/network/interfaces - Check bridge configuration:
ip link show - Verify VLAN tagging:
qm config <vmid> | grep net - Test VLAN connectivity:
ping <vlan-ip>
DNS Resolution Issues
Symptoms:
- Cannot resolve hostnames
- Service discovery not working
Solutions:
- Check DNS configuration:
cat /etc/resolv.conf - Test DNS resolution:
nslookup <hostname> - Verify CoreDNS in Kubernetes:
kubectl get pods -n kube-system | grep coredns - Check DNS service:
kubectl get svc kube-dns -n kube-system
Storage Issues
Storage Not Available
Symptoms:
- Cannot create VMs
- Storage errors
Solutions:
- Check storage status:
pvesm status - Verify storage mounts:
df -h - Check storage permissions:
ls -la /var/lib/vz/ - Review storage logs:
journalctl -u pvestatd
Performance Issues
Symptoms:
- Slow VM performance
- High I/O wait
Solutions:
- Check disk I/O:
iostat -x 1 - Verify storage type (SSD vs HDD)
- Check for disk errors:
dmesg | grep -i error - Consider storage optimization settings
Cloudflare Tunnel Issues
Tunnel Not Connecting
Symptoms:
- Services not accessible externally
- Tunnel errors in logs
Solutions:
- Check tunnel status:
cloudflared tunnel info - Verify tunnel token:
echo $CLOUDFLARE_TUNNEL_TOKEN - Check tunnel logs:
journalctl -u cloudflared -f - Test tunnel connection:
cloudflared tunnel run <tunnel-name>
Zero Trust Not Working
Symptoms:
- Access policies not applying
- SSO not working
Solutions:
- Verify Zero Trust configuration in Cloudflare Dashboard
- Check policy rules and conditions
- Review access logs in Cloudflare Dashboard
- Test with different user accounts
General Troubleshooting Steps
- Check Logs: Always review relevant logs first
- Verify Configuration: Ensure all configuration files are correct
- Test Connectivity: Verify network connectivity between components
- Check Resources: Ensure sufficient CPU, memory, and storage
- Review Documentation: Check relevant documentation and runbooks
- Search Issues: Look for similar issues in logs or documentation
Getting Help
If you cannot resolve an issue:
- Review the relevant runbook in
docs/operations/runbooks/ - Check the troubleshooting guide for your specific component
- Review logs and error messages carefully
- Document the issue with steps to reproduce
- Check for known issues in the project repository