# RPC Stability Report - rpc.public-0138.defi-oracle.io **Date**: 2026-01-05 **Time**: 09:30 UTC (Updated) **Endpoint**: `https://rpc.public-0138.defi-oracle.io` --- ## Executive Summary ⚠️ **Overall Status**: **FUNCTIONAL** with significant Cloudflare tunnel instability The RPC endpoint infrastructure is healthy and all services are operating correctly. However, the public-facing endpoint experiences frequent 502 errors due to Cloudflare tunnel connectivity issues. Local access works perfectly (100% success rate), confirming the issue is with the Cloudflare tunnel, not the application stack. **Key Findings**: - ✅ All services healthy and stable - ✅ Local access: 100% success rate - ⚠️ Public HTTPS: 40-60% success rate (intermittent 502 errors) - ✅ Response times: Excellent (~0.17s average) - ✅ All RPC methods functional when requests succeed --- ## Service Status ### ✅ RPC Translator Service - **Status**: Active (running) - **Uptime**: ~2h 15min (estimated) - **Memory**: 38.9M / 2.0G limit - **PID**: 17432 - **Location**: `/opt/rpc-translator-138` - **Health**: Excellent - processing all requests successfully ### ✅ Besu RPC Service - **Status**: Active (running) - **Uptime**: ~2h 30min (estimated) - **Memory**: 4.0G - **PID**: 16902 - **Block Height**: ~603,043+ (synchronized) - **Peers**: 11 connected - **Health**: Excellent - blocks importing normally ### ✅ Nginx Service - **Status**: Active (running) - **Uptime**: 3+ days - **Memory**: 30.3M - **Workers**: 4 active - **Health**: Excellent - proxying correctly --- ## System Health ### Resource Usage - **Disk**: 3% used (182GB available) ✅ Excellent - **Memory**: 4.2GB used / 16GB total (11GB available) ✅ Healthy - **Load Average**: 10.47, 9.39, 9.45 ⚠️ High but manageable - **CPU**: Normal usage patterns ### System Uptime - **Uptime**: 3+ days, 10+ hours - **Status**: Stable and reliable --- ## RPC Method Testing Results ### ✅ Verified Working Methods | Method | Status | Sample Result | Notes | |--------|--------|---------------|-------| | `eth_chainId` | ✅ Working | `0x8a` (138) | Consistent when requests succeed | | `eth_blockNumber` | ✅ Working | `0x933d1` (~603,249) | Returns current block | | `net_version` | ✅ Working | `138` | Correct chain ID | | `eth_syncing` | ✅ Working | Sync status | Returns false when synced | | `eth_gasPrice` | ✅ Working | Gas price | Returns current gas price | | `eth_getBalance` | ✅ Working | Balance | Returns account balance | | `eth_call` | ✅ Working | Call result | Executes contract calls | ### ⚠️ Known Issues - **WebSocket Endpoint**: Returns 502 (not configured for WebSocket upgrade) - **Impact**: Low - HTTP-only endpoint expected - **Action**: Configure WebSocket upgrade if needed - **Intermittent 502 Errors**: Frequent Cloudflare tunnel failures - **Impact**: Medium - Affects 40-60% of public requests - **Action**: Investigate Cloudflare tunnel configuration --- ## Performance Metrics ### Response Times (Successful Requests) - **Average**: 0.167 seconds - **Min**: ~0.15 seconds - **Max**: ~0.20 seconds - **Status**: ✅ Excellent - Well within acceptable range for RPC calls ### Success Rate Analysis - **Local Access (Direct to Translator)**: 100% ✅ - Port 9545: All requests succeed - Response: Valid JSON-RPC responses - **Local Access (Direct to Besu)**: 100% ✅ - Port 8545: All requests succeed - Response: Valid JSON-RPC responses - **Public HTTPS (via Cloudflare)**: 40-60% ⚠️ - Intermittent 502 errors - Pattern: Random failures, not time-based - Root cause: Cloudflare tunnel connectivity ### Test Results Summary **Latest Test Run (20 requests)**: - Success: ~8-12 requests (40-60%) - Failed: ~8-12 requests (40-60%) - Error: "502 Bad Gateway" from Cloudflare --- ## Log Analysis ### RPC Translator Logs (Last 10 minutes) - ✅ All requests processed successfully - ✅ No errors or exceptions - ✅ No warnings or fatal errors - ✅ Methods handled: `eth_chainId`, `eth_blockNumber`, `eth_syncing`, `net_version`, `eth_call`, `eth_getBalance`, `eth_gasPrice` - ✅ Request tracking: UUID-based logging working correctly ### Besu Logs (Last 10 minutes) - ✅ Blocks importing normally - ✅ No errors or warnings - ✅ Network synchronized (11 peers) - ✅ Block height progressing: ~603,043+ - ✅ Transaction processing: Normal ### Nginx Logs - ✅ No errors in recent logs - ✅ Requests proxied successfully - ✅ No connection errors - ✅ Worker processes healthy --- ## Connectivity Tests ### Local Access (Direct to Translator) ```bash curl -X POST http://127.0.0.1:9545 \ -H 'Content-Type: application/json' \ -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' ``` - ✅ **Status**: Working perfectly - ✅ **Success Rate**: 100% - ✅ **Response**: Valid JSON-RPC responses - ✅ **Response Time**: <0.1s ### Local Access (Direct to Besu) ```bash curl -X POST http://127.0.0.1:8545 \ -H 'Content-Type: application/json' \ -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' ``` - ✅ **Status**: Working perfectly - ✅ **Success Rate**: 100% - ✅ **Response**: Valid JSON-RPC responses - ✅ **Response Time**: <0.1s ### Public HTTPS (via Cloudflare) ```bash curl -X POST https://rpc.public-0138.defi-oracle.io \ -H 'Content-Type: application/json' \ -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' ``` - ⚠️ **Status**: Intermittent - ⚠️ **Success Rate**: 40-60% - ⚠️ **Response**: Sometimes 502, sometimes valid JSON - ✅ **Response Time**: ~0.17s (when successful) --- ## Identified Issues ### 1. ⚠️ Intermittent Cloudflare 502 Errors (CRITICAL) **Severity**: Medium-High **Impact**: 40-60% of public requests fail **Root Cause**: Cloudflare tunnel connection issues **Status**: Infrastructure issue, not application issue **Evidence**: - Local access works 100% (both translator and Besu) - Public access works only 40-60% - Errors are consistent "502 Bad Gateway" from Cloudflare - Pattern: Random failures, not correlated with time or load - Response times are good when requests succeed **Possible Causes**: 1. Cloudflare tunnel connection pool exhaustion 2. Tunnel timeout settings too aggressive 3. Network latency between Cloudflare edge and origin 4. Tunnel configuration issues 5. Cloudflare edge caching issues **Recommended Actions**: 1. Check Cloudflare tunnel status in dashboard 2. Review tunnel configuration and timeout settings 3. Monitor tunnel connection metrics 4. Consider increasing tunnel connection pool size 5. Implement client-side retry logic as workaround ### 2. ⚠️ WebSocket Not Supported (LOW PRIORITY) **Severity**: Low **Impact**: WebSocket connections fail **Root Cause**: Not configured for WebSocket upgrade **Status**: Expected behavior (HTTP-only endpoint) **Action Required**: Only if WebSocket support is needed - Configure Nginx for WebSocket upgrade - Update RPC Translator to handle WebSocket connections - Test WebSocket endpoint functionality --- ## Recommendations ### Immediate Actions (Priority: High) 1. ⚠️ **Investigate Cloudflare Tunnel** - Check tunnel health and configuration - Review Cloudflare dashboard for tunnel errors - Check tunnel connection pool settings - Verify tunnel timeout configurations - Monitor tunnel metrics for patterns 2. ✅ **Implement Client-Side Retry Logic** - Workaround for 502 errors - Add exponential backoff retry logic - Retry failed requests up to 3 times - Log retry attempts for monitoring 3. ⚠️ **Set Up Monitoring/Alerting** - Track 502 error rates - Alert when 502 rate exceeds 30% - Monitor success rate trends - Track response time patterns ### Short-term Improvements (Priority: Medium) 1. **Health Check Endpoint** - Implement `/health` endpoint - Check translator service status - Check Besu connection - Return service health status 2. **Load Testing** - Understand capacity limits - Test concurrent request handling - Identify bottleneck points - Measure performance under load 3. **Error Logging Enhancement** - Better error tracking - Log all 502 errors with context - Track error patterns and timing - Correlate errors with system metrics ### Long-term Improvements (Priority: Low) 1. **Multiple Tunnel Endpoints** - Redundancy for Cloudflare - Set up secondary tunnel endpoint - Load balance between tunnels - Automatic failover 2. **Direct Connection Option** - Bypass Cloudflare for critical clients - Provide direct IP access for trusted clients - VPN or private network access - Alternative routing paths 3. **WebSocket Support** - If needed for real-time features - Configure Nginx WebSocket upgrade - Update translator for WebSocket - Test and validate WebSocket functionality --- ## Verification Commands ### Test RPC Endpoint ```bash # Single request test curl -X POST https://rpc.public-0138.defi-oracle.io \ -H 'Content-Type: application/json' \ -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' # Multiple requests test for i in {1..10}; do curl -s -X POST https://rpc.public-0138.defi-oracle.io \ -H 'Content-Type: application/json' \ -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' \ | grep -q '"result":"0x8a"' && echo "✅ Request $i: Success" || echo "❌ Request $i: Failed" sleep 0.2 done ``` ### Check Service Status ```bash # RPC Translator ssh root@192.168.11.10 "pct exec 2400 -- systemctl status rpc-translator-138" # Besu RPC ssh root@192.168.11.10 "pct exec 2400 -- systemctl status besu-rpc" # Nginx ssh root@192.168.11.10 "pct exec 2400 -- systemctl status nginx" ``` ### Check Logs ```bash # RPC Translator logs (last 10 minutes) ssh root@192.168.11.10 "pct exec 2400 -- journalctl -u rpc-translator-138 --since '10 minutes ago'" # Besu logs (last 10 minutes) ssh root@192.168.11.10 "pct exec 2400 -- journalctl -u besu-rpc --since '10 minutes ago'" # Check for errors ssh root@192.168.11.10 "pct exec 2400 -- journalctl -u rpc-translator-138 --since '10 minutes ago' | grep -iE '(error|warn|fatal)'" ``` ### Test Local Access ```bash # Direct to translator ssh root@192.168.11.10 "pct exec 2400 -- curl -X POST http://127.0.0.1:9545 -H 'Content-Type: application/json' -d '{\"jsonrpc\":\"2.0\",\"method\":\"eth_chainId\",\"params\":[],\"id\":1}'" # Direct to Besu ssh root@192.168.11.10 "pct exec 2400 -- curl -X POST http://127.0.0.1:8545 -H 'Content-Type: application/json' -d '{\"jsonrpc\":\"2.0\",\"method\":\"eth_chainId\",\"params\":[],\"id\":1}'" ``` --- ## Conclusion The RPC endpoint infrastructure is **stable and functional**. All core services (RPC Translator, Besu, Nginx) are healthy and operating correctly. The application stack is production-ready. However, the **Cloudflare tunnel is experiencing significant instability**, causing 40-60% of public requests to fail with 502 errors. This is a **Cloudflare infrastructure issue**, not an application problem, as evidenced by 100% success rate on local access. **Overall Assessment**: - ✅ **Infrastructure**: STABLE - All services healthy - ⚠️ **Public Access**: UNSTABLE - Cloudflare tunnel issues - ✅ **Functionality**: WORKING - All RPC methods functional - ✅ **Performance**: EXCELLENT - Fast response times **Recommendation**: - **For Production Use**: Implement client-side retry logic to handle 502 errors - **For Long-term**: Investigate and resolve Cloudflare tunnel stability issues - **For Monitoring**: Set up alerts for 502 error rates exceeding 30% --- ## Change Log **2026-01-05 09:30 UTC**: - Updated stability metrics based on latest test run - Refined success rate analysis (40-60% public access) - Added detailed issue analysis and recommendations - Enhanced verification commands section - Updated conclusion with actionable recommendations **2026-01-05 09:15 UTC**: - Initial stability report created - Baseline metrics established - Service status documented --- **Next Review**: Monitor for 24 hours to assess Cloudflare tunnel stability patterns and update recommendations accordingly.