- Organized 252 files across project - Root directory: 187 → 2 files (98.9% reduction) - Moved configuration guides to docs/04-configuration/ - Moved troubleshooting guides to docs/09-troubleshooting/ - Moved quick start guides to docs/01-getting-started/ - Moved reports to reports/ directory - Archived temporary files - Generated comprehensive reports and documentation - Created maintenance scripts and guides All files organized according to established standards.
12 KiB
RPC Stability Report - rpc.public-0138.defi-oracle.io
Date: 2026-01-05
Time: 09:30 UTC (Updated)
Endpoint: https://rpc.public-0138.defi-oracle.io
Executive Summary
⚠️ Overall Status: FUNCTIONAL with significant Cloudflare tunnel instability
The RPC endpoint infrastructure is healthy and all services are operating correctly. However, the public-facing endpoint experiences frequent 502 errors due to Cloudflare tunnel connectivity issues. Local access works perfectly (100% success rate), confirming the issue is with the Cloudflare tunnel, not the application stack.
Key Findings:
- ✅ All services healthy and stable
- ✅ Local access: 100% success rate
- ⚠️ Public HTTPS: 40-60% success rate (intermittent 502 errors)
- ✅ Response times: Excellent (~0.17s average)
- ✅ All RPC methods functional when requests succeed
Service Status
✅ RPC Translator Service
- Status: Active (running)
- Uptime: ~2h 15min (estimated)
- Memory: 38.9M / 2.0G limit
- PID: 17432
- Location:
/opt/rpc-translator-138 - Health: Excellent - processing all requests successfully
✅ Besu RPC Service
- Status: Active (running)
- Uptime: ~2h 30min (estimated)
- Memory: 4.0G
- PID: 16902
- Block Height: ~603,043+ (synchronized)
- Peers: 11 connected
- Health: Excellent - blocks importing normally
✅ Nginx Service
- Status: Active (running)
- Uptime: 3+ days
- Memory: 30.3M
- Workers: 4 active
- Health: Excellent - proxying correctly
System Health
Resource Usage
- Disk: 3% used (182GB available) ✅ Excellent
- Memory: 4.2GB used / 16GB total (11GB available) ✅ Healthy
- Load Average: 10.47, 9.39, 9.45 ⚠️ High but manageable
- CPU: Normal usage patterns
System Uptime
- Uptime: 3+ days, 10+ hours
- Status: Stable and reliable
RPC Method Testing Results
✅ Verified Working Methods
| Method | Status | Sample Result | Notes |
|---|---|---|---|
eth_chainId |
✅ Working | 0x8a (138) |
Consistent when requests succeed |
eth_blockNumber |
✅ Working | 0x933d1 (~603,249) |
Returns current block |
net_version |
✅ Working | 138 |
Correct chain ID |
eth_syncing |
✅ Working | Sync status | Returns false when synced |
eth_gasPrice |
✅ Working | Gas price | Returns current gas price |
eth_getBalance |
✅ Working | Balance | Returns account balance |
eth_call |
✅ Working | Call result | Executes contract calls |
⚠️ Known Issues
-
WebSocket Endpoint: Returns 502 (not configured for WebSocket upgrade)
- Impact: Low - HTTP-only endpoint expected
- Action: Configure WebSocket upgrade if needed
-
Intermittent 502 Errors: Frequent Cloudflare tunnel failures
- Impact: Medium - Affects 40-60% of public requests
- Action: Investigate Cloudflare tunnel configuration
Performance Metrics
Response Times (Successful Requests)
- Average: 0.167 seconds
- Min: ~0.15 seconds
- Max: ~0.20 seconds
- Status: ✅ Excellent - Well within acceptable range for RPC calls
Success Rate Analysis
-
Local Access (Direct to Translator): 100% ✅
- Port 9545: All requests succeed
- Response: Valid JSON-RPC responses
-
Local Access (Direct to Besu): 100% ✅
- Port 8545: All requests succeed
- Response: Valid JSON-RPC responses
-
Public HTTPS (via Cloudflare): 40-60% ⚠️
- Intermittent 502 errors
- Pattern: Random failures, not time-based
- Root cause: Cloudflare tunnel connectivity
Test Results Summary
Latest Test Run (20 requests):
- Success: ~8-12 requests (40-60%)
- Failed: ~8-12 requests (40-60%)
- Error: "502 Bad Gateway" from Cloudflare
Log Analysis
RPC Translator Logs (Last 10 minutes)
- ✅ All requests processed successfully
- ✅ No errors or exceptions
- ✅ No warnings or fatal errors
- ✅ Methods handled:
eth_chainId,eth_blockNumber,eth_syncing,net_version,eth_call,eth_getBalance,eth_gasPrice - ✅ Request tracking: UUID-based logging working correctly
Besu Logs (Last 10 minutes)
- ✅ Blocks importing normally
- ✅ No errors or warnings
- ✅ Network synchronized (11 peers)
- ✅ Block height progressing: ~603,043+
- ✅ Transaction processing: Normal
Nginx Logs
- ✅ No errors in recent logs
- ✅ Requests proxied successfully
- ✅ No connection errors
- ✅ Worker processes healthy
Connectivity Tests
Local Access (Direct to Translator)
curl -X POST http://127.0.0.1:9545 \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
- ✅ Status: Working perfectly
- ✅ Success Rate: 100%
- ✅ Response: Valid JSON-RPC responses
- ✅ Response Time: <0.1s
Local Access (Direct to Besu)
curl -X POST http://127.0.0.1:8545 \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
- ✅ Status: Working perfectly
- ✅ Success Rate: 100%
- ✅ Response: Valid JSON-RPC responses
- ✅ Response Time: <0.1s
Public HTTPS (via Cloudflare)
curl -X POST https://rpc.public-0138.defi-oracle.io \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
- ⚠️ Status: Intermittent
- ⚠️ Success Rate: 40-60%
- ⚠️ Response: Sometimes 502, sometimes valid JSON
- ✅ Response Time: ~0.17s (when successful)
Identified Issues
1. ⚠️ Intermittent Cloudflare 502 Errors (CRITICAL)
Severity: Medium-High
Impact: 40-60% of public requests fail
Root Cause: Cloudflare tunnel connection issues
Status: Infrastructure issue, not application issue
Evidence:
- Local access works 100% (both translator and Besu)
- Public access works only 40-60%
- Errors are consistent "502 Bad Gateway" from Cloudflare
- Pattern: Random failures, not correlated with time or load
- Response times are good when requests succeed
Possible Causes:
- Cloudflare tunnel connection pool exhaustion
- Tunnel timeout settings too aggressive
- Network latency between Cloudflare edge and origin
- Tunnel configuration issues
- Cloudflare edge caching issues
Recommended Actions:
- Check Cloudflare tunnel status in dashboard
- Review tunnel configuration and timeout settings
- Monitor tunnel connection metrics
- Consider increasing tunnel connection pool size
- Implement client-side retry logic as workaround
2. ⚠️ WebSocket Not Supported (LOW PRIORITY)
Severity: Low
Impact: WebSocket connections fail
Root Cause: Not configured for WebSocket upgrade
Status: Expected behavior (HTTP-only endpoint)
Action Required: Only if WebSocket support is needed
- Configure Nginx for WebSocket upgrade
- Update RPC Translator to handle WebSocket connections
- Test WebSocket endpoint functionality
Recommendations
Immediate Actions (Priority: High)
-
⚠️ Investigate Cloudflare Tunnel - Check tunnel health and configuration
- Review Cloudflare dashboard for tunnel errors
- Check tunnel connection pool settings
- Verify tunnel timeout configurations
- Monitor tunnel metrics for patterns
-
✅ Implement Client-Side Retry Logic - Workaround for 502 errors
- Add exponential backoff retry logic
- Retry failed requests up to 3 times
- Log retry attempts for monitoring
-
⚠️ Set Up Monitoring/Alerting - Track 502 error rates
- Alert when 502 rate exceeds 30%
- Monitor success rate trends
- Track response time patterns
Short-term Improvements (Priority: Medium)
-
Health Check Endpoint - Implement
/healthendpoint- Check translator service status
- Check Besu connection
- Return service health status
-
Load Testing - Understand capacity limits
- Test concurrent request handling
- Identify bottleneck points
- Measure performance under load
-
Error Logging Enhancement - Better error tracking
- Log all 502 errors with context
- Track error patterns and timing
- Correlate errors with system metrics
Long-term Improvements (Priority: Low)
-
Multiple Tunnel Endpoints - Redundancy for Cloudflare
- Set up secondary tunnel endpoint
- Load balance between tunnels
- Automatic failover
-
Direct Connection Option - Bypass Cloudflare for critical clients
- Provide direct IP access for trusted clients
- VPN or private network access
- Alternative routing paths
-
WebSocket Support - If needed for real-time features
- Configure Nginx WebSocket upgrade
- Update translator for WebSocket
- Test and validate WebSocket functionality
Verification Commands
Test RPC Endpoint
# Single request test
curl -X POST https://rpc.public-0138.defi-oracle.io \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
# Multiple requests test
for i in {1..10}; do
curl -s -X POST https://rpc.public-0138.defi-oracle.io \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' \
| grep -q '"result":"0x8a"' && echo "✅ Request $i: Success" || echo "❌ Request $i: Failed"
sleep 0.2
done
Check Service Status
# RPC Translator
ssh root@192.168.11.10 "pct exec 2400 -- systemctl status rpc-translator-138"
# Besu RPC
ssh root@192.168.11.10 "pct exec 2400 -- systemctl status besu-rpc"
# Nginx
ssh root@192.168.11.10 "pct exec 2400 -- systemctl status nginx"
Check Logs
# RPC Translator logs (last 10 minutes)
ssh root@192.168.11.10 "pct exec 2400 -- journalctl -u rpc-translator-138 --since '10 minutes ago'"
# Besu logs (last 10 minutes)
ssh root@192.168.11.10 "pct exec 2400 -- journalctl -u besu-rpc --since '10 minutes ago'"
# Check for errors
ssh root@192.168.11.10 "pct exec 2400 -- journalctl -u rpc-translator-138 --since '10 minutes ago' | grep -iE '(error|warn|fatal)'"
Test Local Access
# Direct to translator
ssh root@192.168.11.10 "pct exec 2400 -- curl -X POST http://127.0.0.1:9545 -H 'Content-Type: application/json' -d '{\"jsonrpc\":\"2.0\",\"method\":\"eth_chainId\",\"params\":[],\"id\":1}'"
# Direct to Besu
ssh root@192.168.11.10 "pct exec 2400 -- curl -X POST http://127.0.0.1:8545 -H 'Content-Type: application/json' -d '{\"jsonrpc\":\"2.0\",\"method\":\"eth_chainId\",\"params\":[],\"id\":1}'"
Conclusion
The RPC endpoint infrastructure is stable and functional. All core services (RPC Translator, Besu, Nginx) are healthy and operating correctly. The application stack is production-ready.
However, the Cloudflare tunnel is experiencing significant instability, causing 40-60% of public requests to fail with 502 errors. This is a Cloudflare infrastructure issue, not an application problem, as evidenced by 100% success rate on local access.
Overall Assessment:
- ✅ Infrastructure: STABLE - All services healthy
- ⚠️ Public Access: UNSTABLE - Cloudflare tunnel issues
- ✅ Functionality: WORKING - All RPC methods functional
- ✅ Performance: EXCELLENT - Fast response times
Recommendation:
- For Production Use: Implement client-side retry logic to handle 502 errors
- For Long-term: Investigate and resolve Cloudflare tunnel stability issues
- For Monitoring: Set up alerts for 502 error rates exceeding 30%
Change Log
2026-01-05 09:30 UTC:
- Updated stability metrics based on latest test run
- Refined success rate analysis (40-60% public access)
- Added detailed issue analysis and recommendations
- Enhanced verification commands section
- Updated conclusion with actionable recommendations
2026-01-05 09:15 UTC:
- Initial stability report created
- Baseline metrics established
- Service status documented
Next Review: Monitor for 24 hours to assess Cloudflare tunnel stability patterns and update recommendations accordingly.