Files
proxmox/rpc-translator-138/RPC_STABILITY_REPORT.md
defiQUG cb47cce074 Complete markdown files cleanup and organization
- Organized 252 files across project
- Root directory: 187 → 2 files (98.9% reduction)
- Moved configuration guides to docs/04-configuration/
- Moved troubleshooting guides to docs/09-troubleshooting/
- Moved quick start guides to docs/01-getting-started/
- Moved reports to reports/ directory
- Archived temporary files
- Generated comprehensive reports and documentation
- Created maintenance scripts and guides

All files organized according to established standards.
2026-01-06 01:46:25 -08:00

12 KiB

RPC Stability Report - rpc.public-0138.defi-oracle.io

Date: 2026-01-05
Time: 09:30 UTC (Updated)
Endpoint: https://rpc.public-0138.defi-oracle.io


Executive Summary

⚠️ Overall Status: FUNCTIONAL with significant Cloudflare tunnel instability

The RPC endpoint infrastructure is healthy and all services are operating correctly. However, the public-facing endpoint experiences frequent 502 errors due to Cloudflare tunnel connectivity issues. Local access works perfectly (100% success rate), confirming the issue is with the Cloudflare tunnel, not the application stack.

Key Findings:

  • All services healthy and stable
  • Local access: 100% success rate
  • ⚠️ Public HTTPS: 40-60% success rate (intermittent 502 errors)
  • Response times: Excellent (~0.17s average)
  • All RPC methods functional when requests succeed

Service Status

RPC Translator Service

  • Status: Active (running)
  • Uptime: ~2h 15min (estimated)
  • Memory: 38.9M / 2.0G limit
  • PID: 17432
  • Location: /opt/rpc-translator-138
  • Health: Excellent - processing all requests successfully

Besu RPC Service

  • Status: Active (running)
  • Uptime: ~2h 30min (estimated)
  • Memory: 4.0G
  • PID: 16902
  • Block Height: ~603,043+ (synchronized)
  • Peers: 11 connected
  • Health: Excellent - blocks importing normally

Nginx Service

  • Status: Active (running)
  • Uptime: 3+ days
  • Memory: 30.3M
  • Workers: 4 active
  • Health: Excellent - proxying correctly

System Health

Resource Usage

  • Disk: 3% used (182GB available) Excellent
  • Memory: 4.2GB used / 16GB total (11GB available) Healthy
  • Load Average: 10.47, 9.39, 9.45 ⚠️ High but manageable
  • CPU: Normal usage patterns

System Uptime

  • Uptime: 3+ days, 10+ hours
  • Status: Stable and reliable

RPC Method Testing Results

Verified Working Methods

Method Status Sample Result Notes
eth_chainId Working 0x8a (138) Consistent when requests succeed
eth_blockNumber Working 0x933d1 (~603,249) Returns current block
net_version Working 138 Correct chain ID
eth_syncing Working Sync status Returns false when synced
eth_gasPrice Working Gas price Returns current gas price
eth_getBalance Working Balance Returns account balance
eth_call Working Call result Executes contract calls

⚠️ Known Issues

  • WebSocket Endpoint: Returns 502 (not configured for WebSocket upgrade)

    • Impact: Low - HTTP-only endpoint expected
    • Action: Configure WebSocket upgrade if needed
  • Intermittent 502 Errors: Frequent Cloudflare tunnel failures

    • Impact: Medium - Affects 40-60% of public requests
    • Action: Investigate Cloudflare tunnel configuration

Performance Metrics

Response Times (Successful Requests)

  • Average: 0.167 seconds
  • Min: ~0.15 seconds
  • Max: ~0.20 seconds
  • Status: Excellent - Well within acceptable range for RPC calls

Success Rate Analysis

  • Local Access (Direct to Translator): 100%

    • Port 9545: All requests succeed
    • Response: Valid JSON-RPC responses
  • Local Access (Direct to Besu): 100%

    • Port 8545: All requests succeed
    • Response: Valid JSON-RPC responses
  • Public HTTPS (via Cloudflare): 40-60% ⚠️

    • Intermittent 502 errors
    • Pattern: Random failures, not time-based
    • Root cause: Cloudflare tunnel connectivity

Test Results Summary

Latest Test Run (20 requests):

  • Success: ~8-12 requests (40-60%)
  • Failed: ~8-12 requests (40-60%)
  • Error: "502 Bad Gateway" from Cloudflare

Log Analysis

RPC Translator Logs (Last 10 minutes)

  • All requests processed successfully
  • No errors or exceptions
  • No warnings or fatal errors
  • Methods handled: eth_chainId, eth_blockNumber, eth_syncing, net_version, eth_call, eth_getBalance, eth_gasPrice
  • Request tracking: UUID-based logging working correctly

Besu Logs (Last 10 minutes)

  • Blocks importing normally
  • No errors or warnings
  • Network synchronized (11 peers)
  • Block height progressing: ~603,043+
  • Transaction processing: Normal

Nginx Logs

  • No errors in recent logs
  • Requests proxied successfully
  • No connection errors
  • Worker processes healthy

Connectivity Tests

Local Access (Direct to Translator)

curl -X POST http://127.0.0.1:9545 \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
  • Status: Working perfectly
  • Success Rate: 100%
  • Response: Valid JSON-RPC responses
  • Response Time: <0.1s

Local Access (Direct to Besu)

curl -X POST http://127.0.0.1:8545 \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
  • Status: Working perfectly
  • Success Rate: 100%
  • Response: Valid JSON-RPC responses
  • Response Time: <0.1s

Public HTTPS (via Cloudflare)

curl -X POST https://rpc.public-0138.defi-oracle.io \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
  • ⚠️ Status: Intermittent
  • ⚠️ Success Rate: 40-60%
  • ⚠️ Response: Sometimes 502, sometimes valid JSON
  • Response Time: ~0.17s (when successful)

Identified Issues

1. ⚠️ Intermittent Cloudflare 502 Errors (CRITICAL)

Severity: Medium-High
Impact: 40-60% of public requests fail
Root Cause: Cloudflare tunnel connection issues
Status: Infrastructure issue, not application issue

Evidence:

  • Local access works 100% (both translator and Besu)
  • Public access works only 40-60%
  • Errors are consistent "502 Bad Gateway" from Cloudflare
  • Pattern: Random failures, not correlated with time or load
  • Response times are good when requests succeed

Possible Causes:

  1. Cloudflare tunnel connection pool exhaustion
  2. Tunnel timeout settings too aggressive
  3. Network latency between Cloudflare edge and origin
  4. Tunnel configuration issues
  5. Cloudflare edge caching issues

Recommended Actions:

  1. Check Cloudflare tunnel status in dashboard
  2. Review tunnel configuration and timeout settings
  3. Monitor tunnel connection metrics
  4. Consider increasing tunnel connection pool size
  5. Implement client-side retry logic as workaround

2. ⚠️ WebSocket Not Supported (LOW PRIORITY)

Severity: Low
Impact: WebSocket connections fail
Root Cause: Not configured for WebSocket upgrade
Status: Expected behavior (HTTP-only endpoint)

Action Required: Only if WebSocket support is needed

  • Configure Nginx for WebSocket upgrade
  • Update RPC Translator to handle WebSocket connections
  • Test WebSocket endpoint functionality

Recommendations

Immediate Actions (Priority: High)

  1. ⚠️ Investigate Cloudflare Tunnel - Check tunnel health and configuration

    • Review Cloudflare dashboard for tunnel errors
    • Check tunnel connection pool settings
    • Verify tunnel timeout configurations
    • Monitor tunnel metrics for patterns
  2. Implement Client-Side Retry Logic - Workaround for 502 errors

    • Add exponential backoff retry logic
    • Retry failed requests up to 3 times
    • Log retry attempts for monitoring
  3. ⚠️ Set Up Monitoring/Alerting - Track 502 error rates

    • Alert when 502 rate exceeds 30%
    • Monitor success rate trends
    • Track response time patterns

Short-term Improvements (Priority: Medium)

  1. Health Check Endpoint - Implement /health endpoint

    • Check translator service status
    • Check Besu connection
    • Return service health status
  2. Load Testing - Understand capacity limits

    • Test concurrent request handling
    • Identify bottleneck points
    • Measure performance under load
  3. Error Logging Enhancement - Better error tracking

    • Log all 502 errors with context
    • Track error patterns and timing
    • Correlate errors with system metrics

Long-term Improvements (Priority: Low)

  1. Multiple Tunnel Endpoints - Redundancy for Cloudflare

    • Set up secondary tunnel endpoint
    • Load balance between tunnels
    • Automatic failover
  2. Direct Connection Option - Bypass Cloudflare for critical clients

    • Provide direct IP access for trusted clients
    • VPN or private network access
    • Alternative routing paths
  3. WebSocket Support - If needed for real-time features

    • Configure Nginx WebSocket upgrade
    • Update translator for WebSocket
    • Test and validate WebSocket functionality

Verification Commands

Test RPC Endpoint

# Single request test
curl -X POST https://rpc.public-0138.defi-oracle.io \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'

# Multiple requests test
for i in {1..10}; do
  curl -s -X POST https://rpc.public-0138.defi-oracle.io \
    -H 'Content-Type: application/json' \
    -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' \
    | grep -q '"result":"0x8a"' && echo "✅ Request $i: Success" || echo "❌ Request $i: Failed"
  sleep 0.2
done

Check Service Status

# RPC Translator
ssh root@192.168.11.10 "pct exec 2400 -- systemctl status rpc-translator-138"

# Besu RPC
ssh root@192.168.11.10 "pct exec 2400 -- systemctl status besu-rpc"

# Nginx
ssh root@192.168.11.10 "pct exec 2400 -- systemctl status nginx"

Check Logs

# RPC Translator logs (last 10 minutes)
ssh root@192.168.11.10 "pct exec 2400 -- journalctl -u rpc-translator-138 --since '10 minutes ago'"

# Besu logs (last 10 minutes)
ssh root@192.168.11.10 "pct exec 2400 -- journalctl -u besu-rpc --since '10 minutes ago'"

# Check for errors
ssh root@192.168.11.10 "pct exec 2400 -- journalctl -u rpc-translator-138 --since '10 minutes ago' | grep -iE '(error|warn|fatal)'"

Test Local Access

# Direct to translator
ssh root@192.168.11.10 "pct exec 2400 -- curl -X POST http://127.0.0.1:9545 -H 'Content-Type: application/json' -d '{\"jsonrpc\":\"2.0\",\"method\":\"eth_chainId\",\"params\":[],\"id\":1}'"

# Direct to Besu
ssh root@192.168.11.10 "pct exec 2400 -- curl -X POST http://127.0.0.1:8545 -H 'Content-Type: application/json' -d '{\"jsonrpc\":\"2.0\",\"method\":\"eth_chainId\",\"params\":[],\"id\":1}'"

Conclusion

The RPC endpoint infrastructure is stable and functional. All core services (RPC Translator, Besu, Nginx) are healthy and operating correctly. The application stack is production-ready.

However, the Cloudflare tunnel is experiencing significant instability, causing 40-60% of public requests to fail with 502 errors. This is a Cloudflare infrastructure issue, not an application problem, as evidenced by 100% success rate on local access.

Overall Assessment:

  • Infrastructure: STABLE - All services healthy
  • ⚠️ Public Access: UNSTABLE - Cloudflare tunnel issues
  • Functionality: WORKING - All RPC methods functional
  • Performance: EXCELLENT - Fast response times

Recommendation:

  • For Production Use: Implement client-side retry logic to handle 502 errors
  • For Long-term: Investigate and resolve Cloudflare tunnel stability issues
  • For Monitoring: Set up alerts for 502 error rates exceeding 30%

Change Log

2026-01-05 09:30 UTC:

  • Updated stability metrics based on latest test run
  • Refined success rate analysis (40-60% public access)
  • Added detailed issue analysis and recommendations
  • Enhanced verification commands section
  • Updated conclusion with actionable recommendations

2026-01-05 09:15 UTC:

  • Initial stability report created
  • Baseline metrics established
  • Service status documented

Next Review: Monitor for 24 hours to assess Cloudflare tunnel stability patterns and update recommendations accordingly.