- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands - CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround - CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check - NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere - MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates - LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference Co-authored-by: Cursor <cursoragent@cursor.com>
15 KiB
Ingress Architecture Risks and Hardening
Last Updated: 2026-01-31
Document Version: 1.0
Status: Active Documentation
Date: 2026-01-20
Status: Complete Risk Assessment
Purpose: Identify risks and hardening opportunities for ingress architecture
Overview
This document identifies risks and hardening opportunities for the ingress architecture:
Cloudflare DNS → UDM Pro port-forward → NPMplus (reverse proxy + SSL termination) → Backend VMs/services (nginx or direct ports)
Scope: Identifies risks and provides hardening recommendations without breaking production.
Identified Risks
Risk 1: Single Point of Failure - NPMplus
Severity: High
Component: NPMplus (VMID 10233)
Status: Current
Description:
- NPMplus is a single reverse proxy container
- All ingress traffic depends on one container
- If NPMplus fails, all public-facing services become unavailable
Impact:
- Complete ingress outage if NPMplus container fails
- No redundancy or failover
- Single container failure affects all 19 domains
Mitigation (Current):
- Container is monitored and backed up
- Configuration is documented and can be restored
- Container is running on stable Proxmox host (r630-01)
Hardening Opportunities:
- ✅ HA Setup Guide Created: Complete guide available at
docs/04-configuration/NPMPLUS_HA_SETUP_GUIDE.md - Deploy HA NPMplus instance (active-passive with Keepalived)
- Set up automatic failover (Keepalived virtual IP)
- Document manual failover procedures (done in backup/restore guide)
Recommendation:
- Review and implement HA setup guide during next maintenance window
- Set up container health monitoring
- Regular backups (done in backup/restore guide)
HA Implementation: See docs/04-configuration/NPMPLUS_HA_SETUP_GUIDE.md for complete step-by-step instructions.
Risk 2: DNS-Only Mode (No Cloudflare Proxy/WAF)
Severity: Medium
Component: Cloudflare DNS
Status: Intentional Configuration
Description:
- All DNS records use "DNS Only" mode (gray cloud)
- No Cloudflare proxy, WAF, or DDoS protection
- Origin IPs (76.53.10.36) exposed directly
Impact:
- No DDoS protection from Cloudflare
- No WAF rules for application-layer attacks
- Origin IPs visible to attackers
- No CDN caching
Rationale (Intentional):
- Direct SSL termination at NPMplus required
- Cloudflare proxy would interfere with Let's Encrypt validation
- Allows direct control over SSL certificates
Hardening Opportunities (without breaking production):
-
Enable Cloudflare Access for Admin Portals:
- Add authentication layer for
dbis-admin.d-bis.org - Add authentication layer for
secure.d-bis.org - Does not require changing DNS proxy status
- Add authentication layer for
-
Implement Rate Limiting at NPMplus:
- Add rate limiting for RPC endpoints (especially public RPC)
- Configure rate limiting per IP or per domain
- Does not require changing DNS configuration
-
Monitor and Alert on Unusual Traffic:
- Set up log aggregation for NPMplus access logs
- Configure alerts for unusual traffic patterns
- Detect DDoS attempts early
Not in Scope (would require production changes):
- Enabling Cloudflare proxy (would require changing SSL termination)
- Changing to Cloudflare SSL (would require certificate changes)
Recommendation:
- Implement rate limiting for RPC endpoints
- Set up Cloudflare Access for admin portals
- Monitor traffic patterns and set up alerts
Risk 3: Certificate Expiration
Severity: Medium
Component: SSL Certificates
Status: Current
Description:
- All 19 SSL certificates expire on 2026-04-16
- Auto-renewal enabled but could fail
- Certificate failure would cause HTTPS outages
Impact:
- Services become inaccessible if certificates expire
- Browser warnings if certificates invalid
- All domains affected simultaneously (same expiration date)
Current Mitigation:
- Auto-renewal enabled in NPMplus
- Let's Encrypt handles renewal automatically
- Certificates valid until 2026-04-16
Hardening Opportunities (without breaking production):
-
Certificate Expiration Monitoring:
- Set up alerts 90/60/30 days before expiration
- Monitor certificate status via NPMplus API
- Alert if auto-renewal fails
-
Certificate Verification Scripts:
- Regular verification of certificate validity
- Automated checks for certificate expiration
- Integration with monitoring systems
Recommendation:
- Set up certificate expiration alerts
- Regular verification of certificate status
- Document manual renewal procedures (done in backup/restore guide)
Risk 4: Sankofa Routing Issue
Severity: High
Component: Backend Routing
Status: Known, Cutover Plan in Place
Description:
- 5 Sankofa domains route to Blockscout (192.168.11.140) but services not deployed
- Incorrect routing prevents Sankofa services from working
- Users may access wrong content
Impact:
- Sankofa domains don't work as intended
- Incorrect content served (Blockscout instead of Sankofa)
- SSL certificates exist but services not available
Current Status:
- Known issue documented
- Cutover plan created (see
SANKOFA_CUTOVER_PLAN.md) - Waiting for Sankofa service deployment
Mitigation:
- Cutover plan in place
- Will update routing once services deployed
- Temporary routing keeps domains accessible (though incorrect)
Recommendation:
- Complete Sankofa service deployment
- Execute cutover plan when services ready
- Update source-of-truth after cutover
Risk 5: UDM Pro Port Forwarding - Manual Configuration
Severity: Medium
Component: Edge Routing
Status: Current
Description:
- Port forwarding configured manually via UDM Pro web UI
- No automation or API access
- Risk of misconfiguration during changes
Impact:
- Manual errors during configuration changes
- No version control or audit trail
- Difficult to verify configuration matches documentation
Hardening Opportunities (without breaking production):
-
Document Exact Steps:
- Create detailed configuration guide
- Document exact values for port forwarding rules
- Create verification checklist
-
Verification Procedures:
- Regular verification of port forwarding rules
- Screenshot evidence of configuration
- Automated connectivity tests
Recommendation:
- Document exact port forwarding steps (done in verification runbook)
- Regular verification of configuration
- Screenshot evidence stored
Risk 6: Backend VM Direct Access (No Nginx)
Severity: Low-Medium
Component: Backend VMs
Status: Intentional Configuration
Description:
- Some VMs accessible directly (no nginx layer)
- Besu RPC nodes (2101, 2201) expose ports 8545/8546 directly
- Node.js APIs (10150, 10151) expose port 3000 directly
Impact:
- Direct exposure of application ports
- No additional security layer (nginx headers, rate limiting)
- Application-level security only
Rationale (Intentional):
- RPC services require direct access for performance
- Node.js APIs designed for direct exposure
- Nginx layer adds unnecessary complexity for these services
Hardening Opportunities (without breaking production):
-
Rate Limiting at NPMplus:
- Add rate limiting to RPC proxy hosts
- Configure rate limits per IP or globally
- Prevent abuse without adding nginx layer
-
Security Headers at NPMplus:
- Add security headers via NPMplus advanced config
- Configure CSP, X-Frame-Options, etc.
- Apply to all proxy hosts
-
Access Lists:
- Configure IP allowlists for private RPC endpoints
- Restrict access to authorized IPs only
- Use NPMplus access lists feature
Not in Scope (would require production changes):
- Adding nginx layer to all services
- Changing backend architecture
Recommendation:
- Add rate limiting for RPC endpoints at NPMplus
- Configure access lists for private RPC endpoints
- Add security headers via NPMplus advanced config
Risk 7: Internal TLS (Double TLS)
Severity: Low
Component: VMID 2400
Status: Current Configuration
Description:
- VMID 2400 (thirdweb-rpc-1) uses HTTPS internally (port 443)
- NPMplus terminates SSL, then proxies to HTTPS backend
- Results in double TLS termination (NPMplus → VMID 2400)
Impact:
- Additional complexity in certificate management
- Two SSL certificates required (NPMplus + VMID 2400)
- Potential performance overhead
Rationale (Documentation Needed):
- Need to document why this is required
- May be intentional for additional security
- Or legacy configuration that could be simplified
Hardening Opportunities (without breaking production):
-
Document Internal TLS Rationale:
- Document why VMID 2400 uses HTTPS internally
- Verify if internal TLS is necessary
- Document certificate management for internal TLS
-
Monitor Internal TLS Certificate Expiration:
- Track internal SSL certificate expiration
- Ensure internal certificates are renewed
- Avoid internal certificate expiration causing outages
Recommendation:
- Document why internal TLS is used
- Monitor internal certificate expiration
- Verify if internal TLS could be changed to HTTP (future consideration)
Hardening Opportunities (Without Breaking Production)
1. Rate Limiting at NPMplus
Priority: High
Effort: Medium
Impact: High
Implementation:
- Configure rate limiting for RPC endpoints
- Set limits per IP (e.g., 100 requests/minute)
- Apply to all RPC proxy hosts
Steps:
- Access NPMplus UI
- Navigate to Proxy Hosts
- Edit RPC proxy hosts (rpc-http-pub, rpc-ws-pub, etc.)
- Configure rate limiting in advanced config or access lists
- Test rate limiting behavior
Benefits:
- Protects RPC endpoints from abuse
- Prevents DDoS attacks
- Does not require backend changes
2. Cloudflare Access for Admin Portals
Priority: Medium
Effort: Medium
Impact: Medium
Implementation:
- Enable Cloudflare Access for
dbis-admin.d-bis.org - Enable Cloudflare Access for
secure.d-bis.org - Configure access policies (email allowlist, MFA, etc.)
Steps:
- Access Cloudflare Zero Trust dashboard
- Navigate to Access → Applications
- Add application:
dbis-admin.d-bis.org - Configure access policy (email allowlist, MFA)
- Repeat for
secure.d-bis.org
Benefits:
- Additional authentication layer
- MFA support
- Audit trail
- Does not require changing DNS proxy status
3. Certificate Expiration Monitoring
Priority: High
Effort: Low
Impact: High
Implementation:
- Set up monitoring for certificate expiration
- Configure alerts 90/60/30 days before expiration
- Monitor auto-renewal status
Steps:
- Create monitoring script or use existing verification scripts
- Run daily checks of certificate expiration
- Configure alerts (email, Slack, etc.)
- Test alert system
Script:
# Run certificate verification daily
bash scripts/verify/export-npmplus-config.sh
# Check expiration dates
cat docs/04-configuration/verification-evidence/npmplus-verification-*/certificates.json | \
jq '.[] | select(.expires | fromdateiso8601 < (now + (90 * 86400))) | .domain_names'
Benefits:
- Early warning of certificate expiration
- Time to fix auto-renewal issues
- Prevents unexpected outages
4. Health Check Endpoints for All Backend Services
Priority: Medium
Effort: Low-Medium
Impact: Medium
Implementation:
- Add health check endpoints to all backend services
- Configure health checks in NPMplus (if supported)
- Monitor health endpoints
Steps:
- Add
/healthendpoints to all backend services - Configure health checks in application config
- Set up monitoring for health endpoints
- Configure alerts for failed health checks
Benefits:
- Early detection of service issues
- Proactive monitoring
- Better troubleshooting
5. Log Aggregation for NPMplus Access Logs
Priority: Medium
Effort: Medium
Impact: Medium
Implementation:
- Set up log aggregation for NPMplus access logs
- Configure log forwarding (syslog, filebeat, etc.)
- Set up log analysis and alerting
Steps:
- Configure NPMplus to log to syslog or file
- Set up log forwarder (filebeat, fluentd, etc.)
- Configure log aggregation (ELK stack, Loki, etc.)
- Set up alerts for unusual patterns
Benefits:
- Better visibility into traffic patterns
- Detect attacks early
- Audit trail for troubleshooting
6. Document Failover Procedures
Priority: High
Effort: Low
Impact: High
Implementation:
- Document failover procedures if NPMplus fails
- Create step-by-step recovery guide
- Test failover procedures
Status: ✅ Done in NPMPLUS_BACKUP_RESTORE.md
Not in Scope (Would Require Production Changes)
The following hardening measures would require production changes and are not in scope for this plan:
-
Enabling Cloudflare Proxy:
- Would require changing SSL termination from NPMplus to Cloudflare
- Would require reconfiguration of all SSL certificates
- Would break current architecture
-
Adding HA NPMplus Instance:
- Would require deployment of additional NPMplus container
- Would require load balancer configuration
- Would require database replication or shared storage
-
Changing Backend Architecture:
- Adding nginx layer to all services
- Changing RPC endpoints to use nginx
- Would require application changes
Risk Summary Table
| Risk | Severity | Status | Mitigation | Hardening Priority |
|---|---|---|---|---|
| Single Point of Failure (NPMplus) | High | Current | Documented | High (monitoring) |
| DNS-Only Mode | Medium | Intentional | Rate limiting, Cloudflare Access | Medium |
| Certificate Expiration | Medium | Current | Auto-renewal | High (monitoring) |
| Sankofa Routing Issue | High | Known | Cutover plan in place | High (cutover) |
| UDM Pro Manual Config | Medium | Current | Documentation | Medium (verification) |
| Backend Direct Access | Low-Medium | Intentional | Rate limiting | Medium |
| Internal TLS | Low | Current | Documentation | Low (documentation) |
Hardening Implementation Priority
High Priority (Implement First)
- Certificate Expiration Monitoring - Critical for preventing outages
- Rate Limiting for RPC Endpoints - Prevents abuse
- Document Failover Procedures - ✅ Done
Medium Priority
- Cloudflare Access for Admin Portals - Additional security
- Health Check Endpoints - Better monitoring
- Log Aggregation - Better visibility
Low Priority
- Document Internal TLS Rationale - Documentation improvement
Related Documentation
- Verification Runbook:
docs/04-configuration/INGRESS_VERIFICATION_RUNBOOK.md - Backup/Restore Guide:
docs/04-configuration/NPMPLUS_BACKUP_RESTORE.md - Sankofa Cutover Plan:
docs/04-configuration/SANKOFA_CUTOVER_PLAN.md - Comprehensive Architecture:
docs/04-configuration/DNS_NPMPLUS_VM_COMPREHENSIVE_ARCHITECTURE.md
Last Updated: 2026-01-20
Maintained By: Infrastructure Team
Status: Complete Risk Assessment