Files
Sankofa/docs/architecture/cloudflare-pop-mapping.md

376 lines
9.1 KiB
Markdown
Raw Normal View History

# Cloudflare PoP to Physical Infrastructure Mapping Strategy
## Overview
This document outlines the strategy for mapping Cloudflare Points of Presence (PoPs) as regional gateways and tunneling traffic to physical hardware infrastructure across the global Phoenix network.
## Architecture Principles
1. **Cloudflare PoPs as Edge Gateways**: Use Cloudflare's 300+ global PoPs as the entry point for all user traffic
2. **Zero Trust Tunneling**: All traffic from PoPs to physical infrastructure via Cloudflare Tunnels (cloudflared)
3. **Regional Aggregation**: Map multiple PoPs to regional datacenters
4. **Latency Optimization**: Route traffic to nearest physical infrastructure
5. **High Availability**: Multiple PoP paths to physical infrastructure
## Cloudflare PoP Mapping Strategy
### Tier 1: Core Datacenter Mapping
**Mapping Logic**:
- Each Core Datacenter (10-15 locations) serves as a regional hub
- Multiple Cloudflare PoPs in the region route to the nearest Core Datacenter
- Primary and backup tunnel paths for redundancy
**Example Mapping**:
```
Core Datacenter: US-East (Virginia)
├── Cloudflare PoPs:
│ ├── Washington, DC (primary)
│ ├── New York, NY (primary)
│ ├── Boston, MA (backup)
│ └── Philadelphia, PA (backup)
└── Tunnel Configuration:
├── Primary: cloudflared tunnel to VA datacenter
└── Backup: Failover to alternate path
```
### Tier 2: Regional Datacenter Mapping
**Mapping Logic**:
- Regional Datacenters (50-75 locations) aggregate PoP traffic
- PoPs route to nearest Regional Datacenter
- Load balancing across multiple regional paths
**Example Mapping**:
```
Regional Datacenter: US-West (California)
├── Cloudflare PoPs:
│ ├── San Francisco, CA
│ ├── Los Angeles, CA
│ ├── San Jose, CA
│ └── Seattle, WA
└── Tunnel Configuration:
├── Load balanced across multiple tunnels
└── Health-check based routing
```
### Tier 3: Edge Site Mapping
**Mapping Logic**:
- Edge Sites (250+ locations) connect to nearest PoP
- Direct PoP-to-Edge tunneling for low latency
- Edge sites can serve as backup paths
**Example Mapping**:
```
Edge Site: Denver, CO
├── Cloudflare PoP: Denver, CO
└── Tunnel Configuration:
├── Direct tunnel to edge site
└── Backup via regional datacenter
```
## Implementation Architecture
### 1. PoP-to-Region Mapping Service
```typescript
interface PoPMapping {
popId: string
popLocation: {
city: string
country: string
coordinates: { lat: number; lng: number }
}
primaryDatacenter: {
id: string
type: 'CORE' | 'REGIONAL' | 'EDGE'
location: Location
tunnelEndpoint: string
}
backupDatacenters: Array<{
id: string
priority: number
tunnelEndpoint: string
}>
routingRules: {
latencyThreshold: number // ms
failoverThreshold: number // ms
loadBalancing: 'ROUND_ROBIN' | 'LEAST_CONNECTIONS' | 'GEOGRAPHIC'
}
}
```
### 2. Tunnel Management Service
```typescript
interface TunnelConfiguration {
tunnelId: string
popId: string
targetDatacenter: string
tunnelType: 'PRIMARY' | 'BACKUP' | 'LOAD_BALANCED'
healthCheck: {
endpoint: string
interval: number
timeout: number
failureThreshold: number
}
routing: {
path: string
service: string
loadBalancing: LoadBalancingConfig
}
}
```
### 3. Geographic Routing Service
**Distance Calculation**:
- Calculate distance from PoP to all available datacenters
- Select nearest datacenter within latency threshold
- Consider network path, not just geographic distance
**Latency-Based Routing**:
- Measure actual latency from PoP to datacenter
- Route to lowest latency path
- Dynamic rerouting based on real-time latency
## Cloudflare Tunnel Configuration
### Tunnel Architecture
```
User Request
Cloudflare PoP (Edge)
Cloudflare Tunnel (cloudflared)
Physical Infrastructure (Proxmox/K8s)
Application
```
### Tunnel Setup Process
1. **Tunnel Creation**:
- Create Cloudflare Tunnel via API
- Generate tunnel token
- Deploy cloudflared agent on physical infrastructure
2. **Route Configuration**:
- Configure DNS records to point to tunnel
- Set up ingress rules for routing
- Configure load balancing
3. **Health Monitoring**:
- Monitor tunnel health
- Automatic failover on tunnel failure
- Alert on tunnel degradation
### Multi-Tunnel Strategy
**Primary Tunnel**:
- Direct path from PoP to primary datacenter
- Lowest latency path
- Active traffic routing
**Backup Tunnel**:
- Alternative path via backup datacenter
- Activated on primary failure
- Pre-established for fast failover
**Load Balanced Tunnels**:
- Multiple tunnels for high availability
- Load distribution across tunnels
- Health-based routing
## Regional Gateway Mapping
### Region Definition
```typescript
interface Region {
id: string
name: string
type: 'CORE' | 'REGIONAL' | 'EDGE'
location: {
city: string
country: string
coordinates: { lat: number; lng: number }
}
cloudflarePoPs: string[] // PoP IDs
physicalInfrastructure: {
datacenterId: string
tunnelEndpoints: string[]
capacity: {
compute: number
storage: number
network: number
}
}
routing: {
primaryPath: string
backupPaths: string[]
loadBalancing: LoadBalancingConfig
}
}
```
### PoP-to-Region Assignment Algorithm
1. **Geographic Proximity**:
- Calculate distance from PoP to all regions
- Assign to nearest region within threshold
2. **Capacity Consideration**:
- Check region capacity
- Distribute PoPs to balance load
- Avoid overloading single region
3. **Network Topology**:
- Consider network paths
- Optimize for latency
- Minimize hops
4. **Failover Planning**:
- Ensure backup regions available
- Geographic diversity for resilience
- Multiple paths for redundancy
## Implementation Components
### 1. PoP Mapping Service
**File**: `api/src/services/pop-mapping.ts`
```typescript
class PoPMappingService {
async mapPoPToRegion(popId: string): Promise<Region>
async getOptimalDatacenter(popId: string): Promise<Datacenter>
async configureTunnel(popId: string, datacenterId: string): Promise<Tunnel>
async updateRouting(popId: string, routing: RoutingConfig): Promise<void>
}
```
### 2. Tunnel Orchestration Service
**File**: `api/src/services/tunnel-orchestration.ts`
```typescript
class TunnelOrchestrationService {
async createTunnel(config: TunnelConfiguration): Promise<Tunnel>
async monitorTunnel(tunnelId: string): Promise<TunnelHealth>
async failoverTunnel(tunnelId: string, backupTunnelId: string): Promise<void>
async loadBalanceTunnels(tunnelIds: string[]): Promise<LoadBalancer>
}
```
### 3. Geographic Routing Engine
**File**: `api/src/services/geographic-routing.ts`
```typescript
class GeographicRoutingService {
async findNearestDatacenter(popLocation: Location): Promise<Datacenter>
async calculateLatency(popId: string, datacenterId: string): Promise<number>
async optimizeRouting(popId: string): Promise<RoutingPath>
}
```
## Database Schema
### PoP Mappings Table
```sql
CREATE TABLE pop_mappings (
id UUID PRIMARY KEY,
pop_id VARCHAR(255) UNIQUE NOT NULL,
pop_location JSONB NOT NULL,
primary_datacenter_id UUID REFERENCES datacenters(id),
region_id UUID REFERENCES regions(id),
tunnel_configuration JSONB,
routing_rules JSONB,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
```
### Tunnel Configurations Table
```sql
CREATE TABLE tunnel_configurations (
id UUID PRIMARY KEY,
tunnel_id VARCHAR(255) UNIQUE NOT NULL,
pop_id VARCHAR(255) REFERENCES pop_mappings(pop_id),
datacenter_id UUID REFERENCES datacenters(id),
tunnel_type VARCHAR(50),
health_status VARCHAR(50),
configuration JSONB,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
```
## Monitoring and Observability
### Key Metrics
1. **Tunnel Health**:
- Tunnel uptime
- Latency from PoP to datacenter
- Packet loss
- Throughput
2. **Routing Performance**:
- Request routing time
- Failover time
- Load distribution
3. **Geographic Distribution**:
- PoP-to-datacenter mapping distribution
- Regional load balancing
- Capacity utilization
### Alerting
- Tunnel failure alerts
- High latency alerts
- Capacity threshold alerts
- Routing anomaly alerts
## Security Considerations
1. **Zero Trust Architecture**:
- All traffic authenticated
- No public IPs on physical infrastructure
- Encrypted tunnel connections
2. **Access Control**:
- PoP-based access policies
- Geographic restrictions
- IP allowlisting
3. **Audit Logging**:
- All tunnel connections logged
- Routing decisions logged
- Access attempts logged
## Deployment Strategy
### Phase 1: Core Datacenter Mapping (30 days)
- Map top 50 Cloudflare PoPs to Core Datacenters
- Deploy primary tunnels
- Implement basic routing
### Phase 2: Regional Expansion (60 days)
- Map remaining PoPs to Regional Datacenters
- Deploy backup tunnels
- Implement failover
### Phase 3: Edge Integration (90 days)
- Integrate Edge Sites
- Optimize routing algorithms
- Full monitoring and alerting