376 lines
9.1 KiB
Markdown
376 lines
9.1 KiB
Markdown
|
|
# Cloudflare PoP to Physical Infrastructure Mapping Strategy
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
This document outlines the strategy for mapping Cloudflare Points of Presence (PoPs) as regional gateways and tunneling traffic to physical hardware infrastructure across the global Phoenix network.
|
||
|
|
|
||
|
|
## Architecture Principles
|
||
|
|
|
||
|
|
1. **Cloudflare PoPs as Edge Gateways**: Use Cloudflare's 300+ global PoPs as the entry point for all user traffic
|
||
|
|
2. **Zero Trust Tunneling**: All traffic from PoPs to physical infrastructure via Cloudflare Tunnels (cloudflared)
|
||
|
|
3. **Regional Aggregation**: Map multiple PoPs to regional datacenters
|
||
|
|
4. **Latency Optimization**: Route traffic to nearest physical infrastructure
|
||
|
|
5. **High Availability**: Multiple PoP paths to physical infrastructure
|
||
|
|
|
||
|
|
## Cloudflare PoP Mapping Strategy
|
||
|
|
|
||
|
|
### Tier 1: Core Datacenter Mapping
|
||
|
|
|
||
|
|
**Mapping Logic**:
|
||
|
|
- Each Core Datacenter (10-15 locations) serves as a regional hub
|
||
|
|
- Multiple Cloudflare PoPs in the region route to the nearest Core Datacenter
|
||
|
|
- Primary and backup tunnel paths for redundancy
|
||
|
|
|
||
|
|
**Example Mapping**:
|
||
|
|
```
|
||
|
|
Core Datacenter: US-East (Virginia)
|
||
|
|
├── Cloudflare PoPs:
|
||
|
|
│ ├── Washington, DC (primary)
|
||
|
|
│ ├── New York, NY (primary)
|
||
|
|
│ ├── Boston, MA (backup)
|
||
|
|
│ └── Philadelphia, PA (backup)
|
||
|
|
└── Tunnel Configuration:
|
||
|
|
├── Primary: cloudflared tunnel to VA datacenter
|
||
|
|
└── Backup: Failover to alternate path
|
||
|
|
```
|
||
|
|
|
||
|
|
### Tier 2: Regional Datacenter Mapping
|
||
|
|
|
||
|
|
**Mapping Logic**:
|
||
|
|
- Regional Datacenters (50-75 locations) aggregate PoP traffic
|
||
|
|
- PoPs route to nearest Regional Datacenter
|
||
|
|
- Load balancing across multiple regional paths
|
||
|
|
|
||
|
|
**Example Mapping**:
|
||
|
|
```
|
||
|
|
Regional Datacenter: US-West (California)
|
||
|
|
├── Cloudflare PoPs:
|
||
|
|
│ ├── San Francisco, CA
|
||
|
|
│ ├── Los Angeles, CA
|
||
|
|
│ ├── San Jose, CA
|
||
|
|
│ └── Seattle, WA
|
||
|
|
└── Tunnel Configuration:
|
||
|
|
├── Load balanced across multiple tunnels
|
||
|
|
└── Health-check based routing
|
||
|
|
```
|
||
|
|
|
||
|
|
### Tier 3: Edge Site Mapping
|
||
|
|
|
||
|
|
**Mapping Logic**:
|
||
|
|
- Edge Sites (250+ locations) connect to nearest PoP
|
||
|
|
- Direct PoP-to-Edge tunneling for low latency
|
||
|
|
- Edge sites can serve as backup paths
|
||
|
|
|
||
|
|
**Example Mapping**:
|
||
|
|
```
|
||
|
|
Edge Site: Denver, CO
|
||
|
|
├── Cloudflare PoP: Denver, CO
|
||
|
|
└── Tunnel Configuration:
|
||
|
|
├── Direct tunnel to edge site
|
||
|
|
└── Backup via regional datacenter
|
||
|
|
```
|
||
|
|
|
||
|
|
## Implementation Architecture
|
||
|
|
|
||
|
|
### 1. PoP-to-Region Mapping Service
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
interface PoPMapping {
|
||
|
|
popId: string
|
||
|
|
popLocation: {
|
||
|
|
city: string
|
||
|
|
country: string
|
||
|
|
coordinates: { lat: number; lng: number }
|
||
|
|
}
|
||
|
|
primaryDatacenter: {
|
||
|
|
id: string
|
||
|
|
type: 'CORE' | 'REGIONAL' | 'EDGE'
|
||
|
|
location: Location
|
||
|
|
tunnelEndpoint: string
|
||
|
|
}
|
||
|
|
backupDatacenters: Array<{
|
||
|
|
id: string
|
||
|
|
priority: number
|
||
|
|
tunnelEndpoint: string
|
||
|
|
}>
|
||
|
|
routingRules: {
|
||
|
|
latencyThreshold: number // ms
|
||
|
|
failoverThreshold: number // ms
|
||
|
|
loadBalancing: 'ROUND_ROBIN' | 'LEAST_CONNECTIONS' | 'GEOGRAPHIC'
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Tunnel Management Service
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
interface TunnelConfiguration {
|
||
|
|
tunnelId: string
|
||
|
|
popId: string
|
||
|
|
targetDatacenter: string
|
||
|
|
tunnelType: 'PRIMARY' | 'BACKUP' | 'LOAD_BALANCED'
|
||
|
|
healthCheck: {
|
||
|
|
endpoint: string
|
||
|
|
interval: number
|
||
|
|
timeout: number
|
||
|
|
failureThreshold: number
|
||
|
|
}
|
||
|
|
routing: {
|
||
|
|
path: string
|
||
|
|
service: string
|
||
|
|
loadBalancing: LoadBalancingConfig
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Geographic Routing Service
|
||
|
|
|
||
|
|
**Distance Calculation**:
|
||
|
|
- Calculate distance from PoP to all available datacenters
|
||
|
|
- Select nearest datacenter within latency threshold
|
||
|
|
- Consider network path, not just geographic distance
|
||
|
|
|
||
|
|
**Latency-Based Routing**:
|
||
|
|
- Measure actual latency from PoP to datacenter
|
||
|
|
- Route to lowest latency path
|
||
|
|
- Dynamic rerouting based on real-time latency
|
||
|
|
|
||
|
|
## Cloudflare Tunnel Configuration
|
||
|
|
|
||
|
|
### Tunnel Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
User Request
|
||
|
|
↓
|
||
|
|
Cloudflare PoP (Edge)
|
||
|
|
↓
|
||
|
|
Cloudflare Tunnel (cloudflared)
|
||
|
|
↓
|
||
|
|
Physical Infrastructure (Proxmox/K8s)
|
||
|
|
↓
|
||
|
|
Application
|
||
|
|
```
|
||
|
|
|
||
|
|
### Tunnel Setup Process
|
||
|
|
|
||
|
|
1. **Tunnel Creation**:
|
||
|
|
- Create Cloudflare Tunnel via API
|
||
|
|
- Generate tunnel token
|
||
|
|
- Deploy cloudflared agent on physical infrastructure
|
||
|
|
|
||
|
|
2. **Route Configuration**:
|
||
|
|
- Configure DNS records to point to tunnel
|
||
|
|
- Set up ingress rules for routing
|
||
|
|
- Configure load balancing
|
||
|
|
|
||
|
|
3. **Health Monitoring**:
|
||
|
|
- Monitor tunnel health
|
||
|
|
- Automatic failover on tunnel failure
|
||
|
|
- Alert on tunnel degradation
|
||
|
|
|
||
|
|
### Multi-Tunnel Strategy
|
||
|
|
|
||
|
|
**Primary Tunnel**:
|
||
|
|
- Direct path from PoP to primary datacenter
|
||
|
|
- Lowest latency path
|
||
|
|
- Active traffic routing
|
||
|
|
|
||
|
|
**Backup Tunnel**:
|
||
|
|
- Alternative path via backup datacenter
|
||
|
|
- Activated on primary failure
|
||
|
|
- Pre-established for fast failover
|
||
|
|
|
||
|
|
**Load Balanced Tunnels**:
|
||
|
|
- Multiple tunnels for high availability
|
||
|
|
- Load distribution across tunnels
|
||
|
|
- Health-based routing
|
||
|
|
|
||
|
|
## Regional Gateway Mapping
|
||
|
|
|
||
|
|
### Region Definition
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
interface Region {
|
||
|
|
id: string
|
||
|
|
name: string
|
||
|
|
type: 'CORE' | 'REGIONAL' | 'EDGE'
|
||
|
|
location: {
|
||
|
|
city: string
|
||
|
|
country: string
|
||
|
|
coordinates: { lat: number; lng: number }
|
||
|
|
}
|
||
|
|
cloudflarePoPs: string[] // PoP IDs
|
||
|
|
physicalInfrastructure: {
|
||
|
|
datacenterId: string
|
||
|
|
tunnelEndpoints: string[]
|
||
|
|
capacity: {
|
||
|
|
compute: number
|
||
|
|
storage: number
|
||
|
|
network: number
|
||
|
|
}
|
||
|
|
}
|
||
|
|
routing: {
|
||
|
|
primaryPath: string
|
||
|
|
backupPaths: string[]
|
||
|
|
loadBalancing: LoadBalancingConfig
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### PoP-to-Region Assignment Algorithm
|
||
|
|
|
||
|
|
1. **Geographic Proximity**:
|
||
|
|
- Calculate distance from PoP to all regions
|
||
|
|
- Assign to nearest region within threshold
|
||
|
|
|
||
|
|
2. **Capacity Consideration**:
|
||
|
|
- Check region capacity
|
||
|
|
- Distribute PoPs to balance load
|
||
|
|
- Avoid overloading single region
|
||
|
|
|
||
|
|
3. **Network Topology**:
|
||
|
|
- Consider network paths
|
||
|
|
- Optimize for latency
|
||
|
|
- Minimize hops
|
||
|
|
|
||
|
|
4. **Failover Planning**:
|
||
|
|
- Ensure backup regions available
|
||
|
|
- Geographic diversity for resilience
|
||
|
|
- Multiple paths for redundancy
|
||
|
|
|
||
|
|
## Implementation Components
|
||
|
|
|
||
|
|
### 1. PoP Mapping Service
|
||
|
|
|
||
|
|
**File**: `api/src/services/pop-mapping.ts`
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
class PoPMappingService {
|
||
|
|
async mapPoPToRegion(popId: string): Promise<Region>
|
||
|
|
async getOptimalDatacenter(popId: string): Promise<Datacenter>
|
||
|
|
async configureTunnel(popId: string, datacenterId: string): Promise<Tunnel>
|
||
|
|
async updateRouting(popId: string, routing: RoutingConfig): Promise<void>
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Tunnel Orchestration Service
|
||
|
|
|
||
|
|
**File**: `api/src/services/tunnel-orchestration.ts`
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
class TunnelOrchestrationService {
|
||
|
|
async createTunnel(config: TunnelConfiguration): Promise<Tunnel>
|
||
|
|
async monitorTunnel(tunnelId: string): Promise<TunnelHealth>
|
||
|
|
async failoverTunnel(tunnelId: string, backupTunnelId: string): Promise<void>
|
||
|
|
async loadBalanceTunnels(tunnelIds: string[]): Promise<LoadBalancer>
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Geographic Routing Engine
|
||
|
|
|
||
|
|
**File**: `api/src/services/geographic-routing.ts`
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
class GeographicRoutingService {
|
||
|
|
async findNearestDatacenter(popLocation: Location): Promise<Datacenter>
|
||
|
|
async calculateLatency(popId: string, datacenterId: string): Promise<number>
|
||
|
|
async optimizeRouting(popId: string): Promise<RoutingPath>
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Database Schema
|
||
|
|
|
||
|
|
### PoP Mappings Table
|
||
|
|
|
||
|
|
```sql
|
||
|
|
CREATE TABLE pop_mappings (
|
||
|
|
id UUID PRIMARY KEY,
|
||
|
|
pop_id VARCHAR(255) UNIQUE NOT NULL,
|
||
|
|
pop_location JSONB NOT NULL,
|
||
|
|
primary_datacenter_id UUID REFERENCES datacenters(id),
|
||
|
|
region_id UUID REFERENCES regions(id),
|
||
|
|
tunnel_configuration JSONB,
|
||
|
|
routing_rules JSONB,
|
||
|
|
created_at TIMESTAMP,
|
||
|
|
updated_at TIMESTAMP
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
### Tunnel Configurations Table
|
||
|
|
|
||
|
|
```sql
|
||
|
|
CREATE TABLE tunnel_configurations (
|
||
|
|
id UUID PRIMARY KEY,
|
||
|
|
tunnel_id VARCHAR(255) UNIQUE NOT NULL,
|
||
|
|
pop_id VARCHAR(255) REFERENCES pop_mappings(pop_id),
|
||
|
|
datacenter_id UUID REFERENCES datacenters(id),
|
||
|
|
tunnel_type VARCHAR(50),
|
||
|
|
health_status VARCHAR(50),
|
||
|
|
configuration JSONB,
|
||
|
|
created_at TIMESTAMP,
|
||
|
|
updated_at TIMESTAMP
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
## Monitoring and Observability
|
||
|
|
|
||
|
|
### Key Metrics
|
||
|
|
|
||
|
|
1. **Tunnel Health**:
|
||
|
|
- Tunnel uptime
|
||
|
|
- Latency from PoP to datacenter
|
||
|
|
- Packet loss
|
||
|
|
- Throughput
|
||
|
|
|
||
|
|
2. **Routing Performance**:
|
||
|
|
- Request routing time
|
||
|
|
- Failover time
|
||
|
|
- Load distribution
|
||
|
|
|
||
|
|
3. **Geographic Distribution**:
|
||
|
|
- PoP-to-datacenter mapping distribution
|
||
|
|
- Regional load balancing
|
||
|
|
- Capacity utilization
|
||
|
|
|
||
|
|
### Alerting
|
||
|
|
|
||
|
|
- Tunnel failure alerts
|
||
|
|
- High latency alerts
|
||
|
|
- Capacity threshold alerts
|
||
|
|
- Routing anomaly alerts
|
||
|
|
|
||
|
|
## Security Considerations
|
||
|
|
|
||
|
|
1. **Zero Trust Architecture**:
|
||
|
|
- All traffic authenticated
|
||
|
|
- No public IPs on physical infrastructure
|
||
|
|
- Encrypted tunnel connections
|
||
|
|
|
||
|
|
2. **Access Control**:
|
||
|
|
- PoP-based access policies
|
||
|
|
- Geographic restrictions
|
||
|
|
- IP allowlisting
|
||
|
|
|
||
|
|
3. **Audit Logging**:
|
||
|
|
- All tunnel connections logged
|
||
|
|
- Routing decisions logged
|
||
|
|
- Access attempts logged
|
||
|
|
|
||
|
|
## Deployment Strategy
|
||
|
|
|
||
|
|
### Phase 1: Core Datacenter Mapping (30 days)
|
||
|
|
- Map top 50 Cloudflare PoPs to Core Datacenters
|
||
|
|
- Deploy primary tunnels
|
||
|
|
- Implement basic routing
|
||
|
|
|
||
|
|
### Phase 2: Regional Expansion (60 days)
|
||
|
|
- Map remaining PoPs to Regional Datacenters
|
||
|
|
- Deploy backup tunnels
|
||
|
|
- Implement failover
|
||
|
|
|
||
|
|
### Phase 3: Edge Integration (90 days)
|
||
|
|
- Integrate Edge Sites
|
||
|
|
- Optimize routing algorithms
|
||
|
|
- Full monitoring and alerting
|
||
|
|
|