Files
virtual-banker/docs/ARCHITECTURE.md

167 lines
8.1 KiB
Markdown
Raw Normal View History

# Virtual Banker Architecture
## Overview
The Virtual Banker is a multi-layered system that provides a digital human banking experience with full video realism, real-time voice interaction, and embeddable widget capabilities.
## System Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Client Layer │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Embeddable Widget (React/TypeScript) │ │
│ │ - Chat UI │ │
│ │ - Voice Controls │ │
│ │ - Avatar View │ │
│ │ - WebRTC Client │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Edge Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ CDN │ │ API Gateway │ │ WebRTC │ │
│ │ (Widget) │ │ (Auth/Rate) │ │ Gateway │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Core Services │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Session │ │ Orchestrator │ │ LLM Gateway │ │
│ │ Service │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ RAG Service │ │ Tool/Action │ │ Safety/ │ │
│ │ │ │ Service │ │ Compliance │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Media Services │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ ASR Service │ │ TTS Service │ │ Avatar │ │
│ │ (Streaming) │ │ (Streaming) │ │ Renderer │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Data Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PostgreSQL │ │ Redis │ │ Vector DB │ │
│ │ (State) │ │ (Cache) │ │ (pgvector) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## Data Flow
### Voice Turn Flow
1. **User speaks** → Widget captures audio via microphone
2. **Audio stream** → WebRTC gateway → ASR service
3. **ASR** → Transcribes to text (partial + final)
4. **Orchestrator** → Sends transcript to LLM with context
5. **LLM** → Generates response + tool calls + emotion tags
6. **TTS** → Converts text to audio stream
7. **Avatar** → Generates visemes, expressions, gestures
8. **Widget** → Plays audio, displays captions, animates avatar
### Text Turn Flow
1. **User types** → Widget sends text message
2. **Orchestrator** → Processes message (same as step 4+ above)
## Components
### Backend Services
#### Session Service
- Creates and manages sessions
- Issues ephemeral tokens
- Loads tenant configurations
- Tracks session state
#### Conversation Orchestrator
- Maintains conversation state machine
- Routes messages to appropriate services
- Handles barge-in (interruptions)
- Synchronizes audio/video
#### LLM Gateway
- Multi-tenant prompt templates
- Function/tool calling
- Output schema enforcement
- Model routing
#### RAG Service
- Document ingestion and embedding
- Vector similarity search
- Reranking
- Citation formatting
#### Tool/Action Service
- Tool registry and execution
- Banking service integrations
- Human-in-the-loop confirmations
- Audit logging
### Frontend Widget
#### Components
- **ChatPanel**: Main chat interface
- **VoiceControls**: Push-to-talk, hands-free, volume
- **AvatarView**: Video stream display
- **Captions**: Real-time captions overlay
- **Settings**: User preferences
#### Hooks
- **useSession**: Session management
- **useConversation**: Message handling
- **useWebRTC**: WebRTC connection
### Avatar System
#### Unreal Engine
- Digital human character
- Blendshapes for visemes/expressions
- Animation blueprints
- PixelStreaming for video output
#### Render Service
- Controls Unreal instances
- Manages GPU resources
- Streams video via WebRTC
## Security
- JWT/SSO authentication
- Ephemeral session tokens
- PII redaction
- Content filtering
- Rate limiting
- Audit trails
## Accessibility
- WCAG 2.1 AA compliance
- Keyboard navigation
- Screen reader support
- Captions (always available)
- Reduced motion support
- ARIA labels
## Scalability
- Stateless services (behind load balancer)
- Redis for session caching
- PostgreSQL for persistent state
- GPU cluster for avatar rendering
- CDN for widget assets