167 lines
8.1 KiB
Markdown
167 lines
8.1 KiB
Markdown
# Virtual Banker Architecture
|
|
|
|
## Overview
|
|
|
|
The Virtual Banker is a multi-layered system that provides a digital human banking experience with full video realism, real-time voice interaction, and embeddable widget capabilities.
|
|
|
|
## System Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Client Layer │
|
|
│ ┌──────────────────────────────────────────────────────┐ │
|
|
│ │ Embeddable Widget (React/TypeScript) │ │
|
|
│ │ - Chat UI │ │
|
|
│ │ - Voice Controls │ │
|
|
│ │ - Avatar View │ │
|
|
│ │ - WebRTC Client │ │
|
|
│ └──────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Edge Layer │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ CDN │ │ API Gateway │ │ WebRTC │ │
|
|
│ │ (Widget) │ │ (Auth/Rate) │ │ Gateway │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Core Services │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Session │ │ Orchestrator │ │ LLM Gateway │ │
|
|
│ │ Service │ │ │ │ │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ RAG Service │ │ Tool/Action │ │ Safety/ │ │
|
|
│ │ │ │ Service │ │ Compliance │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Media Services │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ ASR Service │ │ TTS Service │ │ Avatar │ │
|
|
│ │ (Streaming) │ │ (Streaming) │ │ Renderer │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Data Layer │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ PostgreSQL │ │ Redis │ │ Vector DB │ │
|
|
│ │ (State) │ │ (Cache) │ │ (pgvector) │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Data Flow
|
|
|
|
### Voice Turn Flow
|
|
|
|
1. **User speaks** → Widget captures audio via microphone
|
|
2. **Audio stream** → WebRTC gateway → ASR service
|
|
3. **ASR** → Transcribes to text (partial + final)
|
|
4. **Orchestrator** → Sends transcript to LLM with context
|
|
5. **LLM** → Generates response + tool calls + emotion tags
|
|
6. **TTS** → Converts text to audio stream
|
|
7. **Avatar** → Generates visemes, expressions, gestures
|
|
8. **Widget** → Plays audio, displays captions, animates avatar
|
|
|
|
### Text Turn Flow
|
|
|
|
1. **User types** → Widget sends text message
|
|
2. **Orchestrator** → Processes message (same as step 4+ above)
|
|
|
|
## Components
|
|
|
|
### Backend Services
|
|
|
|
#### Session Service
|
|
- Creates and manages sessions
|
|
- Issues ephemeral tokens
|
|
- Loads tenant configurations
|
|
- Tracks session state
|
|
|
|
#### Conversation Orchestrator
|
|
- Maintains conversation state machine
|
|
- Routes messages to appropriate services
|
|
- Handles barge-in (interruptions)
|
|
- Synchronizes audio/video
|
|
|
|
#### LLM Gateway
|
|
- Multi-tenant prompt templates
|
|
- Function/tool calling
|
|
- Output schema enforcement
|
|
- Model routing
|
|
|
|
#### RAG Service
|
|
- Document ingestion and embedding
|
|
- Vector similarity search
|
|
- Reranking
|
|
- Citation formatting
|
|
|
|
#### Tool/Action Service
|
|
- Tool registry and execution
|
|
- Banking service integrations
|
|
- Human-in-the-loop confirmations
|
|
- Audit logging
|
|
|
|
### Frontend Widget
|
|
|
|
#### Components
|
|
- **ChatPanel**: Main chat interface
|
|
- **VoiceControls**: Push-to-talk, hands-free, volume
|
|
- **AvatarView**: Video stream display
|
|
- **Captions**: Real-time captions overlay
|
|
- **Settings**: User preferences
|
|
|
|
#### Hooks
|
|
- **useSession**: Session management
|
|
- **useConversation**: Message handling
|
|
- **useWebRTC**: WebRTC connection
|
|
|
|
### Avatar System
|
|
|
|
#### Unreal Engine
|
|
- Digital human character
|
|
- Blendshapes for visemes/expressions
|
|
- Animation blueprints
|
|
- PixelStreaming for video output
|
|
|
|
#### Render Service
|
|
- Controls Unreal instances
|
|
- Manages GPU resources
|
|
- Streams video via WebRTC
|
|
|
|
## Security
|
|
|
|
- JWT/SSO authentication
|
|
- Ephemeral session tokens
|
|
- PII redaction
|
|
- Content filtering
|
|
- Rate limiting
|
|
- Audit trails
|
|
|
|
## Accessibility
|
|
|
|
- WCAG 2.1 AA compliance
|
|
- Keyboard navigation
|
|
- Screen reader support
|
|
- Captions (always available)
|
|
- Reduced motion support
|
|
- ARIA labels
|
|
|
|
## Scalability
|
|
|
|
- Stateless services (behind load balancer)
|
|
- Redis for session caching
|
|
- PostgreSQL for persistent state
|
|
- GPU cluster for avatar rendering
|
|
- CDN for widget assets
|
|
|