Pipeline Architecture
The transcription pipeline is event-driven. When a recording file lands in customer S3 storage, an S3 event notification triggers the transcription queue. Workers on GPU instances consume jobs from the queue, run PanOps transcription engine inference, and write the resulting transcript to the customer's Aurora tenant. The pipeline has no synchronous dependencies — it runs asynchronously in the background, scaling based on queue depth.
Transcription Pipeline
[Platform recording complete (Teams / Zoom / Google Meet / RingCentral / etc.)]
↓
[Connector downloads recording to customer S3]
→ s3://customer-bucket/recordings/{platform}/{date}/{meeting-id}.mp4
↓
[S3 Event Notification → SQS Transcription Queue]
→ Message: { customer_id, s3_key, platform, language_config }
↓
[Auto-scaling group checks SQS queue depth]
→ ApproximateNumberOfMessages > 0 → launch GPU spot instance
→ Queue empty for 5 min → terminate instances (scale to zero)
↓
[Transcription Worker (Python, EC2 GPU)]
1. Pull job from SQS
2. Download recording from S3 (presigned URL)
3. Run transcription_engine.transcribe(audio, task="translate", language=None)
→ auto-detect source language
→ translate to customer.preferred_language
4. Parse segments: [{start, end, speaker, text}]
5. Write transcript segments to Aurora (RLS-scoped to customer_id)
6. Update job status: completed / failed
7. Delete message from SQS
↓
[Transcript available in Aurora for AI retrieval]
→ Searchable via pgvector semantic search + full-text searchTranscription Engine Configuration
| Parameter | Value | Notes |
|---|---|---|
| Model | PanOps transcription engine (open-weight, state-of-the-art) | Default configuration; tuned for enterprise call/meeting audio |
| Runtime | Python (GPU-accelerated inference runtime) | CUDA-accelerated on GPU instances |
| Task | transcribe + translate | Auto-detects source language; translates to CEO preferred language in single pass |
| Language Detection | Automatic | Auto-identifies language from first 30s of audio |
| Output Format | Segments with timestamps | [{start_ms, end_ms, speaker_label, text}] stored per segment |
| Speaker Diarization | Platform-native where available | Teams and Zoom provide speaker labels via their APIs; applied to transcript segments |
| Supported Languages | 99 languages | Full range of major world languages supported |
GPU Compute Configuration
| Parameter | Value |
|---|---|
| Instance Type | g4dn.xlarge (primary) or g5.xlarge (fallback) |
| GPU | NVIDIA T4 (g4dn) or NVIDIA A10G (g5) |
| Pricing | EC2 Spot (up to ~70% discount vs on-demand) |
| Scaling Metric | SQS ApproximateNumberOfMessages |
| Scale-Up Trigger | Queue depth > 0 for 2 minutes |
| Scale-To-Zero | Queue empty for 5 consecutive minutes → terminate |
| Max Instances | Configurable per customer; default 3 concurrent |
| Spot Interruption | SQS message visibility timeout prevents data loss; job re-queued on interruption |
Recording Sources & Ingestion
| Platform | Recording Type | Trigger | Stage |
|---|---|---|---|
| Microsoft Teams | Meeting recordings (MP4) | OneDrive webhook → connector downloads | Live |
| Zoom Meetings | Cloud recordings (MP4) | recording.completed webhook → connector downloads | Live |
| RingCentral | Voice call recordings | call-recording webhook → connector downloads | Live |
| Google Meet | Meeting recordings | Google Drive webhook → connector downloads | Live |
| Dialpad | Call recordings | Webhook push with recording URL | Live |
| Zoom Phone | Call recordings | Zoom Phone webhook | Live |
| OpenPhone | Call recordings | OpenPhone webhook | Live |
Transcript Storage Schema
Transcripts are stored in Aurora PostgreSQL with a normalized schema that preserves segment-level timing and speaker information. The transcript_segments table is indexed for full-text search and has a pgvector embedding column for semantic retrieval by the AI model.
-- Simplified schema CREATE TABLE recordings ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), customer_id UUID NOT NULL, -- RLS enforced platform TEXT NOT NULL, -- 'teams', 'zoom', 'ringcentral', etc. meeting_id TEXT, recorded_at TIMESTAMPTZ NOT NULL, s3_key TEXT NOT NULL, duration_sec INTEGER, status TEXT DEFAULT 'pending', -- pending | processing | complete | failed created_at TIMESTAMPTZ DEFAULT NOW() ); CREATE TABLE transcript_segments ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), recording_id UUID REFERENCES recordings(id), customer_id UUID NOT NULL, -- RLS enforced (denormalized for policy) start_ms INTEGER NOT NULL, end_ms INTEGER NOT NULL, speaker TEXT, text TEXT NOT NULL, language_src TEXT, -- detected source language language_out TEXT, -- output language (CEO preferred) embedding vector(1536), -- pgvector: semantic search created_at TIMESTAMPTZ DEFAULT NOW() ); -- RLS policies enforced on both tables ALTER TABLE recordings ENABLE ROW LEVEL SECURITY; ALTER TABLE transcript_segments ENABLE ROW LEVEL SECURITY;
Language Support
The PanOps transcription engine supports transcription in 99 languages and translation to English (or to the configured target language). The following are the most commonly configured CEO languages in enterprise deployments:
- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Mandarin Chinese (zh)
- Japanese (ja)
- Portuguese (pt)
- Arabic (ar)
- Hindi (hi)
- Korean (ko)
Language preference is configured per customer at onboarding and stored in the customer configuration record. The transcription worker reads this configuration per job and applies the appropriate output language.
← Back to overview