AWS Organization Structure
PanOps operates under an AWS Organization managed via AWS Control Tower. The organizational layout enforces billing separation, security controls, and account-level isolation across all tenants.
PanOps AWS Organization
├── Management Account (PanOps master)
│ ├── AWS Control Tower — landing zone, guardrails, SCPs
│ ├── Consolidated billing
│ └── Org-level policies (deny public S3, require CloudTrail, etc.)
│
├── Shared Compute Account (PanOps)
│ ├── Connector workers (Lambda + Fargate)
│ ├── LLM inference (vLLM on g5 GPU)
│ ├── NAT Gateway
│ ├── EventBridge polling scheduler
│ └── SQS queues (connector jobs, transcription jobs)
│
└── Customer Account: [Customer Name] ← one per customer
├── Aurora Serverless v2 (PostgreSQL + pgvector)
├── S3 bucket (raw archives + recordings)
├── AWS KMS key (customer-controlled)
├── CloudTrail (tamper-proof, customer-owned)
└── VPC — peered to Shared Compute, no internet gatewayDatabase: Aurora Serverless v2
PanOps uses Amazon Aurora Serverless v2 in PostgreSQL-compatible mode. The serverless configuration auto-scales ACUs (Aurora Capacity Units) based on query load, scaling to near-zero during idle periods and scaling up within seconds under demand. pgvector is enabled to support semantic similarity search for AI retrieval-augmented generation.
| Parameter | Current Value | Notes |
|---|---|---|
| Engine | Aurora PostgreSQL 15 Serverless v2 | Aurora PostgreSQL 15 Serverless v2 (per-account) |
| ACU Range | 0.5–16 ACU (shared across tenants) | 0.5–8 ACU per customer account |
| Tenant Isolation | Row-Level Security (RLS) policies per customer_id | Dedicated instance per customer account |
| Extensions | pgvector (semantic search), pg_trgm (text search) | Same |
| Encryption | AES-256 (AWS managed key) | AES-256 via customer-managed KMS key |
| Backups | Aurora automated backups, 7-day retention | Same, stored in customer account |
| Multi-AZ | Aurora default (reader in second AZ) | Same |
Row-Level Security
Every table that holds customer data includes a customer_id column. PostgreSQL RLS policies enforce that all queries — regardless of the application-layer user — can only read and write rows where customer_id matches the authenticated tenant. Application service roles are granted per-customer, set via SET app.current_customer_id at connection establishment. Combined with the dedicated-account architecture, this provides both physical and logical isolation.
-- Example RLS policy (communications table)
ALTER TABLE communications ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON communications
USING (customer_id = current_setting('app.current_customer_id')::uuid);
-- Connection setup per request
SET app.current_customer_id = '<uuid>'; -- set from authenticated sessionObject Storage: Amazon S3
Each customer's raw communication archives and video recordings are stored in a dedicated Amazon S3 bucket within the customer's own AWS account.
| Parameter | Value |
|---|---|
| Isolation | Dedicated bucket in customer AWS account |
| Encryption | SSE-KMS with customer-managed key |
| Access | IAM role in customer account |
| Versioning | Enabled |
| Lifecycle | Customer-configurable per retention policy |
| Public Access | Blocked (all public access disabled) |
Compute Layer
Connector Workers
Platform connectors (email polling, Slack, Teams, Zoom, SMS/voice webhooks) run as Lambda functions for short-lived tasks and Fargate containers for longer-running polling workers. EventBridge Scheduler triggers polling on a 6-hour interval with per-connector jitter to prevent thundering herd.
| Service | Use Case | Configuration |
|---|---|---|
| AWS Lambda | Webhook handlers (SMS/voice push), short delta polls | 512MB–1GB memory, 15-min timeout, VPC-attached |
| AWS Fargate | Long-running connector workers, video download jobs | 0.5–2 vCPU, 1–4GB RAM, Spot pricing where possible |
| EventBridge Scheduler | Polling schedule (6-hr + jitter per connector) | Cron expression per connector, target: Lambda/Fargate |
| SQS | Job queuing, DLQ for failed connector tasks | Standard queue, visibility timeout 30 min, DLQ after 3 retries |
| DynamoDB | Delta state store (cursor positions per connector) | On-demand capacity, TTL on stale entries |
GPU Compute (Transcription)
Whisper transcription runs on EC2 spot GPU instances. The auto-scaling group monitors SQS queue depth: instances launch when recordings are queued and terminate when the queue empties. This keeps GPU costs near zero during periods with no new recordings.
| Parameter | Value |
|---|---|
| Instance Type | g4dn.xlarge or g5.xlarge (spot) |
| Scaling Trigger | SQS ApproximateNumberOfMessages > 0 |
| Scale-to-Zero | Terminates when queue empty for 5 minutes |
| Whisper Model | large-v3 (default); medium configurable per customer |
Network Architecture
Each customer account's VPC has no internet gateway. The only network path is a VPC peering connection to the PanOps Shared Compute account. This means customer data in Aurora and S3 is never directly reachable from the internet — only from PanOps compute services operating over the peering link. NAT Gateway in the Shared Compute account provides controlled outbound internet access for API calls to M365, Gmail, Slack, etc.
Network Flow:
[PanOps Shared Compute VPC] [Customer VPC — no IGW]
connector workers (Fargate) ←→ Aurora Serverless v2
LLM inference (vLLM/g5) ←→ S3 (via VPC endpoint)
NAT Gateway (outbound APIs) KMS (via VPC endpoint)
↓
M365 / Gmail / Slack APIs
(public internet, outbound only)Provisioning & Automation
| Stage | Method | Time | Notes |
|---|---|---|---|
| Current | Terraform + AWS AFT (Account Factory for Terraform) | <30 min | Fully automated account creation, VPC, Aurora, S3, KMS, CloudTrail |
← Back to overview