Databases & Infrastructure — Technical Details

AWS Organization Structure

PanOps operates under an AWS Organization managed via AWS Control Tower. The organizational layout enforces billing separation, security controls, and account-level isolation across all tenants.

PanOps AWS Organization
├── Management Account (PanOps master)
│   ├── AWS Control Tower — landing zone, guardrails, SCPs
│   ├── Consolidated billing
│   └── Org-level policies (deny public S3, require CloudTrail, etc.)
│
├── Shared Compute Account (PanOps)
│   ├── Connector workers (Lambda + Fargate)
│   ├── LLM inference (vLLM on g5 GPU)
│   ├── NAT Gateway
│   ├── EventBridge polling scheduler
│   └── SQS queues (connector jobs, transcription jobs)
│
└── Customer Account: [Customer Name]            ← one per customer
    ├── Aurora Serverless v2 (PostgreSQL + pgvector)
    ├── S3 bucket (raw archives + recordings)
    ├── AWS KMS key (customer-controlled)
    ├── CloudTrail (tamper-proof, customer-owned)
    └── VPC — peered to Shared Compute, no internet gateway

Database: Aurora Serverless v2

PanOps uses Amazon Aurora Serverless v2 in PostgreSQL-compatible mode. The serverless configuration auto-scales ACUs (Aurora Capacity Units) based on query load, scaling to near-zero during idle periods and scaling up within seconds under demand. pgvector is enabled to support semantic similarity search for AI retrieval-augmented generation.

Parameter	Current Value	Notes
Engine	Aurora PostgreSQL 15 Serverless v2	Aurora PostgreSQL 15 Serverless v2 (per-account)
ACU Range	0.5–16 ACU (shared across tenants)	0.5–8 ACU per customer account
Tenant Isolation	Row-Level Security (RLS) policies per customer_id	Dedicated instance per customer account
Extensions	pgvector (semantic search), pg_trgm (text search)	Same
Encryption	AES-256 (AWS managed key)	AES-256 via customer-managed KMS key
Backups	Aurora automated backups, 7-day retention	Same, stored in customer account
Multi-AZ	Aurora default (reader in second AZ)	Same

Row-Level Security

Every table that holds customer data includes a customer_id column. PostgreSQL RLS policies enforce that all queries — regardless of the application-layer user — can only read and write rows where customer_id matches the authenticated tenant. Application service roles are granted per-customer, set via SET app.current_customer_id at connection establishment. Combined with the dedicated-account architecture, this provides both physical and logical isolation.

-- Example RLS policy (communications table)
ALTER TABLE communications ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON communications
  USING (customer_id = current_setting('app.current_customer_id')::uuid);

-- Connection setup per request
SET app.current_customer_id = '<uuid>';  -- set from authenticated session

Object Storage: Amazon S3

Each customer's raw communication archives and video recordings are stored in a dedicated Amazon S3 bucket within the customer's own AWS account.

Parameter	Value
Isolation	Dedicated bucket in customer AWS account
Encryption	SSE-KMS with customer-managed key
Access	IAM role in customer account
Versioning	Enabled
Lifecycle	Customer-configurable per retention policy
Public Access	Blocked (all public access disabled)

Compute Layer

Connector Workers

Platform connectors (email polling, Slack, Teams, Zoom, SMS/voice webhooks) run as Lambda functions for short-lived tasks and Fargate containers for longer-running polling workers. EventBridge Scheduler triggers polling on a 6-hour interval with per-connector jitter to prevent thundering herd.

Service	Use Case	Configuration
AWS Lambda	Webhook handlers (SMS/voice push), short delta polls	512MB–1GB memory, 15-min timeout, VPC-attached
AWS Fargate	Long-running connector workers, video download jobs	0.5–2 vCPU, 1–4GB RAM, Spot pricing where possible
EventBridge Scheduler	Polling schedule (6-hr + jitter per connector)	Cron expression per connector, target: Lambda/Fargate
SQS	Job queuing, DLQ for failed connector tasks	Standard queue, visibility timeout 30 min, DLQ after 3 retries
DynamoDB	Delta state store (cursor positions per connector)	On-demand capacity, TTL on stale entries

GPU Compute (Transcription)

Whisper transcription runs on EC2 spot GPU instances. The auto-scaling group monitors SQS queue depth: instances launch when recordings are queued and terminate when the queue empties. This keeps GPU costs near zero during periods with no new recordings.

Parameter	Value
Instance Type	g4dn.xlarge or g5.xlarge (spot)
Scaling Trigger	SQS ApproximateNumberOfMessages > 0
Scale-to-Zero	Terminates when queue empty for 5 minutes
Whisper Model	large-v3 (default); medium configurable per customer

Network Architecture

Each customer account's VPC has no internet gateway. The only network path is a VPC peering connection to the PanOps Shared Compute account. This means customer data in Aurora and S3 is never directly reachable from the internet — only from PanOps compute services operating over the peering link. NAT Gateway in the Shared Compute account provides controlled outbound internet access for API calls to M365, Gmail, Slack, etc.

Network Flow:

[PanOps Shared Compute VPC]          [Customer VPC — no IGW]
  connector workers (Fargate)  ←→    Aurora Serverless v2
  LLM inference (vLLM/g5)     ←→    S3 (via VPC endpoint)
  NAT Gateway (outbound APIs)        KMS (via VPC endpoint)
       ↓
  M365 / Gmail / Slack APIs
  (public internet, outbound only)

Provisioning & Automation

Stage	Method	Time	Notes
Current	Terraform + AWS AFT (Account Factory for Terraform)	<30 min	Fully automated account creation, VPC, Aurora, S3, KMS, CloudTrail

← Back to overview