Query Processing Data Flow (summarized)
Every CEO query follows the same pipeline: retrieval from the customer's encrypted database, context assembly, inference, streaming response delivery, and session record persistence.
Query Processing Pipeline
[CEO enters query in Signal panel]
↓
[API receives: { query, customer_id, session_id }]
↓
[pgvector retrieval — Aurora (RLS-scoped to customer)]
1. Embed query → vector via embedding model
2. SELECT * FROM communications
WHERE customer_id = $current_customer
ORDER BY embedding <=> $query_vector
LIMIT 40
→ Returns top-k semantically relevant records (emails, messages, transcripts)
↓
[Context assembly]
- Truncate/rank retrieved records to fit context window
- Include session history from current session (last N turns)
- Apply channel scoping if CEO specified filter
↓
[Inference]
Inference: Self-hosted vLLM (Llama 3.1 70B or equivalent, on AWS g5 GPU)
↓
[Streaming response → CEO browser]
→ SSE (Server-Sent Events) or WebSocket, token-by-token
→ Markdown rendered in Signal panel
↓
[Session record written to Aurora — per turn]
→ { session_id, turn_id, query, response, retrieved_record_ids, model_used, timestamp }Self-Hosted Inference
PanOps runs open-weight AI models on dedicated GPU compute inside the customer's cloud environment. No customer data is sent to any external API. The models have passed PanOps performance benchmarks before deployment.
System Prompt Configuration (condensed)
The system prompt configures the model's persona, behavior constraints, and retrieval context. It is applied at the start of every session and is not visible to the CEO in the UI.
System prompt structure (condensed):
ROLE: You are a CEO intelligence analyst for [Customer Name].
Your job is to answer questions about what is actually happening
in the organization based on communication data provided.
BEHAVIOR:
- Be direct and precise. No hedging. No filler.
- Never fabricate. If information is not in the provided context, say so.
- Do not add citations unless the CEO explicitly asks.
- Do not ask clarifying questions unless the query is genuinely ambiguous.
- Respond in [CEO preferred language].
CHANNEL SCOPE:
- Default: query across all connected channels.
- Override: CEO may specify "only look at Slack" or "only emails from [person]" — honor this.
CONTEXT:
[Retrieved communication records — injected per query]
[Session history from current session — last N turns]Self-Hosted Inference — Configuration
PanOps runs open-weight models on dedicated GPU compute. No customer data leaves the PanOps infrastructure boundary at any point in the query pipeline. The models are deployed via blue/green cutover only after passing the PanOps evaluation gate.
| Parameter | Value |
|---|---|
| Deployed Models | Open-weight foundation models, post-trained and fine-tuned by PanOps |
| Inference Server | vLLM (OpenAI-compatible API) |
| GPU | AWS g5.12xlarge or g5.48xlarge (NVIDIA A10G) |
| Fine-Tuning | LoRA/PEFT on PanOps eval set; not trained on customer data |
| Deployment Model | Blue/green cutover — new model version activates only after passing eval gate |
Evaluation Gate
Every model must pass a PanOps evaluation benchmark before deployment. PanOps has developed its own proprietary benchmarks and evaluations across a series of relevant dimensions to enable selection of the best-performing model for CEO query effectiveness, calibrated against real anonymized enterprise data. PanOps AI models and system consistently and significantly outperform generalized frontier models for this application.
| Dimension | Description | Requirement |
|---|---|---|
| Accuracy | Factual correctness of answer vs. known ground truth | ≥ established benchmark score |
| Completeness | All relevant information included in response | ≥ established benchmark score |
| Hallucination Rate | Frequency of fabricated information not present in context | ≤ established benchmark rate |
| Refusal Quality | Appropriate handling of ambiguous or unanswerable queries | ≥ established benchmark score |
Retrieval: pgvector Semantic Search
PanOps uses pgvector on Aurora PostgreSQL for semantic retrieval. Communication records (emails, messages, transcript segments) are embedded when ingested and stored as 1536-dimensional vectors. At query time, the CEO's query is embedded and the top-k most semantically similar records are retrieved using cosine distance.
| Parameter | Value |
|---|---|
| Embedding Model | Self-hosted embedding model (compatible with 1536-dimensional pgvector index) |
| Vector Dimensions | 1536 |
| Distance Metric | Cosine (<=> operator) |
| Index Type | IVFFlat (approximate nearest neighbor for scale) |
| Top-K Retrieved | 40 records default (configurable) |
| RLS Enforcement | customer_id filter applied in all retrieval queries; cross-tenant retrieval structurally impossible |
Session Record Schema
Every query/response exchange is persisted to Aurora as a JSON session record. This record is the basis for the Archive panel and enables continuity queries ("what did I ask last Tuesday?").
CREATE TABLE session_turns ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), customer_id UUID NOT NULL, -- RLS enforced session_id UUID NOT NULL, turn_index INTEGER NOT NULL, query TEXT NOT NULL, response TEXT NOT NULL, retrieved_ids UUID[], -- IDs of records used in context model_used TEXT NOT NULL, -- 'PanOps model configuration' channel_scope TEXT, -- null = all; or specific scoping applied tokens_in INTEGER, tokens_out INTEGER, latency_ms INTEGER, created_at TIMESTAMPTZ DEFAULT NOW() ); ALTER TABLE session_turns ENABLE ROW LEVEL SECURITY;
← Back to overview