Specifications — Technical Details

AI Models

Complete specifications: architected data flow, self-hosted inference architecture, system prompt behavior, model selection and evaluation gate, and session record schema.

← Back to Specifications

Query Processing Data Flow (summarized)

Every CEO query follows the same pipeline: retrieval from the customer's encrypted database, context assembly, inference, streaming response delivery, and session record persistence.

Query Processing Pipeline

[CEO enters query in Signal panel]
         ↓
[API receives: { query, customer_id, session_id }]
         ↓
[pgvector retrieval — Aurora (RLS-scoped to customer)]
  1. Embed query → vector via embedding model
  2. SELECT * FROM communications
     WHERE customer_id = $current_customer
     ORDER BY embedding <=> $query_vector
     LIMIT 40
  → Returns top-k semantically relevant records (emails, messages, transcripts)
         ↓
[Context assembly]
  - Truncate/rank retrieved records to fit context window
  - Include session history from current session (last N turns)
  - Apply channel scoping if CEO specified filter
         ↓
[Inference]
  Inference: Self-hosted vLLM (Llama 3.1 70B or equivalent, on AWS g5 GPU)
         ↓
[Streaming response → CEO browser]
  → SSE (Server-Sent Events) or WebSocket, token-by-token
  → Markdown rendered in Signal panel
         ↓
[Session record written to Aurora — per turn]
  → { session_id, turn_id, query, response, retrieved_record_ids, model_used, timestamp }
Per-turn persistence: Session records are written to Aurora after each exchange turn, not at session end. If the session is interrupted, prior turns are preserved and retrievable via the Archive panel.

Self-Hosted Inference

PanOps runs open-weight AI models on dedicated GPU compute inside the customer's cloud environment. No customer data is sent to any external API. The models have passed PanOps performance benchmarks before deployment.

System Prompt Configuration (condensed)

The system prompt configures the model's persona, behavior constraints, and retrieval context. It is applied at the start of every session and is not visible to the CEO in the UI.

System prompt structure (condensed):

ROLE: You are a CEO intelligence analyst for [Customer Name].
      Your job is to answer questions about what is actually happening
      in the organization based on communication data provided.

BEHAVIOR:
- Be direct and precise. No hedging. No filler.
- Never fabricate. If information is not in the provided context, say so.
- Do not add citations unless the CEO explicitly asks.
- Do not ask clarifying questions unless the query is genuinely ambiguous.
- Respond in [CEO preferred language].

CHANNEL SCOPE:
- Default: query across all connected channels.
- Override: CEO may specify "only look at Slack" or "only emails from [person]" — honor this.

CONTEXT:
[Retrieved communication records — injected per query]
[Session history from current session — last N turns]

Self-Hosted Inference — Configuration

PanOps runs open-weight models on dedicated GPU compute. No customer data leaves the PanOps infrastructure boundary at any point in the query pipeline. The models are deployed via blue/green cutover only after passing the PanOps evaluation gate.

ParameterValue
Deployed ModelsOpen-weight foundation models, post-trained and fine-tuned by PanOps
Inference ServervLLM (OpenAI-compatible API)
GPUAWS g5.12xlarge or g5.48xlarge (NVIDIA A10G)
Fine-TuningLoRA/PEFT on PanOps eval set; not trained on customer data
Deployment ModelBlue/green cutover — new model version activates only after passing eval gate

Evaluation Gate

Every model must pass a PanOps evaluation benchmark before deployment. PanOps has developed its own proprietary benchmarks and evaluations across a series of relevant dimensions to enable selection of the best-performing model for CEO query effectiveness, calibrated against real anonymized enterprise data. PanOps AI models and system consistently and significantly outperform generalized frontier models for this application.

DimensionDescriptionRequirement
AccuracyFactual correctness of answer vs. known ground truth≥ established benchmark score
CompletenessAll relevant information included in response≥ established benchmark score
Hallucination RateFrequency of fabricated information not present in context≤ established benchmark rate
Refusal QualityAppropriate handling of ambiguous or unanswerable queries≥ established benchmark score
Full data-sovereignty guarantee is active. The self-hosted models have passed the eval gate and is running in production. No customer data reaches any external service at any point in the query pipeline.

Retrieval: pgvector Semantic Search

PanOps uses pgvector on Aurora PostgreSQL for semantic retrieval. Communication records (emails, messages, transcript segments) are embedded when ingested and stored as 1536-dimensional vectors. At query time, the CEO's query is embedded and the top-k most semantically similar records are retrieved using cosine distance.

ParameterValue
Embedding ModelSelf-hosted embedding model (compatible with 1536-dimensional pgvector index)
Vector Dimensions1536
Distance MetricCosine (<=> operator)
Index TypeIVFFlat (approximate nearest neighbor for scale)
Top-K Retrieved40 records default (configurable)
RLS Enforcementcustomer_id filter applied in all retrieval queries; cross-tenant retrieval structurally impossible

Session Record Schema

Every query/response exchange is persisted to Aurora as a JSON session record. This record is the basis for the Archive panel and enables continuity queries ("what did I ask last Tuesday?").

CREATE TABLE session_turns (
  id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id      UUID NOT NULL,          -- RLS enforced
  session_id       UUID NOT NULL,
  turn_index       INTEGER NOT NULL,
  query            TEXT NOT NULL,
  response         TEXT NOT NULL,
  retrieved_ids    UUID[],                 -- IDs of records used in context
  model_used       TEXT NOT NULL,          -- 'PanOps model configuration'
  channel_scope    TEXT,                   -- null = all; or specific scoping applied
  tokens_in        INTEGER,
  tokens_out       INTEGER,
  latency_ms       INTEGER,
  created_at       TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE session_turns ENABLE ROW LEVEL SECURITY;

← Back to overview