Architecture

Aria.

One autonomous consultant. Eight composable prompt layers. One model call per turn.

Model

claude-opus-4-7

Schema version

v1.0.0

Layers

What Aria is

Aria is not a pipeline of specialized worker agents. Aria is one claude-opus-4-7 call per conversational turn, conditioned on an eight-layer system prompt that is composed server-side from static text and per-organization context. The "agent architecture" is the prompt.

She is reachable at /chat (authenticated) and at /demo (unauthenticated, demo prompt). The two surfaces are separate code paths; live and demo contexts never mix.

The eight layers

The static layers (01–05) are concatenated into a single text block cached globally — shared across every organization, every user, every turn. The dynamic layers (06–08) are composed per request from Postgres and cached per organization. Both blocks are sent to the model with cache_control: "ephemeral".

Static · cached globally

Five layers concatenated into one ephemeral-cached text block

Dynamic · cached per-org

Three layers composed per request from Postgres; second ephemeral cache block

Composition flow

composeSystem({ org, user, engagement }) runs on every request. The static text block never varies, so it stays warm in Anthropic's prompt cache across all traffic. The dynamic block changes only when the organization or engagement updates — which means repeat turns within a session hit cache on both blocks.

TypeScript

import { composeSystem } from "@/lib/aria/prompt";

const system = composeSystem({
  org,          // Organization | null
  user,         // UserContext (name, title, department, decisionScope)
  engagement,   // EngagementStateData | null (weekNumber, hypotheses, …)
});

// system is Anthropic.TextBlockParam[], each with cache_control: "ephemeral".
// Layer order: identity, methodology, benchmarks, guardrails, outputFormat (static block),
//              companyContext, userContext, engagementState (dynamic block).

src/lib/aria/prompt/compose.ts

Prompt cache hit rate is the performance metric

Every assistant message persists cacheReadTokens and cacheCreateTokens alongside the model's input and output token counts. A warm session reads the vast majority of the system prompt from cache, which drops time-to-first-byte and cost proportionally. Bumping PROMPT_SCHEMA_VERSION cold-starts the cache — monitor warm-up before and after a bump.

Dive deeper

The 8 prompt layers

Line-by-line reference for each layer — source, version, role, and what Aria actually sees on a warmed session.

Model & caching

Model configuration, effort levels, extended-thinking plan, and how prompt caching interacts with schema versions.

Engagement memory

How EngagementState survives between sessions — hypothesis lifecycle, findings promotion, week-counter semantics.

What Aria is not

Honest about scope, on purpose:

Not a tool-calling agent. Aria does not invoke functions, browse the web, or execute code.
Not a retrieval-augmented agent. There is no document store or vector index in the path today.
Not a polling agent. Aria does not read from Slack, Jira, Salesforce, or other tools; she asks.
Not an orchestrator. There is one agent — Aria — and one call per turn.
Not a programmatic platform. There are no API keys, no public OAuth clients, and no webhooks today.

These are deliberate scope decisions that keep the product focused on one thing: being a consultant in chat. Each of them is a roadmap item, not a limitation of the model. See the newsroom for near-term additions.

Multi-tenancy

Every live query — Organization, UserProfile, Conversation, ChatMessage, EngagementState — is scoped by orgId at the query layer. The demo path never touches the live database. This is the system's primary safety property, enforced by route separation and by the tuple-based identity lookup on every persisted row.