Aria.
One autonomous consultant. Eight composable prompt layers. One model call per turn.
Model
claude-opus-4-7
Schema version
v1.0.0
Layers
8
What Aria is
Aria is not a pipeline of specialized worker agents. Aria is one claude-opus-4-7 call per conversational turn, conditioned on an eight-layer system prompt that is composed server-side from static text and per-organization context. The "agent architecture" is the prompt.
She is reachable at /chat (authenticated) and at /demo (unauthenticated, demo prompt). The two surfaces are separate code paths; live and demo contexts never mix.
The eight layers
The static layers (01–05) are concatenated into a single text block cached globally — shared across every organization, every user, every turn. The dynamic layers (06–08) are composed per request from Postgres and cached per organization. Both blocks are sent to the model with cache_control: "ephemeral".
Composition flow
composeSystem({ org, user, engagement }) runs on every request. The static text block never varies, so it stays warm in Anthropic's prompt cache across all traffic. The dynamic block changes only when the organization or engagement updates — which means repeat turns within a session hit cache on both blocks.
import { composeSystem } from "@/lib/aria/prompt";
const system = composeSystem({
org, // Organization | null
user, // UserContext (name, title, department, decisionScope)
engagement, // EngagementStateData | null (weekNumber, hypotheses, …)
});
// system is Anthropic.TextBlockParam[], each with cache_control: "ephemeral".
// Layer order: identity, methodology, benchmarks, guardrails, outputFormat (static block),
// companyContext, userContext, engagementState (dynamic block).Prompt cache hit rate is the performance metric
Every assistant message persists cacheReadTokens and cacheCreateTokens alongside the model's input and output token counts. A warm session reads the vast majority of the system prompt from cache, which drops time-to-first-byte and cost proportionally. Bumping PROMPT_SCHEMA_VERSION cold-starts the cache — monitor warm-up before and after a bump.
Dive deeper
01
The 8 prompt layers
Line-by-line reference for each layer — source, version, role, and what Aria actually sees on a warmed session.
02
Model & caching
Model configuration, effort levels, extended-thinking plan, and how prompt caching interacts with schema versions.
03
Engagement memory
How EngagementState survives between sessions — hypothesis lifecycle, findings promotion, week-counter semantics.
What Aria is not
Honest about scope, on purpose:
- Not a tool-calling agent. Aria does not invoke functions, browse the web, or execute code.
- Not a retrieval-augmented agent. There is no document store or vector index in the path today.
- Not a polling agent. Aria does not read from Slack, Jira, Salesforce, or other tools; she asks.
- Not an orchestrator. There is one agent — Aria — and one call per turn.
- Not a programmatic platform. There are no API keys, no public OAuth clients, and no webhooks today.
These are deliberate scope decisions that keep the product focused on one thing: being a consultant in chat. Each of them is a roadmap item, not a limitation of the model. See the newsroom for near-term additions.
Multi-tenancy
Every live query — Organization, UserProfile, Conversation, ChatMessage, EngagementState — is scoped by orgId at the query layer. The demo path never touches the live database. This is the system's primary safety property, enforced by route separation and by the tuple-based identity lookup on every persisted row.