Prism AI Observability
AI observability built for compliance teams.
Every LLM call captured, scored, and stored with PII scrubbed before it lands in your database. Regulator-ready exports in under 60 seconds.
- #4821Credit Risk Query1.3s5/5PII redacted
- #4820Underwriting Decision0.9s3/5Drift detected
- #4819Policy Lookup0.4s5/5Grounded
Audit pack ready
47 traces · 60s export
Prism
See every step your agent took, and score whether it should have
Trajectory evaluation decomposes multi-step agent runs into ordered steps and scores each run on goal adherence, tool compliance, efficiency, and safety, automatically on ingest.
- Steps, tool usage, decision points, and final outcome captured per run
- Four-dimension scoring: goal adherence, tool compliance, efficiency, safety
- Background async scoring on ingest, zero impact on agent latency
- Automatic for Claude tool-use via proxy
The problem
Agents do not just generate text, they reason, select tools, make API calls, iterate, and produce multi-step outputs. A single bad tool selection three steps into a ten-step trajectory can cascade into a wrong answer, a wasted API call, or a safety violation. Traditional observability shows the final output; trajectory evaluation shows the path.
Capabilities
What you get with Prism
Steps
Ordered sequence of actions the agent took: reasoning, tool calls, intermediate outputs, retries, and error handling.
Tool usage
Every tool called, with arguments and return values, so you can verify the agent used authorized tools correctly.
Decision points
Where the agent chose between alternatives, and whether the choice aligned with the intended behavior.
Trajectory scoring
Goal adherence (task completion), tool compliance (right tools, right order), efficiency (no unnecessary steps), safety (no guardrail violations).
Async ingest scoring
PRISM evaluation runs in the background on ingest. Scores attach to the trajectory record and surface in dashboards, alerts, and compliance reports.
Trajectory export
Full step record exportable as a single artifact for audits and regulator submissions.
How it works
From instrumentation to evidence
- 1
Emit trajectory data
Agent runs emit trajectory data via the SDK. Automatic for Claude tool-use via proxy; manual instrumentation for custom agents.
- 2
Score asynchronously on ingest
PRISMtrace runs background PRISM evaluation on ingest, so scoring is asynchronous and agent latency is unaffected.
- 3
Surface in dashboards and reports
Scores attach to the trajectory record and appear in dashboards, alerts, and compliance reports.
What teams use it for
In production, every day
Regression detection
A prompt change causes agents to add an extra tool call on 30% of runs. Trajectory scoring flags the efficiency drop before users notice latency.
Safety monitoring
An agent occasionally calls a tool with user PII in the arguments. Safety scoring catches it even when the final output looks clean.
Audit evidence
When regulators ask how the AI makes decisions, trajectory records show the exact chain of reasoning and actions, not just the final answer.
Trajectory contents
What a trajectory record contains
Steps
Ordered sequence of actions the agent took: reasoning, tool calls, intermediate outputs, retries, error handling.
Tool usage
Which tools were called, with what arguments, and what they returned, so you can verify the agent used authorized tools correctly.
Decision points
Where the agent chose between alternatives and whether the choice aligned with the intended behavior.
Final outcome
The end result, linked back to the full chain of steps that produced it.
PRISM scoring
Four dimensions of trajectory evaluation
| Dimension | What it evaluates | Signal |
|---|---|---|
| Goal adherence | Did the agent achieve the stated objective? | Did it complete the task or abandon / diverge? |
| Tool compliance | Did it use the right tools in the right order? | Did it call unauthorized tools or skip required ones? |
| Efficiency | Were there unnecessary steps, loops, retries, or redundant tool calls? | Step count and tool-call count versus expected baseline. |
| Safety | Were any guardrails triggered during the trajectory? | Did any step leak data or violate policy? |
Dimension
Goal adherence
What it evaluates
Did the agent achieve the stated objective?
Signal
Did it complete the task or abandon / diverge?
Dimension
Tool compliance
What it evaluates
Did it use the right tools in the right order?
Signal
Did it call unauthorized tools or skip required ones?
Dimension
Efficiency
What it evaluates
Were there unnecessary steps, loops, retries, or redundant tool calls?
Signal
Step count and tool-call count versus expected baseline.
Dimension
Safety
What it evaluates
Were any guardrails triggered during the trajectory?
Signal
Did any step leak data or violate policy?
Regulatory alignment
Built for Compliance Officers, CROs, Engineering Leads
Related capabilities
LLM Observability: Trace Logging Built for Compliance
Structured traces give you the full story of what your AI said, why it said it, how long it took, and what it cost.
LLM Guardrails: PII Redaction and Prompt Injection Blocking
Real-time detection and enforcement for PII, PHI, prompt injection, content policy violations, and off-topic responses, scoped per agent, per project, per knowledge base.
LLM Evaluations: Five-Dimension Automated Quality Scoring
Define quality rubrics, score every interaction, and catch regressions before users do, with automated evaluators that run on every trace or on a schedule you control.
Prism X: AI DLP for Employees Using ChatGPT, Claude, Gemini
Prism X enforces data loss prevention policy in the browser, before prompts and uploads reach third-party AI services. Signed policy, real-time enforcement, audit-grade events.
Start tracing in 5 minutes
One SDK. Five minutes. Full audit trails, PII redaction, and guardrail enforcement, from day one.