Glossary
LLM Observability
Also known as: large language model observability
Definition
LLM observability is the model-layer subset of AI observability. It covers prompt-response capture, token and cost tracking, latency, quality scoring, guardrail decisions, and lineage of inputs through embeddings, retrieval, tools, and reasoning steps. It is what application performance monitoring becomes in an LLM-native architecture.
Why it matters
Generic APM tools were built for deterministic systems where the same input produces the same output. LLMs are non-deterministic by design. The same prompt produces different responses, costs vary by token count, latency varies by model load, and quality varies invisibly with model version updates. Without LLM-specific observability, teams discover regressions through customer complaints rather than monitoring.
LLM observability also addresses risks unique to language models: hallucinations, prompt injection, sensitive-data egress, and policy violations. These are not captured by generic logs.
In practice
Prism treats every LLM call as a first-class trace. The Python and TypeScript SDKs auto-instrument the OpenAI, Anthropic, and Bedrock libraries. OpenTelemetry exporters cover the rest. Each trace carries quality score, guardrail status, and full input/output drill-down, with PII already redacted before storage.
Related
More glossary terms
Start tracing in 5 minutes
One SDK. Five minutes. Full audit trails, PII redaction, and guardrail enforcement, from day one.