Glossary

LLM-as-Judge

Also known as: LLM as a judge, LLM-as-a-Judge, LLM judge

Definition

LLM-as-Judge is an evaluation technique where an independent large language model scores another LLM's output against a rubric — accuracy, relevance, tone, consistency, completeness, or custom criteria. It enables continuous, automated quality measurement on every production trace, replacing manual sampling.

Why it matters

Manual evaluation does not scale. A reviewer sampling 1% of traces misses regressions that affect 99% of users. Hard-coded heuristics catch simple errors but miss subtle quality degradation, tone problems, and factual inconsistency. LLM-as-Judge sits in between: cheap enough to score every trace, sophisticated enough to catch problems heuristics miss.

Done well, it provides the continuous quality signal that NIST AI RMF MEASURE-2.6 expects and that SR 11-7 ongoing monitoring increasingly demands. Done poorly, it inherits the judge model's biases and its scores become noise. Best practice is to use a different model family for the judge than for the system being evaluated, anchor scores against human-labeled ground truth periodically, and report calibration drift as part of the eval.

In practice

PRISM Evaluations use LLM-as-Judge to score every trace across five fixed dimensions within seconds of completion. Judges run independently of the production model, their decisions are themselves logged, and quality scores feed both real-time alerts and weekly regression reports.

PRISM Evaluations

Model Drift (term)

AI Observability (term)

More glossary terms

Shadow AI AI Observability LLM Observability AI Red Teaming AI Guardrail Model Drift All terms →

Start tracing in 5 minutes

One SDK. Five minutes. Full audit trails, PII redaction, and guardrail enforcement, from day one.

Tamper-proof traces, sealed before storage

Zero PII in storage, redacted at ingestion

Multi-cloud: Databricks, Snowflake, AWS, Azure

Request Demo

Enterprise Ready

Trace Latency

80%

PII Redacted

65%

Audit Time

90%

Agents Traced

70%

Trace IngestionActive

Audit ReportsReady in <60s

PII Status100% Redacted

LLM-as-Judge

Also known as: LLM as a judge, LLM-as-a-Judge, LLM judge

Definition