Method — LLM-as-a-Judge
Definition, scope boundary, and structural model.
Definition
LLM-as-a-Judge describes the structured use of large language models as evaluative mechanisms for assessing outputs, responses, behaviors, or artifacts against defined evaluation criteria.
It establishes a framework for model-based assessment without prescribing benchmark systems, implementation mechanisms, or commercial evaluation services.
Model Classification
The LLM-as-a-Judge model is structured as a descriptive and analytical reference model.
It provides a framework for examining model-based evaluation relationships between evaluator models, assessment targets, and defined criteria without defining operational procedures, certification structures, or commercial assessment services.
Scope Boundary
Included
Excluded
Structural Phase Model
Phase 1 — Criteria Definition
Evaluation criteria, rubrics, or assessment conditions are defined within the evaluation context.
Phase 2 — Target Presentation
Outputs, responses, behaviors, or artifacts are presented for model-based assessment.
Phase 3 — Model-Based Assessment
A language model evaluates the assessment target relative to defined criteria.
Phase 4 — Judgment Boundary
The system separates assessed outputs from outputs outside established evaluation scope.
Transferability
The LLM-as-a-Judge model is not limited to a specific vendor, benchmark, or implementation architecture.
It can be applied across evaluation settings involving text outputs, agent responses, generated artifacts, classifications, or comparative assessments.
The model remains consistent by focusing on structural relationships between evaluator models, assessment targets, and evaluation criteria.