Method — LLM-as-a-Judge

Definition, scope boundary, and structural model.

Definition

LLM-as-a-Judge describes the structured use of large language models as evaluative mechanisms for assessing outputs, responses, behaviors, or artifacts against defined evaluation criteria.

It establishes a framework for model-based assessment without prescribing benchmark systems, implementation mechanisms, or commercial evaluation services.

Model Classification

The LLM-as-a-Judge model is structured as a descriptive and analytical reference model.

It provides a framework for examining model-based evaluation relationships between evaluator models, assessment targets, and defined criteria without defining operational procedures, certification structures, or commercial assessment services.

Scope Boundary

Included

Definition of model-based evaluation conditions
Assessment of outputs against defined evaluation criteria
Use of language models as evaluator mechanisms
Comparative scoring or classification of assessment targets
Structural mapping of evaluator-target relationships

Excluded

Vendor rankings or product reviews
Legal advice or regulatory certification
Implementation of benchmark systems or evaluation tools
Operational guidance for system deployment
Commercial testing or assessment services

Structural Phase Model

Phase 1 — Criteria Definition

Evaluation criteria, rubrics, or assessment conditions are defined within the evaluation context.

Phase 2 — Target Presentation

Outputs, responses, behaviors, or artifacts are presented for model-based assessment.

Phase 3 — Model-Based Assessment

A language model evaluates the assessment target relative to defined criteria.

Phase 4 — Judgment Boundary

The system separates assessed outputs from outputs outside established evaluation scope.

Transferability

The LLM-as-a-Judge model is not limited to a specific vendor, benchmark, or implementation architecture.

It can be applied across evaluation settings involving text outputs, agent responses, generated artifacts, classifications, or comparative assessments.

The model remains consistent by focusing on structural relationships between evaluator models, assessment targets, and evaluation criteria.