Method — LLM-as-a-Judge

Definition, scope boundary, and structural model.

Definition

LLM-as-a-Judge describes the structured use of large language models as evaluative mechanisms for assessing outputs, responses, behaviors, or artifacts against defined evaluation criteria.

It establishes a framework for model-based assessment without prescribing benchmark systems, implementation mechanisms, or commercial evaluation services.

Model Classification

The LLM-as-a-Judge model is structured as a descriptive and analytical reference model.

It provides a framework for examining model-based evaluation relationships between evaluator models, assessment targets, and defined criteria without defining operational procedures, certification structures, or commercial assessment services.

Scope Boundary

Included

Definition of model-based evaluation conditions

Assessment of outputs against defined evaluation criteria

Use of language models as evaluator mechanisms

Comparative scoring or classification of assessment targets

Structural mapping of evaluator-target relationships

Excluded

Vendor rankings or product reviews

Legal advice or regulatory certification

Implementation of benchmark systems or evaluation tools

Operational guidance for system deployment

Commercial testing or assessment services

Structural Phase Model

Phase 1 — Criteria Definition

Evaluation criteria, rubrics, or assessment conditions are defined within the evaluation context.

Phase 2 — Target Presentation

Outputs, responses, behaviors, or artifacts are presented for model-based assessment.

Phase 3 — Model-Based Assessment

A language model evaluates the assessment target relative to defined criteria.

Phase 4 — Judgment Boundary

The system separates assessed outputs from outputs outside established evaluation scope.

Transferability

The LLM-as-a-Judge model is not limited to a specific vendor, benchmark, or implementation architecture.

It can be applied across evaluation settings involving text outputs, agent responses, generated artifacts, classifications, or comparative assessments.

The model remains consistent by focusing on structural relationships between evaluator models, assessment targets, and evaluation criteria.