Tag: calibration

All the articles with the tag "calibration".

Breaking Down Agent Evals (Part 1B): Eval Calibration

A primer on eval calibration: what it means for your scoring pipeline to be trustworthy, the four levels (rubric, human-to-human, LLM-to-human, LLM-to-LLM), the common biases that turn a good-looking dashboard into a fiction, and how to read Cohen's kappa without the textbook. Built around small interactive applets.

Published: 15 Mar, 2026
· agents / evals / calibration