Tag: llm

All the articles with the tag "llm".

Setting Logits to Negative Infinity: How LLMs Actually Output JSON

Structured outputs aren't a validation layer; they're a decoding-time intervention. How logit masking actually works, why token boundaries make it hard, and why reordering one field in your Pydantic schema can move accuracy by 90 points.

Published: 11 May, 2026
· llm / decoding / structured-outputs
LLMs playing Just One: Why Same-Model LLM Ensembles Mode-Collapse

Four Claude Haiku instances asked independently for a clue for 'toast' all reply 'bread'. Four Sonnets do it more often. Four Opuses do it even more often. I built a tiny benchmark using the board game Just One to measure when LLM ensembles collapse and what makes them stop. The mixed-family ensemble + anti-correlation prompt hits 3.25× the single-model baseline.

Published: 22 Apr, 2026
· llm / evals / ensembles
Why Streaming LLMs Need Attention Sinks

A walkthrough of attention sinks: what they are, why softmax produces them by accident, why naive sliding-window inference collapses without them, and how a four-token reservation lets streaming inference run to four million tokens with no quality loss.

Published: 12 Nov, 2025
· llm / attention / inference
How to Mitigate the Lost-in-the-Middle Effect in LLMs

A look at why long contexts quietly break LLMs, why important information is easier to use at the boundaries than in the middle, and why agents that periodically restate their goals at the end of the context often work better.

Published: 15 Aug, 2025
· llm / context-engineering / agents