Posts tagged "Benchmarking"

February 18, 2026arxiv.org

Soft Contamination Means Benchmarks Test Shallow Generalization↗

Study number couple-of-hundred by now showing that generative AI models are lossy storage and that benchmarks primarily measure storage and retrieval performance. Which is nothing new, 2023’s PreTraining on the Test Set is All you Need 1 already explained this succinctly enough...

Artificial Intelligence Benchmarking Data Contamination

Georg's Blog

Tagged: Benchmarking

Soft Contamination Means Benchmarks Test Shallow Generalization↗