arxiv.org
Soft Contamination Means Benchmarks Test Shallow Generalization↗
Study number couple-of-hundred by now showing that generative AI models are lossy storage and that benchmarks primarily measure storage and retrieval performance. Which is nothing new, 2023’s PreTraining on the Test Set is All you Need 1 already explained this succinctly enough...
