Georg's Blog

Technology, leadership, and the digital frontier

Georg Zoeller
on Arxiv

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

Humans have a habit if trying to formalize rules and best practices, package them in courses and teach them as methodologies.

Unfortunately, during times of rapid scientific and technological development, this habit can backfire as much of the frontier wisdom is little more than myth, legends and vibes.

There's a few reasons why injecting rules into the context window of an exam tends to backfire, some intuitive others counterintuitive.

For one, operating on a code base with strict coding standards, the LLM is even without instruction likely to generate code matching those existing standings.

For another despite sporting context thing of millions of tokens, woman tends to drop rapidly with context size. On almost any project I get involved with we end up pruning system prompts which have grown to the size of small books in an attempt to cover each and every mistake and edge case made by the llm down to a few paragraphs at most, often resulting in equal war significantly increased prompt adherence, dramatically cutting operating costs.

This is part of the reason why I'm deeply sceptical of the proposition that everyone rush into the frontier lest they get left behind and find themselves out of a job prospects.

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md, by either manually or automatically generating them. Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks. In this work, we study this question and evaluate coding agents’ task completion performance in two complementary settings: established SWE-bench tasks from popular repositories, with LLM-generated context files following agent-developer recommendations, and a novel collection of issues from repositories containing developer-committed context files. Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%. Behaviorally, both LLM-generated and developer-provided context files encourage broader exploration (e.g., more thorough testing and file traversal), and coding agents tend to respect their instructions. Ultimately, we conclude that unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements.

arxiv.org