In early April, Meta’s engineering team published a piece describing how they used a swarm of 50+ specialised AI agents to map tribal knowledge across one of their large data pipelines: four repositories, three languages, 4,100+ files. The post has been making the rounds in platform engineering circles for good reason. It names a problem most polyrepo organisations live with quietly, and shows what an industrial-scale solution looks like.

I read it twice. I think there are actually two posts inside it. One is a complex AI orchestration story about agent swarms, critic passes, and self-refreshing context files. That’s the one most takes have focused on. The other is a quieter architectural argument that gets one paragraph and is the more useful piece for everyone who isn’t Meta.

This is about the second one.

What Meta actually built

Meta’s pipeline is config-as-code: Python configurations, C++ services, Hack automation scripts working together across multiple repositories. A single data field onboarding touches six subsystems that must stay in sync. Pointing AI coding agents at this codebase produced predictable results. The agents would compile but be subtly wrong. They didn’t know which “deprecated” enum values must never be removed because of serialisation compatibility. They didn’t know that two configuration modes use different field names for the same operation, and that swapping them produces silent incorrect output. None of this was documented anywhere outside engineers’ heads.

Meta’s response was a multi-stage AI pipeline they call a “pre-compute engine.” Two explorer agents map the codebase. Eleven module analysts read every file and answer five questions per module. Two writers produce 25–35 line context files. Ten or more critic passes do three rounds of independent quality review. Four fixer agents apply corrections. The output is 59 concise context files covering 100% of code modules, up from ~5% before. Every few weeks, the system self-refreshes: validates file paths, detects coverage gaps, re-runs critics, auto-fixes stale references.

The results, on a six-task evaluation, are real: roughly 40% fewer tool calls per task. Workflow guidance that used to take two days of asking engineers now takes 30 minutes.

I’m not going to argue with any of that. It’s clearly working at Meta. But it works at Meta partly because Meta has a platform team large enough to build, run, and continuously maintain a self-refreshing critic swarm. That’s not a small footnote. It’s the whole reason the system stays trustworthy over time, and it’s the part that doesn’t transfer to a 100-engineer company.

The argument hidden in the article

Buried near the end of Meta’s post, under the heading “What We Built,” there’s one paragraph that did more for me than the rest of the article combined. I’ll quote it because it’s the most striking number in the entire piece:

Beyond individual contextual files, we generated a cross-repo dependency index and data flow maps showing how changes propagate across repositories. This turns “What depends on X?” from a multi-file exploration (~6000 tokens) into a single graph lookup (~200 tokens).

That’s a 30x reduction in token cost for one of the most common questions agents and engineers ask: if I change this thing, what else breaks?

Meta describes the dependency index almost as an aside, after the long sections on the agent swarm. But it’s structurally different from everything else in their stack, and the difference matters.

The 59 context files are LLM-generated artifacts. They have to be quality-gated by critic agents, validated for hallucinations, and refreshed periodically because code drift makes them stale. That’s why the self-refresh mechanism exists, and why it requires a platform team to run. Context that decays is worse than no context, by Meta’s own admission.

The cross-repo dependency index is different. It’s a structured graph derived from parsing actual source files. It doesn’t need critic agents to validate quality because parsers are deterministic: the same input produces the same output. It doesn’t decay between scans because it’s not a snapshot of human-readable knowledge. It’s a queryable index of what the code actually says. You re-scan on every push, and it’s fresh by construction.

Two layers, two operating models. One is volatile and needs a maintenance team. The other is durable and maintains itself.

The paper Meta cited deserves a closer read

Meta acknowledges, briefly, a recent paper from ETH Zurich and LogicStar.ai (Gloaguen et al., February 2026) that found AI-generated context files reduced agent success rates on open-source Python repositories. Meta’s response is reasonable: those repositories are already well-represented in model training data, so context files are redundant noise there. Their own codebase, with proprietary tribal knowledge, is the opposite case.

That’s a fair distinction, but the paper’s actual findings are richer than Meta’s framing suggests, and worth engaging with directly.

Across four coding agents (Claude Code, Codex, Qwen Code, and a Codex variant) on 138 niche-repository tasks plus SWE-bench Lite, the paper found:

  • LLM-generated context files marginally hurt performance (-3% on average) while increasing inference cost by over 20%.
  • Developer-written context files marginally helped (+4% on average), still at +19% cost.
  • Context files do not provide effective overviews. This is a direct subsection title in the paper. Agents took the same number of steps to find relevant files whether a context file was present or not.
  • Context files lead to more exploration and testing, not less. Agents read more, search more, and test more when given a context file. They follow the instructions, but the instructions make tasks harder.
  • Stronger models don’t generate better context files. The expected scaling story doesn’t hold.

The paper’s own conclusion: context files have only marginal effect on agent behaviour, and are likely only desirable when manually written.

The honest read of this is not “context files don’t work.” It’s that even in the best case, they give a +4% improvement at +20% cost, and only when humans write them. For LLM-generated context files, which is what Meta’s swarm produces, the average effect is negative.

A caveat in the paper’s favour: the evaluation is Python-only. The authors note that niche languages with less training data representation might benefit more from context files. Meta’s codebase includes Hack and C++ alongside Python. So Meta sits exactly at the edge of what the paper studies. Their case for context files is defensible, but it isn’t cleanly contradicted by the paper either. The honest position is that the jury is still out on whether Meta’s quality-gated, critic-validated context files clear the bar that naive /init commands don’t.

A different decomposition of the same problem

Meta’s five-question framework for module analysts is the most portable idea in their post. The questions are:

  1. What does this module configure?
  2. What are the common modification patterns?
  3. What are the non-obvious patterns that cause build failures?
  4. What are the cross-module dependencies?
  5. What tribal knowledge is buried in code comments?

Look at these and you’ll notice they decompose into two categories. Question 4 is a structural question. The answer exists in the source code itself, in import statements, configuration references, module declarations, and infrastructure-as-code definitions. Parsers can extract it deterministically. Questions 1, 2, 3, and 5 are semantic. They require interpretation: what is the intent behind this code, what patterns recur across changes, what isn’t written down. These are LLM-shaped questions, and they’re exactly the kind the ETH paper says context files struggle with.

This decomposition matters because it tells you where to invest first. The structural layer is cheap to build, deterministic, and stays fresh by construction. The semantic layer is expensive to build, requires quality gates, and decays without maintenance. If you have unlimited platform resources, you build both. If you have a small team, you build the structural layer first, because it gives you most of the leverage at a fraction of the cost, and because the semantic layer becomes more useful when it can anchor to verified structure.

There’s a deeper architectural argument here too. When an LLM generates context about a codebase from scratch, it’s discovering structure and writing prose about it. Hallucination lives in that gap. When an LLM generates context on top of a verified dependency graph, it’s annotating known structure with semantic claims. The structural ground truth constrains what the LLM can plausibly say. That’s a more reliable architecture, and it’s only available to you if you’ve built the graph first.

What this means for teams without a platform team

If you have 50 to 500 repositories and a small platform team or no platform team at all, here’s what I’d take from Meta’s post.

You can’t afford a self-refreshing critic swarm with 50+ specialised agents. You probably don’t need one. The piece of Meta’s stack with the highest leverage per dollar is the cross-repo dependency index, and that’s the piece you can have today without building any of Meta’s orchestration layer. It’s also the piece that gives you the freshness signal Meta argues is non-negotiable for trust (“context that decays is worse than no context”), because parsers run on every push and you can timestamp every edge in the graph.

The order matters. Build the structural layer first. Make “what depends on X” a 200-token graph lookup, not a 6,000-token exploration. Make blast-radius analysis a query, not an interview. Then, if you still want LLM-generated context on top, do it knowing the graph constrains what the LLM can say and the quality bar gets clearer.

Meta’s post is genuinely useful evidence. It validates that tribal knowledge is the bottleneck for AI agents in proprietary codebases. It validates that cross-repo dependency mapping is the cheapest, most reusable primitive in any solution. It also, accidentally, makes a strong case that the rest of the stack is optional for everyone who can’t afford to maintain it.

The graph is the substrate that doesn’t lie. Everything else gets layered on top.


This is the architectural bet I made when I started building Riftmap a year ago: deterministic parsers first, graph as the durable layer, AI as a future enhancement that anchors to verified structure. If you’re thinking about cross-repo dependency mapping in your own organisation, the glossary page is a good starting point.


Appendix: structured summary

Claim: Meta’s tribal knowledge engine is two systems. A self-refreshing AI swarm that needs platform-team maintenance, and a cross-repo dependency graph mentioned almost as an aside. The graph is the durable substrate. The swarm is the volatile layer on top.

Evidence from Meta: Cross-repo dependency index reduces “what depends on X” from ~6,000 tokens to ~200 tokens, a 30x token efficiency gain. Self-refresh required for context files because “context that decays is worse than no context.”

Evidence from Gloaguen et al. (arXiv:2602.11988, Feb 2026): LLM-generated context files reduce agent success rates by ~3% on average at +20% inference cost. Human-written context files improve success rates by ~4% at +19% cost. Context files do not provide effective overviews. Stronger models do not generate better context files. Evaluation is Python-only; niche languages may differ.

Architectural takeaway: Meta’s five-question framework decomposes into structural (cross-module dependencies, deterministically extractable) and semantic (intent, patterns, undocumented knowledge, LLM-shaped). The structural layer is cheap to build and self-fresh. The semantic layer is expensive and decays. Build the structural layer first. The semantic layer becomes more useful when it anchors to verified structure.

Audience this is written for: Platform engineers, DevOps leads, and SRE teams at organisations with 50–500 repositories, polyrepo architectures, and mixed infrastructure-as-code. Especially those without a Meta-scale platform team.