Where the vocabulary moved, what the products shipped against it, and the layer that’s still uncovered.

Six months ago, “cross-repo context” lived in dev.to posts and HN comments. Today it lives in vendor product documentation, in CPO interviews, in feature pages. It is in Warp’s Codebase Context docs. It is on Augment Code’s product pages as a feature called the Context Engine MCP. The phrase JetBrains coined in March to describe what AI agents produce without structural understanding, “shadow tech debt,” was the framing of their Junie CLI launch. And in Microsoft’s December 2025 look-ahead to AI in 2026, GitHub’s chief product officer Mario Rodriguez named the category by a different name, “repository intelligence,” and framed it as the new competitive advantage for AI-assisted software.

The vocabulary has crossed a threshold. In roughly sixty days it has moved from practitioner discourse into vendor product surfaces, which is faster than most category terms travel. The phrase used to belong to people who were losing sleep over the problem. Now it also belongs to people selling against it.

That distinction matters because vocabulary leads product. The language a vendor uses in docs encodes a promise about what the product delivers. When several vendors describe their offerings using the same phrase, the readers of those docs start to expect the phrase to mean something specific. The category forms in language before it forms in products. Once the phrase is in CPO interviews and product documentation, the window for defining what it means starts to close.

This post is about the gap between the two. The vocabulary moved. The products moved, but mostly not in the direction the vocabulary points. The thing the language most directly describes, a queryable cross-repo dependency graph derived from source manifests, exists. It lives in Meta’s tribal knowledge engine, in Mabl’s 850-line hand-maintained coordination graph, and in Riftmap. It does not yet live in the product that most readers reach for when they read “cross-repo context” in their AI tool’s docs.

What the products actually ship

The landscape past the vocabulary line is finer-grained than I thought before doing the research. Four distinct things sit under the same phrase. Each is real. Each answers a different question. Only one of them is the thing the language most directly promises, and that one is the one no consumer vendor has shipped.

The workspace pattern. This is the largest cohort by vendor count. Warp, Cursor, Cline, Continue, Aider, Zed, Windsurf, GitHub Copilot, and Claude Code all sit here. They give the agent access to more files. Warp’s docs are explicit: “During cross-repo tasks, Warp’s Agents have access to the file paths of all indexed repos.” Cursor’s recommended pattern is the multi-root workspace; their community forum has open feature requests for cross-repo agent communication from December 2025 and February 2026, both still open. GitHub Copilot’s most relevant recent shipping is picking the repository when assigning issues to Copilot, which lets an agent operate in a different repo per assignment. That is the workspace pattern at organisation scale. None of these is a dependency graph. None resolves an artifact to its consumers. The agent sees more files. It does not see how they relate.

The semantic retrieval index. This is Augment Code’s Context Engine MCP. Augment indexes code across multiple repositories, runs the index locally, and exposes a retrieval interface (codebase-retrieval, codebase-search) that other agents can call via MCP. Augment publishes benchmark numbers showing real quality gains for downstream tools that wire it in. This is genuine substrate work. It answers questions of the form “find me code relevant to authentication” very well. The retrieval is semantic, with embeddings over code chunks, which means the index is excellent for navigation and comprehension. It is not, by design, a parser-derived index of how a Terraform module gets consumed by which Helm chart and deployed by which CI pipeline.

The code-symbol graph. This is Sourcegraph Cody. Cody sits on a code intelligence graph that understands symbols, including function definitions, references, and call paths, and supports queries across multiple repositories. “Where is this function used?” gets a precise answer that includes callers in repositories the user has not opened. This is closest to what the word “graph” implies, but it is a code-symbol graph, not an infrastructure-dependency graph. It is also enterprise-only since Sourcegraph discontinued the Free and Pro tiers in July 2025. Pricing starts at $59 per user per month, and the chat UI caps @-mention selection at ten repositories per query. Real product, with the scope its product design implies: code symbols, not infrastructure manifests.

The JS/TS project graph. Nx Synthetic Monorepos, announced in their 2026 roadmap, is the closest publicly-shipped equivalent to the architectural bet under Riftmap. Nx parses cross-repo dependencies into a workspace graph, runs a coordinator agent that walks the graph and spawns per-repo workers, and ships impact analysis and conformance rules across boundaries. This is the same shape of architecture. The ecosystem coverage is JavaScript and TypeScript first, with plugin-based extension to other languages, and the orientation is build orchestration and CI rather than infrastructure-as-code blast radius. Nx independently arriving at the same answer is, if anything, the strongest single piece of evidence that the substrate is the right architecture. Their product extends the JS/TS monorepo ecosystem across repository boundaries. It does not parse Dockerfile FROM lines, Terraform module sources, or Helm chart dependencies across twelve manifest ecosystems.

Each of these does something real. None of them is the artifact the vocabulary most directly describes.

The layer that hasn’t moved

The thing the vocabulary points to most directly is a parser-derived, queryable dependency graph across the manifests that govern deployment. That means Terraform module sources and registries, Dockerfile FROM statements across registries, Helm chart dependencies, Kubernetes manifest references, Kustomize overlays, ArgoCD application sources, Go modules, npm packages, Python requirements, Ansible role dependencies, GitLab CI includes, and GitHub Actions workflow calls. Twelve ecosystems, give or take, that together encode how an organisation’s software actually runs in production.

The question this layer answers is structurally distinct from the questions the four above answer. “Where is this function called?” lives in code symbols. “Find me code relevant to authentication” lives in semantic embeddings. “Which packages need to rebuild?” lives in the JS/TS workspace graph. “If I bump python:3.11-slim to python:3.12-slim, which services pull that base image, which Helm charts deploy those services, which CI pipelines need to retrigger, and which downstream Terraform modules reference the resulting images?” lives in the infrastructure-manifest graph. None of the four above will give you that answer, because none of them parses those files in that way.

Meta’s tribal knowledge engine post named the cost of not having this layer at runtime. The “what depends on X” question costs roughly 6,000 tokens as a multi-file exploration and roughly 200 tokens as a single graph lookup. That is a thirty-times architecture-level efficiency gap on one of the most common planning questions an agent asks. The number is durable. It is not a model-quality problem and it does not close as context windows grow. It is the cost of reconstructing structure from grep every session instead of querying an index that already knows.

Meta built that index in-house. Mabl built theirs by hand, with 850 lines of registry maintained by a platform team across 79 repositories. Harness named it in their April essay on Source Context Management, framing it as “the blast radius of every change before it merges.” Harness sells CD, so the post is content marketing rather than competing product. The architectural conclusion these three teams reached independently, that the dependency graph should be parser-derived, queryable as a primitive, and treated as runtime infrastructure rather than per-session context, is the conclusion Riftmap shipped a productised version of. The substrate the vocabulary now describes in product docs is the same one Mabl, Meta, and Harness named in engineering blogs. It is the one that has not yet made its way into any consumer AI coding product.

The CPO and the feature request

The clearest signal of where this category is going might be the asymmetry between what GitHub says and what GitHub ships.

In early December 2025, GitHub’s chief product officer Mario Rodriguez named “repository intelligence” as the inflection point for 2026 in Microsoft’s annual AI trends piece. The framing is direct: AI that understands code at the level of relationships and history rather than just lines of code. In Rodriguez’s words, repository intelligence “will become a competitive advantage by providing the structure and context for smarter, more reliable AI.” The phrasing is good. The category is real.

In March 2026, a GitHub user filed Discussion #189213, titled Feature Request: Cross-Repository Context for Copilot (Web ↔ Microservice), describing the exact problem the CPO is naming. Web app consumes microservice; service API changes; both repos need coordinated updates; Copilot should be able to see across the boundary. Three upvotes. Two months later, the discussion is still marked Unanswered. It is one specific feature request; I do not want to read too much into a single thread. But the asymmetry is concrete. The CPO of the dominant code platform names the category. The same platform’s flagship coding product has not yet answered the feature request that defines it. That is the clearest signal I have found that the language is moving faster than the products.

Cursor’s situation is similar in shape. The product is excellent within a single repo. The community forum has cross-repo feature requests from December 2025 and February 2026, both still open, both with the same pattern of users describing dependency relationships across repositories that the agent cannot see.

This is not an indictment of either vendor. It is a description of where the leading edge of a category sits when the language has crossed into mainstream product surfaces but the substrate has not. Vendors who use the phrase get credit for the diagnosis. Vendors who ship the layer underneath get credit for the category. The two are different things and right now they are owned by different people.

The window where that authorship gets decided is finite. Once a major vendor ships a parser-derived cross-repo dependency graph as a first-class feature, the category gets named after their product and everyone else becomes a comparison. That has not happened yet. It is the most interesting bet to be making over the next two quarters.

Where Riftmap sits

Riftmap is the productised version of the layer Meta, Mabl, and Harness have separately described. Twelve parser ecosystems today (Terraform, Dockerfile, Helm, Kubernetes, Kustomize, ArgoCD, Go, npm, Python, Ansible, GitLab CI, GitHub Actions), auto-discovered from a single read-only token, with more on the way. The roadmap extends the parser surface into contract layers (OpenAPI, protobuf, GraphQL) and schema registries, because the manifest plane and the contract plane are both part of the same dependency graph at runtime. No registry to maintain. No catalog YAML. The graph re-runs on every push; freshness is by construction rather than by process. Three endpoints (/repositories/lookup, /repositories/{id}/context, /repositories/{id}/impact) are designed to be called by an agent during planning rather than read by a human in a dashboard.

The position against the four above is not that they are wrong. Augment’s semantic retrieval is excellent at what it does. Cody’s code-symbol graph is excellent at what it does. Nx’s synthetic monorepo is the right answer for organisations whose primary stack is JS/TS. The position is that none of them is the parser-derived infrastructure-manifest graph, because that is not what they were designed to be. The reader who needs that graph today either builds it like Mabl, runs it like Meta, or queries Riftmap.

The bear case

If a major model provider (Anthropic, OpenAI, GitHub) ships a platform-level MCP for cross-repo context as a standard feature, the API layer of this category gets commoditised fast. The defence against that scenario is the part that is hardest to copy: the breadth of the parser surface (twelve ecosystems is more work than it looks, particularly around private registries and transitive resolution), and the freshness contract (the index is parser-derived and re-runs on every push, which is structurally different from any context layer that depends on LLM summarisation). A platform MCP for cross-repo context would still need someone to actually parse the manifests. That work does not become easier because the agent has a new channel to call into.

The bull case

The bull case is the architectural choice underneath all of this, and it is the one I keep coming back to. Declarations are deterministic. A Terraform module source, a Dockerfile FROM statement, a Helm chart dependency, a GitHub Actions uses: reference: these are file-based facts. A parser produces the same graph for the same input on every run. There is no temperature, no top-p, no retry-and-hope-it-converges. The output is binary. A dependency either exists in the graph or it does not.

That has three consequences that compound over time, and they are why I think the deterministic-first architecture wins this category rather than loses it.

The first is testability. Every parser case is a fixture. “Given this Dockerfile with this ARG default, the graph should contain edge X→Y.” You can write that test, run it in a millisecond, and pin the behaviour for the lifetime of the codebase. You cannot write that test against an LLM-extracted dependency graph; the same input produces different outputs across runs, model versions, and prompt revisions. The deterministic foundation makes the system improvable in a way that probabilistic extraction is not. Every false positive becomes a permanent test case. Every edge case becomes a permanent regression check. Improvement compounds.

The second is cost. A parser pass over a repository is microseconds and free. An LLM pass over the same repository is hundreds of milliseconds and bills per token. At organisation scale, with continuous re-scanning on every push, the cost gap is several orders of magnitude. The product that built its dependency layer on a deterministic foundation can offer it cheaply forever. The product that wired LLM extraction into the same loop pays a continuous tax that grows linearly with the size of the codebase and the frequency of change.

The third is what sits on top. The graph being deterministic does not mean AI has no role. It means AI sits on top of it for the things AI is genuinely good at: ranking, summarising, explaining, planning, prioritising. The agent calls /repositories/{id}/impact to get the deterministic blast radius, then uses the model to reason about which of the affected repos matter most, which changes are highest-risk, and how to sequence the work. The base layer is the part that needs to be correct. The reasoning layer is the part that needs to be flexible. The bet is that the right architecture for AI coding agents at scale is deterministic at the base and probabilistic at the top, not the other way around. I argued this case in more detail in the blast radius post.

The thing I most want to be wrong about is the timing. If the window is wider than two quarters, the bet is easier. If it is narrower, the next post should already be live.

Closing

The vocabulary moved. The CPO of the largest code platform on the planet has named the category. Several vendors use the phrase in their product surfaces. JetBrains coined a phrase for what AI ships without it. The language now describes a thing more precisely than most of the products that use the language actually deliver.

The layer the language describes is the parser-derived cross-repo dependency graph. It exists in Meta’s in-house system, in Mabl’s hand-maintained registry, and in Riftmap. It does not yet exist in the AI coding product that most readers of this post will reach for tomorrow. The window for that to change is open. The next two quarters decide who ends up owning what the language now promises.

Riftmap is the bet that whoever ships the deterministic substrate during this window, while the language is still being formed and before a major vendor consolidates the category, is the one whose product the language ends up describing.


If you’re running AI coding agents across more than a handful of repositories and the language in their docs is starting to feel ahead of what they deliver, the layer underneath is the gap. You can build it yourself, the way Mabl did. Or you can start a free scan and let the parsers find it.


Sources referenced

  • Warp, Codebase Contextdocs.warp.dev
  • Susanna Ray, Microsoft, What’s next in AI: 7 trends to watch in 2026news.microsoft.com, December 8, 2025 (Mario Rodriguez on repository intelligence, section 6)
  • Augment Code, Context Engine MCPaugmentcode.com
  • Sourcegraph, Cody documentationsourcegraph.com
  • Nx, Synthetic Monoreposnx.dev
  • Nx, 2026 Roadmap: Expanding Agent Autonomynx.dev/blog, February 4, 2026
  • prateek-odev, Feature Request: Cross-Repository Context for Copilot (Web ↔ Microservice)GitHub Discussion #189213, March 2026
  • Cursor Community Forum, Working on multiple repositoriesforum.cursor.com, December 2025
  • Cursor Community Forum, Feature Request: Cross-Window / Cross-Repo Agent Communicationforum.cursor.com, February 2026
  • Engineering at Meta, How Meta used AI to map tribal knowledge in large-scale data pipelinesengineering.fb.com, April 6, 2026
  • Geoff Cooney, mabl, How We Built a System for AI Agents to Ship Real Code Across 75+ Reposmabl.com, April 8, 2026
  • Ompragash Viswanathan, Harness, Your Repo Is a Knowledge Graph. You Just Don’t Query It Yet.harness.io/blog, April 1, 2026
  • Riftmap, AI Doesn’t Understand Blast Radius: Why Change Failure Rates Are Up 30%riftmap.dev/blog, April 19, 2026
  • Riftmap, You don’t need a virtual monorepo. You need a graph.riftmap.dev/blog, May 12, 2026
  • Riftmap, AI coding agents need cross-repo contextriftmap.dev/blog, May 12, 2026
  • Riftmap, Meta needed 50+ AI agents to map their tribal knowledgeriftmap.dev/blog, May 8, 2026

Appendix: structured summary

Claim: Vocabulary describing cross-repo dependency intelligence has crossed into vendor product documentation, but the layer the vocabulary most directly describes, a parser-derived cross-repo dependency graph across infrastructure manifests, has not yet shipped in any consumer-facing AI coding product. Four distinct things now sit under the same phrase: the workspace pattern (Warp, Cursor, Cline, Continue, Aider, Zed, Windsurf, GitHub Copilot, Claude Code), semantic retrieval (Augment Code’s Context Engine MCP), code-symbol graph (Sourcegraph Cody), and JS/TS project graph (Nx Synthetic Monorepos). None of the four covers the infrastructure-manifest layer where deployment-time blast radius lives.

Evidence:

  • Warp Codebase Context docs, Augment Context Engine MCP product pages, Sourcegraph Cody @-mention semantics, and Nx Synthetic Monorepos in their 2026 roadmap all use the cross-repo vocabulary explicitly.
  • Mario Rodriguez, GitHub’s chief product officer, named “repository intelligence” as the inflection point for 2026 AI in Microsoft’s annual trends piece (December 2025).
  • GitHub Copilot Discussion #189213 (Cross-Repository Context for Copilot, March 2026) remained Unanswered two months after filing.
  • Cursor community forum has cross-repo feature requests open from December 2025 and February 2026.
  • Meta’s published number: graph lookup for “what depends on X” costs ~200 tokens; multi-file exploration costs ~6,000. A ~30x architectural efficiency gap.
  • Mabl’s 850-line Repo Coordination Graph spans 79+ repositories and is maintained by hand.

Architectural takeaway: The vocabulary leads the products by approximately two quarters. The kind of substrate the language most directly describes, a parser-derived infrastructure-manifest dependency graph across Terraform, Docker, Helm, Kubernetes, CI manifests, and other ecosystems, exists only in in-house systems (Meta), hand-maintained registries (Mabl), and Riftmap. The window for owning the productised definition of “cross-repo context” closes when a major vendor ships that layer.

Why deterministic-first wins: Declarations are deterministic. A parser produces the same graph for the same input on every run, which makes the system testable as fixtures, cheap to operate at organisation scale, and reliable to wire into agent planning loops. AI sits on top of the graph for ranking, summarising, and prioritising. The base layer is the part that needs to be correct; the reasoning layer is the part that needs to be flexible. The right architecture for AI coding agents at scale is deterministic at the base and probabilistic at the top.

Audience: Platform engineers, DevOps leads, and engineering managers running AI coding agents across multi-repo organisations, particularly those whose dependency graph crosses infrastructure-as-code boundaries.