How to tune Cursor’s codebase index for a large monorepo or a multi-repo workspace — and the one question no amount of tuning makes it answer.


Open a monorepo of any real size in Cursor and the first thing you meet is the indexing bar, and the first thing you learn is that it can take a while. A repository in the tens of thousands of files can sit there for a long time if you point Cursor at the root and walk away.

So you do the sensible thing. You read the docs, you write a .cursorignore, you open the package you actually work in instead of the whole tree, and the bar that took an age now takes a minute. @Codebase gets faster and sharper. This is real work, it is worth doing, and most of this post is how to do it well.

Then, one afternoon, you are about to change something shared. A base image half the org builds on. A module three Terraform stacks call. A contract package a dozen services import. And you ask the fast, well-tuned index the one question that actually matters before you press go. What breaks. And the answer comes back confident, and quick, and wrong in a way you cannot see.

Here is the claim this post runs on. You can tune Cursor’s index to cover a fifty-repo workspace and make @Codebase genuinely fast and useful, and it still will not tell you which repositories break when you change a shared module, because the index answers similarity and a declared dependency is not a similarity relationship. Everything up to that ceiling is worth doing, and I will spend most of the post doing it. The ceiling itself is real, and no amount of tuning moves it, because it was never a tuning problem.

There is a version of this post all over the internet right now, and most of it is good. The best one I have read walks multi-repo workspaces and per-service .cursorignore files carefully and calls the result a microservices graph explorer. I want to be fair to that framing, because the tuning it describes is correct and I am about to repeat a fair amount of it. But “the index spans all my services” and “the index maps how my services depend on each other” are two different claims, and the distance between them is the whole second half of this post. I have made the structural version of that argument before, and shown how to wire the missing graph into Claude Code and Cursor. This one starts somewhere more practical. It starts with your index actually being slow.

What Cursor’s index actually is

Cursor’s index is a semantic search index, and knowing that precisely is what makes the rest of this post make sense. When you open a workspace, Cursor splits your code into chunks along syntactic boundaries, runs each chunk through a custom embedding model to get a vector, and stores those vectors in a remote vector database built on Turbopuffer, keyed by an obfuscated path and line range. Your source is not kept server-side. Only the embeddings and masked metadata leave your machine, and the chunks are decrypted on the client when the agent needs them. A Merkle tree tracks which files changed so re-indexing only touches what moved, and the index syncs roughly every five minutes.

When you search, your query becomes a vector too, and Cursor returns the chunks whose vectors sit nearest yours. That is what @Codebase is underneath. Nearest-neighbour search over embeddings. The official description is exact about it: it returns the most semantically similar code, even when the matching chunk does not contain the words you searched for. Cursor’s own evaluation puts semantic search at around 12.5% more accurate than grep alone on large codebases, with the gain growing as the codebase grows, and that is a real result I have no interest in talking down.

In 2026 this got more capable, and it is worth being current about, because the workflow changed. You mostly do not type @Codebase any more. Cursor’s Agent picks the search strategy itself, combining a fast custom grep it calls Instant Grep with semantic search, and it can spawn an Explore subagent that runs many searches in parallel without bloating the main context. Cursor’s own line is that you do not choose the tool, you describe what you need and the Agent decides. This is a genuine improvement. But notice what did not change underneath the sophistication. Instant Grep matches strings. Semantic search matches meaning. Both are ways of finding text that resembles other text, and neither resolves a reference in one repository to the artifact another repository builds. The agent got much better at choosing which kind of resemblance to look for. It did not gain a new kind of edge to look over.

Tuning the index for a large or multi-repo codebase

The single lever that matters most on a large codebase is how much you ask Cursor to index, because both indexing cost and query noise scale with file count. Cursor’s own numbers are blunt about the cost: a large repository indexed naively can take hours to reach its first query, and on the largest repos the ninety-ninth-percentile time-to-first-query is over four hours before their teammate-index-sharing trick kicks in, with semantic search unavailable until the index is at least 80% built. One monorepo tutorial clocks an 8,800-file repo at seven to twelve hours from the root, cut to minutes with the right exclusions. Everything below is a way of pointing the index at the code that is load-bearing for your task and keeping everything else out of it. Do them in roughly this order.

Scope the index: open the package, not the root

The highest-leverage move on a monorepo is to not open the monorepo. Opening a package directory as the workspace root makes Cursor treat that directory as the whole codebase and index only within it, and on a large tree that is the difference between a minute and several hours.

# Indexes everything under the root, slowly
cursor /path/to/monorepo

# Indexes one package, fast
cursor /path/to/monorepo/packages/api

A shell alias per package you live in (alias ca='cursor /path/to/monorepo/packages/api') makes this frictionless. The cost is that references outside the package are no longer in the index, which is fine right up until your task actually crosses a package boundary, and then it is precisely the problem the second half of this post is about.

The two ignore files, and which one you actually want

Cursor has two ignore files and they do different jobs, and mixing them up is the most common configuration mistake I see. .cursorignore is a complete block: a file listed there is not indexed, not read, and not available even when you @-mention it, as though it did not exist. .cursorindexingignore is narrower: it keeps a file out of the index and out of search results, but the file stays readable, so you can still pull it in with @Files when you genuinely need it.

The practical rule the field has settled on is short. Reach for .cursorindexingignore first, because it is the reversible choice, and promote a path to .cursorignore only when the AI should never see it, like a secret.

# .cursorindexingignore
# Kept out of the index, still reachable with @Files.
tests/fixtures/
e2e/recordings/
packages/legacy/

# .cursorignore
# Invisible to indexing AND to all AI features.
.env*
secrets/

There is one detail in Cursor’s default indexing exclusions worth knowing, because it is quietly on-topic. Cursor already skips lockfiles by default, package-lock.json, yarn.lock, go.sum, and the rest. Those are the files that record the exact resolved version of every transitive dependency, which is to say the single most precise dependency information in your repository is the first thing the index throws away. It throws it away for sensible reasons, lockfiles are enormous and read as noise to a similarity search. Hold onto that, though, because it is a small preview of the larger point. The index is tuned to find code that reads like your question, and dependency records do not read like anything.

Multiple repositories: the multi-root workspace

For genuinely separate repositories, rather than packages in one tree, Cursor supports multi-root workspaces. A .code-workspace file lists several folder roots, Cursor indexes all of them, and Agent can reach across the set. A workable pattern for a large estate is to group repositories into a few workspace files by domain, payments with identity with the API gateway in one, catalogue with search with recommendations in another, and switch between them rather than opening everything at once. One caveat to know going in: features that assume a single git root, like worktrees, are disabled in a multi-root workspace.

Done well, this is genuinely useful. Open web-app, orders-service, and the shared contract repo together, and “how does the orders service validate a token” becomes one question instead of four context switches. This is the setup the multi-repo guides call a microservices graph explorer, and I understand why they reach for it. When every service sits in one index, @Codebase stops being a single-service lookup and starts answering questions that range across the whole set.

But the word doing too much work in “microservices graph explorer” is graph. The index now spans your services. It does not map them. It can surface the code in orders-service that mentions the contract, and the code in the contract repo that defines it, because both are text and both might resemble your query. What it cannot do is tell you that orders-service declares a dependency on that contract, and that billing-service, which you did not open, declares one too. Spanning a set of repositories and mapping the edges between them are different operations, and the index only performs the first one.

Project rules and hierarchical ignore

Project rules and hierarchical ignore are two smaller levers worth setting. Rules live in .cursor/rules/*.mdc now, one or more files that describe your architecture and conventions and load into context by relevance. If you are still carrying a root .cursorrules file, note that it is legacy and ignored in Agent mode, so migrating it is overdue. Rules are where you tell the agent how the monorepo fits together and which boundaries not to cross, and they do help. Notice, though, that they help by you writing the structure down, which makes them a hand-maintained description with the same decay problem every written map has: the description is only as current as the last engineer to update it, and it drifts from the code at exactly the speed the code changes.

Hierarchical Cursor Ignore, a setting rather than a file, lets Cursor walk up parent directories collecting .cursorignore files, so you can keep a global exclusion set at the root of a large monorepo and let each package layer its own on top. It is the right tool for keeping a big tree’s index configuration from repeating itself.

The freshness you’re actually working with

Cursor’s index is as current as your checkout, and no more, and it is worth being honest with yourself about what that means. The index reflects the files on your disk. It does not pull your colleagues’ commits, so a function a teammate added to the identity service this morning is not in your index until you pull and Cursor re-indexes the file. And even for your own work the index trails the actual files by a sync interval. The consequence is a clean line: the index is reliable for the stable shape of a system, and unreliable for the change that landed an hour ago in a repository you did not open. Both of those are outside it. Keep that boundary in mind, because it compounds with the one the next section is about.

The question a tuned index still can’t answer

A perfectly tuned Cursor index still cannot tell you which repositories break when you change a shared module, because it answers similarity and a cross-repo dependency is a declared edge, not a similarity relationship. Do all of it. Scope the index to the package, split the workspace by domain, get the file count into the low thousands, write the rules, keep it fresh. You now have a fast, lean, accurate semantic index across every repository you care about, and @Codebase is as good as it gets. Now ask it the question you opened this whole workflow to answer. You are about to bump a base image, retire a shared module, or change a contract. Which repositories break.

Here is what happens, concretely, and you can check me on it because the org is public. Take the Prometheus organisation: as of Riftmap’s May 2026 scan, fifty-six repositories, and when you parse the dependency edges between them, a hundred and eighty-eight cross-repository edges. A handful of repositories carry most of it, prometheus/common with twenty-five dependents, client_model with twenty-four, procfs with twenty-three, client_golang with twenty-two. (Those counts drift a little between scans, which is exactly why they are dated here; the live showcase always renders the current number.) Clone all fifty-six, open them in one immaculately tuned Cursor multi-root workspace, wait for the index to finish, and ask @Codebase: what depends on client_golang.

You will get chunks. Files that mention client_golang, code that resembles your query, the definition itself. What you will not get is the list of twenty-two repositories that declare a require on it, because that list is not a similarity relationship. It is twenty-two go.mod files, in twenty-two repositories, each with a line naming client_golang and a version. A go.mod require line and the client_golang source it points at share almost no tokens, nothing an embedding would place near the other. The edge is not latent in the text, waiting to be retrieved. It was declared once, in a manifest, and it is either parsed from that manifest or it is not found.

And it is finer than a repository count, which is the part that should give you pause before a change. The same scan finds prometheus/common required twice from prometheus/prometheus alone, two separate manifests in one repository, each naming common with its own version, and the graph tracks them as two references rather than folding them into a single repo-to-repo edge.

“We bumped the dependency” and “we bumped every reference to the dependency” are different statements, and Go monorepos are exactly where that difference hides. A parser surfaces each of those references as its own edge, because it read each manifest and knows how many there are. A similarity index has no concept of “the second go.mod that requires this”. It has chunks, ranked by resemblance, and resemblance was never going to count references in files it treats as prose.

This is not a tuning failure, and that distinction matters. There is no .cursorignore you could write, no workspace split, no rule, that turns a nearest-neighbour search into a dependency resolver. The index is answering the question it was built to answer, which is “what code resembles this”, and it answers it well. The question a breaking change asks is “what declares a dependency on this”, and that is a different question with a different data structure behind it. You cannot tune your way from one to the other, because they were never the same machine.

The graph that answers it is parsed, not tuned

The edges the index can’t retrieve are not missing, they are declared in the manifests you already have, in constructs built for exactly this. The contract in a go.mod require or a package.json dependency. The module in a Terraform source block. The image in a Dockerfile FROM. The chart in a Chart.yaml dependency. The template in a GitLab CI include:project or a reusable Actions uses:. I spent a whole series walking those one ecosystem at a time. They are deterministic. Parsed, not inferred. The dependency graph across your organisation already exists, declared and unassembled, in files a similarity index reads as text and a parser reads as edges. This is the difference between inferred context and a dependency graph, and tuning the index is orthogonal to it: a better index is a better answer to a different question.

Riftmap is that graph. It parses those manifests across your entire GitHub or GitLab organisation from one read-only token, resolves each reference to the repository that owns the artifact, and returns the answer the index can’t, every repository that depends on the thing you are about to change, with the version each one declared. It is the Prometheus graph above, for your own org. If you want that graph in front of the agent rather than in a browser tab, wiring it into Cursor and Claude Code is its own post, because the graph is useful to the engineer holding the pager first and the agent second. Either way it is the same move. Stop asking a tool built for resemblance to answer a question about dependency, and hand over a graph that was parsed for exactly that.

Tune Cursor’s index until it is perfect and you have made it excellent at finding the code that resembles your question. The repositories that break when you change a shared module were never going to resemble your question. They were a set of FROM lines and require blocks and source references, declared once across repositories you may not have even opened, waiting to be read. The index reads them as text. You need something that reads them as edges.

Questions teams ask

The same questions come up whenever I help someone tune this, so here they are, answered straight.

How do I speed up Cursor indexing on a large monorepo? Index less. A repository in the tens of thousands of files can take hours to index from the root, so the highest-leverage move is to open the package you work in as the workspace root rather than the whole tree. Add a .cursorignore for build output, dependencies, and generated files, aim for a few thousand indexed files rather than tens of thousands, and use .cursorindexingignore for large directories you still want to @-mention occasionally. Most slow-index complaints come down to indexing node_modules and vendored code nobody needed in the index in the first place.

Should I use .cursorignore or .cursorindexingignore? Use .cursorindexingignore unless the AI should never see the file at all. .cursorindexingignore keeps a file out of the index and out of search results but leaves it readable, so you can still pull it in with @Files, which makes it the reversible, lower-risk choice for large or noisy directories. Reserve .cursorignore for things that must be fully invisible, like secrets or files you never want referenced, because it blocks reading and @-mentioning as well as indexing.

Does Cursor’s @Codebase understand dependencies between repositories? Not in the sense a breaking change needs. @Codebase is nearest-neighbour search over an embedding index, so it returns the code most similar to your query, which is a different set from the repositories that declare a dependency on what you are changing. A go.mod require line, or a Dockerfile FROM, and the repository it points at are not similar text, so no similarity search reliably connects them. Indexing more repositories widens what @Codebase can resemble against, but a cross-repo dependency edge has to be parsed from a manifest, not retrieved by similarity.

Can Cursor index multiple repositories at once? Yes. A multi-root workspace, defined in a .code-workspace file, can hold several repository roots, and Cursor indexes all of them and lets Agent search across the set. That makes @Codebase span your repositories, which is genuinely useful for understanding a system. It does not make the index map how those repositories depend on each other, which is a separate thing that comes from parsing manifests rather than from a wider index.