You maintain an internal Python package on a private index. You need to change its API. Which repos across the org depend on it, and at which version? The public Python ecosystem has an answer to that question. The moment you move the package onto your own index, everything that knows the answer is looking somewhere your package never appears.

npm puts a Dependents tab on the registry page for every public package. PyPI has nothing of the sort. Open the project page for requests or flask and there is no reverse-dependency view, no list of what builds on top of it, no count. What answers the question for public packages is third-party and sits beside the index rather than inside it: Google’s deps.dev and libraries.io, both of which crawl the public index and will show you who depends on a given package.

Now make the package yours. Rename it from confparse to yourco-config, set it to private, and publish it to AWS CodeArtifact or the GitLab PyPI registry instead of pypi.org. deps.dev and libraries.io go dark immediately, because they crawl the public index and your package is not on it. pip has nothing to offer either. pip show yourco-config lists a “Required-by” field, but it only reflects what is installed in the environment you happen to run it in, and pip has had an open request for a real reverse-dependency command for years. Dependabot and Renovate know implicitly who depends on what, because they are configured per repo, but they are updaters, not mappers, and only where they are switched on.

There is a second gap underneath the first, and it is worth sitting with. Even for a public package, the dependents that deps.dev and libraries.io can show you are mostly other published packages, because a published package is what an index crawler can see. The things consuming your internal library are overwhelmingly applications. Services, pipelines, DAG repos, batch jobs. None of those are published to any index, so they would not appear as dependents even if your package were public. So the answer exists for the packages that cannot hurt you, and is missing for the one that can. The shared client every service imports, the config package forty repos pull in, the feature library the whole ML platform builds on. The one whose breaking change is your problem is precisely the one with no consumer view at all. This post is about getting that view back, and about why Python makes it genuinely harder than the other ecosystems in this series.

The scenario

Your platform team, or your ML-platform team, publishes a package. Maybe it is yourco-clients, a generated client for your internal APIs that half the services import. Maybe it is yourco-observability, the structured-logging and trace-propagation library every service is supposed to use. Maybe it is yourco-config, a thin package that standardises settings loading so nobody hand-rolls it. Maybe it is yourco-features, a shared feature-store and data-access layer the whole ML org builds on.

It started as a way to stop copy-pasting. A few repos adopted it. Then more. And here is where Python diverges from every other post in this series, immediately, before we even get to the hard part. There is no single place a consumer declares the dependency. There is barely a single format.

One service declares it in requirements.txt:

yourco-clients==2.4.1
yourco-observability~=1.7

Another uses the modern standard, PEP 621 dependencies in pyproject.toml:

[project]
name = "checkout-service"
dependencies = [
    "yourco-clients>=2.4,<3.0",
    "yourco-observability~=1.7",
]

[project.optional-dependencies]
dev = ["yourco-testtools>=0.9"]

A third is on Poetry, which until recently used its own table with its own syntax. The caret is not a PEP 440 operator, the resolution semantics are Poetry’s:

[tool.poetry.dependencies]
python = "^3.11"
yourco-clients = "^2.4.1"
yourco-observability = "~1.7"

A fourth predates all of that and declares its dependencies in setup.py, in arbitrary Python:

setup(
    name="reporting-service",
    install_requires=[
        "yourco-clients>=2.4,<3.0",
        "yourco-observability~=1.7",
    ],
)

A fifth never stood up a private index at all and pulls your code straight from git:

-e git+https://gitlab.yourco.com/platform/[email protected]#egg=yourco-clients

And because the package is private, every consumer that does use the index carries routing config that points at it, the way an .npmrc does for a scoped npm package. In Python that lives in pip.conf, or a .netrc, or a Poetry source, or a uv index table:

# pip.conf
[global]
index-url = https://pypi.org/simple
extra-index-url = https://gitlab.yourco.com/api/v4/projects/42/packages/pypi/simple

Twenty repos adopted the package, across four or five of these mechanisms. Then you stopped counting, because nothing in your toolchain counts for you. Now you need to change it. Drop a parameter, rename an export, cut a major. The question is the one that runs through every post in this series: which repos across the org depend on this package, at which version, and which of them break when I publish?

The change you ship without shipping it

Before the tooling, the part that makes this sharper than it first looks, and it is more acute in Python than in the npm version of this same argument.

A loose constraint is a standing instruction to adopt your next release. A consumer on yourco-clients>=2.4 is not pinned. They will take whatever the newest version is the next time their environment is resolved fresh. The PEP 440 compatible-release operator, ~=1.7, is the same thing inside a band: it means >=1.7, <2.0, so every 1.x you publish is a candidate. Poetry’s ^2.4.1 resolves to >=2.4.1, <3.0.0, which is a subscription to every minor and patch you ship in the 2.x line.

Python makes this land more easily than npm does, for one structural reason. A very large number of Python repos have no committed lockfile. They have a loose requirements.txt that gets pip install-ed during a Docker build, on every build, against the live index. There is no poetry.lock or uv.lock holding the line. So the resolution is not a one-time event that someone reviews in a pull request. It happens every time the image is rebuilt, silently, on the consumer’s schedule rather than yours. You did not roll out your 2.5.0. They did, the next time CI ran, and the first you hear of a regression is somebody else’s red pipeline.

This is not a fringe worry, it is being actively argued about right now. uv, the fast-rising resolver, defaults to constraints with no upper bound, which means uv lock --upgrade will happily pull a breaking major across every transitive dependency, and the friction of that has pushed uv to add a --bounds option so uv add can produce a safer >=2.13.4,<3.0.0. The community has not settled on how tight constraints should be. While that argument runs, your consumers are scattered across every position on the spectrum, and you cannot see which.

The reverse case is just as bad in the other direction. When you do the honest thing and cut a genuine breaking change as a major, 2.x to 3.0.0, a <3.0 pin or a Poetry caret correctly refuses to follow. That is the right behaviour. It also leaves you with a long tail of repos stranded on the old major, indefinitely, with no list of who they are. You cannot deprecate 2.x because you cannot see who is still on it.

Either way the constraint is the mechanism, and the constraint is exactly what a quick search across your repos cannot evaluate. You need to know who consumes the package and how their constraint relates to what you are about to publish. Both halves of that live in files most audits never open, in formats most scripts do not all parse.

What existing tools give you (and where they stop)

I want to be fair to the options, because several are genuinely useful for the slice they cover, and I reach for some of them myself.

PyPI, deps.dev, libraries.io, GitHub dependents

For public packages, deps.dev and libraries.io are the right tools, and I would point you straight at them. GitHub’s dependency graph adds a “Used by” panel for repositories that publish a package, though its own documentation calls the dependent counts approximate. The structural problem is not that any of these are bad. It is that they are properties of the public index. A private package is access-controlled by design, served from your own registry behind a token, and never indexed by anything that crawls pypi.org. The same access control that keeps your code off the public internet keeps it off every public consumer graph. There is nothing to fix here. The data is unreachable, on purpose. And as above, even for public packages these views count published packages far better than they count the unpublished services that are usually your real consumers.

`pip show`, `pipdeptree`, and the reverse-dependency tools

These do answer a reverse question, and people reach for them first. pip show yourco-clients lists a “Required-by” field. pipdeptree and deptree will invert the tree and show you dependents with -r. They are the right tools for “what in this environment depends on this.”

But they operate on one installed environment at a time, outward from whatever happens to be in that virtualenv. They cannot tell you which other repos in the org depend on your package. There is no index-side reverse query to ask, either. pip has had an open request for a reverse-dependency command for years, and the standing workaround is a script that walks installed distributions. To build the org-wide view you would clone every repo, create a clean environment in each, install, run pipdeptree -r, and aggregate the output yourself. By the time you finished, the resolutions you installed from would have moved.

The private index itself

This is the one people assume covers them, because the index is the thing all the packages flow through. AWS CodeArtifact, JFrog Artifactory, Sonatype Nexus, the GitLab PyPI registry, devpi, GemFury. They host your private packages, cache the public ones, and serve both from one endpoint behind auth.

They are very good at it. What none of them model is consumption at the source level. The index records that some authenticated client downloaded yourco-clients 2.4.1. It does not record which repo’s pyproject.toml declared the dependency, which team’s CI pipeline the install ran in, or whether the thing that pulled it was a service you care about or a throwaway branch. It is a distribution and caching layer, not a consumption graph. This is the same gap I described for internal Go module proxies in the Go edition: a proxy logs fetches, not the manifest that triggered them. The download event is not the dependency edge.

There is one fair exception worth naming. Some registry products, ProGet among them, do surface a consumer view, listing applications by name and version against a package. That is closer than most, and if you run one, use it. But it sees consumption that flows through that registry, of packages it hosts. It does not read the source manifest in every repo regardless of which index they use, and it does not see the git-ref consumption that never touches a registry at all. The next section is mostly a list of the consumption that escapes a registry-centred view.

Renovate and Dependabot

Both support Python as a first-class ecosystem, including private indexes once you give them credentials, across requirements.txt, Poetry, Pipfile, and setup.py. Because they are configured per consumer, they implicitly know which repos depend on what, and they will open pull requests to bump your package when you publish. As with Terraform modules and the rest of the series, the knowledge is in there.

But they are updaters, not mappers. There is no org-level “show me every repo that depends on yourco-clients, and what constraint each one declares” view to query. They react to new versions going out. The question you have before you publish a breaking one, who is currently consuming the old version and how, is not something either tool surfaces. And both only cover repos where they have been switched on for your private index. A team that never configured private-index auth in their Renovate config is simply invisible.

Code search, and the script

You can search your GitHub org or GitLab group for the package name:

org:yourco "yourco-clients"

For a one-off audit, fine. It finds files that mention the string and gives you a starting list of repos. Then the familiar problems land all at once, and in Python they land harder. It returns the declared constraint, not the installed version. It will not normalise yourco_clients and yourco-clients to the same project. It misses a consumer that pulled the package over git+https. And the index lags your most recent commits.

So someone writes the script. Enumerate every repo, fetch every requirements*.txt, pyproject.toml, setup.py, setup.cfg, Pipfile, and environment.yml, parse all of them, handle three pyproject.toml dialects and arbitrary setup.py code, normalise names, evaluate PEP 440 specifiers, run it on a schedule. People build exactly this. The clearest evidence is all-repos-depends, a real org-scanner whose providers read the setup.py AST for the package name and install_requires and parse the requirements-file conventions. The fact that this keeps getting independently rebuilt is the strongest evidence the question matters. It is also, tellingly, honest about its own limit: it can only read a setup.py that sets its name literally. Which is the first of the corner cases below, and that tool ran straight into it too.

Why this is harder than it looks

A naive search for the package name both overcounts and undercounts, because Python dependency consumption is not one fact in one place. It is spread across constructs that each behave differently, and Python has more of them than any other ecosystem in this series.

There is no single manifest, and the formats disagree. Go has one canonical go.mod. npm has one canonical package.json. Python has at least six families, and once you count the dialects, closer to nine distinct shapes a scanner has to handle: requirements.txt and its requirements-*.txt siblings, split requirements/ trees and pip-tools .in inputs, setup.py install_requires, declarative setup.cfg, pyproject.toml in three different dialects (PEP 621 [project], Poetry’s [tool.poetry] with its groups, and the PEP 735 [dependency-groups] that uv and recent pip understand), Pipfile, and conda’s environment.yml with its nested pip: block. PEP 723 even added inline dependencies inside a single .py script, so the surface is still growing. The version grammar is not even shared: a Poetry caret is not a PEP 440 operator. And the declared constraint and the installed version are different facts living in different places, with the resolved version buried in whichever of poetry.lock, Pipfile.lock, uv.lock, or a pip-compiled requirements.txt the repo happens to use. Real orgs mix all of this. To find every consumer you have to read all of it, and reconcile it.

The distribution name and the import name are different facts. This one is pure Python. You pip install scikit-learn and import sklearn. PyYAML imports as yaml, beautifulsoup4 as bs4, opencv-python as cv2. Your yourco-data-clients might import yourco_data. To bind a declared dependency you match the PEP 503 normalised name, which lowercases and collapses any run of ., -, or _ to a single -, so Yourco.Clients, yourco_clients, and yourco-clients are one project. A grep treats them as three. The confusion is real enough that scikit-learn ships a defensive sklearn shim on PyPI purely to stop people and tools getting it wrong, and that shim’s own remediation advice is, word for word, a find-every-consumer task: track down which packages declare sklearn instead of scikit-learn. It is enough of a trap that GitLab’s own SBOM scanner normalised names by the wrong rule and produced incorrect dependency results. That normalised distribution name is what a manifest declares and what you match on. The import name, the thing that actually appears in import statements, is a different layer again, and it is the one a symbol graph lives in. More on that distinction when we get to the limits.

setup.py is code, not data. package.json is JSON and go.mod has a defined grammar, so a parser can trust them. setup.py is a Python script, and install_requires can be a literal list, or read from a file, or assembled in a loop, or gated on markers computed at runtime. The literal case is statically parseable from the AST. The dynamically constructed case is not knowable without executing untrusted code, which no scanner should do. setup.cfg and pyproject.toml are declarative and parse cleanly, so the holdout is specifically the older setup.py repos, and even the literal case is a best-effort heuristic rather than a guarantee. This is not a footnote you have to take on faith, it shows up in the consumer view as a lower confidence score on setup.py rows than on the declarative ones. all-repos-depends hit exactly this wall and drew exactly this line.

The same name is not always your package. A repo whose index routing is wrong, or missing, can resolve a public package that happens to share your internal name, which looks identical in the manifest and is not your code. This is not hypothetical: pip’s own documentation warns that --extra-index-url is unsafe precisely because a public index can serve a package with the same name as your private one, the dependency-confusion problem. So the name in a manifest is a claim, not proof. Binding the name to your repo safely means resolving it to the in-org repo that actually produces that package, not assuming every matching string is yours. A name nothing in your org produces is an external dependency, not a consumer of your code.

The value is not always a version, and git references travel without an index. A dependency’s value can carry the real target, and for internal Python this is how a great deal of code travels without anyone standing up a private index at all. PEP 508 direct references, yourco-clients @ git+https://github.com/yourco/[email protected], resolve straight from a repo. So does -e git+https://...#egg=yourco-clients, a bare git+... line, a Poetry { git = ... } source, a Pipfile { git = ..., ref = ... }, and a git+ line inside a conda pip: block. Every one of those points at a repo, with the committish standing in for the version. A purely local ./libs/shared or a file: install carries no cross-repo signal, and a plain wheel or sdist URL is not a repo edge either. So the honest split is: git references are first-class consumers and resolve to a repo, while local paths and non-git URLs are not cross-repo edges at all. A scanner that reads only registry-style names misses the entire git-sourced half of how internal Python is consumed.

constraints.txt and includes mean the line you grep is not always the effective version. A repo can declare yourco-clients>=2.4 in requirements.in and then pin it hard via a global -c constraints.txt that says yourco-clients==2.4.1. Or chain -r requirements/base.txt so the real dependency list is assembled across several files. The line you grep is the declared constraint. The effective version, after a constraints overlay, is closer to a lock. For the question this post is about, who adopts your next release, the declared constraint in the dependency line is the load-bearing fact, and the constraints-pinned version is the adjacent, lockfile-shaped question. They are different facts, and a search that reads one file in isolation cannot tell which it is looking at.

Extras and dependency groups change the blast radius, and markers make the edge conditional. A consumer that needs your package only as a PEP 621 [project.optional-dependencies] extra, a Poetry group.test, or a PEP 735 dev group is a weaker consumer than one that imports it at runtime in production. And a PEP 508 marker makes the dependency conditional outright: yourco-clients>=2.4; python_version >= "3.11", or ; sys_platform == "linux". So a consumer may depend on you only inside an extra nobody installs in production, or only on a platform they do not ship. Flattening every declaration into one undifferentiated “depends on” both overstates and understates the blast radius, depending on which way you are wrong. The scope of each declaration, runtime against dev against optional, is part of the answer, not noise to discard.

Namespace packages mean the import namespace is not the unit of dependency. PEP 420 implicit namespace packages let yourco.clients and yourco.auth live in separate distributions, in separate repos, under one shared yourco namespace. The distributions ship and version independently, so the unit of dependency is the distribution, yourco-clients or yourco-auth, not the yourco namespace they share. A tool that treats the top-level import namespace as one package conflates things that release on different schedules. This is the same crack the distribution-versus-import-name beat opened: the manifest layer is about which distribution you declared, and “who imports yourco.auth” is a question one level down, at the symbol layer.

What the full answer requires

To reliably answer “who consumes this internal Python package,” you need a system that:

Scans every repo in the org, parsing each manifest family (requirements.txt and its requirements-*.txt and split requirements/ and pip-tools .in variants, pyproject.toml across the PEP 621, Poetry, and PEP 735 dialects, setup.cfg, setup.py, Pipfile, and conda environment.yml), without requiring each team to opt in or register
Normalises distribution names per PEP 503, so -, _, ., and case variants bind to one project, and resolves each declared dependency to the repo that actually produces the package, so a public package sharing your internal name resolves as external rather than binding to the wrong repo
Reads the value, not just the name, resolving PEP 508 direct references, git+https and -e git installs, Poetry and Pipfile git sources, and conda pip: git lines to the right in-org repo, with the committish recorded as the version
Keeps real cross-repo references while dropping purely local file: and ./path installs and plain wheel or sdist URLs that carry no cross-repo signal
Records the scope of each declaration, runtime against dev against optional or extra, so a test-only or extras-only consumer is not weighed the same as a runtime one
Leaves test, example, and fixture trees out of the consumer count, so a repo that imports your package only in a test harness does not read as a production consumer
Reports the constraint each repo declares, which is the fact that governs who adopts your next release, rather than the exact version a lockfile resolved this minute
Stays current through rescans, rather than a one-time snapshot that is stale the moment a manifest changes

This is one of the specific problems Riftmap is built to solve. It connects to your GitHub or GitLab organisation with one read-only token, scans every repo, and parses requirements.txt, split requirements/ trees and pip-tools .in inputs, pyproject.toml across PEP 621, Poetry groups and PEP 735 dependency groups, setup.cfg, setup.py, Pipfile, and conda environment.yml including the nested pip: block. It normalises names per PEP 503 and resolves each declared dependency to the repo that produces the package, so a name nothing in your org produces is treated as external rather than a consumer. It reads the value rather than just the name, so a pip install git+https://gitlab.yourco.com/platform/yourco-clients.git resolves to the platform/yourco-clients repo and is recorded as a consumer with no private index required, while purely local file: installs and plain wheel URLs are skipped. Each edge carries the constraint the consumer declares and the manifest line where it lives. Parsed from what each repo declares, not inferred from what the index happened to serve.

A few honest limits, in the spirit of the rest of this series. Riftmap reads the declared dependency in the manifest, not the resolved lockfile tree, so it shows the constraint each repo declares, which governs who adopts your next release, rather than the exact version each one has installed right now. It binds at the distribution layer, who declares a dependency on the package, not the import-symbol layer, which repos import the specific function or class you are changing. For that symbol-level question a symbol graph like Sourcegraph is the right tool, and a complementary one, because symbol graphs and artifact dependency graphs are different categories. It records a runtime, dev, or optional scope for each declaration, though surfacing that distinction in the consumer view is still on the near-term roadmap, so the panel today shows the manifest, the line, and the constraint rather than a scope label. And because Python has no @scope convention the way npm does, Riftmap recognises an internal package by the fact that some repo in the scanned org produces it, not by a name prefix. A yourco- prefix is a useful convention, not a guarantee.

The result is the view the rest of this series describes. Before you drop that parameter, rename that export, or cut 3.0.0, you open the graph, click the package, and read the consumer list: every repo that declares a dependency on it, the constraint each one carries, the manifest and line where the dependency lives, whether it came from a registry name or a git reference, and which team owns it. You know who breaks. You know who is riding a loose constraint and will pull your next minor on their next Docker rebuild whether you meant them to or not. You know who is stranded on the old major and needs a migration before you can deprecate it. No clean-installing thirty repos to run pipdeptree. No script juggling nine manifest shapes. No waiting to see whose build goes red.

The dependency was never written down in one place

Here is the closing thought. With the other ecosystems in this series, the reverse question is hard because the consumer graph lives behind access control, or in a proxy that logs the wrong event. That is true for Python too. But Python adds a second, deeper reason, and it is the one worth sitting with.

Python gives you a dozen honest ways to declare a dependency, and two different names for every package, and no single place that reconciles them. The dependency on your library exists, but it was never written down in one canonical form. It is a constraint in a requirements.txt here, a PEP 621 entry there, a Poetry caret somewhere else, a git+https reference in a fourth repo, all under a distribution name that does not match the import name, half of them rerouted by a constraints file you have to read separately. The reverse question is not hard because Python is messy. It is hard because the thing you are trying to find was never recorded as one thing. It lives spread across nine manifest shapes and two namespaces, in the relationship between repos that no single checkout contains. That was never a property of pip, or of PyPI, or of your index. It was a property of asking the question from inside one repo, when the answer was always somewhere between them.

This is the eighth post in the Find Every Consumer series. Previous posts cover Docker base images, Terraform modules, GitHub Actions workflows, Helm charts, Go modules, GitLab CI templates and internal npm packages.

If this is a problem your platform team deals with, I would be interested to hear how you are solving it today. You can find more at riftmap.dev or reach me at the address on the about page.

About Riftmap

Riftmap maps cross-repo dependencies across your entire GitLab or GitHub organisation — Terraform, Docker, CI templates, Helm, Python, Go, npm, and more. One read-only token. No YAML to maintain.

How to Find Every Consumer of Your Internal Python Package