I had a mental model of how Terraform module versions get pinned. Roughly the one the documentation implies. You put a version on a module block, maybe a ~> if you want a little room above the floor, and that is the version the repo is on. I believed it well enough that I never bothered to check it.

So I checked it. I parsed the source and version of every module block across a handful of real, organisation-scale Terraform estates and looked at the actual distribution of how versions are written. Two things I was confident about going in turned out to be wrong, and both were more useful wrong than they would have been right, which is the only reason this is worth writing up.

The first wrong thing: I expected versioning style to be a per-repo, sometimes per-module, decision. A mix inside any given organisation. It is not. The second: I expected one widely used ecosystem to be a ~> world that a naive exact-match would mostly miss. It turned out to be overwhelmingly exact pins. I will take them in order. The through-line underneath both is the same. How a module version is pinned is an organisation-level convention rather than a per-module choice, and the shape of the constraint, not the number inside it, is what decides whether you can safely automate a change across the estate.

I went looking for how versions are pinned, and measured it

Method first, because the numbers are the only thing that makes this more than an opinion.

I parsed module source and version across four public corpora, each scanned as a whole organisation rather than cherry-picked repo by repo:

  • cloudposse, a large and widely used module ecosystem.
  • terraform-aws-modules, the curated AWS collection.
  • terraform-google-modules, the curated GCP collection.
  • A mirrored Azure Verified Modules example corpus. AVM lives inside the very large Azure GitHub organisation and is not cleanly scopeable on its own, so I stood up a throwaway organisation mirroring the real AVM example repositories and scanned that.

Then I ran each constraint string through real Terraform and OpenTofu matching semantics rather than synthetic test cases, so the distribution reflects how these constraints actually resolve, not how I imagine they resolve.

One methodological point matters enough to state up front, because it is part of the argument and not a caveat to bury. I report every figure per corpus and never pool them. A blended “X% of modules pin exact” number across four estates does not describe the world. It describes whichever corpus contributed the most rows. If you take one thing from the method, take that.

Versioning style is an organisation-level convention, not a per-module choice

This is the structural headline, and it is the thing I got most wrong.

Within a single estate, versioning style is close to homogeneous. Not a spread. A house style. cloudposse pins exact: 89% of its module declarations are a bare version = "x.y.z" with no operator, and not a single ~> appears anywhere in the estate. The remainder is almost all git-sourced modules pinned by tag, which is a different problem and a later section. The curated terraform-aws-modules and terraform-google-modules collections are the mirror image, 94% and 95% ~> respectively, with exact pins down in the low single digits. The AVM example corpus lands on the exact-pin side too, overwhelmingly bare pins, around 82%.

That 82% comes with a caveat I want to state in the same breath, because leaving it out would make the number dishonest. The AVM examples are authored by the AVM team and pinned by CI. They over-represent exact pins compared with how real enterprise consumers of those modules actually write their constraints, which leans far more toward ~>. The example corpus tells you the house style of the people writing the modules, not the house style of the people consuming them. Hold onto that when you read the figure.

Four corpora, four single-style camps. The reason this is more useful than I expected is operational. If versioning style were a per-module decision, auditing an estate would mean inspecting every repo, because the last repo would tell you nothing about the next one. Because it is an organisation-level convention, the opposite holds. Learn one repo’s convention and you can usually predict the rest of the estate. The unit of analysis is the organisation, not the module. That changes how you scope a migration before you have written a single line of it.

What the constraint shape tells you about blast radius

Once you stop asking “which version is this repo on” and start asking “is this safe to change,” the constraint shape stops being a formatting detail and becomes the whole question. The shapes do not carry the same amount of information, and the difference between them is exactly the part that decides whether an automated rewrite lands where you meant it to.

An exact pin tells you what runs. version = "0.3.5" is unambiguous. The repo is on 0.3.5, you know precisely what a change touches, and you can reason about it.

A bounded ~> tells you the band. ~> 0.4.0 allows 0.4.x and stops before 0.5.0. Both arities behave the same way in spirit: ~> 1.2.3 pins the patch floor and holds the minor, ~> 1.2 holds the major and allows any minor above. You do not know the exact version, but you know the ceiling, and a ceiling is most of what you need to reason about safety.

An unbounded >= tells you a floor and nothing else. >= 0.3.0 with no upper bound permits 0.3.0, and it permits 2.4.1, and it permits whatever shipped this morning. This is the dangerous shape. Not because it is common, but because it looks like a version and is actually an open-ended permission.

The rest of the taxonomy fills in around those three. A compound comma-AND like >= 0.3.0, < 0.5.0 is a hand-built band, a floor and a ceiling assembled by someone who wanted ~> behaviour with different edges. A bare version with no operator defaults to exact equality. A != exclusion carves a hole out of an otherwise-open range. Prereleases like 1.0.0-beta1 exist and behave like their own small edge case. None of those are the problem. The problem is the unbounded >=, and here is why it is worse than it looks.

Suppose a tool wants to decide whether a given module block matches >= 0.3.0 so it can rewrite it. The honest answer is that it cannot know which version is actually running. The repo could be on 0.3.0. It could be on 2.4.1. The constraint does not say. So what does a tool do when it has to produce a yes-or-no match against a constraint that genuinely has no single answer? One common shortcut is to floor-normalise: collapse >= 0.3.0 down to a 0.3.0 pin so the comparison succeeds. And the moment it does that, it has quietly decided the repo is sitting on the floor of a range the author deliberately left open.

In an automated rewrite, that is a change landing on a module nobody pointed it at. The author wrote >= 0.3.0 precisely to keep their options open above the floor. The tool read that openness as a pin and edited accordingly. The danger here is not frequency. You could go a long way before this bites. The danger is silence. It produces a confident match and a clean diff, and nothing in the output tells you the match was a guess.

The honest line is to treat the shapes by how much they actually constrain. A bounded ~> carries an implicit ceiling, so normalising it to a concrete version stays inside a band the author already accepted. That is defensible. An unbounded >= carries no ceiling at all, so any concrete reduction is a guess wearing the costume of a match. Until a tool can do real constraint-against-constraint intersection, the conservative behaviour is for a bounded ~> to normalise and an unbounded >= to be an explicit no-match. A miss you can see beats a match you cannot trust.

The modules your version query cannot see

There is one more shape, and it is the one a version audit is structurally blind to.

A module sourced over git carries its version in the source ref, not in a version attribute:

module "example" {
  source = "git::https://example.com/modules/network.git?ref=0.16.0"
}

There is no version line here. The version is the ?ref= on the end of source. Anything that audits an estate by reading the version attribute looks at this block, finds no version, and returns null. It does not flag it. It skips it. Silently, in the same way the floor-normalisation problem is silent, and for a related reason: the tool answered a question the data did not actually contain.

There are two ref shapes, and they are not equally bad. ?ref=0.16.0 is a tag, and a tag is semver-recoverable. You can parse the version back out of the URL if you know to look there. ?ref=main is a branch, and a branch is not recoverable at all. There is no version in main. There is only whatever main pointed at the last time this was applied, which can be something different tomorrow.

This is not hypothetical, and it is not rare. In the cloudposse scan, the same estate that is 89% clean exact pins, almost one in ten module declarations are git-sourced with the version sitting in a ?ref=tags/0.x.0 rather than a version attribute. Those refs are recoverable if you know to parse source. A version audit never looks there, so it returns null for every one of them. The tidiest-looking corpus in the sample keeps nearly a tenth of its module edges out of reach of the obvious query.

Sit with that for a second. A branch ref is the least-pinned dependency in the entire estate. It is more open than >= 0.3.0, because at least >= 0.3.0 names a floor. And it is exactly the dependency your audit cannot see if it only looks at version. The most dangerous edges hide in the place most tooling does not think to look.

Why this is a discovery problem

I want to be fair to the tools that do the other half of this, because they are genuinely good at it.

The actual rewrite, the part where you move a shared module from 0.3.x to 0.4.0 across dozens of repos, is a solved and deterministic problem. OpenRewrite’s HCL recipes do exactly this, and they do it well. Point a recipe at code that matches its filter and it produces a clean, reviewable diff with every matching module block flipped and nothing else touched. I have a lot of respect for that work. It is careful, it is deterministic, and it is the easy half.

It is the easy half because it assumes the hard half is already done. A rewrite recipe is only as good as the set of repos you point it at and the accuracy of the version it thinks each one is on. And that, as the last few sections argued, is where it gets quietly hard. Which repos consume the module. Under which constraint shape. And which ones a naive version query cannot even see because the version is hiding in a git:: ref. That is not an apply problem. It is a discovery problem, and it sits upstream of every rewrite tool, every codemod, and every careful sed across the org.

People do solve discovery today, mostly with grep and scripts, and at small scale that is completely fine. A grep -r 'source =' . across a dozen repos will get you most of the way. It starts to break on exactly the subtleties this post is about. It treats >= 0.3.0 and 0.3.0 as different strings without understanding that one is a pin and the other is an open range. It returns nothing for a git::...?ref=main module because there is no version to grep for. It works right up until the estate is big enough, or heterogeneous enough, that the edge cases stop being edge cases. (Symbol-graph tools like Sourcegraph are excellent at what they do and entirely orthogonal to this. A module source edge is not a code symbol. Different category.)

This is the problem Riftmap exists to solve. It parses the dependency edges your repos already declare, the Terraform source and version, the Dockerfile FROM, the CI include, the Helm references, across your whole GitHub or GitLab organisation, and builds the graph underneath them. Parsed, not inferred. So when you ask “if I change this module, what consumes it, and under which constraint shape,” the answer is read off declared manifests rather than guessed from names or text. It is the same discovery question whether you are deprecating an internal module or chasing a CVE through your base images: the blast radius is the product. The rewrite is what you run once you can see it.

A version constraint feels like a fact about what is running. It is not. It is a record of what the author was willing to allow. An exact pin happens to make those two things identical, which is probably why it is so easy to forget they were ever different. The unbounded ranges and the branch refs are where they come apart, and treating the permission as a fact is precisely how an automated change ends up somewhere nobody asked it to go.