The data is in. Cortex 2026 puts change failure rates up ~30%. DORA 2025 found AI amplifies team weaknesses. Amazon’s own memo called it “high blast radius.” Here’s why, and what to do about it.


In March 2026, Amazon’s senior vice-president of eCommerce services sent a briefing note to engineers. The Financial Times got hold of it. The phrase that jumped off the page — and then onto every tech outlet over the next 48 hours — was “high blast radius.”

The note described a pattern of production incidents at Amazon retail. Some were tied to AI coding tools. One contributing factor, per the internal document, was novel generative-AI usage for which best practices and safeguards were not yet fully established. A separately reported AWS incident from December involved Amazon’s internal AI coding assistant, Kiro, deleting and recreating an entire environment when asked to apply a targeted fix. AWS spent roughly 13 hours getting the service back.

This is a big-tech company with world-class engineering culture, shipping outages that took customers offline — and the phrase their own leadership reached for was one every DevOps engineer recognises. Blast radius: the set of things downstream of a change that can fail when the change goes wrong.

The Amazon story isn’t an outlier. It’s what the data already predicted.

The numbers are in, and they’re not good

Three reports published in the last six months agree on the pattern, even though they measure different things.

Cortex’s 2026 Engineering in the Age of AI Benchmark. Pull requests per author up 20% year-over-year. Incidents per pull request up 23.5%. Change failure rates up roughly 30%. Nearly 90% of engineering leaders say their teams use AI coding tools, but only about a third have formal governance policies in place.

Google’s 2025 DORA State of AI-assisted Software Development Report. Around 90% AI adoption across respondents, confirming the Cortex number. Roughly 30% of developers say they have little or no trust in AI-generated code. And Google’s own framing of the central finding is blunt: AI is an amplifier, not a repair tool. Teams that already had strong testing and platform foundations saw AI make them faster and more stable. Teams that didn’t saw the opposite. Across the board, AI adoption showed a negative relationship with delivery stability.

CodeRabbit’s State of AI vs. Human Code Generation. Analysing production pull requests at scale, AI-generated code carried roughly 1.7x the issue rate of human code. Logic errors were worse. Security issues were worse. Performance issues — the kind that don’t trip a test suite but degrade the system — were significantly worse.

A separate joint study from Sun Yat-sen University and Alibaba put 18 coding agents against 100 real-world codebases over 233 days. Passing tests once was easy. Maintaining a codebase for eight months without breaking it turned out to be the part AI agents fell apart on.

Stack Overflow’s engineering blog noted in passing that 2025 saw more widespread outages and incident volume than any prior year on record. That’s not solely an AI story — infrastructure keeps getting more complex — but the correlation is hard to ignore.

So, the top-line: teams are shipping more, faster, and breaking things in ways they don’t always know how to fix. The velocity is real. The rework is also real. And the ratio has gotten worse.

What “blast radius” actually means

In software and infrastructure, the blast radius of a change is the set of everything downstream that depends on it, directly or transitively, and could break when that change lands.

If you edit a function inside a single repo and nothing else in the org imports it, the blast radius is one repo. If you rename a variable in a shared Terraform module, the blast radius is every repo that sources that module at any version where the rename takes effect. If you bump the base tag of a Docker image that’s pulled by 40 Dockerfiles across the org, the blast radius is 40 services and whatever depends on them at runtime.

Blast radius isn’t a property of the code you’re editing. It’s a property of the system around the code you’re editing.

This is why the concept survives so well in post-mortems. When a change breaks production, the question is rarely “did the diff look correct?” It’s almost always “who was consuming this, and did anyone tell them?”

Why AI coding tools can’t see it

Here’s the core structural problem: AI coding tools optimise for local correctness.

They read the file you’re editing. They read files in the same repo, maybe the working set you have open. A capable agent will pull in repository search results, run tests, and iterate. A really capable agent might read the repository’s README and infer some conventions. All of that happens inside a single repository’s boundary.

The organisation-wide cross-repo dependency graph — which other repos import this module, which services pull this image, which pipelines include this template — lives outside that boundary. It isn’t in the training data. It isn’t in the agent’s context window. It isn’t in any of the tools the agent has available unless someone has explicitly built those tools and wired them up.

To the AI, the code is a text to transform. To the system, the code is a contract with unknown counterparties. Those two framings disagree often enough that Cortex can measure it in the aggregate — a 23.5% lift in incidents per PR — and Amazon can experience it in the specific, as a 13-hour AWS recovery and a mandatory engineering meeting.

This is why the gap widens as you ship more AI-assisted code. Each individual change looks fine locally. Each change is merged against the same tests it’s always been reviewed against. But the blast radius of each change was never something the tests were measuring in the first place.

Why tests don’t save you

There’s a reasonable objection at this point: if the tests pass, surely that catches most of it?

No. And this is one of the more uncomfortable findings of the last year.

When the same model writes both the feature and the tests, the tests share the model’s assumptions. They exercise the code the way the model already imagined the code would be exercised. Cases the model didn’t anticipate are cases the tests don’t cover. This isn’t hypothetical — it’s been observed empirically by teams measuring AI-authored test coverage against production incidents, and it’s consistent with what the Alibaba long-horizon study found: agents that passed benchmarks on day one fell apart over eight months of real maintenance.

Tests also run inside a single repository’s CI. They can’t run consumer services that live in other repos. A Terraform module’s unit tests can’t tell you that a staging environment in another repo will fail to plan after your variable rename. A Docker base image’s integration tests can’t tell you that a worker service in another team’s repo pins to the previous tag and will fail its next rebuild.

Your test suite is a local safety net. Blast radius is an organisational problem. These are different layers, and the industry has not been investing in the organisational one.

The three failure modes I see most often

I’ve been building Riftmap for about a year, which means I’ve spent a lot of time looking at real cross-repo dependency graphs in real orgs. When AI-assisted changes break things, the failure almost always lands in one of three modes.

1. The shared library refactor with unknown consumers

An engineer asks an AI agent to clean up an internal library — rename a function, tighten a type, remove a deprecated parameter. The agent does it cleanly. The repository’s own tests pass. The PR gets merged.

Three hours later, two other services fail their next deploy because they were still calling the old signature. Nobody on the originating team knew those services existed. The agent certainly didn’t.

This is the exact shape of most internal Go module, Python package, and npm package breakages I see. The library repo’s test coverage was strong. The problem was that “strong test coverage” meant something different from “known consumer coverage.”

2. The base image bump with silent pins

An engineer asks the agent to upgrade a shared Docker base image — new language runtime, new security patches. The Dockerfile in the base image repo builds fine. The CI goes green.

Meanwhile, 30 downstream Dockerfiles across the org pinned to the old tag explicitly, and nobody notified them. Or, worse, they pinned to :latest and started pulling the new base silently, and three of them have a binary incompatibility with the new glibc. The outages happen at random times over the next week as each service rebuilds.

3. The Terraform module rename that breaks plans across the org

An engineer refactors a shared VPC or IAM module. The agent renames subnet_ids to private_subnet_ids because it’s clearer. The module’s examples still work; its own tests still pass.

Fourteen other repos source that Terraform module and pass the old argument name. Their next terraform plan fails. If they were on autopilot pipelines, they might not notice until the next deployment window. If any of them were in the middle of incident response, they now have an extra incident to deal with.

These three patterns repeat across every ecosystem I’ve looked at — Helm chart value renames, GitHub Actions workflow input changes, reusable CI template variables, Kustomize patch paths, Ansible role parameters. The shape is always the same: the change is correct locally, and the consumers weren’t in the room.

What to put between the AI and main

The honest answer is that you don’t stop AI-assisted changes from landing in production — that ship has sailed in almost every org that has adopted these tools, and the Cortex and DORA data suggest forcing it into reverse isn’t where the productivity math lands. The question is what you put between the agent and main so that blast-radius-class mistakes get caught before they cost you a weekend.

Four things actually help, based on what I see working and what the governance data says is missing at most orgs.

A current, queryable dependency graph across your orgs. Not a YAML catalog that humans maintain. Not a nightly grep script. A system that scans your repos and parses the actual manifests, resolves the consumer-producer relationships, and stays fresh. This is the thing most teams don’t have, and it’s the thing that makes everything else possible.

PR-time blast radius diffing. When an agent opens a pull request against a shared module, image, chart, or CI template, the review surface should include who consumes this, at which version, and which consumers are affected by this specific change. Not a generic “this module is used a lot” notice. A diff of the consumer impact. This is the single highest-leverage gate you can add.

Automatic downstream owner notification. The humans who own the services that depend on the changed artifact need to hear about the change at the moment it’s proposed, not the moment it breaks. Every platform engineering team will tell you this is already their life; formalising it removes the step where the change slips past because the right person was on leave.

A policy split between low-risk and high-risk surface area. Amazon’s post-incident response was to require senior sign-off on AI-assisted changes made by junior and mid-level engineers. That’s one blunt answer. A better one, if you have the graph, is: automatic merge allowed for changes with a blast radius under some threshold (single repo, no external consumers), human senior review required for changes that touch shared artifacts with N+ downstream consumers or services tagged as customer-critical. The agent gets the speed where speed is safe. The humans get the attention where attention matters.

None of these are novel ideas. The reason most teams haven’t implemented them is that the foundation — the cross-repo dependency graph — doesn’t exist in their org yet. Without it, every other gate is a judgement call made by whoever’s reviewing the PR, and judgement calls scale poorly when PR volume is up 20% year on year.

FAQ

What is “blast radius” in software engineering?

Blast radius is the set of downstream systems, services, or repos that depend on the thing you’re changing and could fail when the change lands. It’s a property of the dependency graph around your code, not of the code itself. A change to a function nobody else calls has a blast radius of one; a change to a shared Terraform module sourced by 40 repos has a blast radius of 40.

Does AI coding raise change failure rate?

Yes, in the aggregate, based on every major 2025–2026 report that has measured it. The Cortex 2026 Benchmark puts change failure rate up about 30% and incidents per PR up 23.5% since AI adoption accelerated. Google’s 2025 DORA report found AI has a negative relationship with software delivery stability overall, with the worst effects in teams that lacked strong testing and platform foundations before adoption. CodeRabbit measured roughly 1.7x the defect rate in AI-generated code versus human-written code in production pull requests.

Why does AI-generated code break in production more often than human-written code?

Three structural reasons. First, AI tools optimise for local correctness within a single repository and can’t see the organisation-wide dependency graph — which repos import this module, which services pull this image. Second, AI-authored tests share the same blind spots as AI-authored code, so passing tests doesn’t mean the change is safe. Third, review processes haven’t scaled to match PR volume, and larger PRs with less review time let more blast-radius-class defects through.

What’s a “high blast radius” incident?

It’s an incident whose effects reach far beyond the immediate change — taking out multiple services, customer-facing endpoints, or critical paths rather than a single component. The phrase entered mainstream tech press when Amazon used it in an internal March 2026 briefing note about a pattern of production incidents tied partly to AI-assisted changes. The contrast is with a low-blast-radius incident that affects one service and is contained quickly.

How do you stop AI-generated changes from breaking production?

You can’t eliminate the risk, but you can gate it. The most effective controls are: maintain a current cross-repo dependency graph so you know who consumes what; surface consumer impact as part of the PR review for changes to shared artifacts; automatically notify downstream owners when upstream artifacts change; and apply stricter review requirements (human senior sign-off, staged rollouts, canaries) to changes whose blast radius exceeds a defined threshold.

Closing

A lot of the public conversation about AI coding has settled into two camps: evangelists who think the velocity gains speak for themselves, and sceptics who think the whole thing is a productivity mirage. The data is more interesting than either position. AI coding tools genuinely make individual engineers faster. They also increase the rate at which changes break things in production. Both are true at once, and the second effect is largest in exactly the area the tools are structurally unable to see — the cross-repo dependency graph.

The Cortex report frames this as a governance gap. The DORA report frames it as an amplification effect. Amazon’s post-incident response framed it as a blast-radius problem and a review policy problem. They’re all describing the same hole in the stack.

If you maintain shared infrastructure artifacts — Terraform modules, Docker base images, Helm charts, CI templates, Go modules, anything others consume — the single most useful thing you can do in 2026 is build the visibility layer that your AI coding tools and your CI pipelines don’t have. Know who consumes what. Know what this change affects. Make it a PR-time artifact, not a tribal-knowledge artifact.

That’s what I’m building Riftmap to do: auto-discover cross-repo dependencies across your GitLab or GitHub org — Terraform, Docker, CI templates, Helm, Go, npm, Python, Ansible, Kubernetes, Kustomize — and give you a queryable graph with visual blast radius analysis. One read-only token. No per-repo YAML to maintain.

If you’re feeling this problem at your org, I’d genuinely like to hear how it shows up. Reach me at [email protected], or try a free scan at app.riftmap.dev.


Sources referenced

  • Cortex, Engineering in the Age of AI: 2026 Benchmark Reportcortex.io
  • Google Cloud / DORA, 2025 State of AI-assisted Software Development Reportdora.dev
  • CodeRabbit, State of AI vs. Human Code Generation Report (2025)
  • Sun Yat-sen University & Alibaba, long-horizon benchmark of 18 AI coding agents on 100 codebases over 233 days (2026)
  • Financial Times reporting (March 2026) on Amazon’s “high blast radius” internal memo, covered in CIO.com, Fortune, Tom’s Hardware, and Business Insider
  • Stack Overflow engineering blog on 2025 outage volume