The week coding-agent cost got honest, and the productivity number didn't

2026-06-05

Two side-by-side dashboard gauges on a dark editorial background. The left gauge is labeled Cost with a crisp needle pointing at an exact reading. The right gauge is labeled Productivity with a blurred, doubled needle that gives no clear value.

GitHub's flat Copilot seat became a meter this week, so the cost side of the AI-coding ROI ratio is now precise to the cent per engineer. That precision exposes the other half: the productivity number most engineering orgs still report rises automatically the moment an agent writes the code. Here is which signal a CTO can actually trust.

TLDR

On Monday GitHub's flat Copilot seat became a usage meter, so for the first time the cost side of the AI-coding ROI ratio is precise, per engineer, per task. That precision exposes the other half: the productivity number most engineering orgs report rises automatically the moment an agent writes the code. The signal worth trusting is durable, verified, merged output over the now-honest cost number, not raw throughput.

On Monday GitHub’s flat Copilot seat quietly turned into a meter. By Tuesday the screenshots were everywhere. One developer on the 39-dollar Pro+ plan watched a single agentic session eat 1,180 credits and posted the arithmetic. Another asked for one change to a project and burned more than six dollars on that one request. The Register rounded up the backlash on June 2, and the comment threads read like a support group.

I want to be careful here, because the cost story is not the interesting part. The interesting part is what the meter did to the rest of the dashboard. For three years the cost of a coding agent was a fixed seat price, which made it easy to round to zero. As of this week it is a number measured to the cent. And the moment one half of the return-on-investment ratio got honest, it threw a hard light on the other half. Because the productivity number most engineering orgs report is still, politely, a work of fiction.

"1,180 credits used. 16% of my monthly Pro+ allowance. Gone."

A Copilot Pro+ developer, quoted in The Register, June 2026

The honest cost number is genuinely good news. It is the productivity number I worry about.

Why coding-agent velocity metrics stopped measuring the team

Picture the slide that goes up the chain every month. PRs merged. Commits. Throughput per engineer. Lines shipped, if someone is feeling brave. These are the numbers engineering leaders have reported for a decade, and they had a quiet virtue back then: a human could only type so fast, so the count roughly tracked the work.

Then the agent arrived, and the count stopped tracking anything. Faros AI published a report this spring built on two years of telemetry from 22,000 developers and more than 4,000 teams, comparing each org’s lowest-adoption period against its highest. Task throughput per developer up 33.7 percent. Epics completed per developer up 66 percent. PR merge rate per developer up 16.2 percent. If those were the only lines on the slide, this would be the cleanest productivity win of the decade, landed overnight, no hiring required.

+33.7%

task throughput per developer at high AI adoption (Faros AI, 2026)

Here is the problem with every one of those lines. They count volume. And a coding agent is a volume machine. It will happily produce three times the pull requests, which means a throughput dashboard will show a three-times improvement whether the work got better, stayed flat, or quietly got worse. The metric went up because the metric was always going to go up. It is measuring the agent’s enthusiasm, not the team’s output.

A CTO I was talking to last week put it well. She said her velocity chart had never looked better and she had never trusted it less. That gap, between the chart and the gut, is the whole story of measuring productivity in 2026.

Where the AI productivity number breaks: code churn and review debt

The same Faros telemetry that shows throughput climbing shows the bill arriving downstream, in the part of the pipeline nobody puts on the slide.

"Pull requests merged without any review, human or agentic, are up 31.3%."

Faros AI, The AI Acceleration Whiplash, 2026

Average time spent in code review is up 199.6 percent. Code churn, the ratio of lines deleted to lines added for merged code in a quarter, has increased 861 percent. So the volume tripled, the review queue buckled under it, a growing share of changes shipped with no review at all, and a striking amount of what did ship got torn back out soon after. The headline number went up while the thing it is supposed to represent went sideways or down.

Durability is where this gets concrete. A measurement pattern that has caught on this year is code turnover rate: the share of merged code that gets reverted, deleted, or substantially rewritten within thirty or ninety days, tracked separately for AI-written and human-written code. The benchmarks compiled by Larridin, drawing on GitClear’s churn data, put AI-generated code at roughly 1.8 to 2.5 times the turnover of human-written code. Translated: a meaningful slice of the work the throughput chart already counted as “done” was rewritten before the quarter closed. It was counted once, then counted again when it came back.

This is the trap that catches careful people. Most teams never set a clean baseline before agents arrived, so there is nothing honest to compare against, and the volume numbers rush in to fill the vacuum because they are the easiest thing to pull from GitHub. That is not a character flaw. It is just what happens when the convenient number and the true number point in different directions.

Two halves of the same adoption curve (Faros AI, 2026)

What goes on the slide	What stays off it
Throughput per dev +33.7%	Time in code review +199.6%
Epics per dev +66%	PRs merged without review +31.3%
PR merge rate per dev +16.2%	Code churn +861%

Google’s DORA team landed in the same place in its ROI report this spring. Their framing is that AI is an amplifier: the returns come from the system around the tool, the platform quality and the workflow clarity, not the tool itself. They also found AI adoption associated with rising delivery instability, because more code moving faster overwhelms the review and deploy gates that were built for human pace. VentureBeat made a related point this week, on June 3, arguing that the wall enterprises hit with agents is a runtime and production problem, not a model problem. The productivity you can demo is not the productivity that survives to production. The gap between those two is exactly where the real number hides.

Key Insight

Any productivity metric an agent can inflate by writing more code is not measuring your team. It is measuring the agent. The only signals worth reporting are the ones volume cannot move on its own.

The coding-agent ROI metric a CTO can actually trust

The meter handed engineering leaders something useful, even if it arrived wrapped in a billing complaint. It built a clean denominator. Cost per engineer is now a real, defensible number for the first time. The job is to build a numerator that is just as clean, and then stop reporting the ones that are not.

A trustworthy productivity signal has one property: volume cannot fake it. That rules out PR counts, commit counts, and raw throughput, all of which an agent moves for free. It rules in a small set of harder numbers. Verified, durable, merged output per engineer, meaning work that passed review and was still alive thirty days later. Code turnover rate for AI versus human changes, watched as a ratio rather than a vibe. And those set against the cost per engineer the meter now reports exactly.

The question stopped being "are we shipping faster." It became "how much of what we shipped was real, and what did it cost to keep."

None of this requires a new platform or a measurement project that takes a quarter. The turnover ratio comes out of the same git history already sitting in the repo. The cost number now arrives on the invoice whether anyone asks for it or not. The only real change is deciding to report the durable numbers and to retire the flattering ones, which is a management decision, not a tooling one. DORA’s most practical finding fits here: the biggest returns come from reducing rework to reclaim capacity, and you cannot reduce a rework number you refuse to look at.

What I’d tell you over coffee

The meter did you a favor, and I know it did not feel like one on Tuesday. For three years the cost of these tools hid inside a seat price and the productivity gains hid inside a throughput chart, and the two fictions roughly cancelled out, so nobody had to do the hard arithmetic. This week the cost fiction died. Now the productivity fiction is sitting there, exposed and a little embarrassed, and the honest move is to retire it too.

Pick one durability number and one cost-per-engineer number, put them on the same slide, and let the throughput chart go. It will read as a smaller, quieter result than the triple-digit gains everyone is posting. It will also be true, which is worth more in a board meeting than any chart that flatters the whole room and proves nothing. The teams that come out of this year ahead will not be the ones with the highest velocity number. They will be the ones who knew which number to stop believing.

Sources

Angry devs vow to flee GitHub Copilot as metered billing takes hold - The Register, 2026-06-02
GitHub Copilot Usage-Based Billing Takes Effect, Drawing Developer Backlash Over Rapid Credit Depletion - gHacks Tech News, 2026-06-02
The AI Engineering Report 2026: The AI Acceleration Whiplash, Ten Takeaways - Faros AI, 2026-04-22
Code Turnover Rate: The AI Code Quality Metric - Larridin Developer Productivity Hub, 2026-05-01
New DORA Report Claims Strong Engineering Foundations Drive AI Return on Investment - InfoQ, 2026-05-01
The Agentic Reckoning: Enterprise AI organizations have a runtime problem, not a model problem - VentureBeat, 2026-06-03

Back to all insights