April 30, 2026 · 9 min read
Developer Productivity in 2026: The Metrics That Changed When AI Joined the Team
Velocity is up. Quality is down. Reviews take six times longer. Why DORA stopped working as a single signal and what engineering leads measure instead.
Published by GitIntel Research
TLDR
- • AI now writes 41% of all code across the surveyed enterprises in 2026
- • Cycle time is down 30-40% but bugs per dev are up 54% and PR review time is up 441%
- • Refactor share collapsed from 25% of changes (2021) to under 10% (2024); clones rose from 8.3% to 12.3%
- • DORA Speed metrics alone now read GREEN while quality goes red — the four-metric DX Core 4 (Speed, Effectiveness, Quality, Impact) is the emerging fix
- • Most useful new metric: code turnover rate on AI-touched files — healthy is <15% at 30 days, <22% at 90 days
For ten years, the four DORA metrics — deployment frequency, lead time for changes, change failure rate, and mean time to restore — were the cleanest answer to "is engineering working?" Every framework, every dashboard, every executive review keyed off them.
In 2026, the deployment frequency line is climbing while the on-call rotation is on fire. Both numbers are accurate. The gap between them is the story of AI-assisted development.
What follows is the data, the frameworks that try to fix it, and the metric most engineering teams are still missing.
The Numbers, Side by Side
Sources are public benchmarks from GitClear, Faros, CodeRabbit, GetDX, and the 2024 and 2025 DORA reports. Where ranges differ across studies, the wider range is shown.
| Metric | Δ (AI era) | Source |
|---|---|---|
| Tasks completed per dev | +21% | GitClear / DORA 2025 |
| PRs merged per dev | +98% | CodeRabbit (13M PRs) |
| Cycle time (commit → prod) | −30 to −40% | GetDX 2026 benchmark |
| PR review time (median) | +441% | Faros 2026 DORA review |
| PR size (avg lines) | +51.3% | Faros 2026 |
| Bugs per developer | +54% | Faros 2026 |
| Incidents per PR | +242.7% | Faros 2026 |
| Refactor rate (% of changes) | 25% → <10% | GitClear 2026 |
| Code clones (% of new code) | 8.3% → 12.3% | GitClear 2026 |
| Change failure rate | +15 to +25% | DORA 2024 → 2026 |
Two unrelated things are happening at once. Developers ship more code (PRs merged up 98%, cycle time down 30-40%) and the code is buggier and harder to review (bugs per dev up 54%, review time up 441%, incidents per PR up 242.7%). The DORA Speed metrics catch the first phenomenon. They miss the second entirely.
Why DORA Alone Stopped Working
The four DORA metrics were designed in a world where the constraint on shipping was human typing speed and team coordination. Speed up the human, you get more output. AI coding tools removed that constraint, but they did not remove the downstream constraints — review attention, cognitive load, deployment risk — and DORA was never designed to measure those.
The Faros 2026 review of mid-market engineering orgs found three patterns that DORA renders invisible:
- Refactor collapse. The percentage of changed lines that came from refactoring fell from 25% in 2021 to under 10% in 2024. AI is good at writing new code; it is not good at simplifying existing code, and developers stopped doing the work themselves.
- Clone explosion. Lines classified as copied or near-duplicates rose from 8.3% to 12.3% of new code. This is consistent with the LLM tendency to regenerate similar patterns rather than extract a shared abstraction.
- Acceleration whiplash. Time saved in code creation is being re-allocated to auditing and verification, not to building new features. Output looks higher in DORA Speed metrics; outcome velocity stays flat.
Google's own 2024 DORA report acknowledged the issue indirectly: delivery stability fell 7.2% as AI adoption rose. The team that publishes DORA is now telling readers DORA is not enough.
Four Frameworks That Try to Fix It
SPACE (2021, still relevant)
Five dimensions: Satisfaction and well-being, Performance, Activity, Communication and collaboration, Efficiency and flow. SPACE was the first widely-cited acknowledgment that productivity is not one number. Its weakness in 2026 is breadth — it tells you what to look at, not how to score it.
DevEx (2023)
From the same authors as SPACE, narrowed to three dimensions developers actually feel: feedback loops, cognitive load, flow state. Survey-driven by design. Useful when the problem is "our team feels slow" even though metrics say otherwise.
DX Core 4 (Dec 2024)
Abi Noda and Laura Tacho's unification of DORA + SPACE + DevEx into four scoreable buckets: Speed, Effectiveness, Quality, Impact. DX Core 4 explicitly designs around the AI-era failure mode — Speed alone can read green while Quality and Impact are red, and the framework forces all four to surface together. This is the framework most enterprise engineering orgs are migrating to in 2026.
DORA + Quality Augmentation
Many teams are not switching frameworks. They are adding quality metrics on top of DORA: code turnover rate, defect density on AI-touched files, refactoring rate, and incident-per-PR. The pragmatic path when you already have DORA dashboards in place.
The Single Metric Most Teams Are Still Missing
Code turnover rate — the percentage of newly-committed lines that get rewritten or deleted within N days — is the cleanest signal of AI code quality currently published. It is calculable from any git history without surveys.
Healthy thresholds, per Larridin's 2026 benchmarks:
- 30-day turnover <15% — the new code is sticking
- 90-day turnover <22% — the code is durably correct
- AI-touched code turnover within 1.5× of human-written turnover — AI is not regressing quality
When AI-touched turnover sits at 2-3× the human baseline, the team is shipping fast and rewriting fast. Velocity looks great in DORA. The actual feature delivery rate — code that survives — is much lower than the dashboard suggests.
What Engineering Leads Should Track in 2026
The minimum scoreboard for an AI-assisted team:
- Cycle time (DORA) — keep it; AI accelerates this and you want to see the gain
- Change failure rate (DORA) — watch for 15-25% drift upward as AI adoption grows
- Code turnover rate on AI-touched files — the quality signal AI breaks
- PR review time and PR size — both are growing fast; track them as a flow-state proxy
- DevEx survey (quarterly) — feedback loops, cognitive load, flow state
- AI authorship attribution — without knowing which commits are AI-assisted, you cannot tell whether a quality drop is from AI or from the team
The last one is where most teams have a gap. Co-Authored-By trailers help when developers remember to keep them. Tools that infer attribution from edit patterns and IDE telemetry are the missing piece — and the reason GitIntel exists.
The Bottom Line
DORA is not wrong. DORA is incomplete in a world where 41% of code is AI-written and the constraints have moved. Speed metrics alone now lie by omission.
The teams shipping reliably in 2026 are the ones that pair DORA with at least one quality metric (code turnover is the simplest), one developer-experience signal (DevEx survey or DX Core 4 Effectiveness score), and a way to separate AI-assisted commits from human-only commits. Without the third, you cannot diagnose the others.
GitIntel ships the attribution layer — the last piece — for free. It runs on any git repo, no login, no upload of code. The metrics above are only useful when you can put each commit in the right bucket.