AI Coded It. You Can't Debug It. Anthropic Has the Data.
Anthropic's Jan 2026 RCT: AI users scored 17 points lower on post-task skills tests than controls. 50% vs 67%. No measurable speed gain. The skill atrophy is documented and growing.
Published by GitIntel Research
TLDR
- Anthropic's Jan 2026 RCT (arxiv: 2601.20245): AI users scored 50% vs 67% on post-task skills quizzes — a 17-point gap.
- The biggest skill drops: code debugging and comprehension of why code fails — exactly what you need to supervise AI output.
- There was no statistically significant improvement in task completion time. Developers traded learning for zero speed gain.
- Three other independent studies corroborate the finding across students, senior engineers, and open-source contributors.
- The loop closes: 96% of developers don't fully trust AI code , but atrophied debugging skills make verification harder every month.
The skill gaps were not uniform. The largest drops were in:
- ▸
{label} — {detail}
The irony: the skills that degraded most are exactly the skills required to supervise AI code. Developers are becoming less capable of catching the errors that AI tools most commonly produce.
The study also tracked task completion time. The AI group showed no statistically significant improvement over the control group. The most common explanation from participants: time saved writing was absorbed by time spent understanding, adjusting, and verifying the AI's output. Net speed delta: near zero. Net learning delta: deeply negative.
Three Studies That Say the Same Thing
The Anthropic paper doesn't stand alone. Three independent research groups reached the same conclusion through different methodologies.
Study 1
University of Maribor — 10-Week React RCT
32 undergraduates learning React over 10 weeks. The study found a significant negative correlation between LLM use for code generation and final exam grades. Students who used AI most heavily for writing code — rather than for explanation or concept exploration — performed worst on tests that measured transferable understanding.
Study 2
METR Uplift Study — 16 Experienced Open-Source Developers
Published February 24, 2026. 16 experienced contributors worked on their own open-source repositories — codebases they knew intimately — with and without AI. With AI, they took 19% longer. The explanation: AI suggestions introduced context-switching overhead. Developers spent significant time evaluating and second-guessing suggestions in code they would have written directly from memory.
Study 3
Microsoft + CMU — Critical Thinking Under AI Assistance
The more participants leaned on AI tools, the less critical thinking they engaged in. The pattern was dose-dependent: light AI users maintained near-baseline critical thinking metrics; heavy AI users showed measurable disengagement from the reasoning process. The study framed this as a "cognitive offloading" effect — normal for tools that take over mechanical tasks, but concerning when the offloaded tasks include understanding and judgment.
It's Not All AI. It's How You Use It.
The Anthropic study included a nuance that matters: not all AI users atrophied equally.
The study tracked two distinct usage patterns:
| Usage Pattern | Description | Avg Skills Score | | --- | --- | --- | | Code Delegation | "Write this function for me" — passive acceptance of AI output | <40% | | Conceptual Use | "Explain why this pattern works" — AI as teacher, not author | 65%+ |
Passive AI use — accepting code without understanding it — produced the worst skill outcomes. Conceptual AI use — asking the AI to explain, clarify, or teach — preserved skills at near-control levels.
Practical implication: The atrophy risk isn't AI itself — it's the workflow pattern. A developer who uses AI to write code and then reads, questions, and understands it will fare differently from one who accepts, ships, and moves on.
The Verification Trap
Here's where the skill atrophy story gets structurally dangerous. The January 2026 Sonar State of Code Developer Survey (1,100+ developers) documented what it called the "Verification Gap":
96%
of developers don't fully trust AI-generated code
48%
actually verify it before committing
That 48-point gap between "I don't trust it" and "I checked it" exists for a reason. Verifying AI code takes time. It takes debugging skill. It takes the capacity to trace why code works, not just that it appears to.
As those skills erode, the gap doesn't close — it widens. The less you understand the code, the less capable you are of evaluating it, which means you either skip verification or it takes longer. Both outcomes compound the risk.
The Sonar survey found developers now spend an average of 24% of their work week — nearly a full day — checking and fixing AI output. Code generation accelerated. Code confidence didn't.
What This Means for Engineering Teams
The atrophy data has four immediate implications for how teams structure AI adoption:
1. Junior developer onboarding is at highest risk
Junior developers building their mental models while primarily consuming AI-generated code may never develop the debugging intuition that comes from wrestling with failures. The Anthropic study's passive-use group — the one that scored below 40% — looks a lot like a new hire using Copilot to generate their first 500 PRs.
2. The senior engineer pool is the last firewall
If AI-assisted code generates 1.7x more issues per PR (CodeRabbit, 13M PR study), and the junior/midlevel developers reviewing it have degraded debugging skills, the senior engineer becomes the only reliable correctness check in the pipeline. Review bottleneck is already the number one pain point in enterprise engineering teams — this compounds it.
3. Onboarding velocity masks skill gaps
AI tools cut time-to-10th-PR in half. That metric looks great on engineering dashboards. But shipping PRs and understanding the codebase aren't the same thing. An engineer who merged 20 PRs using AI-generated code may be no more capable of debugging a production incident than on day one.
4. "Usage mode" is now an engineering policy decision
The study's finding that conceptual AI use preserves skills suggests teams should define what good AI use looks like — not just whether to allow it. Requiring developers to explain AI-generated code before it merges is a lightweight intervention with outsized impact on skill preservation.
Measuring It in Your Own Codebase
The atrophy risk scales with AI adoption rate. The more AI-generated code in your codebase, the more review depends on skills that may be quietly degrading.
GitIntel measures the AI attribution footprint in any git repository — which commits were AI-assisted, which files, which authors. That data gives engineering leaders a baseline: before running skills assessments or reviewing onboarding processes, know how much of your codebase was written by a model.
# Install GitIntel
curl -fsSL https://gitintel.com/install.sh | sh
# See your repo's AI attribution breakdown
cd your-repo
gitintel scan
# Output: AI-assisted commits by author, file, date
# Example output:
# Total commits scanned: 500
# AI-attributed: 127 (25.4%)
# Top AI author: alice@company.com (64 commits)
# Top AI file: src/api/handlers.ts (38% AI)
Knowing where AI code concentrates is the first step to knowing where skill gaps are most likely to emerge — and where review coverage needs to be strongest.
How much of your codebase is AI-generated?
GitIntel scans your git history and surfaces AI attribution data — no data leaves your machine. Open source, MIT licensed.
Open source (MIT) · Local-first · No data leaves your machine
Sources
- Anthropic Research, "How AI Assistance Impacts Skill Formation in Software Development," January 29, 2026 — arxiv.org/pdf/2601.20245
- METR Uplift Update, February 24, 2026 — metr.org/blog/2026-02-24-uplift-update
- Sonar State of Code Developer Survey 2026 — 1,100+ developers
- University of Maribor RCT, 10-week React study, 32 participants
- Microsoft + CMU critical thinking study (2025)
Data published March 29, 2026.
Related reading on GitIntel: