Back to Blog
ResearchApril 22, 2026 · 9 min read

110,000 Unresolved Issues: The AI Technical Debt Study Nobody Audited

A new ArXiv study tracked 304,362 AI-authored commits across 6,275 GitHub repos. Surviving technical debt grew from ~200 issues to 110,000+ in 14 months.

Published by GitIntel Research

TLDR

  • ArXiv paper 2603.28592 scanned 304,362 verified AI-authored commits across 6,275 public GitHub repos
  • Surviving unresolved technical debt grew from ~200 issues (early 2025) to 110,000+ by February 2026
  • 24.2% of AI-introduced issues remain unresolved at HEAD — that's 37.25 surviving issues per 100 AI-authored commits
  • Static-analysis warnings and cognitive complexity rose ~18% and 39% respectively in AI-assisted repos
  • 76% of developers generate code they don't fully understand at least some of the time (Stack Overflow 2026)

A paper landed on ArXiv in March 2026 that most engineering teams missed. Titled "Debt Behind the AI Boom", it is the first large-scale longitudinal measurement of technical debt introduced by AI coding assistants in production repositories. The finding is precise and uncomfortable: in 14 months, surviving unresolved issues grew from a few hundred to over 110,000 across the 6,275 repos in the dataset.

That number is not total issues created. It is the count that remains at HEAD — unfixed, merged, living in production code.

The Study

Researchers built a dataset of 304,362 verified AI-authored commits, identifying authorship through commit metadata, comment patterns, and tool signatures. The five assistants with more than 10,000 attributed commits each were GitHub Copilot, Claude, Cursor, Gemini, and Devin.

For each commit, they ran static analysis before and after the change. This let them isolate which code smells, bugs, and security issues the AI introduced rather than inherited. They then tracked whether those issues were subsequently resolved.

The results at February 2026: 24.2% of AI-introduced issues survive at HEAD. That translates to 37.25 surviving issues per 100 AI-authored commits. The rate has not improved over the 14-month window — the absolute count grows because commit volume grows, but the survival rate holds steady.

A companion study at MSR 2026 identified 34 self-admitted technical debt topics from AI-generated code grouped into 10 categories. AI agents predominantly document requirement- and design-related debt — TODO comments and fixme notes where they know their output is incomplete but proceed anyway.

What Debt Types Accumulate Fastest

Not all debt is equal. The MSR 2026 Mining Challenge published multiple papers on this. Static-analysis warnings grew ~18% in AI-assisted repos over the study window. Cognitive complexity — a measurable proxy for how hard code is to understand — rose 39%.

A separate MSR 2026 paper on reverted AI changes analyzed 33,580 agentic pull requests from the same five tools. GitHub Copilot had the highest revert rate at 7.6%. OpenAI Codex had the lowest at 0.7%. The 2.66% average means roughly 1 in every 38 AI-generated PRs gets reverted after merge — which means it passed review, landed in main, and still needed to be undone.

On the security side, Veracode's Spring 2026 GenAI Code Security Report — which we covered in detail in our security post — found 45% of AI-generated code introduces a known security flaw. The ArXiv study's finding that 24.2% of issues survive at HEAD should be read alongside that number: if nearly half of AI code introduces a flaw, and roughly a quarter of introduced issues stay unresolved, the math on accumulated exposure is significant.

The DEV Community analysis pulls from Ox Security's review of 300+ repositories, identifying ten recurring anti-patterns present in 80–100% of AI-generated code. The most common: duplicated logic that would have been extracted into a shared module if a human had written it the second time.

Why Maintainers Aren't Catching It

The GitClear study of 211 million changed lines from 2020–2024 shows the structural reason: refactoring as a share of all code changes dropped from 25% in 2021 to under 10% in 2024. Code cloning rose from 8.3% to 12.3% in the same period. AI tools write new code faster than teams can review and refactor existing code.

Stack Overflow's 2026 developer survey found that 76% of developers using AI coding tools report generating code they don't fully understand at least some of the time. Trust in AI code accuracy dropped from 40% to 29% year-over-year, but adoption continues to climb. Developers are using tools they distrust.

The O'Reilly Radar piece on comprehension debt — which ties into what we covered in our post on AI comprehension debt — makes the structural argument: every time a developer approves AI-generated code without building a mental model of how it works, they add to a second kind of debt that static analysis cannot measure. The 110,000 surviving issues are the auditable surface. The comprehension gap is the unaudited one.

Pull request volume jumped 40% year-over-year in open-source repos, while merge rates declined. Maintainers are reviewing more while merging proportionally less — spending time explaining why AI-generated code doesn't fit the architecture instead of writing features.

The Counter-Argument

AI tools do also eliminate debt in specific cases. The MSR 2026 build code study found that AI agents can meaningfully refactor build scripts and eliminate certain build smells. Most agent-generated build PRs are merged with minimal intervention. For well-scoped, bounded tasks with clear acceptance criteria, AI tools produce cleaner output than for open-ended feature work.

The 24.2% survival rate looks worse on complex feature commits and better on small, isolated changes. Teams running AI-generated code through structured review checklists and automated quality gates see lower persistence rates. The problem is not that AI code is always bad — it's that without deliberate guardrails, a predictable fraction of AI-introduced issues survives review.

What to Do About It

1. Track AI attribution in your repo. The ArXiv study was possible because researchers could identify AI-authored commits. Most teams cannot do this today. Add a consistent commit suffix or tag for AI-assisted code — [ai-assisted] is enough. This makes it possible to measure your own surviving issue rate, not estimate from a study average.

2. Run diff-based static analysis on AI commits specifically. The study's methodology — run analysis before and after, attribute the delta — is something any team can replicate with SonarQube, CodeClimate, or Semgrep. Set up a pipeline that shows the issue delta introduced per PR, not just the total count. AI PRs should have explicit issue-delta visibility.

3. Treat cognitive complexity as a merge gate. Static-analysis warnings catch known patterns. Cognitive complexity — measurable in most linters — catches the structural problem: code that is technically correct but hard to reason about. The 39% rise in cognitive complexity is the leading indicator that the comprehension debt is compounding. Cap it at PR merge.

Sources