Back to Blog
SecurityMarch 29, 2026 · 7 min read

AI Code Gets Smarter. It Doesn't Get Safer.

Veracode Spring 2026: syntax pass rates hit 95%, security pass rates flat at 45-55% since 2023. 35 CVEs from AI code in March alone. XSS fails 86% of the time. The divergence is now official.

Published by GitIntel Research

TLDR

The syntax curve has been one of the great success stories of the LLM era. Three years ago, AI coding tools frequently produced broken, non-compiling output. Today, that problem is essentially solved — 95% of generated code runs correctly on the first try.

The security curve tells a different story. It hasn't moved. The models learned to write working code. They didn't learn to write safe code. And with AI coding tool adoption accelerating — Stack Overflow's 2025 Developer Survey puts weekly usage at 65% of all developers — that flat line is compounding into a crisis.

Java Is the Worst. No Language Is Safe.

Veracode's Spring 2026 data breaks down security failure rates by language. Java is the clear outlier — but no ecosystem escapes unscathed:

{langData.map((l) => (

{l.lang}

{l.failRate}% fail

))}

Security failure rates (AI-generated code). Source: Veracode Spring 2026 GenAI Code Security Report.

Java's 70%+ failure rate likely reflects its verbose security patterns — things like input validation, prepared statements, and output encoding that require deliberate, multi-step construction. AI models tend to optimize for brevity, and Java security is the opposite of brief.

The CVE Count Is Accelerating

Georgia Tech's SSLab launched the Vibe Security Radar in May 2025 to track CVEs directly attributable to AI coding tools in public vulnerability databases (CVE.org, NVD, GHSA, OSV). The trend line is not subtle:

CVEs from AI-generated code (2026 monthly confirmed)

{cveData.map((d) => (

{d.count}

{d.month}

))}

35 confirmed CVEs in a single month. That's a 483% jump from January. Researcher Hanqing Zhao from Georgia Tech told The Register on March 26, 2026 that these are conservative figures: "Those 74 cases [combined Jan–Mar] are confirmed instances… the estimated real number is 400–700 cases" across the open-source ecosystem once unattributed cases are factored in.

Of the 35 March CVEs, 27 were attributed to Claude Code — a figure that correlates directly with volume. Claude Code added 30.7 billion lines of code to public repos in the past 90 days. More output means more surface area for vulnerabilities to slip through.

Where AI Fails: The Specific Patterns

The Veracode data identifies which vulnerability classes AI models consistently miss. The failure rates are striking because these aren't obscure edge cases — they're OWASP Top 10 staples that have been known and documented for decades:

| Vulnerability Class | CWE | AI Failure Rate | | --- | --- | --- | | Cross-Site Scripting | CWE-80 | 86% | | Log Injection | CWE-117 | 88% | | Privilege Escalation Paths | — | +322% | | Design Flaws | — | +153% | | Secrets Exposure | — | +40% |

AI failure rates for XSS/Log Injection: Veracode Spring 2026. Privilege escalation/design flaws/secrets: Apiiro Fortune 50 analysis (relative increase vs. human-written code).

The privilege escalation and design flaw numbers from Apiiro's Fortune 50 analysis are particularly alarming because they aren't syntax errors — they're architectural mistakes that static analysis tools often miss. AI writes code that works correctly but is fundamentally structured in ways that expand attack surfaces.

Bigger Models Don't Fix the Problem

The intuitive assumption is that larger, more capable models will solve the security gap. Veracode's data says otherwise: 20B and 400B parameter models cluster around the same 55% security pass rate. Intelligence at scale doesn't automatically translate to security awareness.

Model size Security pass rate ────────────────────────────────── 20B params ~52% 70B params ~54% 400B params ~56% GPT-5 (RL) ~71% ← extended reasoning exception

The one exception is GPT-5 with extended reasoning enabled, which reaches 70–72% — a meaningful improvement, but still means **28–30% of generated code is vulnerable**. Extended reasoning forces the model to think through security implications step by step before generating output. The finding suggests the security gap isn't a knowledge problem — the models know about XSS and SQL injection. It's a generation priority problem: without explicit prompting, security gets optimized away in favor of brevity and correctness.

The Moltbook Breach: What This Looks Like in Production

In February 2026, Wiz researchers disclosed a breach of Moltbook — a social network built entirely via vibe coding. The vulnerable application exposed 1.5 million authentication tokens and 35,000 email addresses via a misconfigured database.

Attack chain

  1. AI generates database connection layer with overly permissive access patterns

  2. No input validation on user-facing endpoints (classic AI omission)

  3. Authentication tokens stored in queryable table without row-level security

  4. Single request exposes full token table to unauthenticated caller

None of these are exotic vulnerabilities. They're exactly the patterns Veracode documents as AI's blind spots. The Moltbook team wasn't incompetent — they just trusted AI output without running it through a security lens. According to Aikido Security's 2026 report, they're far from alone: 1 in 5 breaches is now caused by AI-generated code.

What This Means for Your Team

Know your AI surface area first

Veracode found that 82% of companies carry significant security debt — up from 74% a year ago. Before you can fix AI-introduced vulnerabilities, you need to know where your AI-generated code lives. You can't secure what you can't measure. GitIntel tells you exactly which commits came from AI tools, which files they touched, and how much of your codebase they wrote.

Treat AI code like third-party code

Your organization probably has stricter review processes for external libraries than for AI-generated code. That's backwards. A 45–55% security pass rate is worse than most third-party dependencies you'd ever ship. AI code should go through the same — or stricter — security review pipeline as external packages.

Prioritize by language

If your stack includes Java and you're using AI coding tools, you're operating at a 70%+ security failure rate. That's not a reason to stop using AI tools — it's a reason to build language-specific review checkpoints. Java security patterns (input sanitization, prepared statements, output encoding) need explicit human verification regardless of the AI tool used.

Extended reasoning changes the calculus

GPT-5's extended reasoning mode achieving 70–72% security pass rates (vs. 45–55% for standard generation) suggests a straightforward mitigation: for security-sensitive code paths, use extended/chain-of-thought prompting that explicitly asks the model to reason through vulnerabilities before generating output. It's not a silver bullet, but it shifts the baseline meaningfully.

How to Track This in Your Own Repos

Before you can audit AI-generated code for security, you need to identify it. GitIntel surfaces AI attribution from Co-Authored-By git trailers, giving you a per-file, per-commit breakdown:

Scan your repo for AI-generated commits

gitintel scan --format json

Output sample

{ "total_commits": 1247, "ai_commits": 183, "ai_percentage": 14.7, "agents": { "claude-code": 179, "devin": 3, "copilot": 1 } }

Once you know which commits are AI-generated, you can feed that list to your existing SAST tools with elevated priority — ensuring every AI-authored file gets the same scrutiny as a third-party dependency import.

Measure your AI surface area.

You can't secure what you can't see. Run gitintel scan to find every AI-generated commit in your repo.

# Install
curl -fsSL https://gitintel.com/install.sh | sh

# Scan any repo
cd your-repo
gitintel scan

View on GitHub

Open source (MIT) · Local-first · No data leaves your machine

Data sources: Veracode Spring 2026 GenAI Code Security Report · Georgia Tech SSLab Vibe Security Radar (March 26, 2026) · Aikido Security 2026 · Apiiro Fortune 50 Analysis · Stack Overflow Developer Survey 2025.


Related reading on GitIntel: