Back to Blog
ClaudeApril 14, 2026 · 6 min read

Anthropic's Mid-April Shifts: What the 5-Minute Cache TTL and Effort=High Default Actually Mean for Your Workflow

Two quiet changes to Claude Code — a shorter prompt cache TTL and a raised default effort level — are producing visible cost and behavior shifts for heavy users. Here's what changed, what Anthropic says, and how to adapt.

Published by GitIntel Research

TLDR

What Actually Changed

Two changes landed in the first half of April 2026 that most users didn't notice until their token bill or quota meter started behaving differently.

Change 1 — Prompt cache TTL: 1 hour → 5 minutes. Anthropic shortened the default Time-To-Live for Claude Code's prompt cache. Their stated reason is serving-side efficiency. The user-visible effect is simple: if your Claude Code session stays idle for more than five minutes between tool calls — a meeting, a test run, reading a long diff — the cached context expires. When you come back and ask the next question, the whole project context gets re-sent.

For a small repo, this is invisible. For a large monorepo with a rich CLAUDE.md and many open files, this compounds fast. Developers running day-long sessions are reporting token bills 20–40% higher than two weeks ago for identical work patterns.

Change 2 — Default effort level is now "high". As of Claude Code v2.1.94 (Apr 7), the default effort level was raised from medium to high for API-key users and all Team/Enterprise/Bedrock/Vertex/Foundry tiers. Higher effort means the model thinks longer, writes more verbose plans, and is more likely to surface edge cases. It also consumes quota faster.

The two changes interact. Effort=high produces longer context and longer thinking traces. Shorter cache TTL means that larger context has to be re-sent more often. The net effect on heavy users is noticeable.

The "Nerfing" Conversation

Starting around Apr 11, posts from well-known developers began claiming Opus 4.6 "feels dumber" — more verbose, less precise, worse at one-shot fixes. The word "nerfed" has shown up in a lot of threads.

Anthropic's public position is direct: the model weights have not been changed. What changed is the harness — the default effort level, cache behavior, and some UI-side adjustments in Claude Code itself.

This is worth taking at face value, and here's why. If you flip effort back to medium (via --effort medium or the in-session toggle), the verbosity drops back to what it was. That's consistent with Anthropic's explanation and inconsistent with "the weights got worse." The model is the same; the default interaction pattern is different.

What to Do If You Use Claude Code Heavily

If you're on the Max plan or running lots of Claude Code work:

  1. Use /compact before breaks. If you know you're about to step away for more than 5 minutes, compact the conversation first. You'll lose less to re-sending after cache expiry.
  2. Split long sessions. Multi-hour continuous sessions now have worse economics than two focused 45-minute sessions with a clean context in between.
  3. Consider effort=medium for mechanical work. File searches, simple edits, status checks, and template filling don't benefit from high effort. The default change helps complex reasoning but wastes tokens on routine tasks. Explicit effort selection gives back control.
  4. Watch your rate limit meter. Effort=high burns quota faster. If you're hitting limits earlier than usual, this is probably why.
  5. Don't treat cache TTL as permanent. Anthropic has adjusted TTL before. If enough heavy users push back, 5 minutes may not be the final setting.

The Bigger Picture

The direction of travel is clear: Anthropic is tuning the serving layer more aggressively now that Claude Code is running at production scale. Mythos-tier models are expensive to serve. Max-plan subscribers are growing fast. Every millisecond of cache lifetime and every notch of default effort shows up in the infrastructure bill.

For developers, this means the economics of AI-assisted coding are becoming more explicit. You're no longer just picking a model — you're picking a cache behavior, an effort level, a session shape. The teams that understand these knobs will get more out of the same plan than teams that don't.

This is also, quietly, a maturity signal. A year ago, prompt caching TTL wasn't a thing most developers thought about. Now it's a live variable in day-to-day workflow decisions. That's a sign the field is starting to behave like real infrastructure rather than a novelty.

Related Reading

Sources