Summary
When using an explicit cache_control breakpoint on the system block, with no top-level cache_control field (i.e., not opted into automatic caching), the server writes an additional cache entry inside the user content on warm calls. These additional writes are billed at the 1.25x cache-write rate but are never read back on subsequent calls, since they live past the only explicit breakpoint and the trailing content varies per request.
This contradicts the documented behavior at https://platform.claude.com/docs/en/build-with-claude/prompt-caching:
Cache writes happen only at your breakpoint. Marking a block with cache_control writes exactly one cache entry: a hash of the prefix ending at that block. The system does not write entries for any earlier position.
I have confirmed this reproduces against both the latest SDK (0.102.0) and older versions (0.79.0), ruling out the Python SDK as the cause. The behavior is server-side.
Reproduction
import uuid
import anthropic
# Nonce keeps Call 1 cold across re-runs of this script.
SYSTEM_PROMPT = f"<nonce>{uuid.uuid4().hex}</nonce>\n" + "FACT: The capital of France is Paris.\n" * 200
USER_BLOCK_1 = (
"Ignore this gibberish\n" * 200 + "\nGenerate some fictional JSON data about a person from france who is "
)
USER_BLOCK_2 = " years old"
STARTING_AGE = 20
def print_cache_usage(usage):
print(
f"cache_creation={usage.cache_creation_input_tokens} "
f"cache_read={usage.cache_read_input_tokens} "
f"uncached_input={usage.input_tokens}"
)
client = anthropic.Anthropic()
# Call 1 (cold)
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=32,
system=[{"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}}],
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": USER_BLOCK_1},
{"type": "text", "text": str(STARTING_AGE) + USER_BLOCK_2},
],
},
{"role": "assistant", "content": "```json"},
],
)
print("Call 1: ", end="")
print_cache_usage(response.usage)
# Call 2 (warm)
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=32,
system=[{"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}}],
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": USER_BLOCK_1},
{"type": "text", "text": str(STARTING_AGE + 10) + USER_BLOCK_2},
],
},
{"role": "assistant", "content": "```json"},
],
)
print("Call 2: ", end="")
print_cache_usage(response.usage)
# Call 3 (warm)
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=32,
system=[{"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}}],
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": USER_BLOCK_1},
{"type": "text", "text": str(STARTING_AGE + 20) + USER_BLOCK_2},
],
},
{"role": "assistant", "content": "```json"},
],
)
print("Call 3: ", end="")
print_cache_usage(response.usage)
Expected behavior
Per the documentation, since there is exactly one cache_control marker (on the system block), only one cache entry should be written. Warm calls should show cache_creation=0:
Call 1: cache_creation=2227 cache_read=0 uncached_input=1424
Call 2: cache_creation=0 cache_read=2227 uncached_input=1424
Call 3: cache_creation=0 cache_read=2227 uncached_input=1424
Actual behavior
Warm calls write an additional ~1416 tokens to cache, even though no second breakpoint exists. These writes are billed at 1.25x but never produce a cache read in subsequent calls:
Call 1: cache_creation=2227 cache_read=0 uncached_input=1424
Call 2: cache_creation=1416 cache_read=2227 uncached_input=8
Call 3: cache_creation=1416 cache_read=2227 uncached_input=8
There is currently no documented way to opt out.
Request
- Confirm whether this is intended behavior.
- If intended: please update the prompt-caching docs (the quoted paragraph is misleading) and provide an opt-out mechanism (header or request field) so users on explicit caching can prevent unrequested cache writes.
- If unintended: please patch.
Environment
anthropic Python SDK: 0.102.0 (also reproduced on 0.79.0)
- Model:
claude-sonnet-4-5-20250929
- Python:
3.12.3
- OS: Ubuntu 24.04
- API: Direct Anthropic API (not Bedrock/Vertex)
Summary
When using an explicit
cache_controlbreakpoint on the system block, with no top-levelcache_controlfield (i.e., not opted into automatic caching), the server writes an additional cache entry inside the user content on warm calls. These additional writes are billed at the 1.25x cache-write rate but are never read back on subsequent calls, since they live past the only explicit breakpoint and the trailing content varies per request.This contradicts the documented behavior at https://platform.claude.com/docs/en/build-with-claude/prompt-caching:
I have confirmed this reproduces against both the latest SDK (0.102.0) and older versions (0.79.0), ruling out the Python SDK as the cause. The behavior is server-side.
Reproduction
Expected behavior
Per the documentation, since there is exactly one
cache_controlmarker (on the system block), only one cache entry should be written. Warm calls should showcache_creation=0:Actual behavior
Warm calls write an additional ~1416 tokens to cache, even though no second breakpoint exists. These writes are billed at 1.25x but never produce a cache read in subsequent calls:
There is currently no documented way to opt out.
Request
Environment
anthropicPython SDK:0.102.0(also reproduced on0.79.0)claude-sonnet-4-5-202509293.12.3