[BUG] Server writes additional cache entries past explicit `cache_control` breakpoint, billing 1.25x for tokens that are never read

## Summary

When using an explicit `cache_control` breakpoint on the system block, with no top-level `cache_control` field (i.e., not opted into automatic caching), the server writes an *additional* cache entry inside the user content on warm calls. These additional writes are billed at the 1.25x cache-write rate but are never read back on subsequent calls, since they live past the only explicit breakpoint and the trailing content varies per request.

This contradicts the documented behavior at https://platform.claude.com/docs/en/build-with-claude/prompt-caching:

> Cache writes happen only at your breakpoint. Marking a block with `cache_control` writes exactly one cache entry: a hash of the prefix ending at that block. **The system does not write entries for any earlier position.**

I have confirmed this reproduces against both the latest SDK (0.102.0) and older versions (0.79.0), ruling out the Python SDK as the cause. The behavior is server-side.

## Reproduction

```python
import uuid
import anthropic

# Nonce keeps Call 1 cold across re-runs of this script.
SYSTEM_PROMPT = f"<nonce>{uuid.uuid4().hex}</nonce>\n" + "FACT: The capital of France is Paris.\n" * 200
USER_BLOCK_1 = (
    "Ignore this gibberish\n" * 200 + "\nGenerate some fictional JSON data about a person from france who is "
)
USER_BLOCK_2 = " years old"
STARTING_AGE = 20


def print_cache_usage(usage):
    print(
        f"cache_creation={usage.cache_creation_input_tokens} "
        f"cache_read={usage.cache_read_input_tokens} "
        f"uncached_input={usage.input_tokens}"
    )


client = anthropic.Anthropic()


# Call 1 (cold)
response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=32,
    system=[{"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}}],
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": USER_BLOCK_1},
                {"type": "text", "text": str(STARTING_AGE) + USER_BLOCK_2},
            ],
        },
        {"role": "assistant", "content": "```json"},
    ],
)
print("Call 1: ", end="")
print_cache_usage(response.usage)


# Call 2 (warm)
response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=32,
    system=[{"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}}],
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": USER_BLOCK_1},
                {"type": "text", "text": str(STARTING_AGE + 10) + USER_BLOCK_2},
            ],
        },
        {"role": "assistant", "content": "```json"},
    ],
)
print("Call 2: ", end="")
print_cache_usage(response.usage)


# Call 3 (warm)
response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=32,
    system=[{"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}}],
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": USER_BLOCK_1},
                {"type": "text", "text": str(STARTING_AGE + 20) + USER_BLOCK_2},
            ],
        },
        {"role": "assistant", "content": "```json"},
    ],
)
print("Call 3: ", end="")
print_cache_usage(response.usage)

```

## Expected behavior

Per the documentation, since there is exactly one `cache_control` marker (on the system block), only one cache entry should be written. Warm calls should show `cache_creation=0`:

```
Call 1: cache_creation=2227 cache_read=0    uncached_input=1424
Call 2: cache_creation=0    cache_read=2227 uncached_input=1424
Call 3: cache_creation=0    cache_read=2227 uncached_input=1424
```

## Actual behavior

Warm calls write an additional ~1416 tokens to cache, even though no second breakpoint exists. These writes are billed at 1.25x but never produce a cache read in subsequent calls:

```
Call 1: cache_creation=2227 cache_read=0    uncached_input=1424
Call 2: cache_creation=1416 cache_read=2227 uncached_input=8
Call 3: cache_creation=1416 cache_read=2227 uncached_input=8
```

There is currently no documented way to opt out.

## Request

1. Confirm whether this is intended behavior.
2. If intended: please update the prompt-caching docs (the quoted paragraph is misleading) and provide an opt-out mechanism (header or request field) so users on explicit caching can prevent unrequested cache writes.
3. If unintended: please patch.

## Environment

- `anthropic` Python SDK: `0.102.0` (also reproduced on `0.79.0`)
- Model: `claude-sonnet-4-5-20250929`
- Python: `3.12.3`
- OS: Ubuntu 24.04
- API: Direct Anthropic API (not Bedrock/Vertex)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Server writes additional cache entries past explicit `cache_control` breakpoint, billing 1.25x for tokens that are never read #1547

Summary

Reproduction

Expected behavior

Actual behavior

Request

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Server writes additional cache entries past explicit cache_control breakpoint, billing 1.25x for tokens that are never read #1547

Description

Summary

Reproduction

Expected behavior

Actual behavior

Request

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[BUG] Server writes additional cache entries past explicit `cache_control` breakpoint, billing 1.25x for tokens that are never read #1547