Skip to content

Cost optimization: prompt caching & context management strategies #49

@Rustam-Z

Description

@Rustam-Z

Overview

A set of strategies to reduce API costs and manage context efficiently. Combining 1 + 4 gives the best results.


Strategies

1. Prompt Caching (Anthropic API)

Repeated prefixes are charged at 10% the normal rate. Cache the stable system prompt + memory files.

  • Impact: Often 70–90% cost reduction on long-context conversations.
  • Priority: Biggest single win.

2. Sliding Window with Sticky Head

Keep the system prompt + most recent N messages, drop the middle. Simple and effective.

3. Compaction

When context exceeds a threshold, summarize older messages into a shorter representation.

  • Loses some fidelity but caps the bill.

4. Memory Files + Retrieval

Extract important facts to disk, query on demand instead of keeping in context.

  • This is what Claude Code does internally.

5. Tool Result Truncation

If tools return large outputs (web fetches, file reads), summarize before adding to context. Don't store raw HTML.

6. Two-Tier Model

Use Haiku for context-management / summarization tasks, Opus only for actual responses.

  • ~10–20× cheaper for the summarization step.

Recommendation

Start with 1 (prompt caching) + 4 (memory files + retrieval) for the best cost/complexity trade-off.

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions