Cost optimization: prompt caching & context management strategies

## Overview

A set of strategies to reduce API costs and manage context efficiently. Combining **1 + 4** gives the best results.

---

## Strategies

### 1. Prompt Caching (Anthropic API)
Repeated prefixes are charged at 10% the normal rate. Cache the stable system prompt + memory files.
- **Impact:** Often 70–90% cost reduction on long-context conversations.
- **Priority:** Biggest single win.

### 2. Sliding Window with Sticky Head
Keep the system prompt + most recent N messages, drop the middle. Simple and effective.

### 3. Compaction
When context exceeds a threshold, summarize older messages into a shorter representation.
- Loses some fidelity but caps the bill.

### 4. Memory Files + Retrieval
Extract important facts to disk, query on demand instead of keeping in context.
- This is what Claude Code does internally.

### 5. Tool Result Truncation
If tools return large outputs (web fetches, file reads), summarize before adding to context. Don't store raw HTML.

### 6. Two-Tier Model
Use Haiku for context-management / summarization tasks, Opus only for actual responses.
- ~10–20× cheaper for the summarization step.

---

## Recommendation

Start with **1 (prompt caching) + 4 (memory files + retrieval)** for the best cost/complexity trade-off.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cost optimization: prompt caching & context management strategies #49

Overview

Strategies

1. Prompt Caching (Anthropic API)

2. Sliding Window with Sticky Head

3. Compaction

4. Memory Files + Retrieval

5. Tool Result Truncation

6. Two-Tier Model

Recommendation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cost optimization: prompt caching & context management strategies #49

Description

Overview

Strategies

1. Prompt Caching (Anthropic API)

2. Sliding Window with Sticky Head

3. Compaction

4. Memory Files + Retrieval

5. Tool Result Truncation

6. Two-Tier Model

Recommendation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions