Quick Start Guide

Installation

# Install dependencies
pip install -r requirements.txt

Run Without API Keys (Demo Mode)

# See example test cases
python demo.py

# Dry run to see what would be tested
python benchmark.py --no-api --num-samples 10

Run Tests

# Run unit tests
python test_benchmark.py

Run Benchmark (Requires API Keys)

Copy the example environment file:

cp .env.example .env

Edit .env and add your API key(s)
Run the benchmark:

# With OpenAI
python benchmark.py --provider openai --num-samples 20

# With Anthropic
python benchmark.py --provider anthropic --num-samples 20

# With specific model
python benchmark.py --provider openai --model gpt-4 --num-samples 10

Example Commands

# Quick test with 5 samples
python benchmark.py --num-samples 5

# Test only words with 4+ letter occurrences
python benchmark.py --min-letter-count 4 --num-samples 15

# Test longer words (10-20 characters)
python benchmark.py --min-word-length 10 --max-word-length 20 --num-samples 10

# Dry run to preview test cases
python benchmark.py --no-api --min-letter-count 4

Understanding Results

Results are saved to results/benchmark_results_TIMESTAMP.json with:

Each test case and its result
LLM's response
Whether it was correct
Summary statistics

Example result:

{
  "word": "strawberry",
  "letter": "R",
  "expected_count": 3,
  "llm_count": 2,
  "correct": false,
  "response": "2",
  "model": "gpt-3.5-turbo"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start Guide

Installation

Run Without API Keys (Demo Mode)

Run Tests

Run Benchmark (Requires API Keys)

Example Commands

Understanding Results

FilesExpand file tree

QUICKSTART.md

Latest commit

History

QUICKSTART.md

File metadata and controls

Quick Start Guide

Installation

Run Without API Keys (Demo Mode)

Run Tests

Run Benchmark (Requires API Keys)

Example Commands

Understanding Results