refactor: replace file-based HTTP cache with SQLite backend by baszalmstra · Pull Request #5501 · prefix-dev/pixi

baszalmstra · 2026-02-13T13:31:16Z

Description

Replaces the default file-based CACacheManager with a new SqliteCacheManager that stores all cached HTTP responses in a single SQLite database file instead of many small files on disk.

Motivation:

File-based caching creates many small files, which performs poorly on HPC and network filesystems and windows.
A single SQLite database file is more efficient for these environments
Reduces filesystem overhead and improves concurrent access patterns

Implementation Details:

New SqliteCacheManager implements the CacheManager trait from http_cache_reqwest
Database uses WAL journal mode for good concurrent read performance
Sets synchronous = NORMAL since this is a cache and data loss on crash is acceptable
Response body stored as raw BLOB (no serialization overhead)
Response metadata (headers, status, url, version) and cache policy stored as JSON columns
Includes 5-second busy timeout for concurrent process coordination
Parent directory is created automatically if it doesn't exist

Fixes #5439

How Has This Been Tested?

The change integrates with existing HTTP caching infrastructure. The CacheManager trait implementation ensures compatibility with the http_cache_reqwest library's cache layer. Existing code paths that use HTTP caching will automatically use the new SQLite backend without modification.

Further testing should be done manually and in CI.

AI Disclosure

Written by Claude Code Opus 4.6 Extended.

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas

nichmor · 2026-02-16T10:17:37Z

should we also update pixi clean command to remove sqlite file?

The PyPI mapping system was using cacache (CACacheManager) which creates many small files on disk. This works poorly on HPC and network filesystems where metadata operations on many small files are expensive. Replace CACacheManager with a new SqliteCacheManager that stores all HTTP cache entries in a single SQLite database file. The implementation: - Uses WAL journal mode for good concurrent read performance - Sets synchronous=NORMAL since this is a cache (crash data loss is OK) - Configures a 5s busy_timeout for concurrent process access - Serializes HttpResponse + CachePolicy together as JSON blobs - Fully respects HTTP cache semantics (same CacheManager trait) The SQLite database is stored at: ~/.cache/pixi/conda-pypi-mapping/http_cache.sqlite https://claude.ai/code/session_01XykR7AMvHDmUnrhnzptwW1

bincode serializes the response body as raw bytes, avoiding the base64 overhead that serde_json would introduce for the Vec<u8> body field. This also matches what the original CACacheManager used. https://claude.ai/code/session_01XykR7AMvHDmUnrhnzptwW1

… columns Instead of serializing the entire HttpResponse+CachePolicy as a single blob, split the schema into three columns: - body: raw BLOB (no serialization overhead for response bytes) - response_meta: JSON (headers, status, url, version) - policy: JSON (HTTP cache policy) This avoids any encoding overhead for the response body and keeps the metadata human-readable for debugging. https://claude.ai/code/session_01XykR7AMvHDmUnrhnzptwW1

Move the SQLite-backed CacheManager out of pypi_mapping into a standalone crate at crates/http_cache_sqlite. This implementation is not pixi-specific and can be reused by any consumer of http-cache-reqwest that wants a single-file SQLite cache instead of many small files. https://claude.ai/code/session_01XykR7AMvHDmUnrhnzptwW1

Tests cover: - get on missing key returns None - put then get roundtrips body, status, and headers - put overwrites existing entries - delete removes entries - delete on nonexistent key is ok - multiple keys are independent - response headers are preserved - binary body (all 256 byte values including null) - empty body - data persists across reopen of the database - parent directories are created automatically https://claude.ai/code/session_01XykR7AMvHDmUnrhnzptwW1

The workspace clippy config disallows std::fs methods. Switch create_dir_all to fs_err::create_dir_all for better error messages. https://claude.ai/code/session_01XykR7AMvHDmUnrhnzptwW1

Store the SQLite database directly as ~/.cache/pixi/conda-pypi-mapping.sqlite instead of nesting it inside a subdirectory. Simpler and avoids creating an extra directory just for one file. https://claude.ai/code/session_01XykR7AMvHDmUnrhnzptwW1

https://claude.ai/code/session_01XykR7AMvHDmUnrhnzptwW1

baszalmstra · 2026-02-17T11:51:49Z

Comparing the new SqliteCacheManager against the previous CACacheManager (file-based, backed by cacache) for the
PyPI mapping HTTP cache.

Results

Operation	SQLite	CACacheManager	Speedup
put	82 µs	2.85 ms	35x
get (hit)	11.8 µs	284 µs	24x
get (miss)	3.5 µs	43.2 µs	12x
put (overwrite)	114 µs	3.68 ms	32x
delete	218 µs	4.15 ms	19x
get (hit, 500 keys)	20.2 µs	296 µs	15x
cold start (init + first put)	8.2 ms	6.4 ms	0.8x

Summary

SQLite is 12–35x faster across all steady-state operations. The only case where cacache is faster is cold start
(~1.3x), which involves creating a new database file, setting WAL/sync pragmas, and creating the table. This cost is
paid once per process and is negligible in practice.

No await points are held while the lock is held, so a synchronous mutex avoids the overhead of the tokio runtime for lock acquisition.

Set mmap_size to 32 MB so SQLite can use memory-mapped I/O for read operations. This is a cap, not a pre-allocation — the OS maps only what the file actually uses and silently falls back to read() if mmap is unavailable.

…und-trip Add From conversions between local HttpVersion and upstream http_cache::HttpVersion, then build/deconstruct HttpResponse by accessing its fields directly. This eliminates two serde_json::Value round-trips per cache get/put.

Reuse compiled SQL statements across calls by using prepare_cached instead of execute, matching what we already do for get.

baszalmstra · 2026-02-17T13:50:47Z

@nichmor I fixed the clean situation as well.

ruben-arts · 2026-03-03T13:59:01Z

I'm very hessitant to merge this as it adds a huge dependency to the cargo workspace: https://crates.io/crates/libsqlite3-sys.

This will compile the libsqlite3 package on cargo build which on my mac M4 pro is a 22 second compilation. Developing on Pixi is already cumbursome, because of the compile times, especially the non incremental times. This would be the second largest dependency 😢.

Here is a little overview of cargo build --timings in a clean build:

I would like to challenge you to figure out if we can avoid the use of this dependency, if we could use a different strategy to solve the given issue, or make use of sqlite in other parts of pixi's caching to make it a more impactful introduction to the overall project.

nichmor · 2026-03-03T14:04:48Z

I'm very hessitant to merge this as it adds a huge dependency to the cargo workspace: https://crates.io/crates/libsqlite3-sys.

This will compile the libsqlite3 package on cargo build which on my mac M4 pro is a 22 second compilation. Developing on Pixi is already cumbursome, because of the compile times, especially the non incremental times. This would be the second largest dependency 😢.

Here is a little overview of cargo build --timings in a clean build:

I would like to challenge you to figure out if we can avoid the use of this dependency, if we could use a different strategy to solve the given issue, or make use of sqlite in other parts of pixi's caching to make it a more impactful introduction to the overall project.

what have you used for plotting?

ruben-arts · 2026-03-03T14:40:50Z

what have you used for plotting?

cargo build --timings drops this html file in: pixi/target/pixi/cargo-timings/cargo-timing-20260303T134441.256099Z.html

baszalmstra · 2026-03-17T16:26:22Z

CLosing this for now.

baszalmstra requested review from nichmor and ruben-arts February 13, 2026 13:31

nichmor reviewed Feb 16, 2026

View reviewed changes

Comment thread crates/http_cache_sqlite/src/lib.rs Outdated

Comment thread crates/http_cache_sqlite/src/lib.rs Outdated

baszalmstra force-pushed the claude/optimize-pypi-caching-2uVVC branch from 48197d0 to eac6277 Compare February 17, 2026 08:58

claude and others added 9 commits February 17, 2026 09:58

style: fix clippy warning — use fs_err instead of std::fs

81367a8

The workspace clippy config disallows std::fs methods. Switch create_dir_all to fs_err::create_dir_all for better error messages. https://claude.ai/code/session_01XykR7AMvHDmUnrhnzptwW1

fix: remove broken doc link to CACacheManager

b3ab5eb

https://claude.ai/code/session_01XykR7AMvHDmUnrhnzptwW1

use tokio mutex

5112929

baszalmstra force-pushed the claude/optimize-pypi-caching-2uVVC branch from eac6277 to b3ab5eb Compare February 17, 2026 08:59

fix: make sure database is cleaned

587ce2f

baszalmstra force-pushed the claude/optimize-pypi-caching-2uVVC branch from 81b3600 to 587ce2f Compare February 17, 2026 09:23

baszalmstra added 4 commits February 17, 2026 13:50

perf: use std::sync::Mutex instead of tokio::sync::Mutex

d71bd42

No await points are held while the lock is held, so a synchronous mutex avoids the overhead of the tokio runtime for lock acquisition.

perf: enable memory-mapped I/O for SQLite cache reads

b9cbf7c

Set mmap_size to 32 MB so SQLite can use memory-mapped I/O for read operations. This is a cap, not a pre-allocation — the OS maps only what the file actually uses and silently falls back to read() if mmap is unavailable.

perf: use prepare_cached for put and delete statements

1e4a75b

Reuse compiled SQL statements across calls by using prepare_cached instead of execute, matching what we already do for get.

baszalmstra force-pushed the claude/optimize-pypi-caching-2uVVC branch from 525c93d to 1e4a75b Compare February 17, 2026 13:11

baszalmstra requested a review from nichmor February 17, 2026 13:50

ruben-arts removed their request for review March 3, 2026 13:59

baszalmstra closed this Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: replace file-based HTTP cache with SQLite backend#5501

refactor: replace file-based HTTP cache with SQLite backend#5501
baszalmstra wants to merge 14 commits intoprefix-dev:mainfrom
baszalmstra:claude/optimize-pypi-caching-2uVVC

baszalmstra commented Feb 13, 2026

Uh oh!

nichmor commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

baszalmstra commented Feb 17, 2026

Uh oh!

baszalmstra commented Feb 17, 2026

Uh oh!

ruben-arts commented Mar 3, 2026

Uh oh!

nichmor commented Mar 3, 2026

Uh oh!

ruben-arts commented Mar 3, 2026

Uh oh!

baszalmstra commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

baszalmstra commented Feb 13, 2026

Description

How Has This Been Tested?

AI Disclosure

Checklist:

Uh oh!

nichmor commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

baszalmstra commented Feb 17, 2026

Results

Summary

Uh oh!

baszalmstra commented Feb 17, 2026

Uh oh!

ruben-arts commented Mar 3, 2026

Uh oh!

nichmor commented Mar 3, 2026

Uh oh!

ruben-arts commented Mar 3, 2026

Uh oh!

baszalmstra commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants