LLM-accessible markdown file database.
This repository now includes a working foundation for MarkdownKeeper:
- Python package structure under
src/markdownkeeper - Config loading from
markdownkeeper.toml - SQLite schema initialization for core entities (
documents,headings,links,tags,concepts,document_chunks,embeddings,query_cache,events) - CLI commands for indexing, retrieval, validation, indexing artifacts, watching, and
API hosting:
mdkeeper show-configmdkeeper init-dbmdkeeper scan-file <file>mdkeeper query <text>mdkeeper get-doc <id>mdkeeper check-linksmdkeeper build-indexmdkeeper find-concept <concept>mdkeeper watchmdkeeper serve-apimdkeeper write-systemdmdkeeper daemon-start <watch|api>mdkeeper daemon-stop <watch|api>mdkeeper daemon-status <watch|api>mdkeeper daemon-restart <watch|api>mdkeeper daemon-reload <watch|api>mdkeeper statsmdkeeper embeddings-generatemdkeeper embeddings-statusmdkeeper embeddings-eval <cases.json>mdkeeper semantic-benchmark <cases.json>
python -m markdownkeeper.cli.main init-db --db-path .markdownkeeper/index.db
python -m markdownkeeper.cli.main scan-file README.md --db-path .markdownkeeper/index.db --format json
python -m markdownkeeper.cli.main query "markdown" --db-path .markdownkeeper/index.db --format json --search-mode semantic
python -m markdownkeeper.cli.main build-index --db-path .markdownkeeper/index.db --output-dir _index
python -m markdownkeeper.cli.main check-links --db-path .markdownkeeper/index.db --format json
python -m markdownkeeper.cli.main find-concept kubernetes --db-path .markdownkeeper/index.db --format json
python -m markdownkeeper.cli.main get-doc 1 --db-path .markdownkeeper/index.db --format json --include-content --max-tokens 200
python -m markdownkeeper.cli.main watch --mode auto --interval 0.5 --duration 5
python -m markdownkeeper.cli.main write-systemd --output-dir deploy/systemd
python -m markdownkeeper.cli.main daemon-start watch --pid-file .markdownkeeper/watch.pid
python -m markdownkeeper.cli.main daemon-status watch --pid-file .markdownkeeper/watch.pid
python -m markdownkeeper.cli.main daemon-stop watch --pid-file .markdownkeeper/watch.pid
python -m markdownkeeper.cli.main daemon-restart watch --pid-file .markdownkeeper/watch.pid
python -m markdownkeeper.cli.main daemon-reload watch --pid-file .markdownkeeper/watch.pid
python -m markdownkeeper.cli.main stats --db-path .markdownkeeper/index.db --format json
python -m markdownkeeper.cli.main embeddings-generate --db-path .markdownkeeper/index.db
python -m markdownkeeper.cli.main embeddings-status --db-path .markdownkeeper/index.db --format json
python -m markdownkeeper.cli.main embeddings-eval examples/semantic-cases.json --db-path .markdownkeeper/index.db --k 5 --format json
python -m markdownkeeper.cli.main semantic-benchmark examples/semantic-cases.json --db-path .markdownkeeper/index.db --k 5 --iterations 3 --format jsonpython -m markdownkeeper.cli.main serve-api --db-path .markdownkeeper/index.db --host 127.0.0.1 --port 8765Then call:
POST /api/v1/querywith methodsemantic_queryPOST /api/v1/get_docwith methodget_document(include_content,max_tokens,sectionsupported)POST /api/v1/find_conceptwith methodfind_by_conceptGET /health
Track progress by checking items as they are completed.
- Implement durable watcher queue persistence and replay after restart
- Add event coalescing and idempotent processing for create/modify/move/delete bursts
- Validate restart-safe ingestion under rapid file changes
- Promote model-backed embeddings as primary runtime path (with fallback retained)
- Add chunk-level embedding retrieval and stronger hybrid ranking (vector + lexical + concept + freshness)
- Add evaluation harness for precision@5 and semantic regression tests
- Finalize systemd hardening, lifecycle semantics, and config reload behavior
- Publish deployment runbook (install, upgrade, rollback, troubleshooting)
- Add structured metrics/logging for queue lag, embedding throughput, and API/query latency
- Run full integration/performance suite and meet KPI targets
- Freeze CLI/API contracts and document compatibility guarantees
- Publish changelog, migration notes, and tag
v1.0.0
- See
docs/USAGE.mdfor comprehensive usage documentation covering CLI commands, HTTP API reference, configuration, semantic search, embeddings, and LLM agent integration. - See
docs/OPERATIONS_RUNBOOK.mdfor install/upgrade/rollback and troubleshooting guidance. - See
docs/COMPATIBILITY.mdfor CLI/API/storage compatibility targets towardv1.0.0.
- Execute sustained high-throughput watcher stress benchmark and publish baseline metrics
- Run larger-corpus semantic tuning to improve precision@5 beyond baseline thresholds
- Continue expanding production ops docs (alerts, SLOs, incident playbooks)
- Improve ranking quality for lexical + concept queries