Agentic Documentation Infrastructure for Cloud-Native OSS Ecosystems
DocuMind is an intelligent documentation infrastructure platform for OSS ecosystems like OpenKruise, Kubernetes, and Argo. It continuously ingests repositories, maintains semantic documentation context, detects documentation drift, and exposes repository intelligence through MCP-compatible interfaces.
- 🔄 Repository Ingestion — Clone and incrementally index GitHub repositories
- 📝 Semantic Chunking — Parse Markdown, YAML, and Go with heading-aware chunking
- 🧠 Embedding Pipeline — Local embeddings via Ollama (nomic-embed-text)
- 🔍 Hybrid Search — Vector similarity + keyword matching with version filtering
- 📋 Quality Evaluation — Detect broken links, stale docs, invalid YAML, duplicates
- 🕸️ Knowledge Graph — Cross-repository feature-doc-code linking
- 🔌 MCP Server — Model Context Protocol for AI IDE integration
- 🤖 Agentic Workflows — ReAct self-healing loop for automated doc fixes
- 📊 Metrics — Prometheus + OpenTelemetry observability
GitHub Repositories
│
▼
┌─────────────────────┐
│ Ingestion Engine │ ← Clone/Pull, File Walking, Change Detection
│ (go-git + walker) │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Parser Pipeline │ ← Markdown AST (goldmark), YAML, Go parsing
│ (goldmark + regex) │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Embedding Pipeline │ ← Ollama (nomic-embed-text) / OpenAI compatible
│ (Ollama client) │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Vector Store │ ← chromem-go (embedded, persistent)
│ + SQLite Metadata │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Knowledge Graph │ ← Feature ↔ Doc ↔ Code ↔ Release linking
│ (in-memory graph) │
└────────┬────────────┘
│
┌────┴────┐
▼ ▼
┌────────┐ ┌────────────┐
│ MCP │ │ REST API │
│ Server │ │ + Dashboard│
└────────┘ └────────────┘
│ │
▼ ▼
┌─────────────────────┐
│ AI IDEs / Agents │ ← Cursor, Windsurf, external agents
│ GitHub Actions │
└─────────────────────┘
- Go 1.22+
- Ollama (for local embeddings): Install Guide
- Git
# Clone the repository
git clone https://github.com/priya-sharma/documind.git
cd documind
# Pull the embedding model
ollama pull nomic-embed-text
# Build
make build
# Run
./bin/documind version# Ingest a repository
./bin/documind ingest --repo https://github.com/openkruise/kruise
# Ingest all configured repos
./bin/documind ingest --all
# Search documentation
./bin/documind search "How does CloneSet handle scaling?"
# Search with version filter
./bin/documind search --version v1.3 "sidecar injection"
# Run quality evaluation
./bin/documind evaluate --repo kruise
# Start HTTP API server
./bin/documind serve --http :8080
# Start MCP server (for AI IDEs)
./bin/documind serve --mcpdocumind/
├── cmd/
│ └── documind/
│ └── main.go # CLI entry point (cobra)
├── internal/
│ ├── config/ # Viper-based configuration
│ ├── models/ # Core domain types
│ ├── storage/ # SQLite metadata storage
│ ├── version/ # Build version info
│ ├── ingestion/ # Repository cloning & file walking
│ ├── parser/ # Markdown/YAML/Go parsing
│ ├── embedding/ # Ollama embedding pipeline
│ ├── vectorstore/ # chromem-go vector database
│ ├── search/ # Hybrid search engine
│ ├── evaluator/ # Documentation quality checks
│ ├── graph/ # Knowledge graph
│ ├── mcp/ # MCP server implementation
│ ├── api/ # REST API server
│ └── agent/ # ReAct agentic workflows
├── web/ # Dashboard (HTML/CSS/JS)
├── docs/ # Architecture documentation
├── .github/workflows/ # CI/CD pipelines
├── config.yaml # Default configuration
├── Makefile # Build automation
└── README.md
DocuMind uses a layered configuration system: defaults → config.yaml → env vars → CLI flags
# config.yaml
embedding:
provider: "ollama"
model: "nomic-embed-text"
endpoint: "http://localhost:11434"
repositories:
- name: "kruise"
url: "https://github.com/openkruise/kruise"
branches: ["master"]
search:
top_k: 10
hybrid_weight: 0.7 # 0=keyword, 1=vectorEnvironment variables use the DOCUMIND_ prefix:
export DOCUMIND_EMBEDDING_ENDPOINT=http://localhost:11434
export DOCUMIND_LOGGING_LEVEL=debugDocuMind exposes the following MCP tools for AI IDE integration:
| Tool | Description |
|---|---|
docs_lookup |
Semantic documentation retrieval |
release_lookup |
Version-aware retrieval |
architecture_summary |
Repository architecture overview |
feature_context |
Feature-to-code-to-doc mapping |
code_reference |
Implementation lookup |
evaluate_docs |
Documentation quality check |
- Project scaffolding & CLI
- Repository ingestion engine
- Markdown/YAML/Go parsing pipeline
- Embedding pipeline (Ollama + chromem-go)
- Semantic search with hybrid retrieval
- Documentation quality evaluation
- Knowledge graph
- Version-aware retrieval
- MCP server
- REST API
- Prometheus metrics
- ReAct agentic workflows
- GitHub Actions CI/CD
- Web dashboard
- Graph RAG
- Multi-language support
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Apache License 2.0 — see LICENSE for details.