feat: add FlowRAG workflow memory & retrieval module with ChromaDB baโฆ#3049
Open
ardi1s wants to merge 1 commit into
Open
feat: add FlowRAG workflow memory & retrieval module with ChromaDB baโฆ#3049ardi1s wants to merge 1 commit into
ardi1s wants to merge 1 commit into
Conversation
Contributor
|
All contributors have signed the CLA โ๏ธ โ
|
Author
|
I have read the Contributor License Agreement (CLA) and hereby sign the CLA. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds FlowRAG, an intelligent workflow memory and retrieval module to Crush.
It captures successful multi-step tool call sequences, stores them in a
ChromaDB vector database, and retrieves similar past workflows via semantic
search to accelerate future tasks.
How It Works
Trigger Detection (Dual Layer)
Primary โ Skill-driven: FlowRAG registers as a built-in Skill
(internal/skills/builtin/flowrag/SKILL.md). The LLM automatically
identifies when to trigger it โ when a multi-step task completes
without errors, when the user confirms satisfaction, or when the
user explicitly asks to save a workflow.
Secondary โ Keyword matching: CompletionDetector recognizes explicit
markers ("task complete", "save workflow", "remember this") alongside
colloquial confirmation phrases ("ok", "ๆๅฎไบ", "done").
Data Flow
Task completed โ prompt "Save workflow? (y/n)"
โ y: Segmenter extracts successful steps, skipping IsError retries
โ Embedding generated via OpenAI-compatible API or local trigram hash
โ Stored in ChromaDB collection "crush_workflows"
โ Next similar task: semantic search retrieves Top-K
โ Retrieved workflows injected as system prompt context
Embedding Strategy
Two embedding backends:
(e.g. text-embedding-3-small, DeepSeek, Ollama)
produces content-aware vectors locally โ no API key required
Vector Storage
VectorStoreBackend interface with two implementations:
collection on first use
used as automatic fallback when ChromaDB is unavailable
Changes
internal/flowrag/
โโ detector.go Completion marker detection (Skill + keyword)
โโ segmenter.go Workflow segmentation with error-step filtering
โโ store.go Vector store interface, ChromaDB + JSON backends,
OpenAI + Hash embedding clients
โโ retriever.go Semantic search + system prompt context builder
โโ workflow.go Orchestrator with unified public API
โโ workflow_test.go 17 unit tests covering all components
โโ cmd/demo/main.go Interactive CLI demo with trigram-hash embeddings
(zero-dependency, runs without any API key)
โโ cmd/e2e_test/main.go End-to-end semantic search verification script
โโโ README.md Full English documentation
internal/skills/builtin/flowrag/SKILL.md
Built-in Skill definition โ LLM autoloads this to know when FlowRAG
should trigger. Describes save/retrieve protocol, ChromaDB config,
and E2E testing procedure.
.gitignore
Added crush.local.json and flowrag_workflows.json to prevent
accidental credential / local data leaks.
Testing
Unit Tests
$ go test ./internal/flowrag/... -count=1 -v
PASS: TestCompletionDetector_Match (15 sub-cases)
PASS: TestTaskCompleteMarker (6 business keyword cases)
PASS: TestShouldTriggerFlowRAG
PASS: TestSegmenter_SuccessfulFlow
PASS: TestSegmenter_ExcludeErrorSteps
PASS: TestWorkflow_ToText
PASS: TestJSONFileStore_InsertAndSearch
PASS: TestJSONFileStore_SearchEmpty
PASS: TestCosineSimilarity (4 precision checks)
PASS: TestRetriever_BuildContextPrompt
PASS: TestRetriever_BuildContextPromptEmpty
PASS: TestWorkflowManager_Integration
PASS: TestTruncate
PASS: TestMustMarshalSteps
17 tests, all passing. Covers: detector, segmenter, both store
backends, cosine similarity precision, retriever, end-to-end
integration.
E2E Semantic Search
$ go run ./internal/flowrag/cmd/e2e_test/
Inserts 6 cross-domain workflows (auth fix, REST API, Python CSV
parser, DB migration, OAuth refresh, Docker Compose) then runs
semantic queries:
"fix a login authentication bug" โ #1 auth-login-fix โ
"python parse csv data" โ #1 python-csv-parser โ
"user registration API endpoint" โ #1 rest-api-register โ
"database migration SQL" โ #1 db-migration โ
"OAuth token refresh expired" โ #1 oauth-token-refresh โ
Interactive Demo
$ go run ./internal/flowrag/cmd/demo/
Fully interactive CLI demo using the HashEmbeddingClient (zero
API key required). Demonstrates insert/search/list operations
with natural language queries.
All existing tests continue to pass โ no modifications to
existing code paths outside the new package.
gofmt / go vet: clean