GitGPT Codebase QA
ActionsAbout
v1.1.1
LatestBy alobroke
Tags
(2)An intelligent Retrieval-Augmented Generation (RAG) system that enables developers to ask natural language questions about any GitHub repository and receive context-aware answers grounded in the repository's source code.
Repository RAG Assistant combines:
- AST-based code chunking
- Semantic embeddings
- FAISS vector search
- Cross-Encoder reranking
- Large Language Models (Qwen 2.5 Coder)
to create an AI-powered code understanding system capable of answering repository-specific questions.
Instead of manually searching through hundreds of files, developers can ask:
- How does OAuth authentication work?
- What does
make_not_authenticated_error()do? - How are security scopes handled?
- Which files are involved in user authentication?
and receive answers grounded in the repository source code.
- AST-based Python parsing
- Function-level chunking
- Method-level chunking
- Class metadata extraction
- Duplicate reduction
- BAAI BGE embeddings
- FAISS vector indexing
- Similarity search
- Fast retrieval
- Cross-Encoder reranking
- Improved retrieval quality
- Better repository grounding
- Qwen2.5-Coder integration
- Context-aware code explanations
- Repository-focused responses
- Source-backed answers
- FastAPI backend
- Swagger documentation
- JSON request/response format
GitHub Repository
│
▼
AST Code Chunking
│
▼
Code Chunks
│
▼
BGE Embeddings
│
▼
FAISS Vector Store
│
▼
Retriever
│
▼
Cross Encoder Reranker
│
▼
Qwen2.5-Coder
│
▼
Generated Answer
- Python
- FastAPI
- Pydantic
- FAISS
- Sentence Transformers
- BAAI/bge-small-en-v1.5
- cross-encoder/ms-marco-MiniLM-L-6-v2
- Qwen/Qwen2.5-Coder-3B-Instruct
backend/
│
├── api/
│ ├── routes.py
│ └── schemas.py
│
├── embeddings/
│ ├── embedder.py
│ └── build_index.py
│
├── ingestion/
│ ├── chunker.py
│ └── build_chunks.py
│
├── llm/
│ ├── model_loader.py
│ ├── generator.py
│ └── prompt.py
│
├── rag/
│ └── pipeline.py
│
├── retrieval/
│ ├── retriever.py
│ └── reranker.py
│
└── main.py
tests/
data/
docs/
git clone <repository-url>
cd repository-rag-assistantpython -m venv venvWindows:
venv\Scripts\activateLinux/Mac:
source venv/bin/activatepip install -r requirements.txtpython -m backend.ingestion.build_chunkspython -m backend.embeddings.build_indexuvicorn backend.main:appServer:
http://127.0.0.1:8000
Swagger UI:
http://127.0.0.1:8000/docs
Request:
{
"question": "What does make_not_authenticated_error do?"
}Response:
{
"answer": "The make_not_authenticated_error method creates an HTTPException with status code 401 and sets the WWW-Authenticate header..."
}- How does OAuth authentication work?
- What is the purpose of Security()?
- How are OAuth scopes handled?
- Explain the authentication flow.
- Which files implement API key security?
- How does dependency injection work?
Current pipeline:
- AST-based chunking
- 588 indexed chunks
- FAISS vector retrieval
- Cross-Encoder reranking
- Qwen 2.5 Coder generation
- Multi-language repository support
- Multi-repository indexing
- Hybrid Search (BM25 + Vector Search)
- Docker deployment
- React frontend
- Conversation memory
- GitHub repository ingestion via URL
- Source citations in responses
- GitHub Action integration
MIT License
GitGPT Codebase QA is not certified by GitHub. It is provided by a third-party and is governed by separate terms of service, privacy policy, and support documentation.