Codebase RAG

A Retrieval Augmented Generation (RAG) system for querying and understanding codebases using Streamlit, Pinecone, and LLMs.

Overview

This project implements a conversational interface for querying codebases using RAG technology. It combines vector similarity search through Pinecone with Large Language Models to provide contextually relevant answers about your code.

Features

🔍 Semantic Code Search: Uses HuggingFace embeddings to find relevant code snippets
💬 Conversational Interface: Built with Streamlit for an intuitive chat experience
📚 Multiple Repository Support: Switch between different codebases using namespaces which are added to Pinecone
🤖 Advanced LLM Integration: Powered by Groq's LLama models
🔄 Context-Aware Responses: Maintains chat history for coherent conversations

Usage

Visit the Streamlit web application: https://rag-my-codebase.streamlit.app/
Select a repository namespace from the sidebar
Start asking questions about your codebase in the chat interface

Architecture

Frontend: Streamlit web interface
Embedding: HuggingFace Sentence Transformers
Vector Store: Pinecone
LLM: Groq (LLama models)
RAG Framework: Custom implementation with context augmentation

Future Work & Challenges

Implement AST parsing of codebase embeddings rather than dumping the whole codebase to embeddings to allow for more accurate and relevant answers as code follows a different structure to natural language.
Add a way to update the Pinecone index when you push any new commits to your repo. This would be done through a webhook that's triggered on each commit, where the codebase is re-embedded and added to Pinecone.
Add a way to chat with multiple codebases at the same time.
Add support for image uploads when chatting with the codebase this is called Multimodal RAG.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
Codebase_RAG.ipynb		Codebase_RAG.ipynb
README.md		README.md
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Codebase RAG

Overview

Features

Usage

Architecture

Future Work & Challenges

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mehmoodosman/codebase-rag

Folders and files

Latest commit

History

Repository files navigation

Codebase RAG

Overview

Features

Usage

Architecture

Future Work & Challenges

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages