RAG API with LLaMA 3.2

This project deploys a private Retrieval-Augmented Generation (RAG) API using LLaMA 3.2 and vLLM.

Features

✅ Serverless (scale to zero) ✅ Private API ✅ Your own infrastructure ✅ Multi-GPU support

Installation

Clone this repository:

git clone <your-repo-url>
cd <your-repo-directory>

Install required packages:
```
pip install -r requirements.txt
```
Ensure these modules are in your project directory:
- ingestion.py
- retriever.py
- prompt_template.py

LLaMA Model Setup

Download LLaMA model weights from [appropriate source].
Place weights in [appropriate directory].
Update model_name in rag.py if necessary.

Usage

Add documents to chat with in the ./docs folder.
Start the server:
```
python server.py
```

Use the API:

python client.py --query "Your question here"

Deployment

Expose the server to the internet (authentication optional)
Enable "auto start" for serverless operation
Optimize performance with LitServe features (batching, multi-GPU, etc.)

Background

This project utilizes:

RAG (Retrieval-Augmented Generation)
vLLM for efficient LLM serving
Vector database (self-hosted Qdrant)
LitServe for scalable inference

For more details on these components, refer to the full documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG API with LLaMA 3.2

Features

Installation

LLaMA Model Setup

Usage

Deployment

Background

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
client.py		client.py
ingestion.py		ingestion.py
prompt_template.py		prompt_template.py
rag.py		rag.py
requirements.txt		requirements.txt
retriever.py		retriever.py

lmbase/vllm-rag-api

Folders and files

Latest commit

History

Repository files navigation

RAG API with LLaMA 3.2

Features

Installation

LLaMA Model Setup

Usage

Deployment

Background

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages