LLaMA Stack RAG Deployment

This guide helps you deploy the LLaMA Stack RAG UI on an OpenShift cluster using Helm.

Prerequisites

Before deploying, make sure you have the following:

Access to an OpenShift cluster with appropriate permissions.
NFD Operator and NVIDIA-GPU operator installed
Two GPU nodes, one for vLLM and the other for Safety Model(A10 nodes)
The label - you can have any label on the node and pass it as part of the parameter to the deploy script. Please refer deploy.sh.
Helm is installed
A valid Hugging Face Token.
Access to meta-llama/Llama-3.2-3B-Instruct model

Pre-deployment Steps

In case you have a fresh cluster -

Install NFD Operator from OperatorHub
Create default instance(no change needed)
Validate in the GPU nodes if you have required 10de labels in place
Install NVIDIA-GPU operator and create the ClusterPolicy(default)

This will set your cluster to use the provided GPUs and you can move forward to deploying AI workloads.

Deployment Steps

Prior to deploying, ensure that you have access to the meta-llama/Llama-3.2-3B-Instruct model. If not, you can visit this meta and get access - https://www.llama.com/llama-downloads/
Once everything's set, navigate to the Helm deployment directory:
```
cd deploy/helm
```
Run the install command:
```
make install
```
When prompted, enter your Hugging Face Token.

The script will:
- Create a new project: llama-stack-rag
- Create and annotate the huggingface-secret
- Deploy the Helm chart with toleration settings
- Output the status of the deployment

Post-deployment Verification

Once deployed, verify the following:

kubectl get pods -n llama-stack-rag

kubectl get svc -n llama-stack-rag

kubectl get routes -n llama-stack-rag

You should see the running components, services, and exposed routes.

Resource cleanup

make unistall

LLama UI

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data-ingestion		data-ingestion
deploy/helm		deploy/helm
docs		docs
ui		ui
LICENSE		LICENSE
Llama-UI.png		Llama-UI.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLaMA Stack RAG Deployment

Prerequisites

Pre-deployment Steps

Deployment Steps

Post-deployment Verification

Resource cleanup

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

opdev/Ancestry-Assistant-Blueprint

Folders and files

Latest commit

History

Repository files navigation

LLaMA Stack RAG Deployment

Prerequisites

Pre-deployment Steps

Deployment Steps

Post-deployment Verification

Resource cleanup

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages