This guide helps you deploy the LLaMA Stack RAG UI on an OpenShift cluster using Helm.
Before deploying, make sure you have the following:
- Access to an OpenShift cluster with appropriate permissions.
- NFD Operator and NVIDIA-GPU operator installed
- Two GPU nodes, one for vLLM and the other for Safety Model(A10 nodes)
- The label - you can have any label on the node and pass it as part of the parameter to the deploy script. Please refer
deploy.sh. - Helm is installed
- A valid Hugging Face Token.
- Access to meta-llama/Llama-3.2-3B-Instruct model
In case you have a fresh cluster -
- Install NFD Operator from OperatorHub
- Create default instance(no change needed)
- Validate in the GPU nodes if you have required
10delabels in place - Install NVIDIA-GPU operator and create the ClusterPolicy(default)
This will set your cluster to use the provided GPUs and you can move forward to deploying AI workloads.
-
Prior to deploying, ensure that you have access to the meta-llama/Llama-3.2-3B-Instruct model. If not, you can visit this meta and get access - https://www.llama.com/llama-downloads/
-
Once everything's set, navigate to the Helm deployment directory:
cd deploy/helm -
Run the install command:
make install
-
When prompted, enter your Hugging Face Token.
The script will:
- Create a new project:
llama-stack-rag - Create and annotate the
huggingface-secret - Deploy the Helm chart with toleration settings
- Output the status of the deployment
- Create a new project:
Once deployed, verify the following:
kubectl get pods -n llama-stack-rag
kubectl get svc -n llama-stack-rag
kubectl get routes -n llama-stack-ragYou should see the running components, services, and exposed routes.
make unistall
