Welcome! In this workshop, you'll build an AI-powered application using Couchbase Shell, starting with a simple chat request and evolving it into a RAG (Retrieval-Augmented Generation) system.
By the end of this workshop, you'll have:
- A working connection to OpenAI's API
- A Couchbase Capella database with vector search
- A simple chat application
- A RAG system that enhances responses with your own data
Time Required: ~60 minutes
In this section, you'll install all the necessary tools and set up your accounts. By the end, you'll have Couchbase Shell running on your machine, an OpenAI API key ready to use, and a free Couchbase Capella cluster in the cloud.
We'll start by installing Couchbase Shell, which will be your command-line interface for interacting with both Couchbase and OpenAI throughout this workshop.
macOS:
brew install couchbase-shellLinux:
# Download the latest release
wget https://github.com/couchbaselabs/couchbase-shell/releases/download/v1.1.0/cbsh-x86_64-unknown-linux-gnu.tar.gz
tar -xzf cbsh-x86_64-unknown-linux-gnu.tar.gz
sudo mv cbsh /usr/local/bin/Windows:
# Download from GitHub releases
# https://github.com/couchbaselabs/couchbase-shell/releases/download/v1.1.0/cbsh-x86_64-pc-windows-msvc.zip
# Extract and add to PATHVerify installation:
cbsh --versionNext, you'll create an AI API Key. This key allows your applications to access LLMs APIs to generate embeddings(vector representation of your content), and to start chat conversation. Right now Couchbase Shell is compatible with OpenAI, Gemini and Bedrocks.
To Create an OpenAI Key:
- Go to https://platform.openai.com/api-keys
- Sign up or log in
- Click "Create new secret key"
- Copy your key (starts with
sk-)
Set your OpenAI API key:
# macOS/Linux
export OPENAI_API_KEY="sk-your-key-here"# Windows PowerShell
$env:OPENAI_API_KEY="sk-your-key-here"Important
Keep your key around: as we will also copy-paste it in Couchbase Shell's configuration later on.
Now you'll create a free tier cloud database cluster. Couchbase Capella is a fully-managed database service that will store your documents and handle vector search for RAG. The free tier is perfect for learning and small projects.
- Go to https://cloud.couchbase.com/sign-up
- Sign up for a free account
- Click "Create Cluster"
- Select Free Tier (Capella Trial)
- Choose a region close to you
- Name your cluster (e.g., "rag-workshop")
- Wait 2-3 minutes for provisioning
With your cluster created, you'll now set up an API Key, and configure security settings so Couchbase Shell can connect.
Once your cluster is ready:
-
Create an API Key on Organization Level:
- Go to Organization Settings
- Click on "API Keys", "Generate Key"
- Choose a Key Name, check all Organization Roles
- Click on "Generate Key"
- Make sure you copy the API Key and API Secret
Caution
API Secret will only be shown once, namely just after you've created the API Key. Hence do not forget to copy the API Secret or "Download" the whole API Key information on the creation window.
-
Assign Allowed IP Addresses for your Cluster:
- Go to Cluster "Settings"
- Click on "Allowed IP Addresses" under Networking menu
- Click "Add Allowed IP" button
- Select either "Add Current IP Address" or "Allow Access from Anywhere" (which uses 0.0.0.0/0)
- If you select "Allow Access from Anywhere", you have to confirm this choice
Note
"Allow Access from Anywhere" allows without restriction any connection to your cluster. This is useful for testing and development purposes, but in production you must consider to restrict the allowed IP addresses to your cluster.
In this section, you'll connect your local Couchbase Shell to your cloud database. You'll register your cluster, verify the connection works, and create the organizational structure (scope and collection) where your data will live.
Let's add the Couchbase Capella API Key and the LLM Configuration. yourOrgIdentifier can be whatever you want. It will be used later on to associate an API key with a cluster configuration.
- create a folder named
.cbshin the same folder, where Couchbase Shell executable will be run, or in your home directory like~/.cbsh/ - open/create
~/.cbsh/configand edit this file with the following content:
version = 1
[[capella-organization]]
identifier = "yourOrgIdentifier"
access-key = "yourAccessKey"
secret-key = "yourSecretKey"
default-project = "Trial - Project"
[[llm]]
identifier = "OpenAI-small"
provider = "OpenAI"
embed_model = "text-embedding-3-small"
chat_model = "gpt-3.5-turbo"
api_key = "sk-your-key"
Note
- The value of
identifierkey in this config file will also be used in step 2.3 - Replace
yourAccessKeyandyourSecretKeywith API Key's values, which have been created in step 1.4
Let's launch Couchbase Shell and explore its interactive command-line interface.
cbshYou should see the Couchbase Shell prompt:
# MacOS/Linux
👤 🏠
>
# Windows PowerShell
>
You'll now tell Couchbase Shell how to connect to your cloud cluster by providing the connection string, username, and password you created earlier.
Select the Capella project you will be working on:
# Select a Project
projects | cb-env project $in.0.name⚠ Understanding the following is important for the rest of the workshop, as we will manipulate JSON, which are all dataframes in a Couchbase Shell context.
Couchbase Shell is based on nushell, where everything structured is managed as a dataframe, and every commands can be piped. Here Project returns the list of projects your API Key gives you access to. You can type projects to display the list. It's piped in the next command cb-env project that requires a string argument. (type cb-env project -h to see the details of the command). '$in' refers to whatever was piped in that command. As it's a list of records, '$in.0.name' will get the first element of the list, then the value of the record 'name'.
Now that the Project has been selected, we can list available clusters by running the clusters command.
We can assign the name our Free Tier cluster by running:
let cluster_name = clusters | $in.0.nameThis variable will be accessible with $cluster_name until you exit Couchbase Shell.
The following command allows you to register the cluster:
Note
Please be sure that the parameter --capealla-organization has the same value with the identifier key, which you've already defined in your config file in step 2.1
# Register your cluster
( clusters get $cluster_name | cb-env register $cluster_name $in."connection string"
--capella-organization "yourOrgIdentifier"
--project (projects | $in.0.name)
--default-bucket chat_data
--default-scope workshop
--default-collection knowledge_base
--username workshop_user
--password yourPassword123!
--save )cb-env cluster $cluster_nameNote
Replace:
your-password with the password you will create (yes we are setting up the connection before creating the user, and it must contain an uppercase letter, lowercase letter, number and special character, and minimum 8 chars long.)
With an active Project and Cluster, we can create the cluster user.
credentials create --read --write --username workshop_user --password yourPassword123!You'll create a logical organization for your data. Think of a scope as a database and a collection as a table - this is where your documents will be stored.
# Create a bucket for our project
buckets create chat_data 1024# Create a scope for our project
scopes create --bucket chat_data workshop# Create a collection for documents
collections create --bucket chat_data --scope workshop knowledge_baseAt this point you can run cb-env to get an overview of your current context. The commands you will run will refer to these unless specified otherwise.
This is where the fun begins! You'll run your first commands to interact with OpenAI's API.
Ask OpenAI a question about Couchbase and see what it knows from its training data.
# Ask a simple question
ask "What is Couchbase?"You should see a response from OpenAI!
You'll experiment with different questions to understand the limitations of a basic chat system - it only knows what was in its training data and can't access your specific information.
# Ask about databases
ask "Explain what a document database is in simple terms"
# Ask about vector search
ask "What is vector search used for?"
# Ask something OpenAI doesn't know
ask "What are the latest features in Couchbase 8.0"Notice: OpenAI might not have specific, up-to-date information about Couchbase features. This is where RAG helps!
Now you'll start building your knowledge base. You'll add documents with information about Couchbase, generate vector embeddings for each document (numerical representations that capture meaning), and prepare everything for semantic search.
You'll insert five documents containing information about Couchbase features. This is your custom knowledge that the AI doesn't have in its training data.
Let's add some documents about Couchbase features:
# Document 1: Platform changes
doc upsert doc1 {title: "Platform Support Changes", category: "platform", text: "Couchbase Server 8.0 adds support for Alma Linux 10, Debian Linux 13, macOS 15 (Sequoia, development only), Oracle Linux 10, RHEL 10, Rocky Linux 10, and Windows Server 2025. It also drops support for Amazon Linux 2, macOS 12 (Monterey), SLES 12, Ubuntu 20.04 LTS, and Windows 10/Server 2019."}
# Document 2: GSI Vectors
doc upsert doc2 {title: "GSI Vector Indexes", category: "features", text: "Couchbase Server 8.0 introduces Hyperscale Vector indexes and Composite Vector indexes in the Index Service, enabling vector-search workloads (e.g., for AI applications). Hyperscale supports single vector column for very large datasets; Composite supports one vector plus scalar filters for hybrid vector+scalar queries, alongside existing Search Vector indexes."}If the operation is failing with a timeout error, we set pretty aggressive default, especially for a Free Tier instance, that you can change running the following:
cb-env timeouts --data-timeout 50000You should see a similar response as following:
╭──────────────────────────┬────────╮
│ data_timeout (ms) │ 50000 │
│ management_timeout (ms) │ 75000 │
│ analytics_timeout (ms) │ 75000 │
│ query_timeout (ms) │ 75000 │
│ search_timeout (ms) │ 75000 │
│ transaction_timeout (ms) │ 120000 │
╰──────────────────────────┴────────╯
Creating documents manually can be a painful process. A bulk import is also available and can be run like this:
doc import features.jsonResponse will be shown as follows:
╭───┬───────────┬─────────┬────────┬────────────────┬────────────────────╮
│ # │ processed │ success │ failed │ failures │ cluster │
├───┼───────────┼─────────┼────────┼────────────────┼────────────────────┤
│ 0 │ 30 │ 0 │ 30 │ Missing doc id │ fixedjohncreynolds │
╰───┴───────────┴─────────┴────────┴────────────────┴────────────────────╯
As you can see it failed because no id were provider. We can easily generate one like so:
open features.json | each { |x| $x| insert id (random uuid)} | save features_with_id.json
doc import --id-column id features_with_id.jsonResponse will be shown as follows:
╭───┬───────────┬─────────┬────────┬──────────┬────────────────────╮
│ # │ processed │ success │ failed │ failures │ cluster │
├───┼───────────┼─────────┼────────┼──────────┼────────────────────┤
│ 0 │ 30 │ 30 │ 0 │ │ fixedjohncreynolds │
╰───┴───────────┴─────────┴────────┴──────────┴────────────────────╯
A quick check to make sure all your documents were successfully stored in Couchbase.
# Query all documents
query "SELECT text, title FROM chat_data.workshop.knowledge_base"You should see all the documents!
You'll a query that converts text into vector embeddings using OpenAI's embedding model. These vectors will allow you to search by meaning, not just keywords.
We can use vector enrich-doc, a Couchbase Shell built-in function that takes the field of a document and send it to OpenAI, gets a Vector back and add it to the doc. The new doc is then upserted.
query "SELECT meta().id, * FROM `chat_data`.`workshop`.`knowledge_base`" | vector enrich-doc text | doc upsert
query "SELECT text, title FROM chat_data.workshop.knowledge_base" | exploreIf you explore the results, you will see each document have a textVector field containing vectors.
You can also open documents in the Capella Free Tier UI to see the same textVector field with vector embeddings
Here you'll configure Couchbase to perform vector search. You'll create a special search index that understands vector embeddings and can find documents similar to your query based on semantic meaning.
You'll create a search index that can handle both text and vector fields. This is what powers semantic search in your RAG system.
# Create a search index with vector support
vector create-index knowledge_base_idx textVector 1536Note: The index takes 30-60 seconds to build. You can check status with:
query indexesYou'll run a search query that finds the most similar documents to your question. This is the core of RAG!
# Function to search using vectors
let question = vector enrich-text "What are the latest features in Couchbase 8.0"
vector search knowledge_base_idx textVector $question.content.vector.0 --neighbors 10You should see a list of 10 documents that are closest to the question in meaning, ranked by similarity.
This is the main reason we are here, to ask a question and get a better answer thanks to RAG.
We run the search, use the result to fetch only the text field thanks to the subdoc get operation, select the content field and pass the rsult to the ask command.
# Function to search using vectors
vector search knowledge_base_idx textVector $question.content.vector.0 --neighbors 10 | subdoc get text | select content | ask $question.content.text.0The result is a precise answer to your questions thanks to the addition of relevant context using RAG.
This is the culmination of everything! You'll combine vector search with OpenAI chat to create a RAG system. When someone asks a question, you'll search your knowledge base, add relevant context to the prompt, and get accurate, informed responses.
You will create a new Couchbase Shell/Nushell function that wraps up the search and ask part. Create a new file named rag.nu with the following content:
# RAG-enhanced chat function
def rag-ask [question: string] {
let vectorized_question = ( vector enrich-text $question | $in.content.vector.0 )
let search_results = vector search knowledge_base_idx textVector $vectorized_question --neighbors 10
let context = $search_results | subdoc get text | select content
$context | ask $question
}To use it, like most shell you can source it by running:
source rag.nu The rag-ask command should be available and usable like so:
rag-ask "What are the latest features in Couchbase 8.0"You should see a similar response as follows:
Embedding batch 1/1
Couchbase Server 8.0 comes with several new and enhanced features, including:
1. **Search Improvements**: Now supporting the BM25 algorithm for scoring search results, providing better hybrid search support and stable ordering across index partitions.
2. **Platform Support**: Added support for new platforms such as Alma Linux 10, Debian Linux 13, macOS 15 (Sequoia), Oracle Linux 10, RHEL 10, Rocky Linux 10, and Windows Server 2025. Some older platforms are dropped, like Amazon Linux 2 and macOS 12.
3. **Data Storage Changes**: Memcached-type buckets have been removed, encouraging users to replace them before upgrading. Additionally, the default storage engine for new buckets is now Magma with 128 vBuckets and a reduced minimum memory quota.
4. **Enhanced Security**: Added 'Hybrid' authentication mode allowing clients to use certificates or username/password credentials, with mandatory mTLS for inter-node communication. Also supports simultaneous certificate-based client auth and node-to-node encryption.
5. **Indexing Enhancements**: Introduced Hyperscale Vector indexes and Composite Vector indexes in the Index Service, facilitating vector-search workloads, especially for AI applications.
6. **Native Encryption at Rest**: Support for native encryption at rest for bucket data, logs, audit, and configuration, with the option for third-party key management solutions.
7. **Bucket REST API Updates**: The Bucket REST API has new settings and some older parameters have been removed, allowing for greater control and management flexibility.
These features collectively enhance performance, security, and usability of Couchbase Server 8.0, catering to a wider range of scenarios and use cases.
This is the moment of truth! You'll ask the same question using both the regular chat function (from Part 3) and your new RAG function, and compare the results side-by-side.
First, try the original chat function:
# Regular chat without RAG
ask "What is Couchbase Capella and what does the free tier include?"Now try with RAG:
# RAG-enhanced chat
rag-ask "What is Couchbase Capella and what does the free tier include?"Compare the results! The RAG version should provide specific details about the 50GB storage and 1GB RAM from your knowledge base.
You'll test several more questions to really see the difference. The RAG version should consistently provide more accurate, specific information from your knowledge base.
# Question about features, ChatGPT
ask "Tell me about Couchbase scopes and collections"# Question about features, RAG
rag-ask "Tell me about Couchbase scopes and collections"# Question about vector search, ChatGPT
ask "How does vector search work in Couchbase?"# Question about vector search, RAG
rag-ask "How does vector search work in Couchbase?"Want to take it further? These optional enhancements will make your RAG system more powerful by adding source citations and conversation memory.
Sometimes using ask or vector enrich-text is not flexible enough. They can be reimplemented easily:
# Define a function build the message history
def build_messages [sysprompt, context, message] {
mut messages = [
{
role: "system",
content: $sysprompt
},
];
$messages = $context | values | get 0 | reduce --fold $messages {|item, acc| $acc | append {role: "system", content: $item} }
$messages = $messages | append {role: "user", content: $message}
$messages
}
# Define a function to chat with OpenAI
def chat [model, message, options] {
let context = $in
let url = "https://api.openai.com/v1/chat/completions"
let messages = build_messages "You are a helpful assistant. Always start by saying Hi Laurent!" $context $message
let json = {
model: $model,
messages: $messages,
temperature: 0.7
};
let json = $json | merge $options
let response = ( http post -e -f $url $json --headers ["Authorization" $"Bearer ($env.OPENAI_API_KEY) " ] --content-type "application/json")
$response.body.choices.0.message.content
}
# Function to generate embeddings
def generate-embedding [text: string, options] {
let api_key = $env.OPENAI_API_KEY
let url = " https://api.openai.com/v1/embeddings"
let json = {
model: "text-embedding-3-small",
input: $text
};
let json = $json | merge $options
let response = ( http post -e -f $url $json --headers ["Authorization" $"Bearer ($env.OPENAI_API_KEY) " ] --content-type "application/json")
$response.body.data.0.embedding
}
# RAG-enhanced chat function
def rag-ask [question: string] {
let vectorized_question = generate-embedding $question {}
let search_results = vector search knowledge_base_idx textVector $vectorized_question --neighbors 10
let context = $search_results | subdoc get text | select content
$context | chat "gpt-3.5-turbo" $question {}
}
This could give you an almost similar behavior as before, with a twist in the system prompt. Feel free to change it and experiment.
Congratulations! You've built a complete RAG system. Here's what you accomplished:
✅ Implemented Fully managed DBaaS with Capella Free Tier
✅ Installed Couchbase Shell and connected to Capella
✅ Set up OpenAI API integration
✅ Created a simple chat application
✅ Added knowledge documents to Couchbase
✅ Generated embeddings using OpenAI
✅ Configured vector search in Couchbase
✅ Built a RAG system that enhances responses with your data
✅ Compared regular chat vs. RAG-enhanced chat
| Feature | Regular Chat | RAG Chat |
|---|---|---|
| Knowledge Source | OpenAI's training data (cutoff date) | Your custom knowledge base |
| Accuracy | General information | Specific, up-to-date information |
| Control | Limited | Full control over data sources |
| Privacy | Sends queries to OpenAI only | Can use private data |
| Cost | Lower (fewer tokens) | Higher (more tokens for context) |
We have added data that was already available in this repo but you can easily ingest other sources.
- Try different
top_kvalues (number of documents retrieved) - Adjust OpenAI
temperature(0 = focused, 1 = creative) - Use different embedding models (text-embedding-3-large for better quality)
- Try different similarity metrics (cosine, dot_product, euclidean)
- Add error handling
- Implement caching for embeddings
- Monitor costs (OpenAI API usage)
- Couchbase Shell Docs: https://couchbase.sh
- Couchbase Capella: https://cloud.couchbase.com
- OpenAI API Docs: https://platform.openai.com/docs
- Vector Search Guide: https://docs.couchbase.com/server/current/vector-search/vector-search.html








