Update documentation

fbugarski · fbugarski · commit 5678700be101 · 2025-12-25T17:57:38.000+01:00
Signed-off-by: fbugarski &lt;filipbugarski@gmail.com&gt;
diff --git a/docs/api/embeddings.md b/docs/api/embeddings.md
@@ -1,26 +1,124 @@
 ---
 id: embeddings
-title: Embeddings
+title: Embeddings & RAG
 ---
 
+# Embeddings & Retrieval-Augmented Generation (RAG)
+
+Large Language Models (LLMs) are powerful, but they **do not know your private or internal data**.
+They are trained on public information and cannot access your documents, databases, or source
+code unless you explicitly provide that context.
+
+This is the problem that **Embeddings** and **Retrieval-Augmented Generation (RAG)** solve.
+
+---
+
+## The Problem LLMs Have
+
+Without embeddings and RAG:
+
+- LLMs cannot answer questions about private data
+- responses are often generic or inaccurate
+- models may hallucinate answers
+- updating knowledge requires retraining (slow and expensive)
+
+---
+
+## What Are Embeddings?
+
+An **embedding** is a numerical (vector) representation of text that captures its meaning.
+
+Embeddings allow Cube AI to:
+
+- compare text by semantic similarity
+- search documents by meaning instead of keywords
+- retrieve relevant context for LLM prompts
+
+In simple terms:
+
+> Embeddings allow Cube AI to understand and search your data.
+
+Cube AI embeddings are generated inside **Trusted Execution Environments (TEEs)**,
+ensuring that both input text and resulting vectors remain confidential.
+
+---
+
+## What Is RAG?
+
+**Retrieval-Augmented Generation (RAG)** is a technique where:
+
+1. Your data is converted into embeddings
+2. Relevant content is retrieved based on a user query
+3. The retrieved content is injected into the LLM prompt
+4. The model generates an answer grounded in your data
+
+Instead of asking the model to guess, RAG lets it **answer using facts you provide**.
+
+---
+
+## How RAG Works in Cube AI
+
+<!-- IMAGE: rag-flow-diagram -->
+<!-- Diagram: Documents → Embeddings → Vector Store → Retrieved Context → LLM -->
+
+The RAG flow in Cube AI looks like this:
+
+1. Documents are split into chunks
+2. Each chunk is converted into an embedding
+3. Embeddings are stored in a vector database
+4. A user asks a question
+5. Cube AI retrieves the most relevant chunks
+6. The LLM generates an answer using retrieved context
+
+All processing stays inside your Cube AI deployment.
+
+---
+
+## Why Use RAG with Cube AI?
+
+Using RAG enables:
+
+- chat over internal documentation
+- question answering over PDFs and files
+- AI assistants for support and operations
+- safer and more accurate LLM responses
+- no data leakage to external providers
+
+---
+
+## Common Use Cases
+
+### Internal Documentation Assistant
+Ask questions about internal docs, wikis, or README files.
+
+### Support & Helpdesk Bots
+Answer customer questions using company knowledge bases.
+
+### Codebase Search
+Query large repositories using natural language.
+
+### Knowledge-Based AI Assistants
+Build enterprise-grade ChatGPT-like systems backed by private data.
+
+---
+
+## Embeddings API Reference
+
 The embeddings endpoint allows you to generate vector representations of text.
 These vectors can be used for semantic search, clustering, retrieval-augmented
 generation (RAG), and similarity comparisons.
 
-Cube AI embeddings are generated inside **Trusted Execution Environments (TEEs)**,
-ensuring that input text and resulting vectors remain confidential.
-
 ---
 
-## Endpoint
+### Endpoint
 
 ```http
 POST /proxy/{domain_id}/v1/embeddings
 ```
 
 ---
 
-## Example Request
+### Example Request
 
 ```bash
 curl -k https://localhost/proxy/<domain_id>/v1/embeddings \
@@ -34,15 +132,23 @@ curl -k https://localhost/proxy/<domain_id>/v1/embeddings \
 
 ---
 
-## Response
+### Response
 
 Returns an OpenAI-compatible `embeddings` response object containing one or more
 embedding vectors.
 
 ---
 
-## Notes
+### Notes
 
 - Embeddings are **domain-scoped**
 - Input text is processed securely inside a TEE
 - Use embedding models such as `nomic-embed-text` for best results
+
+---
+
+## Next Steps
+
+- Combine Embeddings with **Chat Completions**
+- Explore available **Models**
+- Build a complete RAG pipeline using Cube AI
diff --git a/docs/integrations/continue.md b/docs/integrations/continue.md
@@ -4,48 +4,85 @@ title: Continue for VS Code
 sidebar_position: 1
 ---
 
-## Continue Integration for VS Code
+# Continue Integration for VS Code
 
-The **Continue** extension brings Cube AI’s LLM capabilities directly into
-Visual Studio Code, enabling inline completions, refactoring help, and
-chat-based assistance.
+The **Continue** extension brings **Cube AI** LLM capabilities directly into **Visual Studio Code**, enabling:
 
-This guide explains how to connect Continue with a Cube AI domain.
+- inline code completions
+- refactoring assistance
+- chat-based explanations
+- test and documentation generation
+
+This guide shows how to connect **Continue** with a **Cube AI domain** in a few simple steps.
+
+---
+
+## What You Will Get
+
+After completing this guide, you will be able to:
+
+- use Cube AI models inside VS Code
+- chat with your codebase
+- refactor and explain code using enterprise-grade LLMs
+- keep all data inside your Cube AI deployment
+
+---
+
+## Architecture Overview
+
+Continue runs locally inside VS Code and forwards requests to Cube AI, which handles authentication, model routing, and security, while all data remains inside your Cube AI deployment.
+
+<!-- IMAGE: architecture-diagram -->
+<!-- Add diagram: Continue → Cube AI → Models -->
 
 ---
 
 ## 1. Install Requirements
 
-1. Install **Visual Studio Code**  
-   [https://code.visualstudio.com](https://code.visualstudio.com)
+### Install Visual Studio Code
+https://code.visualstudio.com
 
-2. Install the **Continue** extension  
-   [https://www.continue.dev](https://www.continue.dev)
+### Install the Continue Extension
+https://www.continue.dev
 
 ---
 
 ## 2. Open Continue Configuration
 
-In Visual Studio Code:
+In **Visual Studio Code**:
 
-1. Click the **Continue** icon  
-2. Open the **Settings / gear** menu  
+1. Click the **Continue** icon in the sidebar  
+2. Open the **Settings (⚙️)** menu  
 3. Select **Configure Continue**
 
 This opens the configuration file:
 
-```yaml
+```
 .continue/config.yaml
 ```
 
+<!-- IMAGE: continue-open-config -->
+<!-- Screenshot: Continue icon + Configure option -->
+
 ---
 
-## 3. Configure Continue to Use Cube AI
+## 3. Generate a Cube AI Access Token
 
-Replace the contents of `config.yaml` with the configuration below.
+Before configuring Continue, generate an access token in **Cube AI UI**:
 
-Before editing the file, make sure you have generated a Cube AI access token.
-You can obtain it from the Cube AI UI under **Profile → Tokens**.
+1. Open Cube AI UI
+2. Go to **Profile → Tokens**
+3. Click **Generate token**
+4. Copy the token value
+
+<!-- IMAGE: cube-token-generation -->
+<!-- Screenshot: Profile → Tokens -->
+
+---
+
+## 4. Configure Continue to Use Cube AI
+
+Replace the contents of `.continue/config.yaml` with the configuration below.
 
 ```yaml
 name: Cube AI
@@ -57,15 +94,15 @@ models:
     provider: ollama
     model: tinyllama:1.1b
     apiKey: <access_token>
-    apiBase: https://<your-cube-instance>/proxy/<your-domain-id>
+    apiBase: https://<cube-instance>/proxy/<domain-id>
     requestOptions:
       verifySsl: false
 
   - name: starcoder2
     provider: ollama
     model: starcoder2:7b
     apiKey: <access_token>
-    apiBase: https://<your-cube-instance>/proxy/<your-domain-id>
+    apiBase: https://<cube-instance>/proxy/<domain-id>
     requestOptions:
       verifySsl: false
 
@@ -78,54 +115,66 @@ context:
   - provider: docs
 ```
 
-### Replace
+### Replace the placeholders
 
 - `<access_token>` → your Cube AI access token  
-- `<your-cube-instance>` → usually `localhost`  
-- `<your-domain-id>` → the domain ID you want VS Code to use  
+- `<cube-instance>` → usually `localhost`  
+- `<domain-id>` → Cube AI domain ID  
 
-> `verifySsl: false` should be used **only for local development**.
+⚠️ `verifySsl: false` is for local development only.
 
 ---
 
-## 4. Using Continue With Cube AI
-
-Once configured:
+## 5. Verify the Connection
 
-- Press **Ctrl + L** to open the Continue chat  
-- Ask questions or request explanations  
-- Use inline completions powered by Cube AI models  
+1. Open Continue chat using **Ctrl + L**
+2. Select a configured model
+3. Ask:
 
-Example prompts:
+```
+Explain what this project does
+```
 
-- “Explain this function”  
-- “Refactor this TypeScript file”  
-- “Write unit tests for this module”
+<!-- IMAGE: continue-chat-success -->
+<!-- Screenshot: Continue chat with response -->
 
 ---
 
-## 5. Troubleshooting
+## 6. Example Prompts
+
+- Explain this function
+- Refactor this file
+- Write unit tests
+- Summarize this folder
 
-### Connection issues
+---
 
-- Ensure Cube AI is running (`make up`)  
-- Verify that the domain exists  
-- Check that your access token is valid  
+## 7. Troubleshooting
 
-### SSL issues
+### Connection Issues
+- Ensure Cube AI is running
+- Verify domain ID
+- Check access token
 
-If you are running Cube AI locally without valid TLS certificates, set:
+### Unauthorized (401)
+- Token expired or invalid
 
+### SSL Errors
 ```yaml
 requestOptions:
   verifySsl: false
 ```
 
-For production deployments, always use valid TLS certificates.
+---
+
+## 8. Video Tutorial
+
+https://www.youtube.com/watch?v=BGpv_iTB2NE
 
 ---
 
-## 6. Video Tutorial
+## Next Steps
 
-A complete walkthrough is available here:  
-[https://www.youtube.com/watch?v=BGpv_iTB2NE](https://www.youtube.com/watch?v=BGpv_iTB2NE)
+- Embeddings & RAG
+- Models overview
+- API integrations