Add RAG Documentation by SashenkaG · Pull Request #161 · wso2/docs-devant

SashenkaG · 2026-02-18T09:48:15Z

Description

This PR introduces several improvements and updates to the RAG (Retrieval-Augmented Generation) documentation, including:

Documentation Restructuring and Navigation Updates:

Split the original rag-application.md into four focused guides: rag-ingestion.md, rag-retrieval.md, schedule-rag-automation.md, and rag-service.md, each covering a specific aspect of RAG workflows.
Updated the sidebar navigation in mkdocs.yml to group RAG documentation under a new "RAG" section, with clear links to Ingestion, Retrieval, Schedule Automation, and Service guides.
Changed the homepage AI Agents section to link to the new "RAG Ingestion" guide instead of the old "RAG Application" guide.

RAG Ingestion Guide Improvements:

Expanded the ingestion guide with clearer instructions, improved UI navigation directions and included updates GIFs and screenshots.

New Guides and Features:

Added a new "RAG Retrieval" guide with instructions on retrieving ingested data, including configuration, querying, and reranking details.
Added a new "Schedule RAG Automation" guide describing how to set up scheduled ingestion from external datasources (Google Drive, S3).
Added a new "RAG Service" guide detailing the available REST API endpoints for programmatic ingestion, retrieval, chunking, and health checks, with example requests and responses.

Resolves issue #2201, #2060

Summary by CodeRabbit

Documentation
- Restructured RAG docs with clearer, reworded step flows, standardized "RAG ingestion" terminology, new "Choose ingestion mode" workflow (Upload Now or Schedule), and renamed/reflowed step headers.
- Expanded supported source types (PDFs, scanned PDFs, Office files, HTML/MD, images, audio, CSV, etc.).
- Added new guides for RAG retrieval, RAG service endpoints, and scheduled RAG automation (Google Drive example), plus updated site navigation and AI Quick Start links with new visuals and video note.

…on documentation

coderabbitai · 2026-02-18T09:48:37Z

Warning

Rate limit exceeded

@SashenkaG has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 22 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

This PR splits and expands the single RAG guide into a multi-page suite (ingestion, retrieval, service, scheduled automation), updates navigation and quick-start links to surface the new RAG section, and revises the ingestion flow and UI screenshots.

Changes

Cohort / File(s)	Summary
RAG docs (rework + additions) `en/docs/ai/rag-ingestion.md`, `en/docs/ai/rag-retrieval.md`, `en/docs/ai/rag-service.md`, `en/docs/ai/schedule-rag-automation.md`	Reworked ingestion guide (terminology, reordered steps, new "Choose ingestion mode", added/updated images/gifs, video note). Added new pages: retrieval (Pinecone-backed flow, query params, reranking note), service (API endpoints: `/upload`, `/retrieve`, `/chunks`, `/health` with payload examples), and scheduled automation (create automation, datasource config, scheduling, logs).
Navigation config `en/mkdocs.yml`	Replaced single "RAG Application" entry with a new "RAG" group containing Ingestion, Retrieval, Schedule Automation, and Service pages.
Quick Start / homepage `en/theme/material/templates/home-page.html`	Updated AI/Quick Start link and label from "RAG Application" to "RAG Ingestion" and pointed anchor to `rag-ingestion`.
Media & UI copies referenced `en/docs/ai/rag-ingestion.md` (images: `rag-ingestion1.gif`, `rag-ingestion2.gif`, `rag-ingestion3.gif`)	Added/repositioned animated images and a video-tutorial reference; removed the prior embedded RAG retrieval section and replaced with notes linking to the new retrieval page.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client (UI/API)
    participant Service as RAG Service
    participant Embed as Embedding API
    participant Vector as Vector DB
    participant Store as Object Storage

    rect rgba(200,230,255,0.5)
    Client->>Service: POST /upload (file, metadata)
    Service->>Store: Persist original file (optional)
    Service->>Embed: Request embeddings for chunks
    Embed-->>Service: Return embedding vectors
    Service->>Vector: Upsert vectors into index
    Vector-->>Service: Ack upsert
    Service-->>Client: Upload response (status, ids)
    end

    rect rgba(200,255,200,0.5)
    Client->>Service: POST /retrieve (query, params)
    Service->>Embed: Embed query text
    Embed-->>Service: Query vector
    Service->>Vector: Search(top N, threshold)
    Vector-->>Service: Retrieved chunks
    Service->>Service: (optional) Rerank / filter results
    Service-->>Client: Retrieved results (documents, metadata)
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibbled lines and rearranged the trail,
Four little paths now guide the curious snail.
Ingest, retrieve, service, and a scheduler chime —
I hopped, I smiled, I organized in time. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description covers the key changes (documentation restructuring, four new guides, navigation updates) and resolves linked issues, but diverges significantly from the repository's structured template covering Purpose, Goals, Approach, Release notes, Documentation impact, and other formal sections.	Align the description with the template structure by adding formal sections for Purpose, Goals, Approach, Release notes, Documentation, and other relevant sections to meet repository standards.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add RAG Documentation' is concise and clearly describes the main change: introducing/adding new RAG documentation, which aligns with the PR's primary objective of restructuring and expanding RAG guides.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (1)

en/docs/ai/schedule-rag-automation.md (1)

29-29: Amazon S3 is listed as a supported datasource but has no corresponding setup instructions.

Step 2 only covers Google Drive. Users who select S3 will have no guidance.

Either add S3 configuration steps, or soften the claim to reflect that only Google Drive is currently documented:

✏️ Proposed interim fix

-The datasource specifies the location from which files will be ingested. Devant supports both Google Drive folders and Amazon S3 buckets as datasources.
+The datasource specifies the location from which files will be ingested. Devant supports Google Drive folders and Amazon S3 buckets as datasources. The steps below walk through Google Drive; refer to the platform documentation for S3 configuration.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@en/docs/ai/schedule-rag-automation.md` at line 29, The docs claim that Devant
supports Google Drive folders and Amazon S3 buckets but Step 2 only documents
Google Drive; either add missing S3 setup instructions (how to create an S3
bucket, IAM policy/credentials, environment variables or config keys the app
expects, and any required path/key format) and link them from the "datasource"
paragraph, or change the "datasource specifies..." sentence to only mention
Google Drive (e.g., "currently supports Google Drive folders") and note S3 is
planned; update the "Step 2" heading to reflect the chosen change so S3 users
are not left without guidance.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@en/docs/ai/rag-ingestion.md`:
- Line 13: Fix the capitalization and align the supported file-type list with
rag-service.md: change "PDFS" to "PDFs" and expand the list in the sentence that
starts with "Devant RAG ingestion has support for multiple file types" to
include PDF (including scanned), DOCX, PPTX, XLSX, HTML, CSV, MD, images, and
audio formats MP3, WAV, OGG, M4A, FLAC so the documentation in rag-ingestion.md
matches rag-service.md.

In `@en/docs/ai/rag-retrieval.md`:
- Line 50: Change the lowercase phrase "rag retrieval" to capitalized "RAG
retrieval" in the sentence "After completing the RAG ingestion process, you can
also implement a rag retrieval to connect your vector database with user queries
and generate responses using the WSO2 Integrator: BI." so the term matches the
acronym formatting; update the instance in the file content where "rag
retrieval" appears.
- Around line 45-46: Replace the incorrect admonition token "!!!info" with the
correct Markdown admonition syntax "!!! info" in the document (look for the
string "!!!info" in the content snippet) so the block renders properly; update
any identical occurrences in this file (and optionally mirror the same fix in
rag-service.md) to maintain consistent admonition formatting.
- Around line 6-9: Fix the prose and formatting in the RAG retrieval doc: change
the note marker from "!!!note" to "!!! note", correct "in to" to "into", change
"havent" to "haven't", and remove the duplicated phrase so "navigate to your
organization using the **Organization** dropdown in the in the top left of the
Devant console header" becomes "navigate to your organization using the
**Organization** dropdown in the top left of the Devant console header"; update
the note block and the navigation sentence accordingly in the rag-retrieval.md
content.

In `@en/docs/ai/rag-service.md`:
- Line 153: Fix the typo in the sentence containing the link text "[RAG
Ingesion]" by changing "RAG Ingesion" to "RAG Ingestion" so the sentence reads
"For more details on RAG ingestion and retrieval, and how to obtain API keys and
credentials refer the [RAG Ingestion](rag-ingestion.md) guide." Ensure only the
displayed link text is corrected without altering the target URL
"rag-ingestion.md".
- Around line 32-33: The admonition uses incorrect syntax "!!!note" which won't
be recognized; update the string in en/docs/ai/rag-service.md to include the
required space so it reads "!!! note" (match the correct usage already present
elsewhere in the file) to ensure Python-Markdown's admonition extension renders
the note block properly.
- Line 26: Fix the spelling mistake in the documentation sentence that currently
reads "On the devolopment environment card…" by changing "devolopment" to
"development" so the sentence becomes "On the development environment card,
click **Test** to open the OpenAPI Console…"; update this text in the
en/docs/ai/rag-service.md content where that sentence appears.

In `@en/docs/ai/schedule-rag-automation.md`:
- Line 36: Update the sentence about creating an API key to clarify that
standard Google Cloud API keys cannot access private Drive folders and to
specify what credential format the product's "API Key" field accepts (e.g.,
service account JSON, OAuth 2.0 client/refresh token, or domain-wide delegated
service account) and then replace or augment the existing link with direct links
to the appropriate workflows for creating service account credentials and OAuth
2.0 credentials (Google Drive service account JSON setup and OAuth
consent/refresh token guides) so users follow the correct credential creation
path for private Drive access.

---

Nitpick comments:
In `@en/docs/ai/schedule-rag-automation.md`:
- Line 29: The docs claim that Devant supports Google Drive folders and Amazon
S3 buckets but Step 2 only documents Google Drive; either add missing S3 setup
instructions (how to create an S3 bucket, IAM policy/credentials, environment
variables or config keys the app expects, and any required path/key format) and
link them from the "datasource" paragraph, or change the "datasource
specifies..." sentence to only mention Google Drive (e.g., "currently supports
Google Drive folders") and note S3 is planned; update the "Step 2" heading to
reflect the chosen change so S3 users are not left without guidance.

en/docs/ai/rag-ingestion.md

en/docs/ai/rag-retrieval.md

en/docs/ai/rag-service.md

en/docs/ai/schedule-rag-automation.md

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

en/docs/ai/rag-service.md (1)
27-30: Try it out should be bolded to match the UI-element formatting convention used elsewhere.

Every other interactive UI label in this list (**Test**, **Execute**) is bolded; Try it out on Line 28 is the odd one out.
✏️ Proposed fix
-3. Click Try it out to enable it.
+3. Click **Try it out** to enable it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@en/docs/ai/rag-service.md` around lines 27 - 30, Make the UI label "Try it
out" bold to match the existing convention used for other interactive labels;
locate the step text containing the plain "Try it out" in the steps list (the
same block that contains "**Test**" and "**Execute**") and change it so the
label is wrapped in markdown bold markers (i.e., make it "**Try it out**") to
keep formatting consistent with the other UI elements.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@en/docs/ai/rag-retrieval.md`:
- Line 19: The sentence "Enter the **Collection Name**. The collection will be
automatically created if it does not exist." in step 3 is misleading for
retrieval-only flows; remove the second sentence and replace it with a short
note that the named collection must already exist and contain indexed documents
(e.g., "Enter the **Collection Name**. Note: for retrieval, the collection must
already exist and be populated with indexed documents; an empty or non-existent
collection will return no results."). Update the text in the
en/docs/ai/rag-retrieval.md step 3 accordingly.

In `@en/docs/ai/schedule-rag-automation.md`:
- Around line 29-43: The docs currently say both Google Drive and Amazon S3 are
supported but only show Google Drive steps; add a parallel "Amazon S3"
subsection under the datasource selection that lists the required fields (Bucket
name/URL, Region, Access Key ID, Secret Access Key, optional Endpoint for
S3-compatible providers, and any IAM role or public/private access notes),
provide brief guidance on where to find the bucket name and region, and either
include example steps for entering these values in the UI or add a cross-link to
existing S3 configuration documentation if one exists (mirror the structure used
in the Google Drive steps and include an info block for important notes such as
bucket permissions).

---

Duplicate comments:
In `@en/docs/ai/rag-retrieval.md`:
- Around line 45-46: The admonition is written as '!!!info' which won’t render;
change the token to include a space — use '!!! info' (and keep the same
indentation level) so Python-Markdown's admonition extension recognizes it and
renders the reranking note as a styled info block.
- Around line 6-7: Fix the grammar in the note sentence inside the RAG retrieval
guide: change the line that currently reads "If you haven't already follow the
[Ingestion](rag-ingestion.md) guide on how you can do that." to a corrected,
concise phrasing such as "If you haven't already, follow the
[Ingestion](rag-ingestion.md) guide to ingest files into your vector store."
Ensure the comma after "already" is added and the trailing informal clause is
replaced with the clearer instruction; update the note block where this sentence
appears.

In `@en/docs/ai/rag-service.md`:
- Around line 152-153: Fix the ungrammatical sentence in the doc: replace the
line that currently reads "For more details on RAG ingestion and retrieval, and
how to obtain API keys and credentials refer the [RAG
Ingestion](rag-ingestion.md) guide." with a corrected version that inserts the
missing comma and corrects "refer the" to "refer to the", e.g. "For more details
on RAG ingestion and retrieval, and how to obtain API keys and credentials,
refer to the [RAG Ingestion](rag-ingestion.md) guide." Ensure you update the
exact sentence text in the en/docs/ai/rag-service.md content.

In `@en/docs/ai/schedule-rag-automation.md`:
- Around line 35-38: Update the guidance in the "schedule-rag-automation.md" doc
so the public-folder limitation is explicit at the top of Step 2 rather than
only inside the collapsible info block: move or duplicate the sentence "The
target folder must be public ('Anyone with the link'), as API keys cannot access
private files." to the start of Step 2 and add a short sentence naming supported
alternatives (OAuth 2.0 or service account credentials) for users who need
private-folder access or note that private access is not supported if that is
intentional; modify the existing info block (the "Info" block shown) to
reference the top-level note and/or provide the OAuth/service-account options.

---

Nitpick comments:
In `@en/docs/ai/rag-service.md`:
- Around line 27-30: Make the UI label "Try it out" bold to match the existing
convention used for other interactive labels; locate the step text containing
the plain "Try it out" in the steps list (the same block that contains
"**Test**" and "**Execute**") and change it so the label is wrapped in markdown
bold markers (i.e., make it "**Try it out**") to keep formatting consistent with
the other UI elements.

en/docs/ai/rag-retrieval.md

coderabbitai · 2026-02-18T12:23:13Z

en/docs/ai/schedule-rag-automation.md

+The datasource specifies the location from which files will be ingested. Devant supports both Google Drive folders and Amazon S3 buckets as datasources.
+
+1. Select `Google Drive` as the datasource.
+
+2. Enter the **API Key** in the API key field.
+
+    ???+ info "Info"
+        To obtain a key, use the [Google Cloud Console](https://console.cloud.google.com/) to create a project and generate a key as described in the [Google Documentation](https://cloud.google.com/docs/authentication/api-keys#create) and restrict it to the **Google Drive API**.
+
+        **Note:** The target folder must be **public** ("Anyone with the link"), as API keys cannot access private files.
+
+3. Provide the **Folder ID** of the Google Drive folder containing the files to be ingested.
+
+    ???+ info "Info"
+        The Google Drive folder ID can be found in the URL when viewing the folder in Google Drive. It is the string that appears after `/folders/` in the URL.


⚠️ Potential issue | 🟡 Minor

S3 datasource is mentioned as supported but its configuration steps are not documented.

Line 29 explicitly states both Google Drive and Amazon S3 are supported, yet the guide only walks through Google Drive. Users choosing Amazon S3 as the datasource have no instructions for providing the bucket URL, region, access key, or secret key.

Either document the S3 configuration in a sub-section (parallel to the Google Drive sub-section), or note that S3 documentation is covered elsewhere and link to it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@en/docs/ai/schedule-rag-automation.md` around lines 29 - 43, The docs currently say both Google Drive and Amazon S3 are supported but only show Google Drive steps; add a parallel "Amazon S3" subsection under the datasource selection that lists the required fields (Bucket name/URL, Region, Access Key ID, Secret Access Key, optional Endpoint for S3-compatible providers, and any IAM role or public/private access notes), provide brief guidance on where to find the bucket name and region, and either include example steps for entering these values in the UI or add a cross-link to existing S3 configuration documentation if one exists (mirror the structure used in the Google Drive steps and include an info block for important notes such as bucket permissions).

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@en/docs/ai/rag-retrieval.md`:
- Line 19: Fix the typo in the sentence "Enter the **Collection Name** from
which you want to retrive data." by changing "retrive" to "retrieve" so the line
reads "Enter the **Collection Name** from which you want to retrieve data.";
update that exact string in en/docs/ai/rag-retrieval.md.
- Line 24: Replace the incorrect brand spacing in the sentence "Select
`text-embedding-ada-002` embedding model from the **Open AI** dropdown." by
changing "**Open AI**" to "**OpenAI**" so the brand name is a single word
(consistent with the other occurrence on line 28); update the phrase in the same
string or markdown element where that sentence appears (search for
"text-embedding-ada-002" to locate it).

---

Duplicate comments:
In `@en/docs/ai/rag-retrieval.md`:
- Line 7: Update the sentence "This guide assumes you have already ingested
files into your vector store. If you haven't already follow the
[Ingestion](rag-ingestion.md) guide on how you can do that." by inserting a
comma after "already" so it reads "If you haven't already, follow the
[Ingestion](rag-ingestion.md) guide..." — locate the sentence in
en/docs/ai/rag-retrieval.md and apply the punctuation fix to the second
sentence.

en/docs/ai/rag-retrieval.md

SashenkaG added 5 commits February 16, 2026 09:47

Merge upstream/main

cbcdf02

Merge remote-tracking branch 'upstream/main'

eb3e6b3

Merge remote-tracking branch 'upstream/main'

985dc35

Add RAG ingestion, retrieval service and scheduled ingestion automati…

50d9d6d

…on documentation

Refactor RAG section in navigation and update link

c82260f

SashenkaG requested a review from pcnfernando as a code owner February 18, 2026 09:48

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

Resolve CodeRabbit reviews

99dc05e

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

Clarify instructions in RAG retrieval docs

f2abae4

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

en/docs/ai/rag-retrieval.md Outdated Show resolved Hide resolved

en/docs/ai/rag-retrieval.md Outdated Show resolved Hide resolved

Fix typos

b641d82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RAG Documentation#161

Add RAG Documentation#161
SashenkaG wants to merge 8 commits intowso2:mainfrom
SashenkaG:main

SashenkaG commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

Rate limit exceeded

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Feb 18, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

SashenkaG commented Feb 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

SashenkaG commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading