Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThis PR splits and expands the single RAG guide into a multi-page suite (ingestion, retrieval, service, scheduled automation), updates navigation and quick-start links to surface the new RAG section, and revises the ingestion flow and UI screenshots. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client (UI/API)
participant Service as RAG Service
participant Embed as Embedding API
participant Vector as Vector DB
participant Store as Object Storage
rect rgba(200,230,255,0.5)
Client->>Service: POST /upload (file, metadata)
Service->>Store: Persist original file (optional)
Service->>Embed: Request embeddings for chunks
Embed-->>Service: Return embedding vectors
Service->>Vector: Upsert vectors into index
Vector-->>Service: Ack upsert
Service-->>Client: Upload response (status, ids)
end
rect rgba(200,255,200,0.5)
Client->>Service: POST /retrieve (query, params)
Service->>Embed: Embed query text
Embed-->>Service: Query vector
Service->>Vector: Search(top N, threshold)
Vector-->>Service: Retrieved chunks
Service->>Service: (optional) Rerank / filter results
Service-->>Client: Retrieved results (documents, metadata)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
en/docs/ai/schedule-rag-automation.md (1)
29-29: Amazon S3 is listed as a supported datasource but has no corresponding setup instructions.Step 2 only covers Google Drive. Users who select S3 will have no guidance.
Either add S3 configuration steps, or soften the claim to reflect that only Google Drive is currently documented:
✏️ Proposed interim fix
-The datasource specifies the location from which files will be ingested. Devant supports both Google Drive folders and Amazon S3 buckets as datasources. +The datasource specifies the location from which files will be ingested. Devant supports Google Drive folders and Amazon S3 buckets as datasources. The steps below walk through Google Drive; refer to the platform documentation for S3 configuration.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@en/docs/ai/schedule-rag-automation.md` at line 29, The docs claim that Devant supports Google Drive folders and Amazon S3 buckets but Step 2 only documents Google Drive; either add missing S3 setup instructions (how to create an S3 bucket, IAM policy/credentials, environment variables or config keys the app expects, and any required path/key format) and link them from the "datasource" paragraph, or change the "datasource specifies..." sentence to only mention Google Drive (e.g., "currently supports Google Drive folders") and note S3 is planned; update the "Step 2" heading to reflect the chosen change so S3 users are not left without guidance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@en/docs/ai/rag-ingestion.md`:
- Line 13: Fix the capitalization and align the supported file-type list with
rag-service.md: change "PDFS" to "PDFs" and expand the list in the sentence that
starts with "Devant RAG ingestion has support for multiple file types" to
include PDF (including scanned), DOCX, PPTX, XLSX, HTML, CSV, MD, images, and
audio formats MP3, WAV, OGG, M4A, FLAC so the documentation in rag-ingestion.md
matches rag-service.md.
In `@en/docs/ai/rag-retrieval.md`:
- Line 50: Change the lowercase phrase "rag retrieval" to capitalized "RAG
retrieval" in the sentence "After completing the RAG ingestion process, you can
also implement a rag retrieval to connect your vector database with user queries
and generate responses using the WSO2 Integrator: BI." so the term matches the
acronym formatting; update the instance in the file content where "rag
retrieval" appears.
- Around line 45-46: Replace the incorrect admonition token "!!!info" with the
correct Markdown admonition syntax "!!! info" in the document (look for the
string "!!!info" in the content snippet) so the block renders properly; update
any identical occurrences in this file (and optionally mirror the same fix in
rag-service.md) to maintain consistent admonition formatting.
- Around line 6-9: Fix the prose and formatting in the RAG retrieval doc: change
the note marker from "!!!note" to "!!! note", correct "in to" to "into", change
"havent" to "haven't", and remove the duplicated phrase so "navigate to your
organization using the **Organization** dropdown in the in the top left of the
Devant console header" becomes "navigate to your organization using the
**Organization** dropdown in the top left of the Devant console header"; update
the note block and the navigation sentence accordingly in the rag-retrieval.md
content.
In `@en/docs/ai/rag-service.md`:
- Line 153: Fix the typo in the sentence containing the link text "[RAG
Ingesion]" by changing "RAG Ingesion" to "RAG Ingestion" so the sentence reads
"For more details on RAG ingestion and retrieval, and how to obtain API keys and
credentials refer the [RAG Ingestion](rag-ingestion.md) guide." Ensure only the
displayed link text is corrected without altering the target URL
"rag-ingestion.md".
- Around line 32-33: The admonition uses incorrect syntax "!!!note" which won't
be recognized; update the string in en/docs/ai/rag-service.md to include the
required space so it reads "!!! note" (match the correct usage already present
elsewhere in the file) to ensure Python-Markdown's admonition extension renders
the note block properly.
- Line 26: Fix the spelling mistake in the documentation sentence that currently
reads "On the devolopment environment card…" by changing "devolopment" to
"development" so the sentence becomes "On the development environment card,
click **Test** to open the OpenAPI Console…"; update this text in the
en/docs/ai/rag-service.md content where that sentence appears.
In `@en/docs/ai/schedule-rag-automation.md`:
- Line 36: Update the sentence about creating an API key to clarify that
standard Google Cloud API keys cannot access private Drive folders and to
specify what credential format the product's "API Key" field accepts (e.g.,
service account JSON, OAuth 2.0 client/refresh token, or domain-wide delegated
service account) and then replace or augment the existing link with direct links
to the appropriate workflows for creating service account credentials and OAuth
2.0 credentials (Google Drive service account JSON setup and OAuth
consent/refresh token guides) so users follow the correct credential creation
path for private Drive access.
---
Nitpick comments:
In `@en/docs/ai/schedule-rag-automation.md`:
- Line 29: The docs claim that Devant supports Google Drive folders and Amazon
S3 buckets but Step 2 only documents Google Drive; either add missing S3 setup
instructions (how to create an S3 bucket, IAM policy/credentials, environment
variables or config keys the app expects, and any required path/key format) and
link them from the "datasource" paragraph, or change the "datasource
specifies..." sentence to only mention Google Drive (e.g., "currently supports
Google Drive folders") and note S3 is planned; update the "Step 2" heading to
reflect the chosen change so S3 users are not left without guidance.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
en/docs/ai/rag-service.md (1)
27-30:Try it outshould be bolded to match the UI-element formatting convention used elsewhere.Every other interactive UI label in this list (
**Test**,**Execute**) is bolded;Try it outon Line 28 is the odd one out.✏️ Proposed fix
-3. Click Try it out to enable it. +3. Click **Try it out** to enable it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@en/docs/ai/rag-service.md` around lines 27 - 30, Make the UI label "Try it out" bold to match the existing convention used for other interactive labels; locate the step text containing the plain "Try it out" in the steps list (the same block that contains "**Test**" and "**Execute**") and change it so the label is wrapped in markdown bold markers (i.e., make it "**Try it out**") to keep formatting consistent with the other UI elements.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@en/docs/ai/rag-retrieval.md`:
- Line 19: The sentence "Enter the **Collection Name**. The collection will be
automatically created if it does not exist." in step 3 is misleading for
retrieval-only flows; remove the second sentence and replace it with a short
note that the named collection must already exist and contain indexed documents
(e.g., "Enter the **Collection Name**. Note: for retrieval, the collection must
already exist and be populated with indexed documents; an empty or non-existent
collection will return no results."). Update the text in the
en/docs/ai/rag-retrieval.md step 3 accordingly.
In `@en/docs/ai/schedule-rag-automation.md`:
- Around line 29-43: The docs currently say both Google Drive and Amazon S3 are
supported but only show Google Drive steps; add a parallel "Amazon S3"
subsection under the datasource selection that lists the required fields (Bucket
name/URL, Region, Access Key ID, Secret Access Key, optional Endpoint for
S3-compatible providers, and any IAM role or public/private access notes),
provide brief guidance on where to find the bucket name and region, and either
include example steps for entering these values in the UI or add a cross-link to
existing S3 configuration documentation if one exists (mirror the structure used
in the Google Drive steps and include an info block for important notes such as
bucket permissions).
---
Duplicate comments:
In `@en/docs/ai/rag-retrieval.md`:
- Around line 45-46: The admonition is written as '!!!info' which won’t render;
change the token to include a space — use '!!! info' (and keep the same
indentation level) so Python-Markdown's admonition extension recognizes it and
renders the reranking note as a styled info block.
- Around line 6-7: Fix the grammar in the note sentence inside the RAG retrieval
guide: change the line that currently reads "If you haven't already follow the
[Ingestion](rag-ingestion.md) guide on how you can do that." to a corrected,
concise phrasing such as "If you haven't already, follow the
[Ingestion](rag-ingestion.md) guide to ingest files into your vector store."
Ensure the comma after "already" is added and the trailing informal clause is
replaced with the clearer instruction; update the note block where this sentence
appears.
In `@en/docs/ai/rag-service.md`:
- Around line 152-153: Fix the ungrammatical sentence in the doc: replace the
line that currently reads "For more details on RAG ingestion and retrieval, and
how to obtain API keys and credentials refer the [RAG
Ingestion](rag-ingestion.md) guide." with a corrected version that inserts the
missing comma and corrects "refer the" to "refer to the", e.g. "For more details
on RAG ingestion and retrieval, and how to obtain API keys and credentials,
refer to the [RAG Ingestion](rag-ingestion.md) guide." Ensure you update the
exact sentence text in the en/docs/ai/rag-service.md content.
In `@en/docs/ai/schedule-rag-automation.md`:
- Around line 35-38: Update the guidance in the "schedule-rag-automation.md" doc
so the public-folder limitation is explicit at the top of Step 2 rather than
only inside the collapsible info block: move or duplicate the sentence "The
target folder must be public ('Anyone with the link'), as API keys cannot access
private files." to the start of Step 2 and add a short sentence naming supported
alternatives (OAuth 2.0 or service account credentials) for users who need
private-folder access or note that private access is not supported if that is
intentional; modify the existing info block (the "Info" block shown) to
reference the top-level note and/or provide the OAuth/service-account options.
---
Nitpick comments:
In `@en/docs/ai/rag-service.md`:
- Around line 27-30: Make the UI label "Try it out" bold to match the existing
convention used for other interactive labels; locate the step text containing
the plain "Try it out" in the steps list (the same block that contains
"**Test**" and "**Execute**") and change it so the label is wrapped in markdown
bold markers (i.e., make it "**Try it out**") to keep formatting consistent with
the other UI elements.
| The datasource specifies the location from which files will be ingested. Devant supports both Google Drive folders and Amazon S3 buckets as datasources. | ||
|
|
||
| 1. Select `Google Drive` as the datasource. | ||
|
|
||
| 2. Enter the **API Key** in the API key field. | ||
|
|
||
| ???+ info "Info" | ||
| To obtain a key, use the [Google Cloud Console](https://console.cloud.google.com/) to create a project and generate a key as described in the [Google Documentation](https://cloud.google.com/docs/authentication/api-keys#create) and restrict it to the **Google Drive API**. | ||
|
|
||
| **Note:** The target folder must be **public** ("Anyone with the link"), as API keys cannot access private files. | ||
|
|
||
| 3. Provide the **Folder ID** of the Google Drive folder containing the files to be ingested. | ||
|
|
||
| ???+ info "Info" | ||
| The Google Drive folder ID can be found in the URL when viewing the folder in Google Drive. It is the string that appears after `/folders/` in the URL. |
There was a problem hiding this comment.
S3 datasource is mentioned as supported but its configuration steps are not documented.
Line 29 explicitly states both Google Drive and Amazon S3 are supported, yet the guide only walks through Google Drive. Users choosing Amazon S3 as the datasource have no instructions for providing the bucket URL, region, access key, or secret key.
Either document the S3 configuration in a sub-section (parallel to the Google Drive sub-section), or note that S3 documentation is covered elsewhere and link to it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@en/docs/ai/schedule-rag-automation.md` around lines 29 - 43, The docs
currently say both Google Drive and Amazon S3 are supported but only show Google
Drive steps; add a parallel "Amazon S3" subsection under the datasource
selection that lists the required fields (Bucket name/URL, Region, Access Key
ID, Secret Access Key, optional Endpoint for S3-compatible providers, and any
IAM role or public/private access notes), provide brief guidance on where to
find the bucket name and region, and either include example steps for entering
these values in the UI or add a cross-link to existing S3 configuration
documentation if one exists (mirror the structure used in the Google Drive steps
and include an info block for important notes such as bucket permissions).
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@en/docs/ai/rag-retrieval.md`:
- Line 19: Fix the typo in the sentence "Enter the **Collection Name** from
which you want to retrive data." by changing "retrive" to "retrieve" so the line
reads "Enter the **Collection Name** from which you want to retrieve data.";
update that exact string in en/docs/ai/rag-retrieval.md.
- Line 24: Replace the incorrect brand spacing in the sentence "Select
`text-embedding-ada-002` embedding model from the **Open AI** dropdown." by
changing "**Open AI**" to "**OpenAI**" so the brand name is a single word
(consistent with the other occurrence on line 28); update the phrase in the same
string or markdown element where that sentence appears (search for
"text-embedding-ada-002" to locate it).
---
Duplicate comments:
In `@en/docs/ai/rag-retrieval.md`:
- Line 7: Update the sentence "This guide assumes you have already ingested
files into your vector store. If you haven't already follow the
[Ingestion](rag-ingestion.md) guide on how you can do that." by inserting a
comma after "already" so it reads "If you haven't already, follow the
[Ingestion](rag-ingestion.md) guide..." — locate the sentence in
en/docs/ai/rag-retrieval.md and apply the punctuation fix to the second
sentence.
Description
This PR introduces several improvements and updates to the RAG (Retrieval-Augmented Generation) documentation, including:
Documentation Restructuring and Navigation Updates:
rag-application.mdinto four focused guides:rag-ingestion.md,rag-retrieval.md,schedule-rag-automation.md, andrag-service.md, each covering a specific aspect of RAG workflows.mkdocs.ymlto group RAG documentation under a new "RAG" section, with clear links to Ingestion, Retrieval, Schedule Automation, and Service guides.RAG Ingestion Guide Improvements:
New Guides and Features:
Resolves issue #2201, #2060
Summary by CodeRabbit