refactor: streamline RAG query execution in Hivemind agent by replaci… #49

amindadgar · 2025-09-30T12:06:44Z

…ng deprecated components with new QueryDataSources class and enhancing error handling

Summary by CodeRabbit

Refactor
- Streamlined data querying in Hivemind to call data sources directly, improving robustness and returning a safe fallback when no answer is available.
Chores
- Updated production and staging CI workflows to use a new reusable pipeline.
- Simplified dependencies by removing LangChain packages; functionality remains intact with an optional, no-op decorator when LangChain isn’t present.
Bug Fixes
- Improved error handling during data queries to prevent failures from interrupting the flow and ensure consistent responses.

…ng deprecated components with new QueryDataSources class and enhancing error handling

coderabbitai · 2025-09-30T12:06:51Z

Walkthrough

Updates CI workflows to use a new reusable template (ci2.yml), removes LangChain dependencies from requirements, refactors the Hivemind agent to call QueryDataSources directly instead of a LangChain-based agent, and makes the RAG tool’s LangChain decorator optional via runtime import with a no-op fallback.

Changes

Cohort / File(s)	Summary
CI workflow switch to ci2.yml `.github/workflows/production.yml`, `.github/workflows/start.staging.yml`	Both workflows change the reusable job reference from `ci.yml@main` to `ci2.yml@main`. No other YAML structure or secrets changed.
Dependencies cleanup `requirements.txt`	Removes `langchain` and `langchain-openai`. Reorders `tc-hivemind-backend` after `openai`.
Hivemind agent RAG flow refactor `tasks/hivemind/agent.py`	Replaces LangChain-based agent/tooling with direct `QueryDataSources` usage. Adds `asyncio.run` wrapper, error handling returning "NONE" on failure/empty, updates state assignment and retry_count, retains router stop behavior. Removes imports for LangChain agent components.
Optional LangChain decorator for RAG tool `tasks/hivemind/query_data_sources.py`	Removes static `from langchain.tools import tool`. Adds runtime import of `langchain.tools.tool` as `lc_tool`; falls back to a no-op decorator if unavailable. Applies `@lc_tool(return_direct=True)` to the RAG function.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Router as Router
  participant Agent as AgenticHivemindFlow
  participant QDS as QueryDataSources
  participant DS as Data Sources

  Router->>Agent: do_rag_query(state)
  Note over Agent: Build QueryDataSources with context
  Agent->>QDS: query(prompt, constraints) (async)
  QDS->>DS: fetch/search/aggregate
  DS-->>QDS: results or none
  QDS-->>Agent: answer or null
  alt answer available
    Agent->>Agent: set state.last_answer, inc retry_count
  else error/none
    Agent->>Agent: log error, set "NONE", inc retry_count
  end
  Agent-->>Router: "stop"

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: remove unused codes & making the tool pass the original rag query! #44 — Touches the same Hivemind RAG flow, adjusting agent/query integration.
fix: using langchain tool calling! #42 — Also modifies RAG integration points, contrasting LangChain usage vs. direct querying.
feat: added direct return for hivemind tool! #6 — Refactors query invocation toward QueryDataSources, similar control-path changes.

Poem

A hop, a skip, I trim my pack—
Out goes LangChain, lighter back.
I query streams with nimble paws,
Async whispers, no more flaws.
CI’s new trail, ci2 we tread—
Carrots cached, green lights ahead! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely and accurately describes the main change—refactoring the RAG query execution in the Hivemind agent by replacing deprecated components—without extraneous detail or vague wording.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/remove-rag-agent-call

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

Public repositories are always opted into early access features.
You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tasks/hivemind/agent.py (1)
163-169: Prefer logging.exception for exception logging.

When logging within an exception handler, use logging.exception instead of logging.error to automatically include the stack trace. This provides better debugging context without needing exc_info=True.

Apply this diff:
         except Exception as e:
-            logging.error(f"RAG query execution failed: {e}")
+            logging.exception("RAG query execution failed")
             answer = "NONE"
Note: The broad Exception catch is acceptable here since any failure should fall back to "NONE" per the existing error handling strategy.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 620bbcf and bb52bd3.

📒 Files selected for processing (5)

.github/workflows/production.yml (1 hunks)
.github/workflows/start.staging.yml (1 hunks)
requirements.txt (1 hunks)
tasks/hivemind/agent.py (2 hunks)
tasks/hivemind/query_data_sources.py (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-07-01T11:14:19.992Z

Learnt from: amindadgar
PR: TogetherCrew/agents-workflow#44
File: tasks/hivemind/agent.py:164-164
Timestamp: 2025-07-01T11:14:19.992Z
Learning: In RAG architectures using LangChain agents with custom RAG tools, the agent should pass queries unchanged to the RAG tool rather than processing them. The RAG tool itself (like `make_rag_tool` in tasks/hivemind/agent.py) handles all query processing, retrieval, and answer generation, while the agent serves as a simple intermediary. The system instruction "Pass the incoming queries without changing it" is appropriate for this separation of concerns.

Applied to files:

tasks/hivemind/agent.py

📚 Learning: 2025-02-26T15:41:54.631Z

Learnt from: amindadgar
PR: TogetherCrew/agents-workflow#2
File: tasks/hivemind/query_data_sources.py:27-31
Timestamp: 2025-02-26T15:41:54.631Z
Learning: In the agents-workflow project, "HivemindWorkflow" and "AgenticHivemindTemporalWorkflow" are two distinct workflows. The QueryDataSources class in tasks/hivemind/query_data_sources.py is designed to interact with "HivemindWorkflow", which is separate from "AgenticHivemindTemporalWorkflow" defined in tasks/agent.py.

Applied to files:

tasks/hivemind/agent.py

🧬 Code graph analysis (1)

tasks/hivemind/agent.py (2)

tasks/hivemind/query_data_sources.py (2)

QueryDataSources (17-82)

query (23-71)

tasks/agent.py (1)

run (219-227)

🪛 Ruff (0.13.1)

tasks/hivemind/query_data_sources.py

100-100: Do not catch blind exception: Exception

(BLE001)

tasks/hivemind/agent.py

167-167: Do not catch blind exception: Exception

(BLE001)

168-168: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: ci / test / Test

🔇 Additional comments (5)

requirements.txt (1)

1-9: LGTM! Dependency cleanup aligns with the refactoring.

The removal of langchain and langchain-openai dependencies is consistent with the PR's objective to decouple from LangChain-based tooling and adopt the new QueryDataSources class approach.

tasks/hivemind/query_data_sources.py (1)

98-106: Well-designed runtime import pattern for optional LangChain support.

The try-except block with a no-op fallback decorator elegantly handles the case where LangChain is not installed, allowing the RAG functionality to work without the dependency. This aligns perfectly with the PR's goal of making LangChain optional.

Regarding the static analysis hint about catching broad Exception at line 100: this is appropriate here since import failures can raise various exception types (ImportError, ModuleNotFoundError, etc.), and the fallback behavior is safe and intentional.

tasks/hivemind/agent.py (1)

157-161: Cleaner refactoring that removes LangChain complexity.

The direct instantiation of QueryDataSources and synchronous wrapper with asyncio.run (line 164) simplifies the RAG query flow by removing the LangChain agent/executor intermediary. This aligns well with the PR objectives.

.github/workflows/start.staging.yml (1)

9-9: LGTM! Consistent workflow template update.

This change mirrors the production workflow update to ci2.yml@main, ensuring consistency across staging and production environments.

.github/workflows/production.yml (1)

12-12: Confirm reusable workflow compatibility
ci2.yml in TogetherCrew/operations/.github/workflows exists and declares on: workflow_call, so the updated workflow reference is valid and compatible.

coderabbitai · 2025-09-30T12:32:32Z

tasks/hivemind/agent.py

+        query_data_sources = QueryDataSources(
+            community_id=self.community_id,
+            enable_answer_skipping=self.enable_answer_skipping,
+            workflow_id=self.workflow_id,
        )
-        agent = create_openai_functions_agent(llm, tools, prompt)

-        # Run the agent
-        agent_executor = AgentExecutor(
-            agent=agent,
-            tools=tools,
-            verbose=True,
-            return_intermediate_steps=False,
-            max_iterations=3,
-        )
+        try:
+            answer = asyncio.run(query_data_sources.query(self.state.user_query))
+            if answer is None:
+                answer = "NONE"
+        except Exception as e:
+            logging.error(f"RAG query execution failed: {e}")
+            answer = "NONE"

-        result = agent_executor.invoke({"input": self.state.user_query})
-        self.state.last_answer = result["output"]
+        self.state.last_answer = answer
        self.state.retry_count += 1

        return "stop"


⚠️ Potential issue | 🟠 Major

Type inconsistency: last_answer expects CrewOutput | None but receives str.

Line 171 assigns a string value (answer) to self.state.last_answer, but the field is typed as CrewOutput | None (line 19). This mismatch will cause type checking failures and potential runtime issues downstream where CrewOutput is expected.

Apply this diff to fix the type annotation:

class AgenticFlowState(BaseModel): user_query: str = "" retry_count: int = 0 - last_answer: CrewOutput | None = None + last_answer: CrewOutput | str | None = None state: str = "continue" chat_history: str | None = None

Alternatively, if last_answer should remain CrewOutput | None, wrap the string response in a CrewOutput object before assignment.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

query_data_sources = QueryDataSources(

community_id=self.community_id,

enable_answer_skipping=self.enable_answer_skipping,

workflow_id=self.workflow_id,

)

agent = create_openai_functions_agent(llm, tools, prompt)

# Run the agent

agent_executor = AgentExecutor(

agent=agent,

tools=tools,

verbose=True,

return_intermediate_steps=False,

max_iterations=3,

)

try:

answer = asyncio.run(query_data_sources.query(self.state.user_query))

if answer is None:

answer = "NONE"

except Exception as e:

logging.error(f"RAG query execution failed: {e}")

answer = "NONE"

result = agent_executor.invoke({"input": self.state.user_query})

self.state.last_answer = result["output"]

self.state.last_answer = answer

self.state.retry_count += 1

return "stop"

class AgenticFlowState(BaseModel):

user_query: str = ""

retry_count: int = 0

last_answer: CrewOutput | str | None = None

state: str = "continue"

chat_history: str | None = None

🧰 Tools

🪛 Ruff (0.13.1)

167-167: Do not catch blind exception: Exception

(BLE001)

168-168: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents

In tasks/hivemind/agent.py around lines 157 to 174, last_answer is being set to a plain string but its type is declared as CrewOutput | None (line 19); update the code so the assigned value matches the declared type by either (A) wrapping the string into a CrewOutput instance before assigning to self.state.last_answer (create a CrewOutput with the appropriate fields populated from the RAG response and use that), or (B) if the design intent is to allow plain strings, change the state type annotation for last_answer to str | CrewOutput | None (and update any downstream usages to handle the string case). Ensure you choose one approach and make corresponding downstream type/usage updates so type checking passes.

refactor: streamline RAG query execution in Hivemind agent by replaci…

de0eef5

…ng deprecated components with new QueryDataSources class and enhancing error handling

fix: removed using codeClimate!

bb52bd3

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

amindadgar merged commit 5262aa3 into main Oct 1, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: streamline RAG query execution in Hivemind agent by replaci… #49

refactor: streamline RAG query execution in Hivemind agent by replaci… #49

Uh oh!

amindadgar commented Sep 30, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 30, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor: streamline RAG query execution in Hivemind agent by replaci… #49

refactor: streamline RAG query execution in Hivemind agent by replaci… #49

Uh oh!

Conversation

amindadgar commented Sep 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amindadgar commented Sep 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 30, 2025 •

edited

Loading