Skip to content

Conversation

@amindadgar
Copy link
Member

@amindadgar amindadgar commented Sep 30, 2025

…ng deprecated components with new QueryDataSources class and enhancing error handling

Summary by CodeRabbit

  • Refactor

    • Streamlined data querying in Hivemind to call data sources directly, improving robustness and returning a safe fallback when no answer is available.
  • Chores

    • Updated production and staging CI workflows to use a new reusable pipeline.
    • Simplified dependencies by removing LangChain packages; functionality remains intact with an optional, no-op decorator when LangChain isn’t present.
  • Bug Fixes

    • Improved error handling during data queries to prevent failures from interrupting the flow and ensure consistent responses.

…ng deprecated components with new QueryDataSources class and enhancing error handling
@coderabbitai
Copy link

coderabbitai bot commented Sep 30, 2025

Walkthrough

Updates CI workflows to use a new reusable template (ci2.yml), removes LangChain dependencies from requirements, refactors the Hivemind agent to call QueryDataSources directly instead of a LangChain-based agent, and makes the RAG tool’s LangChain decorator optional via runtime import with a no-op fallback.

Changes

Cohort / File(s) Summary
CI workflow switch to ci2.yml
.github/workflows/production.yml, .github/workflows/start.staging.yml
Both workflows change the reusable job reference from ci.yml@main to ci2.yml@main. No other YAML structure or secrets changed.
Dependencies cleanup
requirements.txt
Removes langchain and langchain-openai. Reorders tc-hivemind-backend after openai.
Hivemind agent RAG flow refactor
tasks/hivemind/agent.py
Replaces LangChain-based agent/tooling with direct QueryDataSources usage. Adds asyncio.run wrapper, error handling returning "NONE" on failure/empty, updates state assignment and retry_count, retains router stop behavior. Removes imports for LangChain agent components.
Optional LangChain decorator for RAG tool
tasks/hivemind/query_data_sources.py
Removes static from langchain.tools import tool. Adds runtime import of langchain.tools.tool as lc_tool; falls back to a no-op decorator if unavailable. Applies @lc_tool(return_direct=True) to the RAG function.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Router as Router
  participant Agent as AgenticHivemindFlow
  participant QDS as QueryDataSources
  participant DS as Data Sources

  Router->>Agent: do_rag_query(state)
  Note over Agent: Build QueryDataSources with context
  Agent->>QDS: query(prompt, constraints) (async)
  QDS->>DS: fetch/search/aggregate
  DS-->>QDS: results or none
  QDS-->>Agent: answer or null
  alt answer available
    Agent->>Agent: set state.last_answer, inc retry_count
  else error/none
    Agent->>Agent: log error, set "NONE", inc retry_count
  end
  Agent-->>Router: "stop"
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A hop, a skip, I trim my pack—
Out goes LangChain, lighter back.
I query streams with nimble paws,
Async whispers, no more flaws.
CI’s new trail, ci2 we tread—
Carrots cached, green lights ahead! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely and accurately describes the main change—refactoring the RAG query execution in the Hivemind agent by replacing deprecated components—without extraneous detail or vague wording.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/remove-rag-agent-call

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tasks/hivemind/agent.py (1)

163-169: Prefer logging.exception for exception logging.

When logging within an exception handler, use logging.exception instead of logging.error to automatically include the stack trace. This provides better debugging context without needing exc_info=True.

Apply this diff:

         except Exception as e:
-            logging.error(f"RAG query execution failed: {e}")
+            logging.exception("RAG query execution failed")
             answer = "NONE"

Note: The broad Exception catch is acceptable here since any failure should fall back to "NONE" per the existing error handling strategy.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 620bbcf and bb52bd3.

📒 Files selected for processing (5)
  • .github/workflows/production.yml (1 hunks)
  • .github/workflows/start.staging.yml (1 hunks)
  • requirements.txt (1 hunks)
  • tasks/hivemind/agent.py (2 hunks)
  • tasks/hivemind/query_data_sources.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-01T11:14:19.992Z
Learnt from: amindadgar
PR: TogetherCrew/agents-workflow#44
File: tasks/hivemind/agent.py:164-164
Timestamp: 2025-07-01T11:14:19.992Z
Learning: In RAG architectures using LangChain agents with custom RAG tools, the agent should pass queries unchanged to the RAG tool rather than processing them. The RAG tool itself (like `make_rag_tool` in tasks/hivemind/agent.py) handles all query processing, retrieval, and answer generation, while the agent serves as a simple intermediary. The system instruction "Pass the incoming queries without changing it" is appropriate for this separation of concerns.

Applied to files:

  • tasks/hivemind/agent.py
📚 Learning: 2025-02-26T15:41:54.631Z
Learnt from: amindadgar
PR: TogetherCrew/agents-workflow#2
File: tasks/hivemind/query_data_sources.py:27-31
Timestamp: 2025-02-26T15:41:54.631Z
Learning: In the agents-workflow project, "HivemindWorkflow" and "AgenticHivemindTemporalWorkflow" are two distinct workflows. The QueryDataSources class in tasks/hivemind/query_data_sources.py is designed to interact with "HivemindWorkflow", which is separate from "AgenticHivemindTemporalWorkflow" defined in tasks/agent.py.

Applied to files:

  • tasks/hivemind/agent.py
🧬 Code graph analysis (1)
tasks/hivemind/agent.py (2)
tasks/hivemind/query_data_sources.py (2)
  • QueryDataSources (17-82)
  • query (23-71)
tasks/agent.py (1)
  • run (219-227)
🪛 Ruff (0.13.1)
tasks/hivemind/query_data_sources.py

100-100: Do not catch blind exception: Exception

(BLE001)

tasks/hivemind/agent.py

167-167: Do not catch blind exception: Exception

(BLE001)


168-168: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci / test / Test
🔇 Additional comments (5)
requirements.txt (1)

1-9: LGTM! Dependency cleanup aligns with the refactoring.

The removal of langchain and langchain-openai dependencies is consistent with the PR's objective to decouple from LangChain-based tooling and adopt the new QueryDataSources class approach.

tasks/hivemind/query_data_sources.py (1)

98-106: Well-designed runtime import pattern for optional LangChain support.

The try-except block with a no-op fallback decorator elegantly handles the case where LangChain is not installed, allowing the RAG functionality to work without the dependency. This aligns perfectly with the PR's goal of making LangChain optional.

Regarding the static analysis hint about catching broad Exception at line 100: this is appropriate here since import failures can raise various exception types (ImportError, ModuleNotFoundError, etc.), and the fallback behavior is safe and intentional.

tasks/hivemind/agent.py (1)

157-161: Cleaner refactoring that removes LangChain complexity.

The direct instantiation of QueryDataSources and synchronous wrapper with asyncio.run (line 164) simplifies the RAG query flow by removing the LangChain agent/executor intermediary. This aligns well with the PR objectives.

.github/workflows/start.staging.yml (1)

9-9: LGTM! Consistent workflow template update.

This change mirrors the production workflow update to ci2.yml@main, ensuring consistency across staging and production environments.

.github/workflows/production.yml (1)

12-12: Confirm reusable workflow compatibility
ci2.yml in TogetherCrew/operations/.github/workflows exists and declares on: workflow_call, so the updated workflow reference is valid and compatible.

Comment on lines +157 to 174
query_data_sources = QueryDataSources(
community_id=self.community_id,
enable_answer_skipping=self.enable_answer_skipping,
workflow_id=self.workflow_id,
)
agent = create_openai_functions_agent(llm, tools, prompt)

# Run the agent
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
return_intermediate_steps=False,
max_iterations=3,
)
try:
answer = asyncio.run(query_data_sources.query(self.state.user_query))
if answer is None:
answer = "NONE"
except Exception as e:
logging.error(f"RAG query execution failed: {e}")
answer = "NONE"

result = agent_executor.invoke({"input": self.state.user_query})
self.state.last_answer = result["output"]
self.state.last_answer = answer
self.state.retry_count += 1

return "stop"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Type inconsistency: last_answer expects CrewOutput | None but receives str.

Line 171 assigns a string value (answer) to self.state.last_answer, but the field is typed as CrewOutput | None (line 19). This mismatch will cause type checking failures and potential runtime issues downstream where CrewOutput is expected.

Apply this diff to fix the type annotation:

 class AgenticFlowState(BaseModel):
     user_query: str = ""
     retry_count: int = 0
-    last_answer: CrewOutput | None = None
+    last_answer: CrewOutput | str | None = None
     state: str = "continue"
     chat_history: str | None = None

Alternatively, if last_answer should remain CrewOutput | None, wrap the string response in a CrewOutput object before assignment.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
query_data_sources = QueryDataSources(
community_id=self.community_id,
enable_answer_skipping=self.enable_answer_skipping,
workflow_id=self.workflow_id,
)
agent = create_openai_functions_agent(llm, tools, prompt)
# Run the agent
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
return_intermediate_steps=False,
max_iterations=3,
)
try:
answer = asyncio.run(query_data_sources.query(self.state.user_query))
if answer is None:
answer = "NONE"
except Exception as e:
logging.error(f"RAG query execution failed: {e}")
answer = "NONE"
result = agent_executor.invoke({"input": self.state.user_query})
self.state.last_answer = result["output"]
self.state.last_answer = answer
self.state.retry_count += 1
return "stop"
class AgenticFlowState(BaseModel):
user_query: str = ""
retry_count: int = 0
last_answer: CrewOutput | str | None = None
state: str = "continue"
chat_history: str | None = None
🧰 Tools
🪛 Ruff (0.13.1)

167-167: Do not catch blind exception: Exception

(BLE001)


168-168: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents
In tasks/hivemind/agent.py around lines 157 to 174, last_answer is being set to
a plain string but its type is declared as CrewOutput | None (line 19); update
the code so the assigned value matches the declared type by either (A) wrapping
the string into a CrewOutput instance before assigning to self.state.last_answer
(create a CrewOutput with the appropriate fields populated from the RAG response
and use that), or (B) if the design intent is to allow plain strings, change the
state type annotation for last_answer to str | CrewOutput | None (and update any
downstream usages to handle the string case). Ensure you choose one approach and
make corresponding downstream type/usage updates so type checking passes.

@amindadgar amindadgar merged commit 5262aa3 into main Oct 1, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants