feat: enhance Hivemind agent execution with max iterations and improve error handling! #48

amindadgar · 2025-08-13T09:21:39Z

Summary by CodeRabbit

Bug Fixes
- Prevented runaway agent loops by capping execution iterations, improving reliability and response times.
- Handled workflow failures gracefully, returning a clean failure state instead of crashing or exposing raw errors.
- Normalized error-shaped responses and improved error messages for clearer diagnostics.
Chores
- Enhanced logging around data source queries and workflow execution to aid troubleshooting.

…e error handling in query data sources

coderabbitai · 2025-08-13T09:21:45Z

Walkthrough

Caps AgentExecutor iterations at 3 in do_rag_query. Adds logging and exception handling around Temporal workflow execution in QueryDataSources.query, normalizing failure-shaped results to None. No other logic changes.

Changes

Cohort / File(s)	Summary
Agent iteration cap `tasks/hivemind/agent.py`	Passes max_iterations=3 to AgentExecutor in do_rag_query, limiting agent loop iterations.
Temporal workflow error handling `tasks/hivemind/query_data_sources.py`	Adds logging and try/except around client.execute_workflow, catches WorkflowFailureError and generic exceptions, logs details, returns None on failure; normalizes failure-shaped results returned as data.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as do_rag_query
  participant Agent as AgentExecutor
  Caller->>Agent: initialize(max_iterations=3)
  loop up to 3 iterations
    Agent->>Agent: plan/act/observe
  end
  Agent-->>Caller: final result or early stop

sequenceDiagram
  participant Caller as QueryDataSources.query
  participant Temporal as Temporal Client
  Caller->>Temporal: execute_workflow(payload, id, task_queue, ...)
  alt success
    Temporal-->>Caller: result
    alt result indicates failure-shape
      Caller->>Caller: log error
      Caller-->>Caller: return None
    else valid data
      Caller-->>Caller: return result
    end
  else WorkflowFailureError/Exception
    Temporal-->>Caller: throw
    Caller->>Caller: log error
    Caller-->>Caller: return None
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

feat: add retry policy to Hivemind query execution! #46: Also changes QueryDataSources.query workflow invocation, adding retry_policy to execute_workflow.
fix: using langchain tool calling! #42: Modifies do_rag_query AgentExecutor construction/execution in a similar area.
feat: added direct return for hivemind tool! #6: Touches both hivemind agent and query flow modules involved here.

Poem

Three hops, not four—my agent’s run,
A tidy loop, then task is done.
If workflows trip, we log the tale,
And safely warren back from fail.
Nose twitch, ears perk—onward we go,
With guarded steps and steady flow. 🐇

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/47-unlimited-workflow-call

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🔭 Outside diff range comments (1)

tasks/hivemind/agent.py (1)
186-189: Guard agent invocation with try/except and log context

Given the tool invokes Temporal and networked services, agent_executor.invoke can raise. Mirror the improved error handling pattern and avoid flow crashes by catching exceptions here, logging, and setting a safe last_answer fallback.

Example change (outside this hunk):
try:
    result = agent_executor.invoke({"input": self.state.user_query})
    self.state.last_answer = result["output"]
except Exception as e:
    logging.error(f"Agent execution failed for workflow_id={self.workflow_id}: {e}", exc_info=True)
    self.state.last_answer = "Sorry, I couldn't complete your request."
finally:
    self.state.retry_count += 1

🧹 Nitpick comments (3)

tasks/hivemind/agent.py (1)
179-184: Make max_iterations configurable

Hard-coding 3 may be too strict/lenient depending on the task. Consider exposing it via a constructor arg to AgenticHivemindFlow and pass it through here.

Apply within these lines:
-            max_iterations=3,
+            max_iterations=self.max_agent_iterations,
Outside this hunk, add a parameter and store it:
# In __init__(..., max_agent_iterations: int = 3, ...)
self.max_agent_iterations = max_agent_iterations
tasks/hivemind/query_data_sources.py (2)
55-61: Consider handling specific Temporal exceptions and adding timeouts

Catching WorkflowFailureError and a generic Exception is fine. Consider also handling known Temporal exceptions (e.g., already started) separately for clearer operator signals, and consider adding a suitable workflow timeout to avoid hanging calls.

Would you like me to propose concrete exception types/timeouts based on the tc_temporal_backend and temporalio versions you’re on?

3-3: Use a module-level logger instead of the root logger

To integrate better with app-wide logging config, prefer a module-specific logger.

Outside this hunk:
logger = logging.getLogger(__name__)
# then replace logging.error/info(...) with logger.error/info(...)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6bff1c6 and 1ba0ddd.

📒 Files selected for processing (2)

tasks/hivemind/agent.py (1 hunks)
tasks/hivemind/query_data_sources.py (3 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-02-26T15:41:54.631Z

Learnt from: amindadgar
PR: TogetherCrew/agents-workflow#2
File: tasks/hivemind/query_data_sources.py:27-31
Timestamp: 2025-02-26T15:41:54.631Z
Learning: In the agents-workflow project, "HivemindWorkflow" and "AgenticHivemindTemporalWorkflow" are two distinct workflows. The QueryDataSources class in tasks/hivemind/query_data_sources.py is designed to interact with "HivemindWorkflow", which is separate from "AgenticHivemindTemporalWorkflow" defined in tasks/agent.py.

Applied to files:

tasks/hivemind/query_data_sources.py

📚 Learning: 2025-06-17T13:03:41.479Z

Learnt from: amindadgar
PR: TogetherCrew/agents-workflow#37
File: tasks/hivemind/query_data_sources.py:40-43
Timestamp: 2025-06-17T13:03:41.479Z
Learning: The HivemindQueryPayload schema in tc-temporal-backend package version 1.1.3 includes support for the workflow_id field, so assignments like payload.workflow_id = self.workflow_id are valid and won't cause ValidationError.

Applied to files:

tasks/hivemind/query_data_sources.py

🔇 Additional comments (1)

tasks/hivemind/agent.py (1)

179-184: Capping agent iterations at 3 — LGTM

Setting max_iterations=3 is a sensible safeguard against runaway agent loops. No issues spotted with the current usage.

coderabbitai · 2025-08-13T09:26:25Z

tasks/hivemind/query_data_sources.py

 from tc_temporal_backend.client import TemporalClient
 from tc_temporal_backend.schema.hivemind import HivemindQueryPayload
 from temporalio.common import RetryPolicy
+from temporalio.client import WorkflowFailureError


⚠️ Potential issue

Incorrect import for WorkflowFailureError

WorkflowFailureError is exposed from temporalio.exceptions, not temporalio.client. This will raise ImportError at runtime.

-from temporalio.client import WorkflowFailureError +from temporalio.exceptions import WorkflowFailureError

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from temporalio.client import WorkflowFailureError

- from temporalio.client import WorkflowFailureError

+ from temporalio.exceptions import WorkflowFailureError

🤖 Prompt for AI Agents

In tasks/hivemind/query_data_sources.py around line 12, the code imports WorkflowFailureError from temporalio.client which is incorrect; change the import to import WorkflowFailureError from temporalio.exceptions (i.e., replace the temporalio.client import with temporalio.exceptions) so the correct symbol is imported and ImportError at runtime is avoided.

coderabbitai · 2025-08-13T09:26:25Z

tasks/hivemind/query_data_sources.py

+        try:
+            result = await client.execute_workflow(
+                "HivemindWorkflow",
+                payload,
+                id=f"hivemind-query-{self.community_id}-{self.workflow_id}",
+                task_queue=hivemind_queue,
+                retry_policy=RetryPolicy(maximum_attempts=3),
+            )
+        except WorkflowFailureError as e:


🛠️ Refactor suggestion

Ensure workflow ID uniqueness when workflow_id is absent

Using id=f"hivemind-query-{community_id}-{workflow_id}" will collide across requests when workflow_id is None, leading to WorkflowAlreadyStarted-like errors. Fallback to a random suffix when no workflow_id is provided.

- id=f"hivemind-query-{self.community_id}-{self.workflow_id}", + id=f"hivemind-query-{self.community_id}-{self.workflow_id or os.urandom(8).hex()}",

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In tasks/hivemind/query_data_sources.py around lines 47 to 55, the workflow id string uses self.workflow_id directly which can be None and cause collisions; update the code to compute a unique workflow_id_part (use self.workflow_id if truthy otherwise generate a short random suffix such as a UUID4 or timestamp+random) and then use id=f"hivemind-query-{self.community_id}-{workflow_id_part}" when calling client.execute_workflow so each started workflow without an explicit workflow_id gets a unique id to avoid WorkflowAlreadyStarted errors.

coderabbitai · 2025-08-13T09:26:25Z

tasks/hivemind/query_data_sources.py

+        # Normalize Temporal failure-shaped responses that may be returned as data
+        if isinstance(result, dict) and (
+            "workflowExecutionFailedEventAttributes" in result or "failure" in result
+        ):
+            logging.error(f"WorkflowFailureError: {result} for workflow {self.workflow_id}", exc_info=True)
+            return None
+        if isinstance(result, str) and "workflowExecutionFailedEventAttributes" in result:
+            logging.error(f"WorkflowFailureError: {result} for workflow {self.workflow_id}", exc_info=True)
+            return None


🛠️ Refactor suggestion

Don’t log exc_info for non-exceptions

These branches handle failure-shaped payloads, not exceptions. exc_info=True prints a spurious “NoneType: None” stack trace. Remove exc_info here.

- logging.error(f"WorkflowFailureError: {result} for workflow {self.workflow_id}", exc_info=True) + logging.error(f"WorkflowFailureError-shaped payload for workflow {self.workflow_id}: {result}") ... - logging.error(f"WorkflowFailureError: {result} for workflow {self.workflow_id}", exc_info=True) + logging.error(f"WorkflowFailureError-shaped payload for workflow {self.workflow_id}: {result}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Normalize Temporal failure-shaped responses that may be returned as data

if isinstance(result, dict) and (

"workflowExecutionFailedEventAttributes" in result or "failure" in result

):

logging.error(f"WorkflowFailureError: {result} for workflow {self.workflow_id}", exc_info=True)

return None

if isinstance(result, str) and "workflowExecutionFailedEventAttributes" in result:

logging.error(f"WorkflowFailureError: {result} for workflow {self.workflow_id}", exc_info=True)

return None

# Normalize Temporal failure-shaped responses that may be returned as data

if isinstance(result, dict) and (

"workflowExecutionFailedEventAttributes" in result or "failure" in result

):

logging.error(f"WorkflowFailureError-shaped payload for workflow {self.workflow_id}: {result}")

return None

if isinstance(result, str) and "workflowExecutionFailedEventAttributes" in result:

logging.error(f"WorkflowFailureError-shaped payload for workflow {self.workflow_id}: {result}")

return None

🤖 Prompt for AI Agents

In tasks/hivemind/query_data_sources.py around lines 62 to 70, the logging.error calls for failure-shaped payloads include exc_info=True which prints a spurious stack trace for non-exceptions; remove the exc_info=True argument from both logging.error calls so the message is logged without exception/stack trace information.

coderabbitai · 2025-08-13T09:26:25Z

tasks/hivemind/query_data_sources.py

+        # Normalize Temporal failure-shaped responses that may be returned as data
+        if isinstance(result, dict) and (
+            "workflowExecutionFailedEventAttributes" in result or "failure" in result
+        ):
+            logging.error(f"WorkflowFailureError: {result} for workflow {self.workflow_id}", exc_info=True)
+            return None
+        if isinstance(result, str) and "workflowExecutionFailedEventAttributes" in result:
+            logging.error(f"WorkflowFailureError: {result} for workflow {self.workflow_id}", exc_info=True)
+            return None



💡 Verification agent

🧩 Analysis chain

Verify return type consistency (may not always be str)

You normalize failure-shaped dict/str responses, but a successful result could also be a dict depending on the workflow. The method signature declares str | None, and the tool wrapper returns response as-is. If success can be non-str, adjust types or coerce to str before returning.

Possible adjustments:

Change signature to return Any or Union[str, dict, None].

Or, stringify non-str successes:

if not isinstance(result, str): result = json.dumps(result, ensure_ascii=False) return result

If you want, I can scan this repo for consumers that assume a string and propose end-to-end changes.

I’ll inspect the function signature to confirm what types it may return. Running a quick search and file excerpt now.

🏁 Script executed:

#!/bin/bash # Locate the function definition and show its signature rg -n "def " tasks/hivemind/query_data_sources.py # Show the first 200 lines to capture imports and the method signature sed -n '1,200p' tasks/hivemind/query_data_sources.py

Length of output: 4699

Ensure query always returns a string or update its signature

The async def query(self, …) -> str | None currently returns whatever the workflow yields—often a dict—which violates the declared return type and will break the get_rag_answer tool (it expects a str). You should either serialize non‐string results or widen the type hint.

Locations to update in tasks/hivemind/query_data_sources.py:

After the failure checks (around line 70), coerce dicts to JSON:
# Normalize Temporal failure-shaped responses … if isinstance(result, dict) and ("workflowExecutionFailedEventAttributes" in result or "failure" in result): logging.error(…) return None if isinstance(result, str) and "workflowExecutionFailedEventAttributes" in result: logging.error(…) return None

# Ensure successful payloads are returned as strings

if not isinstance(result, str):

import json

result = json.dumps(result, ensure_ascii=False) return result

Or, if you intend to surface structured data, change the signature to:
async def query(self, query: str) -> dict | str | None:
and update the get_rag_answer wrapper accordingly.

🤖 Prompt for AI Agents

In tasks/hivemind/query_data_sources.py around lines 62–71, the function annotated as async def query(...) -> str | None sometimes returns dicts (or other non-string types), violating the signature and breaking callers; after the existing failure checks, coerce non-string results to strings (e.g., import json and if isinstance(result, dict) set result = json.dumps(result), and for other non-str types convert with str(result)) so the function always returns a str or None, or alternatively change the signature to async def query(self, query: str) -> dict | str | None and update get_rag_answer to accept structured results—pick one approach and apply the corresponding code and type-hint changes.

feat: enhance Hivemind agent execution with max iterations and improv…

1ba0ddd

…e error handling in query data sources

coderabbitai bot reviewed Aug 13, 2025

View reviewed changes

amindadgar merged commit 620bbcf into main Aug 13, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: enhance Hivemind agent execution with max iterations and improve error handling! #48

feat: enhance Hivemind agent execution with max iterations and improve error handling! #48

Uh oh!

amindadgar commented Aug 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 13, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 13, 2025

Uh oh!

coderabbitai bot Aug 13, 2025

Uh oh!

coderabbitai bot Aug 13, 2025

Uh oh!

coderabbitai bot Aug 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	from temporalio.client import WorkflowFailureError
	- from temporalio.client import WorkflowFailureError
	+ from temporalio.exceptions import WorkflowFailureError

feat: enhance Hivemind agent execution with max iterations and improve error handling! #48

feat: enhance Hivemind agent execution with max iterations and improve error handling! #48

Uh oh!

Conversation

amindadgar commented Aug 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amindadgar commented Aug 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 13, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)