Skip to content

Conversation

@Jagriti-student
Copy link
Contributor

@Jagriti-student Jagriti-student commented Jan 13, 2026

Description

Improves CSV parsing in load_local_csv by removing hardcoded
string splitting and introducing configurable delimiters.

Changes

  • Added safe parsing helper for list fields
  • Made tools and context delimiters configurable
  • Improved robustness against malformed CSV rows
  • Preserved backward compatibility

Fixes #60

Summary by CodeRabbit

  • New Features

    • CSV loading supports configurable delimiters for tools and context fields and safer parsing that preserves empty/missing list values.
  • Bug Fixes

    • Improved robustness when processing CSV rows with enhanced error reporting for malformed rows and consistent metadata extraction that excludes standard fields.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
@continue
Copy link

continue bot commented Jan 13, 2026

All Green - Keep your PRs mergeable

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts


Unsubscribe from All Green comments

@coderabbitai
Copy link

coderabbitai bot commented Jan 13, 2026

Warning

Rate limit exceeded

@Jagriti-student has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 4 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between be0d741 and de69703.

📒 Files selected for processing (1)
  • src/agentunit/datasets/base.py

Walkthrough

Enhanced CSV dataset loader robustness by introducing a helper function for safe field parsing, configurable delimiters for tools and context, and error handling with row context for malformed rows. All changes localized to dataset loading logic.

Changes

Cohort / File(s) Summary
CSV Dataset Loader Enhancements
src/agentunit/datasets/base.py
Added `_parse_list_field(value: str

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • aviralgarg05
🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description covers the key changes but lacks details on testing, code quality checks, and documentation updates required by the template. Add sections on testing (test commands run, coverage), code quality checks (style, linting), and documentation updates to fully satisfy the template requirements.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: improving robustness of the CSV dataset loader through better parsing and delimiter handling.
Linked Issues check ✅ Passed All objectives from issue #60 are addressed: robust parsing helper, configurable delimiters for tools and context, and improved error handling for malformed rows.
Out of Scope Changes check ✅ Passed All changes are directly related to the stated objective of improving CSV loader robustness with configurable delimiters and error handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/agentunit/datasets/base.py 0.00% 13 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @src/agentunit/datasets/base.py:
- Around line 82-89: _in _parse_list_field_, the function currently calls
value.split(delimiter) without validating delimiter which raises ValueError when
delimiter is an empty string; add an early check for an empty or non-string
delimiter and either raise a clear ValueError (e.g., "delimiter must be a
non-empty string") or return None, then proceed with splitting only when
delimiter is valid so runtime errors are prevented.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23afcf8 and c8f76e6.

📒 Files selected for processing (1)
  • src/agentunit/datasets/base.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/datasets/base.py (2)
src/agentunit/cli/__init__.py (1)
  • get (80-89)
src/agentunit/core/exceptions.py (1)
  • AgentUnitError (8-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build Package
🔇 Additional comments (2)
src/agentunit/datasets/base.py (2)

92-96: Good backward-compatible signature update.

The configurable delimiters with sensible defaults maintain backward compatibility while addressing the issue requirements.


107-128: Robust error handling with good context.

The try/except properly wraps row processing, chains the original exception, and provides helpful row context for debugging malformed CSV data. The required query field access (line 110) correctly enforces its presence.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/agentunit/datasets/base.py (1)

82-93: Consider adding unit tests for the new parsing logic.

The Codecov report shows 0% patch coverage. While the implementation looks correct, tests would help ensure:

  • _parse_list_field handles edge cases (empty strings, whitespace-only values, missing delimiters)
  • Error handling for malformed CSV rows works as expected
  • Different delimiter configurations produce correct results

This is particularly valuable since the PR aims to improve robustness.

Also applies to: 112-133

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8f76e6 and be0d741.

📒 Files selected for processing (1)
  • src/agentunit/datasets/base.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/datasets/base.py (2)
src/agentunit/cli/__init__.py (1)
  • get (80-89)
src/agentunit/core/exceptions.py (1)
  • AgentUnitError (8-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test (Python 3.12)
  • GitHub Check: Test (Python 3.10)
🔇 Additional comments (3)
src/agentunit/datasets/base.py (3)

82-93: LGTM! Well-designed helper for safe field parsing.

The function handles edge cases appropriately:

  • Empty/None values return None
  • Empty delimiter defaults to treating the whole value as a single item
  • Multi-character delimiters like "||" work correctly with str.split()
  • Empty strings after splitting are filtered out

The isinstance(value, str) check at line 84 is defensive since csv.DictReader always yields strings, but it doesn't hurt and adds robustness if the function is reused elsewhere.


97-101: LGTM! Backward-compatible configurable delimiters.

The default values ";" for tools and "||" for context preserve the original behavior while allowing callers to customize. This addresses the issue #60 requirement for configurable delimiters.


112-133: Good error handling with row context.

Wrapping in try/except and re-raising with row index provides useful debugging information. The row["query"] access (line 115) correctly enforces that query is a required field.

One minor consideration: catching bare Exception at line 132 is broad but acceptable here since:

  1. In Python 3, KeyboardInterrupt/SystemExit don't inherit from Exception
  2. The most likely exceptions (KeyError for missing query) are appropriately wrapped

Copy link
Owner

@aviralgarg05 aviralgarg05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@aviralgarg05 aviralgarg05 merged commit 48cfb0d into aviralgarg05:main Jan 14, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Robustness of CSV Dataset Loader

3 participants