-
Notifications
You must be signed in to change notification settings - Fork 13
Improve robustness of CSV dataset loader #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve robustness of CSV dataset loader #64
Conversation
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Learn moreAll Green is an AI agent that automatically: ✅ Addresses code review comments ✅ Fixes failing CI checks ✅ Resolves merge conflicts |
|
Warning Rate limit exceeded@Jagriti-student has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 4 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughEnhanced CSV dataset loader robustness by introducing a helper function for safe field parsing, configurable delimiters for tools and context, and error handling with row context for malformed rows. All changes localized to dataset loading logic. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @src/agentunit/datasets/base.py:
- Around line 82-89: _in _parse_list_field_, the function currently calls
value.split(delimiter) without validating delimiter which raises ValueError when
delimiter is an empty string; add an early check for an empty or non-string
delimiter and either raise a clear ValueError (e.g., "delimiter must be a
non-empty string") or return None, then proceed with splitting only when
delimiter is valid so runtime errors are prevented.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/agentunit/datasets/base.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/datasets/base.py (2)
src/agentunit/cli/__init__.py (1)
get(80-89)src/agentunit/core/exceptions.py (1)
AgentUnitError(8-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build Package
🔇 Additional comments (2)
src/agentunit/datasets/base.py (2)
92-96: Good backward-compatible signature update.The configurable delimiters with sensible defaults maintain backward compatibility while addressing the issue requirements.
107-128: Robust error handling with good context.The try/except properly wraps row processing, chains the original exception, and provides helpful row context for debugging malformed CSV data. The required
queryfield access (line 110) correctly enforces its presence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/agentunit/datasets/base.py (1)
82-93: Consider adding unit tests for the new parsing logic.The Codecov report shows 0% patch coverage. While the implementation looks correct, tests would help ensure:
_parse_list_fieldhandles edge cases (empty strings, whitespace-only values, missing delimiters)- Error handling for malformed CSV rows works as expected
- Different delimiter configurations produce correct results
This is particularly valuable since the PR aims to improve robustness.
Also applies to: 112-133
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/agentunit/datasets/base.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/datasets/base.py (2)
src/agentunit/cli/__init__.py (1)
get(80-89)src/agentunit/core/exceptions.py (1)
AgentUnitError(8-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test (Python 3.12)
- GitHub Check: Test (Python 3.10)
🔇 Additional comments (3)
src/agentunit/datasets/base.py (3)
82-93: LGTM! Well-designed helper for safe field parsing.The function handles edge cases appropriately:
- Empty/None values return
None- Empty delimiter defaults to treating the whole value as a single item
- Multi-character delimiters like
"||"work correctly withstr.split()- Empty strings after splitting are filtered out
The
isinstance(value, str)check at line 84 is defensive sincecsv.DictReaderalways yields strings, but it doesn't hurt and adds robustness if the function is reused elsewhere.
97-101: LGTM! Backward-compatible configurable delimiters.The default values
";"for tools and"||"for context preserve the original behavior while allowing callers to customize. This addresses the issue #60 requirement for configurable delimiters.
112-133: Good error handling with row context.Wrapping in try/except and re-raising with row index provides useful debugging information. The
row["query"]access (line 115) correctly enforces thatqueryis a required field.One minor consideration: catching bare
Exceptionat line 132 is broad but acceptable here since:
- In Python 3,
KeyboardInterrupt/SystemExitdon't inherit fromException- The most likely exceptions (
KeyErrorfor missing query) are appropriately wrapped
aviralgarg05
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!

Description
Improves CSV parsing in
load_local_csvby removing hardcodedstring splitting and introducing configurable delimiters.
Changes
Fixes #60
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.