fix(cache): validate cache_index schema collisions in worker dat…#3216
fix(cache): validate cache_index schema collisions in worker dat…#3216Hitanshi7556 wants to merge 2 commits intokubeflow:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow Trainer! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
…asource Signed-off-by: Hitanshi Goklani <hitanshigoklani33@gmail.com>
94a4c9d to
f3fa59d
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds schema validation to prevent column name collisions when the data cache worker appends its cache_index column to source table schemas. The validation rejects source tables that already define a cache_index column, addressing the TODO comment that was previously in the code.
Changes:
- Added validation in
WorkerDataSource::newto check forcache_indexcolumn collision before appending it to the schema - Removed the TODO comment about validating name collisions
- Added two unit tests to verify the schema collision detection logic
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Hitanshi7556 <hitanshigoklani33@gmail.com>
What this PR does
Adds schema validation in
WorkerDataSource::try_newto reject source tables that already definecache_index.Why
cache_indexis appended by data-cache worker logic; collisions can lead to ambiguous schema behavior.Scope
Testing
make test-rustfixes #3174