-
-
Notifications
You must be signed in to change notification settings - Fork 5
SF-3661 Notify user when training data file format is invalid #3625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3625 +/- ##
=======================================
Coverage 82.80% 82.81%
=======================================
Files 610 610
Lines 37446 37462 +16
Branches 6163 6166 +3
=======================================
+ Hits 31009 31024 +15
Misses 5487 5487
- Partials 950 951 +1 ☔ View full report in Codecov by Sentry. |
RaymondLuong3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaymondLuong3 made 1 comment.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion.
test/SIL.XForge.Scripture.Tests/Services/TrainingDataServiceTests.cs line 603 at r1 (raw file):
text = text.TrimEnd(); // Remove trailing new lines text = text.Replace("\r\n", " "); // Handle newlines in both linux and windows environments text = text.Replace("\n", " ");
When I used Environment.NewLine it was working locally and detecting the NewLine convention, but when this was run on GHA the excel file NewLine came back with the windows convention. I chose to just do a replace so it will be more robust.
Code quote:
text = text.Replace("\r\n", " "); // Handle newlines in both linux and windows environments
text = text.Replace("\n", " ");
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmachapman reviewed 3 files and all commit messages, made 2 comments, and resolved 1 discussion.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @RaymondLuong3).
test/SIL.XForge.Scripture.Tests/Services/TrainingDataServiceTests.cs line 603 at r1 (raw file):
Previously, RaymondLuong3 (Raymond Luong) wrote…
When I used Environment.NewLine it was working locally and detecting the NewLine convention, but when this was run on GHA the excel file NewLine came back with the windows convention. I chose to just do a replace so it will be more robust.
Good thinking.
src/SIL.XForge.Scripture/Services/TrainingDataService.cs line 331 at r1 (raw file):
} if ( csvReader.ColumnCount != NumTrainingDataColumns
Given the exception message below, this should be csvReader.ColumnCount < NumTrainingDataColumns, as it is OK to upload a CSV file with 3 or more columns (for example, the third column might be a comment on the sentence pair, or a last updated date, or some other data), where columns 3 and higher are just ignored (this is the behavior of the Convert Excel logic).
Code quote:
csvReader.ColumnCount != NumTrainingDataColumns9106789 to
79c8f7a
Compare
RaymondLuong3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaymondLuong3 made 1 comment.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @pmachapman).
src/SIL.XForge.Scripture/Services/TrainingDataService.cs line 331 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Given the exception message below, this should be
csvReader.ColumnCount < NumTrainingDataColumns, as it is OK to upload a CSV file with 3 or more columns (for example, the third column might be a comment on the sentence pair, or a last updated date, or some other data), where columns 3 and higher are just ignored (this is the behavior of the Convert Excel logic).
Good thinking. I have updated the logic and tests to support 3 or more columns in csv/tsv files.
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmachapman reviewed 2 files and all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @RaymondLuong3).
79c8f7a to
1a537b3
Compare
This PR will detect if a training data file contains rows that are invalid. Rows must have two columns of data. When a row contains fewer than 2 columns of data, the file will be rejected and the user is notified to check their data. If a row is blank, it will be skipped over.

This change is