Skip to content

Conversation

@RaymondLuong3
Copy link
Collaborator

@RaymondLuong3 RaymondLuong3 commented Jan 6, 2026

This PR will detect if a training data file contains rows that are invalid. Rows must have two columns of data. When a row contains fewer than 2 columns of data, the file will be rejected and the user is notified to check their data. If a row is blank, it will be skipped over.
Spreadsheet Warning


This change is Reviewable

@RaymondLuong3 RaymondLuong3 added the will require testing PR should not be merged until testers confirm testing is complete label Jan 6, 2026
@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.81%. Comparing base (aac7d83) to head (1a537b3).
⚠️ Report is 1 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...L.XForge.Scripture/Services/TrainingDataService.cs 94.73% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3625   +/-   ##
=======================================
  Coverage   82.80%   82.81%           
=======================================
  Files         610      610           
  Lines       37446    37462   +16     
  Branches     6163     6166    +3     
=======================================
+ Hits        31009    31024   +15     
  Misses       5487     5487           
- Partials      950      951    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator Author

@RaymondLuong3 RaymondLuong3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaymondLuong3 made 1 comment.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion.


test/SIL.XForge.Scripture.Tests/Services/TrainingDataServiceTests.cs line 603 at r1 (raw file):

        text = text.TrimEnd(); // Remove trailing new lines
        text = text.Replace("\r\n", " "); // Handle newlines in both linux and windows environments
        text = text.Replace("\n", " ");

When I used Environment.NewLine it was working locally and detecting the NewLine convention, but when this was run on GHA the excel file NewLine came back with the windows convention. I chose to just do a replace so it will be more robust.

Code quote:

        text = text.Replace("\r\n", " "); // Handle newlines in both linux and windows environments
        text = text.Replace("\n", " ");

@pmachapman pmachapman self-assigned this Jan 11, 2026
Copy link
Collaborator

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmachapman reviewed 3 files and all commit messages, made 2 comments, and resolved 1 discussion.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @RaymondLuong3).


test/SIL.XForge.Scripture.Tests/Services/TrainingDataServiceTests.cs line 603 at r1 (raw file):

Previously, RaymondLuong3 (Raymond Luong) wrote…

When I used Environment.NewLine it was working locally and detecting the NewLine convention, but when this was run on GHA the excel file NewLine came back with the windows convention. I chose to just do a replace so it will be more robust.

Good thinking.


src/SIL.XForge.Scripture/Services/TrainingDataService.cs line 331 at r1 (raw file):

                            }
                            if (
                                csvReader.ColumnCount != NumTrainingDataColumns

Given the exception message below, this should be csvReader.ColumnCount < NumTrainingDataColumns, as it is OK to upload a CSV file with 3 or more columns (for example, the third column might be a comment on the sentence pair, or a last updated date, or some other data), where columns 3 and higher are just ignored (this is the behavior of the Convert Excel logic).

Code quote:

csvReader.ColumnCount != NumTrainingDataColumns

@RaymondLuong3 RaymondLuong3 force-pushed the task/sf-3661-training-data branch from 9106789 to 79c8f7a Compare January 12, 2026 23:56
Copy link
Collaborator Author

@RaymondLuong3 RaymondLuong3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaymondLuong3 made 1 comment.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @pmachapman).


src/SIL.XForge.Scripture/Services/TrainingDataService.cs line 331 at r1 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Given the exception message below, this should be csvReader.ColumnCount < NumTrainingDataColumns, as it is OK to upload a CSV file with 3 or more columns (for example, the third column might be a comment on the sentence pair, or a last updated date, or some other data), where columns 3 and higher are just ignored (this is the behavior of the Convert Excel logic).

Good thinking. I have updated the logic and tests to support 3 or more columns in csv/tsv files.

Copy link
Collaborator

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@pmachapman reviewed 2 files and all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @RaymondLuong3).

@pmachapman pmachapman added ready to test and removed will require testing PR should not be merged until testers confirm testing is complete labels Jan 13, 2026
@Nateowami Nateowami added testing complete Testing of PR is complete and should no longer hold up merging of the PR and removed ready to test labels Jan 14, 2026
@Nateowami Nateowami force-pushed the task/sf-3661-training-data branch from 79c8f7a to 1a537b3 Compare January 14, 2026 15:53
@Nateowami Nateowami enabled auto-merge (squash) January 14, 2026 15:53
@Nateowami Nateowami merged commit 562282d into master Jan 14, 2026
21 checks passed
@Nateowami Nateowami deleted the task/sf-3661-training-data branch January 14, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing complete Testing of PR is complete and should no longer hold up merging of the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants