Skip to content

Conversation

@yarikoptic
Copy link

Changes

Configuration & Infrastructure

  • Added .codespellrc configuration with comprehensive skip patterns
  • Created GitHub Actions workflow to check spelling on push and PRs
  • Configured to skip training data/models, test resources, binary files, and caches
  • Added .npm/ to .gitignore

Domain-Specific Whitelist

Added legitimate terms that codespell flags as typos:

  • serie - French word for "series"
  • blockin - ML feature label
  • punctuations, ther, ist, usre, theses, ue - domain-specific terms
  • consol - abbreviation for "consolidation"
  • countr, inpu - variable names
  • currenty, currentx - coordinate variable typos (currentY, currentX)

Typo Fixes

Ambiguous typos fixed manually (12 fixes with context review):

  • Documentation: prefered choisepreferred choice, noremore, makedmarked, theitheir
  • Code: limtedlimited, extactexact (3×), positonspositions, preceedingpreceding, withereither, skiptskipped

Non-ambiguous typos fixed automatically (221 fixes in 80 files):
Common fixes include occurred, instantiate, immediately, preferred, original, below, recommend, and many others.

Regex Pattern Protection

Added inline codespell:ignore comments for:

  • EmailSanitizer.java: ddress patterns (intentionally match OCR typos in source text)
  • BasicStructureBuilder.java: ment pattern (matches "Acknowledgment"/"Acknowledgement")

Testing

✅ Codespell passes with zero errors after all fixes


🤖 Generated with Claude Code

- Fix 'splitted' to 'split' in comment
- Add inline codespell:ignore for regex patterns that may intentionally match OCR typos
- Add ignore list: consol, countr, inpu, currenty, currentx
- Fix 'inputed' to 'inputted' in javadoc comment
- Add .npm/ to .gitignore to prevent npm cache commits
- Add to ignore list: countr, inpu, currenty, currentx
- Fix 'inputed' to 'inputted' in javadoc comment
Documentation fixes:
- prefered choise → preferred choice
- for nore details → for more details
- table is maked → table is marked
- thei consequences → their consequences

Code fixes:
- limted → limited
- extact → exact (3 occurrences)
- token positons → token positions
- preceeding → preceding
- wither → either
- skipt → skipped
Applied codespell -w to automatically fix clear typos across 81 files:
- Documentation files (29 files)
- Java source files (52 files)

Common fixes include:
- occured → occurred
- instanciate/instanciated → instantiate/instantiated
- immediatly → immediately
- prefered → preferred
- orginal → original
- bellow → below
- recommand/recommanded → recommend/recommended
- intials → initials
And many others
- Runs on push to master and on pull requests
- Uses official codespell-project/actions-codespell@v2
- Checks both file contents and filenames
- Read-only permissions for security
Revert Acknowledge?meant? → Acknowledge?ment? in regex pattern.
The pattern matches 'Acknowledgment' or 'Acknowledgement', not 'Acknowledgemeant'.
Added inline codespell:ignore comment to protect regex pattern.
@lfoppiano
Copy link
Member

Thanks @yarikoptic. I will merge this in sync with #1358 which may be in contrast between each other

@yarikoptic
Copy link
Author

oh -- likely do not bother merging! I should be able to easily redo fixes. Just proceed with refactoring first and then let's fix those typos up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants