Fix: Implementing Character-Level Diffing in Limited Situations#46
Open
Fix: Implementing Character-Level Diffing in Limited Situations#46
Conversation
Introduce a `character_level_diffing` option to refine single token replacements by analyzing prefix and suffix changes. This enables more precise diffing for minor text variations and improves the granularity of redline outputs. Signed-off-by: houfu <houfu@users.noreply.github.com>
…ing trailing punctuation Signed-off-by: houfu <houfu@users.noreply.github.com>
* Update email addresses, remove Matrix references, and revise license Updated contact emails to use a standardized format with "OUTLOOK dot sg". Removed all references to the Matrix chat platform for communication. Updated license to include contributors and extend copyright to 2025. Signed-off-by: houfu <houfu@users.noreply.github.com> * Update workflows and dependencies to use uv and upgrade libraries Replaced Poetry with uv in GitHub workflows for dependency management and streamlined project setup. Updated Python dependencies, libraries, and specified versions in `uv.lock` for improved stability and compatibility. Signed-off-by: houfu <houfu@users.noreply.github.com> --------- Signed-off-by: houfu <houfu@users.noreply.github.com>
Added support for manual workflow dispatch in python-publish.yml with input for version control. Also corrected inconsistent indentation in both workflows for better readability and proper YAML formatting. Signed-off-by: houfu <houfu@users.noreply.github.com>
Outdated actions.
* fix: match using normalized tokens, then apply the result to the original ones * doc: add explanatory comment Co-authored-by: Houfu Ang <houfu@users.noreply.github.com>
Introduce a `character_level_diffing` option to refine single token replacements by analyzing prefix and suffix changes. This enables more precise diffing for minor text variations and improves the granularity of redline outputs. Signed-off-by: houfu <houfu@users.noreply.github.com>
…ing trailing punctuation Signed-off-by: houfu <houfu@users.noreply.github.com>
…el-diffing-option' into 44-feature-request-character-level-diffing-option
Refactored `Redlines` and `WholeDocumentProcessor` to streamline text processing logic. Introduced handling for `Redline` objects and moved tokenization and processing into a more cohesive structure. This improves the clarity, flexibility, and maintainability of change-tracking functionality. Signed-off-by: houfu <houfu@users.noreply.github.com>
Removed unused attributes from the processor class and updated token handling to use local variables, improving clarity and reducing state complexity. Minor adjustments were also made to standardize formatting and remove unnecessary string prefixes. Signed-off-by: houfu <houfu@users.noreply.github.com>
Signed-off-by: houfu <houfu@users.noreply.github.com>
Change docs workflow trigger from push to release for better control. Add character-level diffing parameter to enhance functionality, and improve clarity in redlines documentation examples. Signed-off-by: houfu <houfu@users.noreply.github.com>
Introduced parameterized tests to validate character-level diffing for various scenarios, such as word changes, capitalization, and punctuation. Also added dedicated tests to handle special cases like trailing and multiple punctuation. Ensures the Redlines library behaves correctly with and without character-level diffing enabled. Signed-off-by: houfu <houfu@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #44
We make a limited two pass diffing approach which replace words that share similarities by running a second character level diff on just those pairs (like "dinner"/"dinners"). This is only activated when the change is only on one word/token has changes in the beginning or end of the token. This is a balance on performance and improvements on common cases. You might find inconsistencies or puzzling cases why character level is or is not activated. Feedback is always welcome.