Skip to content

Fix: Implementing Character-Level Diffing in Limited Situations#46

Open
houfu wants to merge 15 commits intomainfrom
44-feature-request-character-level-diffing-option
Open

Fix: Implementing Character-Level Diffing in Limited Situations#46
houfu wants to merge 15 commits intomainfrom
44-feature-request-character-level-diffing-option

Conversation

@houfu
Copy link
Owner

@houfu houfu commented May 18, 2025

Closes #44

We make a limited two pass diffing approach which replace words that share similarities by running a second character level diff on just those pairs (like "dinner"/"dinners"). This is only activated when the change is only on one word/token has changes in the beginning or end of the token. This is a balance on performance and improvements on common cases. You might find inconsistencies or puzzling cases why character level is or is not activated. Feedback is always welcome.

houfu added 2 commits May 19, 2025 00:32
Introduce a `character_level_diffing` option to refine single token replacements by analyzing prefix and suffix changes. This enables more precise diffing for minor text variations and improves the granularity of redline outputs.

Signed-off-by: houfu <houfu@users.noreply.github.com>
…ing trailing punctuation

Signed-off-by: houfu <houfu@users.noreply.github.com>
@houfu houfu linked an issue May 18, 2025 that may be closed by this pull request
houfu added 2 commits May 19, 2025 23:26
* Update email addresses, remove Matrix references, and revise license

Updated contact emails to use a standardized format with "OUTLOOK dot sg". Removed all references to the Matrix chat platform for communication. Updated license to include contributors and extend copyright to 2025.

Signed-off-by: houfu <houfu@users.noreply.github.com>

* Update workflows and dependencies to use uv and upgrade libraries

Replaced Poetry with uv in GitHub workflows for dependency management and streamlined project setup. Updated Python dependencies, libraries, and specified versions in `uv.lock` for improved stability and compatibility.

Signed-off-by: houfu <houfu@users.noreply.github.com>

---------

Signed-off-by: houfu <houfu@users.noreply.github.com>
Added support for manual workflow dispatch in python-publish.yml with input for version control. Also corrected inconsistent indentation in both workflows for better readability and proper YAML formatting.

Signed-off-by: houfu <houfu@users.noreply.github.com>
@houfu houfu added this to the 0.5.2 milestone May 19, 2025
houfu and others added 11 commits May 20, 2025 00:10
Outdated actions.
* fix: match using normalized tokens, then apply the result to the original ones

* doc: add explanatory comment

Co-authored-by: Houfu Ang <houfu@users.noreply.github.com>
Introduce a `character_level_diffing` option to refine single token replacements by analyzing prefix and suffix changes. This enables more precise diffing for minor text variations and improves the granularity of redline outputs.

Signed-off-by: houfu <houfu@users.noreply.github.com>
…ing trailing punctuation

Signed-off-by: houfu <houfu@users.noreply.github.com>
…el-diffing-option' into 44-feature-request-character-level-diffing-option
Refactored `Redlines` and `WholeDocumentProcessor` to streamline text processing logic. Introduced handling for `Redline` objects and moved tokenization and processing into a more cohesive structure. This improves the clarity, flexibility, and maintainability of change-tracking functionality.

Signed-off-by: houfu <houfu@users.noreply.github.com>
Removed unused attributes from the processor class and updated token handling to use local variables, improving clarity and reducing state complexity. Minor adjustments were also made to standardize formatting and remove unnecessary string prefixes.

Signed-off-by: houfu <houfu@users.noreply.github.com>
Signed-off-by: houfu <houfu@users.noreply.github.com>
Change docs workflow trigger from push to release for better control. Add character-level diffing parameter to enhance functionality, and improve clarity in redlines documentation examples.

Signed-off-by: houfu <houfu@users.noreply.github.com>
Introduced parameterized tests to validate character-level diffing for various scenarios, such as word changes, capitalization, and punctuation. Also added dedicated tests to handle special cases like trailing and multiple punctuation. Ensures the Redlines library behaves correctly with and without character-level diffing enabled.

Signed-off-by: houfu <houfu@users.noreply.github.com>
@houfu houfu modified the milestones: 0.5.2, v0.6.0 May 22, 2025
@houfu houfu removed this from the v0.6.0 milestone Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Character-Level Diffing Option

2 participants