Using 'ZERO WIDTH JOINER' char instead of "|" for offset calculations#61
Open
ilanbm wants to merge 2 commits intoichord:masterfrom
Open
Using 'ZERO WIDTH JOINER' char instead of "|" for offset calculations#61ilanbm wants to merge 2 commits intoichord:masterfrom
ilanbm wants to merge 2 commits intoichord:masterfrom
Conversation
This was referenced Nov 28, 2018
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I noticed that Caret.js doesnt work well with languages that display connected chars such as arabic/persian/malayalam etc..
The code uses the char "|" to check the offset of a certain location. you should use the ZWJ char:
https://en.wikipedia.org/wiki/Zero-width_joiner
The advantage of using the 'zero width joiner' char is that:
Example:
العربية
الع|ربية
notice that "|" break the word and it's offset is different than it suppose to be
It might also solve issues with offset of numbers chars on RTL languages.
12345
will appear on RTL with "|" char: 345|12
Note: I'm just suggesting that on the way... I didn't check the code (forked and edited directly on github) and I didn't change getPosition function since it uses span element that wraps the "|" char and having an element instead of a text node with the ZWJ might not work - and still separate connected chars.
Please check and update - for people who use those languages (quite a lot)