Skip to content

Conversation

@diegomura
Copy link
Owner

@diegomura diegomura commented Dec 29, 2025

Regression from: #3188

#3188 broke emojis and text super/sub rendering.

After a more careful review, I changed some stuff. In short:

  • Changing orders of wrapText and generate glyphs was the right move
  • But instead of hyphenate word fn to return soft hyphens, it's better to keep it as before and have them not. Several reasons
    • It makes custom hyphenation fns harder to nail. Many people do not know about this unicode char
    • It's pointless, risky and unnecessary to generate a glyph for soft hyphens to later have to remove them from attributed strings
    • It's much simpler that after wrap words step we remove all soft hyphens and successive steps do not have to worry about them
  • After this step, a lot of things get simplified as theres no really need to manipulate attributed strings differently

Also previous approach was slightly wrong as it did not hyphenate the same word more than once if needed

@carlobeltrame please take a look if you have a sec and report anything wrong :)

@changeset-bot
Copy link

changeset-bot bot commented Dec 29, 2025

⚠️ No Changeset found

Latest commit: fea3867

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@diegomura diegomura merged commit e0db9bf into master Dec 29, 2025
@diegomura diegomura deleted the dm/fix-hyphenation branch December 29, 2025 02:42
@carlobeltrame
Copy link
Contributor

carlobeltrame commented Dec 29, 2025

@diegomura sorry about the broken tests. However, your fix here makes it a lot more complicated for users to set up proper hyphenation, not simpler. Please let me explain.

The goal of any reasonably-sized application in a western language is:

  1. Have most words be broken with a final hyphen on the line (such as e.g. the word hyphenation should be broken into hyphena- and tion if at the end of the line, note the automatically added hyphen)
  2. Some other words should be broken without an added final hyphen (such as e.g. the word react-pdf should be broken into react- and pdf or github.com/diegomura should be broken into github.com/ and diegomura, with no automatically added hyphen).

Requirement 1. has always been possible and still is. Requirement 2. was impossible in the past, was made possible with #3188 and now made much harder in this PR #3267. To properly implement this in #3188, only the hyphenationCallback has to be set properly, once globally for the whole app (or for a component subtree), and we can provide good defaults or examples in the documentation where users learn about hyphenationCallback in the first place. The easiest way with #3188 to fulfill both of the above requirements is for users to set hyphenationCallback to:

(word, original) =>
  original(word).flatMap(part => part.split(/(?<=-/)/))

I.e. no knowledge of soft hyphens is effectively required, it's all already contained in the original hyphenation callback built into react-pdf.

Now with this PR #3267, to implement the hyphen-less word breaking, users need to pre-insert soft hyphens into ALL of their strings in the whole application. That is, every single <Text> instance can no longer have plain text content <Text>github.com/diegomura/react-pdf 'hyphenation'</Text>, but must be used as <Text>{ insertSoftHyphens('github.com/diegomura/react-pdf \'hyphenation\''}</Text>. Alternatively, an application can choose to create their own <HyphenatedText> component which pre-processes the text, making the <Text> component effectively useless. The documentation and examples are now confusing for developers of such apps, because it recommends using <Text> and hyphenationCallback, both of which must in reality be avoided in order to get the correct behaviour. The transformation insertSoftHyphens has to be re-implemented from scratch without access to the react-pdf native implementation, and therefore now developers DO have to know about soft hyphens.

I chose the soft hyphen character because it is intended for almost exactly this use case. pdfkit itself already has support for soft hyphens, but it is never used inside react-pdf because the layout and textkit packages perform line breaking, and pdfkit's own line breaking capabilities are never triggered. That's why we have to remove all soft hyphens from the text before passing it to pdfkit.

If you were willing to reconsider my solution or a similar solution, I'd be happy to have a look at how I broke the emoji and superscript tests, as well as the issue with multiple breaks in the same word. Please let me know about your opinion on this.

@diegomura
Copy link
Owner Author

diegomura commented Dec 29, 2025

@carlobeltrame nothing to be sorry about! And I'll always be open to reconsider solutions :) Consider this change mostly as a way to fix tests and publish a new version yesterday, but doesn't have to stay like this if it's not ideal. This was a breaking change anyways

Let me break down your response though as some things I do not get.

About Requirement 1, I agree was possible but was broken due to soft-hyphens persisting in attributed string but not on syllables as far as I understand.

About Requirement 2, I was not considering them and you have a good point. But do we know what these cases are? For things like react-pdf where there's already a hyphen it would be easy to handle. URLs as well although slightly harder.

The easiest way with #3188 to fulfill both of the above requirements is for users to set hyphenationCallback to:

My general thought about this is that if users are supposed to set this snippet every time, it shuold be built in in the lib.

Now with this PR #3188, to implement the hyphen-less word breaking, users need to pre-insert soft hyphens into ALL of their strings in the whole application. That is, every single instance can no longer have plain text content

This part I don't get 😄 Why is this, and why is it different than #3188 ? As far as I can tell, if the passed string has soft-hyphens, those will be used by the hyphenator, and if not they will be computed by hyphen or the custom hyphenation fn the user sets. Just like before and just like in #3188

The documentation and examples are now confusing for developers of such apps, because it recommends using and hyphenationCallback, both of which must in reality be avoided in order to get the correct behaviour.

This I dont get either. How it's suposed to work now and I as far as I can tell does is:

  • You don't wanna think of hyphenation? Do nothing and let 'hyphen do the work (tailored for english, won't work well for other locales)
  • You wanna have more control or specific locale? Pass your own hyphenation callback where given a word, it returns "syllables" with no shy unicode chars or things users might not be aware of (like in example). I like this vs having to return soft-hyphens at the end of each syllable.
  • Regardless of what you choose, you want fine control over a word in text? Pass soft-hyphens to <Text /> and the engine will respect those

Again, I get requirement 2 is not being fixed here but I'm I suspect those can be workaround this solution.

Long snapshots (answering here to centralize discussions)

I surely agree snapshots are great, and those particularly are helpful. We have some sort of e2e tests with image regression testing, which kinda makes sense because they allow for internals to change while just ensuring the output remains the same, which is ultimately what the user cares about. But avoiding huge JSON snapshots that aren't really readable. Obviously they don't cover everything (like bookmarks, links, etc) but those are tested. elsewhere anyways.

What I don't really liked about those though is they are testing things we shouldn't. Ex. they contained the serialized fonts, images (in case they have attachements) or any other internal thing that isn't really relevant to test on for layout.

Imo those are essentially unreadable. When they will break (which they will, probably in a false way, given how many internal things they capture), given the difficulty to see diffs we will just hit u everytime which will defy the purpose of a test in the first place, and just increase PR sizes.

Let me know what you think. Again, I'm very much in favor to have as many tests as we can, but they should be somehow manageable so tests do not get in the way (been there many times). I don't think those were :)

@carlobeltrame
Copy link
Contributor

carlobeltrame commented Dec 29, 2025

I'll try to explain more, and give you proposed documentation for both versions, which will demonstrate what exactly is possible with #3188 and not possible with #3267.

About Requirement 1, I agree was possible but was broken due to soft-hyphens persisting in attributed string but not on syllables as far as I understand.

I think this requirement has always been possible to a satisfying degree. I would ignore soft hyphens here, because most user-provided and developer-provided text does not contain soft hyphens.

About Requirement 2, I was not considering them and you have a good point. But do we know what these cases are? For things like react-pdf where there's already a hyphen it would be easy to handle. URLs as well although slightly harder.

I have listed many issues in the description of this PR. Specifically, the issues #1642, #2456, #2564, #1380, #1416 and #1662 all are solveable with my version #3188, but are not solvable with #3267 (see below the documentation proposals for details).

The easiest way with #3188 to fulfill both of the above requirements is for users to set hyphenationCallback to:

My general thought about this is that if users are supposed to set this snippet every time, it shuold be built in in the lib.

I agree that would be ideal, but applications might have different views on which characters or character combinations should be allowed to break. E.g. should we break on underscores _, slashes /, colons :, question marks ? and should the special character be at the end of the previous line or at the start of the next line? How about breaking after emoji? How about if an application wants to not break on underscores, except inside URLs which can often become long? How about an application which does not want to split domain names but split at periods later in an URL (e.g. mydomain.com/some-dir-with-a-very-long-name.local/hello.html)? How about an application which wants to never break a specific company name such as react-pdf, due to rules from legal or marketing departments? I don't feel like I can provide an answer for all special characters in unicode, which suits most applications in most languages. So my proposal with part.split(/(?<=[-./])/) is supposed to be a starting point (split on every single hyphen, period and slash) for applications to take and extend to their needs.
If we include this regex in the built-in react-pdf hyphenation function, developers would have a harder time turning this feature off and only splitting at english syllables.

Now with this PR #3188, to implement the hyphen-less word breaking, users need to pre-insert soft hyphens into ALL of their strings in the whole application. That is, every single instance can no longer have plain text content

This part I don't get 😄 Why is this, and why is it different than #3188 ? As far as I can tell, if the passed string has soft-hyphens, those will be used by the hyphenator, and if not they will be computed by hyphen or the custom hyphenation fn the user sets. Just like before and just like in #3188

I think I was wrong in this point, in reality it's even worse. With this new PR #3267, hyphen-less word breaking is not possible anymore (or if it is, I may misunderstand your changes). See below the documentation proposals for a clearer explanation.

After #3188, the docs could look like this:

  • Don't want to think of hyphenation? Do nothing and let the built-in syllable hyphenation do the work (tailored for English, won't work well for other locales).
  • Want to also split at some special characters, or after a certain word length has been reached? Extend the built-in hyphenation callback. E.g. for also splitting at existing hyphens -, periods . and slashes /:
    Font.registerHyphenationCallback(
      (word, syllables) => syllables(word).flatMap(part => part.split(/(?<=[-./])/))
    )
    
  • You can decide whether an extra hyphen is inserted when your line is broken after each part. If the last character on a line is a soft hyphen character, react-pdf inserts an extra hyphen, otherwise it does not. Example: If your hyphenation callback splits example.com/index.html?foobar=baz into ['example.com/', 'index.html?', 'foo<SOFT_HYPHEN_CHARACTER>', 'bar=baz'], in a very narrow container, this would result in the following lines:
    example.com/
    index.html?
    foo-         // <-- note the added hyphen due to the returned soft hyphen
    bar=baz
    
    The built-in syllables hyphenator will already return soft hyphens at the end of each syllable (except at the end of words).

After #3267, the docs would rather look something like this:

  • Don't want to think of hyphenation? Do nothing and let the built-in syllable hyphenation do the work (tailored for English, won't work well for other locales).
  • Want to also split at some special characters, or after a certain word length has been reached? Extend the built-in hyphenation callback. E.g. for also splitting at existing hyphens -, periods . and slashes /:
    Font.registerHyphenationCallback(
      (word, syllables) => syllables(word).flatMap(part => part.split(/(?<=[-./])/))
    )
    
    Note: React-pdf will always insert an extra hyphen after your split parts. There is currently no way and no easy workaround to e.g. have URLs be split without having extra hyphens inserted into them. With the above example, the text example.com/index.html?foobar=baz in a narrow container will be split into:
    example.-
    com/-
    index.-
    html?foo-
    bar=baz
    
  • Regardless of what you choose, you want fine control over a word in text? Pass soft-hyphens to <Text /> and the engine will respect those.

@diegomura if I misunderstand your changes, please let me know. For full support of both requirements, the hyphenation callback MUST have a way to specify, which split parts should be followed by an extra hyphen if broken, and which split parts should not. In #3188, I introduced this possibility via the soft hyphen character. As far as I can tell, with this PR #3267, you removed this capability again, breaking the solution for many of the issues linked above. If I understood this wrong, I'd be glad if you could show me a code example on how to acheive the hyphen-less word splitting with your version.

@carlobeltrame
Copy link
Contributor

Long snapshots (answering here to centralize discussions)

I surely agree snapshots are great, and those particularly are helpful. We have some sort of e2e tests with image regression testing, which kinda makes sense because they allow for internals to change while just ensuring the output remains the same, which is ultimately what the user cares about. But avoiding huge JSON snapshots that aren't really readable. Obviously they don't cover everything (like bookmarks, links, etc) but those are tested. elsewhere anyways.

I agree. I myself am using react-pdf in a special way without the react renderer package, with a Vue.js renderer that generates the input for the layout stage (so, react-pdf with Vue.js instead of React). So that is a reason why I am especially interested in stability of these internal representations. I understand if you cannot currently make stability there a part of the maintenance effort.

What I don't really liked about those though is they are testing things we shouldn't. Ex. they contained the serialized fonts, images (in case they have attachements) or any other internal thing that isn't really relevant to test on for layout.

Maybe the specific bytes of the fonts aren't interesting, but if the fonts unexpectedly change or vanish or move to a different place in the internal representation object, would you want to know about it? I think these fonts rarely ever change, so having them in the snapshot shouldn't really make the snapshot test fail often, right? Or on the other hand, if you think these fonts don't really belong to the state worth testing, maybe they shouldn't be this deeply nested inside the internal representation object in the first place?

Imo those are essentially unreadable. When they will break (which they will, probably in a false way, given how many internal things they capture), given the difficulty to see diffs we will just hit u everytime which will defy the purpose of a test in the first place, and just increase PR sizes.

Well, snapshots are specifically a tool for when the output of an algorithm is too large to be readable. But I understand if you don't want them.

Another way forward would be to restore the integration test, then reduce the internal state by recursively throwing away undesired contents such as the fonts, and then snapshot this reduced version, or even write manual assertions for it. This approach may or may not lead to less maintenance of this integration test, who knows... I didn't do this yet because the internal representation is extremely complex and I wouldn't know which parts to keep and which to ignore.

Let me know what you think. Again, I'm very much in favor to have as many tests as we can, but they should be somehow manageable so tests do not get in the way (been there many times). I don't think those were :)

@diegomura
Copy link
Owner Author

diegomura commented Dec 29, 2025

I think I was wrong in this point, in reality it's even worse. With this new PR #3267, hyphen-less word breaking is not possible anymore (or if it is, I may misunderstand your changes).

By "hyphen-less" word breaking here you mean essentially Requirement 2 right? Allowing some words (like urls) to break, but not add a hyphen char at the end of the line. If so, I agree #3267 dropped this ability and we need it back. How to get there I feel is where we might have different opinions here :)

I agree that would be ideal, but applications might have different views on which characters or character combinations should be allowed to break.

Lots of examples you list here are already possible with #3267, like not breaking a company name (for which hyphenation custom fn shuold just return 1 syllable. ex Google => [Google]) or breaking at specific chars like underscore _ (for which hyphenation custom fn should split syllables accordingly too).

Again, and correct me if I'm wrong, the question is how we give users control to break in a specific place without engine adding a hyphen at the end. Let's say we find a way to achieve this, would you think something else is missing?

If the last character on a line is a soft hyphen character, react-pdf inserts an extra hyphen, otherwise it does not

I feel this is an incorrect use of soft-hyphen, and probably what got me confused about #3188 in the first place. Soft-hyphen unicode char purpose is to mark where a long word may be broken across lines if line-wrapping is needed, not to flag the engine wether a visible hyphen should be added or not. From what I get from your last part of your response you acknowledge this but you are fine with it.

Going back to my original point of #3188 exposing soft-hyphen as a public api (1st sub-point of the PR description) to which you said it didn't, after reading your potential docs I believe even stronger that it does. Not only that but we (react-pdf) are adding a custom meaning to soft-hyphens which is non-standard (at least to my knowledge). Now if users do not add this somehow mysterious (to their eyes) unicode char at the end of each syllable they won't see the line wrapped like they expect 99% of the time which is with a -.

In general I wanna think on maximizing DX for the common case, which in this case is hyphenating with a -. Those users should not have to worry about anything but to return the corresponding syllables in the callback. #3188 however asks each user to add a soft-hyphen char to each item in the array every time, just for very few cases where visible hyphen is not desired. And I don't really like that 😄

So how to still support Requirement 2? I can think on some solutions:

1. Allow to set hyphenation char for each word, optionally empty:

const hyphenationCallback = (word) => {
  const syllables = hyphen(word).split(SOFT_HYPHEN)
  return {  syllables, hyphen: '' } // Normally this would be `-`
}

Pros: semantic, backwards compatible (we can still support returning an array of strings for the most common cases)
Cons: Does not support different hyphen chars inside the same word. I feel this is probably not needed, but if so, we can support each syllable to define it's own hyphen

2. Accept unicode for non hyphenated words:

Similar approach but opposite direction: user doesn't want a - at the end of a syllable, add a special unicode char. This shouldn't be soft-hyphen. ChatGPT suggests ZERO WIDTH SPACE — U+200B

Pros: Simpler return type of hyphenation callback (not sure)
Cons: I feel unicode chars make a bad interface. Will also mean having to do glyph manipulation, etc

--

Regardless, I think things like URLs where common case is not to add - we can handle them internally so user doesn't have to worry about it

@carlobeltrame thoughts?

@carlobeltrame
Copy link
Contributor

carlobeltrame commented Dec 29, 2025

@diegomura

I think I was wrong in this point, in reality it's even worse. With this new PR #3267, hyphen-less word breaking is not possible anymore (or if it is, I may misunderstand your changes).

By "hyphen-less" word breaking here you mean essentially Requirement 2 right? Allowing some words (like urls) to break, but not add a hyphen char at the end of the line. If so, I agree #3267 dropped this ability and we need it back. How to get there I feel is where we might have different opinions here :)

Agreed.

I agree that would be ideal, but applications might have different views on which characters or character combinations should be allowed to break.

Lots of examples you list here are already possible with #3267, like not breaking a company name (for which hyphenation custom fn shuold just return 1 syllable. ex Google => [Google]) or breaking at specific chars like underscore _ (for which hyphenation custom fn should split syllables accordingly too).

Yes, these are already solvable with #3267, but I wasn't listing impossible cases here, I was listing reasons why react-pdf can't ship with the one and only perfect hyphenation algorithm for all use cases. On the other hand, the GitHub issues #1642, #2456, #2564, #1380, #1416 and #1662 depend on requirement 2 to be solved. I think we are on the same page here.

Again, and correct me if I'm wrong, the question is how we give users control to break in a specific place without engine adding a hyphen at the end. Let's say we find a way to achieve this, would you think something else is missing?

Yes, agreed, this is the only feature that is effectively missing. Once we have a solution for this, we're good.

If the last character on a line is a soft hyphen character, react-pdf inserts an extra hyphen, otherwise it does not

I feel this is an incorrect use of soft-hyphen, and probably what got me confused about #3188 in the first place. Soft-hyphen unicode char purpose is to mark where a long word may be broken across lines if line-wrapping is needed, not to flag the engine wether a visible hyphen should be added or not. From what I get from your last part of your response you acknowledge this but you are fine with it.

Well, according to the Wikipedia text, it's "for the purpose of breaking words across lines by inserting visible hyphens if they fall on the line end but remain invisible within the line". The hyphen npm package used in react-pdf also simply inserts a soft hyphen in between syllables (where a break WITH added hyphen could occur) and inserts nothing on whitespace (where a break WITHOUT added hyphen could occur). I personally know the soft hyphen from word processors, where you can usually insert a soft hyphen using Ctrl+-, always with the effect of an added hyphen if the word happens to break at that point. I have never seen soft hyphens used in a way that would not add a hyphen when breaking, or only sometimes add a hyphen when breaking, so for me a soft hyphen seemed like an almost perfect fit for the job.
But if my proposed usage of the soft hyphen does not feel as intuitive to you, it probably won't be intuitive for many other devs. We can gladly use another, more explicit system, e.g. with your proposed hyphen: '' syntax. The only downside is, if we still support the rare soft hyphen passed into a <Text> node, I'm not yet sure how much we save on complexity here.

Going back to my original point of #3188 exposing soft-hyphen as a public api (1st sub-point of the PR description) to which you said it didn't, after reading your potential docs I believe even stronger that it does. Not only that but we (react-pdf) are adding a custom meaning to soft-hyphens which is non-standard (at least to my knowledge). Now if users do not add this somehow mysterious (to their eyes) unicode char at the end of each syllable they won't see the line wrapped like they expect 99% of the time which is with a -.

In general I wanna think on maximizing DX for the common case, which in this case is hyphenating with a -. Those users should not have to worry about anything but to return the corresponding syllables in the callback. #3188 however asks each user to add a soft-hyphen char to each item in the array every time, just for very few cases where visible hyphen is not desired. And I don't really like that 😄

As detailed in the documentation proposal, most devs (for english language apps, or for apps using the npm hyphen library for syllable splitting) wouldn't need to worry about inserting the soft hyphens themselves, because react-pdf and the hyphen library already insert it for them. But of course, more explicit might be more clear here.

So how to still support Requirement 2? I can think on some solutions:

1. Allow to set hyphenation char for each word, optionally empty:

const hyphenationCallback = (word) => {
  const syllables = hyphen(word).split(SOFT_HYPHEN)
  return {  syllables, hyphen: '' } // Normally this would be `-`
}

Pros: semantic, backwards compatible (we can still support returning an array of strings for the most common cases) Cons: Does not support different hyphen chars inside the same word. I feel this is probably not needed, but if so, we can support each syllable to define it's own hyphen

Multiple different hyphen chars inside the same word seems essential to me. As an example, the compound term "yellowish-blue" must be hyphenated as [{ syllable: 'yel', hyphen: '-' }, { syllable: 'low', hyphen: '-' }, { syllable: 'ish-', hyphen: '' }, { syllable: 'blue', hyphen: '' }]. And I already hear you saying, well, we can add an exception to never add a hyphen when there is already a hyphen, and inside URLs, and inside UUIDs, cryptographic public keys, and after emoji, and so on and so forth. But there will always be more technical identifiers, possibly proprietary ones, and then the rules might differ in other languages such as asian languages. I just think handling all this in an opinionated manner will add so many edge cases and maintenance effort to react-pdf.
However, given per-syllable hyphen specifications, the code for a custom hyphenation callback for splitting at dashes, periods and slashes will become more complicated:

Font.registerHyphenationCallback(
  (word, syllables) => syllables(word).flatMap(({ syllable, hyphen }) => {
    const parts = syllable.split(/(?<=[-./])/)
    return parts.map((part, i) => ({ syllable: part, hyphen: (i === parts.length - 1) ? hyphen : '' }))
  })
)

Technically, another advantage of this approach is that it meets more closely the feature asked for in #2456 and #1416, although I suspect nobody actually wants an end-of-line character other than -.

2. Accept unicode for non hyphenated words:

Similar approach but opposite direction: user doesn't want a - at the end of a syllable, add a special unicode char. This shouldn't be soft-hyphen. ChatGPT suggests ZERO WIDTH SPACE — U+200B

Pros: Simpler return type of hyphenation callback (not sure) Cons: I feel unicode chars make a bad interface. Will also mean having to do glyph manipulation, etc

Yes, seems similar to my proposal, with similar downsides.
It's important to make sure the zero width spaces or other unicode symbols do not end up in the final PDF. Otherwise, copy pasting the text out of the PDF could yield unexpected results. A browser may be smart enough to filter zero width spaces out of a URL pasted into it, but other programs consuming URLs or UUIDs or other identifiers may not.

For completeness sake, here is an implementation of a custom hyphenation callback in this scenario, as it might look in the future react-pdf documentation:

const ZERO_WIDTH_SPACE = '\u200B'
Font.registerHyphenationCallback(
  (word, syllables) => syllables(word).flatMap(part => part.split(/(?<=[-./])/).map(p => p + ZERO_WIDTH_SPACE))
)

Regardless, I think things like URLs where common case is not to add - we can handle them internally so user doesn't have to worry about it

As a dev using react-pdf, I welcome good defaults where appropriate. But I'd prefer customizability over opinionated defaults. As detailed above, I don't think react-pdf can and should handle all the eventualities. I would rather focus on offering the devs the tools to implement the domain-specific correct rules (and also override the default rules for URLs if possible!)

@diegomura
Copy link
Owner Author

diegomura commented Dec 29, 2025

Thanks for the quick responses here @carlobeltrame :)

The only downside is, if we still support the rare soft hyphen passed into a node, I'm not yet sure how much we save on complexity here.

I'm not sure I see why. Soft hyphen passed into a Text node would still be a valid way for consumer to instruct engine where he wants potential word breaks to be in.

And I already hear you saying, well, we can add an exception to never add a hyphen when there is already a hyphen, and inside URLs, and inside UUIDs, #2456, and after emoji, and so on and so forth. But there will always be more technical identifiers, possibly proprietary ones, and then the rules might differ in other languages such as #1662. I just think handling all this in an opinionated manner will add so many edge cases and maintenance effort to react-pdf.

Fair. And kinda agree. I don't think we should add defaults for a bunch of things. Just that we can add some of these handy defaults to the built-in hyphenation function so less people have the need to implement their own custom fn. Many people will still need, and for those we should give them all tools needed to customize hyphenation as they want. So about "But I'd prefer customizability over opinionated defaults", both are possible from my point of view. But happy to put a pin on defaults at the moment.

However, given per-syllable hyphen specifications, the code for a custom hyphenation callback for splitting at dashes, periods and slashes will become more complicated:

Agree. For those wanting fine control over hyphenation, code will be more complex. But it's also important to note that for those that don't need it, code will remain simpler. If we go for this spec, all these should be possible:

No special handling:
Probably apply basic set of defaults

Font.registerHyphenationCallback(word => 
  hyphen(word).split(SOFT_HYPHEN)
)

Define hyphen at word level:
I believe should be sufficient for most requirements, like not hyphenating URLs, UUIDs or any custom value.

Font.registerHyphenationCallback(word => (
  { syllables: hyphen(word).split(SOFT_HYPHEN)), hyphen: isUrl(word) ? '' : '-' }
))

All control:
For those who want to really go down the rabbit hole to control every single syllable

Font.registerHyphenationCallback(word => 
  hyphen(word).split(SOFT_HYPHEN).map(syllable => (
    { syllable, hyphen: '<computed>' }
  )
))

I like that it get's more complex as you gain more control, which kinda makes sense.

re/ (i === parts.length - 1) ? hyphen : '', I don't think that will be necessary as engine does not add penalties at the end of words as far as I remember. Might be wrong here, need to check. But if it is, will only be for those seeking great control.

I also thought about #2456 and #1416, and if the goal is to give users all control in the word, it's a very nice to have.

About implementation, it should be very straightforward as we can internally normalize to { syllable: string, hyphen: string | number } internally early in the process and forget that API even supports multiple schemas.

I'm the one now that can hear you saying that with trailing soft-hyphens you can control all of these with a single API 😄 And agree. But going back to my initial point, we would also be penalizing simple requirements users DX (not mentioning won't be possible to change - default).

Something important to me as well about implementation, there's no need at all to recompute all glyphs to remove soft-hyphens. I did not mentioned it before as it felt less important, but I didn't really liked this import (conceptually speaking, I don't want engines importing layout steps, the dependency between these is the opposite), plus any perf issue this might carry. Textkit is already too complex and sometimes expensive to have "re-layouts" internally if they can be avoided

I'm not saying it's perfect, but measuring pros and cons I feel it's the best path to follow given the info I have. Still, I value your opinion and thoughts, so please throw what you think about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants