Skip to content

Conversation

@avdoseferovic
Copy link

@avdoseferovic avdoseferovic commented Jan 16, 2026

⚠️ DISCLAIMER: This PR was mostly produced by AI, I did manual testing and verify that there is no panics with emoji usages.

Details Screenshot 2026-01-16 at 17 28 13

This PR implements full support for emojis and Unicode characters outside the BMP.
It includes:

  1. Data structure updates (map-based widths).
  2. CMAP Format 12 parsing.
  3. CID remapping strategy to handle characters > U+FFFF using Identity-H encoding.
  4. Updates to Text, CellFormat, write, and generateCIDFontMap to support this remapping.
  5. Backward compatibility for existing fonts and tests.

emoji.pdf

Changes:

- Change fontDefType.Cw and utf8FontFile.CharWidths from slice to map[int]int to support sparse and high unicode characters (fixing crash).

- Update utf8toutf16 to correctly handle 4-byte UTF-8 sequences using surrogate pairs.

- Add UnmarshalJSON to fontDefType to backward-compatibility with array-based font definitions.

- Remove hardcoded limit checks for character widths.
Changes:

- Implement CMAP Format 12 parsing in utf8fontfile.go.

- Implement CID remapping in fpdf.go to support characters outside BMP (e.g. Emojis).

- Add runeToCid map to fontDefType.

- Add helper methods stringToCIDs and getOrAssignCID.

- Update Text, CellFormat, and generateCIDFontMap to use CID remapping and correct width lookup.

- Update parseSymbols to use CIDs as keys for GID lookup.
@avdoseferovic avdoseferovic changed the title Refactor: Support emojis and high unicode characters refactor: support emojis and high unicode characters Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant