fix: imports encoded in utf-16 break DocxZipper#860
Merged
harbournick merged 1 commit intomainfrom Sep 3, 2025
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR fixes an encoding issue in DocxZipper where XML files encoded in UTF-16 would break the parser. The fix introduces comprehensive encoding detection and handling utilities.
- Added encoding detection utilities for UTF-8, UTF-16LE, and UTF-16BE with BOM support
- Replaced string-based ZIP entry extraction with byte-level extraction and proper decoding for XML files
- Added comprehensive test coverage for various encoding scenarios
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/super-editor/src/core/encoding-helpers.js | New utility module with encoding detection, BOM handling, and XML string normalization functions |
| packages/super-editor/src/core/encoding-helpers.test.js | Comprehensive test suite covering all encoding scenarios and utility functions |
| packages/super-editor/src/core/DocxZipper.js | Updated to use encoding helpers for proper XML file extraction from ZIP archives |
| packages/super-editor/src/core/DocxZipper.test.js | Added integration test for UTF-16LE XML handling in DOCX files |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
harbournick
pushed a commit
that referenced
this pull request
Sep 3, 2025
# [0.16.0-next.7](v0.16.0-next.6...v0.16.0-next.7) (2025-09-03) ### Bug Fixes * imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24))
Collaborator
Author
|
🎉 This PR is included in version 0.16.0-next.7 🎉 The release is available on: Your semantic-release bot 📦🚀 |
harbournick
pushed a commit
that referenced
this pull request
Sep 3, 2025
## [0.16.1](v0.16.0...v0.16.1) (2025-09-03) ### Bug Fixes * add safety check for clipboard usage ([#859](#859)) ([bfca96e](bfca96e)) * correct syntax in release workflow for semantic-release command ([3e6376e](3e6376e)) * dispatch tracked changes transaction only once at import ([31ecec7](31ecec7)) * imports encoded in utf-16 break DocxZipper ([6d09115](6d09115)) * imports encoded in utf-16 break DocxZipper ([9bc488d](9bc488d)) * imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24)) * semantic release range ([505e27b](505e27b)) * update release naming pattern in .releaserc.json for better version matching ([1fda655](1fda655))
Collaborator
Author
|
🎉 This PR is included in version 0.16.1 🎉 The release is available on: Your semantic-release bot 📦🚀 |
harbournick
pushed a commit
that referenced
this pull request
Sep 9, 2025
# [0.16.0](v0.15.18...v0.16.0) (2025-09-09) ### Bug Fixes * add processing for line-height defined in px ([#880](#880)) ([3b61275](3b61275)) * add safety check for clipboard usage ([#859](#859)) ([bfca96e](bfca96e)) * additional fixes to list indent/outdent, split list, toggle list, types and more tests ([02e6cd9](02e6cd9)) * backspaceNextToList, toggleList and tests ([8b33258](8b33258)) * closing dropdown after clicking again ([#835](#835)) ([88ff88d](88ff88d)) * correct syntax in release workflow for semantic-release command ([3e6376e](3e6376e)) * createNewList in input rule to fix new list in tables, lint ([aa79655](aa79655)) * definition possibly missing name key, add jsdoc ([bb714f1](bb714f1)) * dispatch tracked changes transaction only once at import ([31ecec7](31ecec7)) * do not deploy next on oracle or yjs changes ([a02cf33](a02cf33)) * highlight selected value in font dropdowns ([#869](#869)) ([4a30f59](4a30f59)) * images are missing for the document in edit mode ([#831](#831)) ([a9af47e](a9af47e)) * imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24)) * include package lock on tests folder ([#845](#845)) ([1409d02](1409d02)) * insertContentAt fails if new line characters (\n) inserted ([dd60d91](dd60d91)) * insertContentAt for html ([f6c53d3](f6c53d3)) * inserting html with heading tags does not render as expected (HAR-10430) ([#874](#874)) ([bba5074](bba5074)) * install http server ([#846](#846)) ([1a6e684](1a6e684)) * **internal:** remove pdfjs from build ([#843](#843)) ([021b2c1](021b2c1)) * japanese list numbering ([#882](#882)) ([d256a48](d256a48)) * regex improvements ([ee0333b](ee0333b)) * remove footer line length breaking deployments ([04766cd](04766cd)) * restore stored marks if they exist ([#863](#863)) ([0a2860e](0a2860e)) * restore stored marks if they exist ([#863](#863)) ([1961e5f](1961e5f)) * splitListItem if there are images or other atom nodes in list item, fix tests ([#878](#878)) ([535390f](535390f)) * **table:** add support for table row w:cantSplit ([#890](#890)) ([3467ad5](3467ad5)) * test ([8572b8a](8572b8a)) * test ([65126fd](65126fd)) * test ([42cb383](42cb383)) * test next release ([c3ac7d0](c3ac7d0)) * toggle list ([770998a](770998a)) * toggle list for multiple nodes and active selection ([69b3a1b](69b3a1b)) * toggle list inside tables ([091df80](091df80)) * update condition checks for screenshot updates in CI workflow ([e17fdf0](e17fdf0)) ### Features * add custom toolbar button example (HAR-10436) ([#868](#868)) ([c4fd4d5](c4fd4d5)) * add support for paragraph borders ([#862](#862)) ([2f98c07](2f98c07)) * begin v0.18 development ([ed5030f](ed5030f)) * enable dispatching example apps tests ([#844](#844)) ([8b2bc73](8b2bc73)) * filter out ooxml tags cli to highest priority namespaces ([23b1efa](23b1efa)) * ignore specific docx nodes during import ([#909](#909)) ([0a99a09](0a99a09))
harbournick
pushed a commit
that referenced
this pull request
Sep 9, 2025
# [0.16.0](v0.15.18...v0.16.0) (2025-09-09) ### Bug Fixes * add processing for line-height defined in px ([#880](#880)) ([3b61275](3b61275)) * add safety check for clipboard usage ([#859](#859)) ([bfca96e](bfca96e)) * additional fixes to list indent/outdent, split list, toggle list, types and more tests ([02e6cd9](02e6cd9)) * backspaceNextToList, toggleList and tests ([8b33258](8b33258)) * closing dropdown after clicking again ([#835](#835)) ([88ff88d](88ff88d)) * correct syntax in release workflow for semantic-release command ([3e6376e](3e6376e)) * createNewList in input rule to fix new list in tables, lint ([aa79655](aa79655)) * definition possibly missing name key, add jsdoc ([bb714f1](bb714f1)) * dispatch tracked changes transaction only once at import ([31ecec7](31ecec7)) * do not deploy next on oracle or yjs changes ([a02cf33](a02cf33)) * highlight selected value in font dropdowns ([#869](#869)) ([4a30f59](4a30f59)) * images are missing for the document in edit mode ([#831](#831)) ([a9af47e](a9af47e)) * imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24)) * include package lock on tests folder ([#845](#845)) ([1409d02](1409d02)) * insertContentAt fails if new line characters (\n) inserted ([dd60d91](dd60d91)) * insertContentAt for html ([f6c53d3](f6c53d3)) * inserting html with heading tags does not render as expected (HAR-10430) ([#874](#874)) ([bba5074](bba5074)) * install http server ([#846](#846)) ([1a6e684](1a6e684)) * **internal:** remove pdfjs from build ([#843](#843)) ([021b2c1](021b2c1)) * japanese list numbering ([#882](#882)) ([d256a48](d256a48)) * regex improvements ([ee0333b](ee0333b)) * remove footer line length breaking deployments ([04766cd](04766cd)) * restore stored marks if they exist ([#863](#863)) ([0a2860e](0a2860e)) * restore stored marks if they exist ([#863](#863)) ([1961e5f](1961e5f)) * splitListItem if there are images or other atom nodes in list item, fix tests ([#878](#878)) ([535390f](535390f)) * **table:** add support for table row w:cantSplit ([#890](#890)) ([3467ad5](3467ad5)) * test ([8572b8a](8572b8a)) * test ([65126fd](65126fd)) * test ([42cb383](42cb383)) * test next release ([c3ac7d0](c3ac7d0)) * toggle list ([770998a](770998a)) * toggle list for multiple nodes and active selection ([69b3a1b](69b3a1b)) * toggle list inside tables ([091df80](091df80)) * update condition checks for screenshot updates in CI workflow ([e17fdf0](e17fdf0)) ### Features * add custom toolbar button example (HAR-10436) ([#868](#868)) ([c4fd4d5](c4fd4d5)) * add support for paragraph borders ([#862](#862)) ([2f98c07](2f98c07)) * begin v0.18 development ([ed5030f](ed5030f)) * enable dispatching example apps tests ([#844](#844)) ([8b2bc73](8b2bc73)) * filter out ooxml tags cli to highest priority namespaces ([23b1efa](23b1efa)) * ignore specific docx nodes during import ([#909](#909)) ([0a99a09](0a99a09)) * new release cycle after version sync ([eb9684a](eb9684a))
harbournick
pushed a commit
that referenced
this pull request
Sep 9, 2025
# [0.16.0](v0.15.18...v0.16.0) (2025-09-09) ### Bug Fixes * add processing for line-height defined in px ([#880](#880)) ([3b61275](3b61275)) * add safety check for clipboard usage ([#859](#859)) ([bfca96e](bfca96e)) * additional fixes to list indent/outdent, split list, toggle list, types and more tests ([02e6cd9](02e6cd9)) * backspaceNextToList, toggleList and tests ([8b33258](8b33258)) * closing dropdown after clicking again ([#835](#835)) ([88ff88d](88ff88d)) * correct syntax in release workflow for semantic-release command ([3e6376e](3e6376e)) * createNewList in input rule to fix new list in tables, lint ([aa79655](aa79655)) * definition possibly missing name key, add jsdoc ([bb714f1](bb714f1)) * dispatch tracked changes transaction only once at import ([31ecec7](31ecec7)) * do not deploy next on oracle or yjs changes ([a02cf33](a02cf33)) * highlight selected value in font dropdowns ([#869](#869)) ([4a30f59](4a30f59)) * images are missing for the document in edit mode ([#831](#831)) ([a9af47e](a9af47e)) * imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24)) * include package lock on tests folder ([#845](#845)) ([1409d02](1409d02)) * insertContentAt fails if new line characters (\n) inserted ([dd60d91](dd60d91)) * insertContentAt for html ([f6c53d3](f6c53d3)) * inserting html with heading tags does not render as expected (HAR-10430) ([#874](#874)) ([bba5074](bba5074)) * install http server ([#846](#846)) ([1a6e684](1a6e684)) * **internal:** remove pdfjs from build ([#843](#843)) ([021b2c1](021b2c1)) * japanese list numbering ([#882](#882)) ([d256a48](d256a48)) * regex improvements ([ee0333b](ee0333b)) * remove footer line length breaking deployments ([04766cd](04766cd)) * restore stored marks if they exist ([#863](#863)) ([0a2860e](0a2860e)) * restore stored marks if they exist ([#863](#863)) ([1961e5f](1961e5f)) * splitListItem if there are images or other atom nodes in list item, fix tests ([#878](#878)) ([535390f](535390f)) * **table:** add support for table row w:cantSplit ([#890](#890)) ([3467ad5](3467ad5)) * test ([8572b8a](8572b8a)) * test ([65126fd](65126fd)) * test ([42cb383](42cb383)) * test next release ([c3ac7d0](c3ac7d0)) * toggle list ([770998a](770998a)) * toggle list for multiple nodes and active selection ([69b3a1b](69b3a1b)) * toggle list inside tables ([091df80](091df80)) * update condition checks for screenshot updates in CI workflow ([e17fdf0](e17fdf0)) ### Features * add custom toolbar button example (HAR-10436) ([#868](#868)) ([c4fd4d5](c4fd4d5)) * add support for paragraph borders ([#862](#862)) ([2f98c07](2f98c07)) * begin v0.18 development ([ed5030f](ed5030f)) * enable dispatching example apps tests ([#844](#844)) ([8b2bc73](8b2bc73)) * filter out ooxml tags cli to highest priority namespaces ([23b1efa](23b1efa)) * ignore specific docx nodes during import ([#909](#909)) ([0a99a09](0a99a09))
harbournick
pushed a commit
that referenced
this pull request
Sep 9, 2025
# [0.16.0](v0.15.18...v0.16.0) (2025-09-09) ### Bug Fixes * add processing for line-height defined in px ([#880](#880)) ([3b61275](3b61275)) * add safety check for clipboard usage ([#859](#859)) ([bfca96e](bfca96e)) * additional fixes to list indent/outdent, split list, toggle list, types and more tests ([02e6cd9](02e6cd9)) * backspaceNextToList, toggleList and tests ([8b33258](8b33258)) * closing dropdown after clicking again ([#835](#835)) ([88ff88d](88ff88d)) * correct syntax in release workflow for semantic-release command ([3e6376e](3e6376e)) * createNewList in input rule to fix new list in tables, lint ([aa79655](aa79655)) * definition possibly missing name key, add jsdoc ([bb714f1](bb714f1)) * dispatch tracked changes transaction only once at import ([31ecec7](31ecec7)) * do not deploy next on oracle or yjs changes ([a02cf33](a02cf33)) * highlight selected value in font dropdowns ([#869](#869)) ([4a30f59](4a30f59)) * images are missing for the document in edit mode ([#831](#831)) ([a9af47e](a9af47e)) * imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24)) * include package lock on tests folder ([#845](#845)) ([1409d02](1409d02)) * insertContentAt fails if new line characters (\n) inserted ([dd60d91](dd60d91)) * insertContentAt for html ([f6c53d3](f6c53d3)) * inserting html with heading tags does not render as expected (HAR-10430) ([#874](#874)) ([bba5074](bba5074)) * install http server ([#846](#846)) ([1a6e684](1a6e684)) * **internal:** remove pdfjs from build ([#843](#843)) ([021b2c1](021b2c1)) * japanese list numbering ([#882](#882)) ([d256a48](d256a48)) * regex improvements ([ee0333b](ee0333b)) * remove footer line length breaking deployments ([04766cd](04766cd)) * restore stored marks if they exist ([#863](#863)) ([0a2860e](0a2860e)) * restore stored marks if they exist ([#863](#863)) ([1961e5f](1961e5f)) * splitListItem if there are images or other atom nodes in list item, fix tests ([#878](#878)) ([535390f](535390f)) * **table:** add support for table row w:cantSplit ([#890](#890)) ([3467ad5](3467ad5)) * test ([8572b8a](8572b8a)) * test ([65126fd](65126fd)) * test ([42cb383](42cb383)) * test next release ([c3ac7d0](c3ac7d0)) * toggle list ([770998a](770998a)) * toggle list for multiple nodes and active selection ([69b3a1b](69b3a1b)) * toggle list inside tables ([091df80](091df80)) * update condition checks for screenshot updates in CI workflow ([e17fdf0](e17fdf0)) ### Features * add custom toolbar button example (HAR-10436) ([#868](#868)) ([c4fd4d5](c4fd4d5)) * add support for paragraph borders ([#862](#862)) ([2f98c07](2f98c07)) * begin v0.18 development ([ed5030f](ed5030f)) * enable dispatching example apps tests ([#844](#844)) ([8b2bc73](8b2bc73)) * filter out ooxml tags cli to highest priority namespaces ([23b1efa](23b1efa)) * ignore specific docx nodes during import ([#909](#909)) ([0a99a09](0a99a09)) * new release cycle after version sync ([eb9684a](eb9684a))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.