Add support for encodings other than UTF-8 by EuclidDivisionLemma · Pull Request #36497 · zed-industries/zed

EuclidDivisionLemma · 2025-08-19T15:17:27Z

Add the ability to open and save files in different encodings. Closes #16965

zed-industries-bot · 2025-08-29T17:25:17Z

	Warnings
⚠️	This PR is missing release notes. Please add a "Release Notes" section that describes the change: `Release Notes: - Added/Fixed/Improved ...` If your change is not user-facing, you can use "N/A" for the entry: `Release Notes: - N/A`

Generated by 🚫 dangerJS against 4330e5f

CrazyboyQCD · 2025-08-30T04:06:10Z

I wonder why use encoding instead of encoding_rs, since the former has not been developed for a long time.

EuclidDivisionLemma · 2025-08-30T04:35:32Z

@CrazyboyQCD

Well, I considered it, but eventually decided against it as the docs explicitly states

Both in terms of scope and performance, the focus is on the Web.

CrazyboyQCD · 2025-08-30T08:19:58Z

@EuclidDivisionLemma

The main issue is that it is unmaintained, buggy and legacy, so I think a more mordern crate would be better if you don't want to fork and maintain it.
https://rustsec.org/advisories/RUSTSEC-2021-0153.html

EuclidDivisionLemma · 2025-08-30T08:45:15Z

@CrazyboyQCD

Yes, you're right. I'll definitely look into it. Also, what do you think about using ICU. Surely, it is more mature and has multi-threaded support. There is a C Version of the library, ICU4C, and also the Rust version is a part of the ICU4X project.

https://docs.rs/icu/2.0.0/icu/
https://icu4x.unicode.org/

CrazyboyQCD · 2025-08-30T09:11:34Z

ICU4X is related with i18n and is not suitable for this.

one.

`InvalidBufferView`

encoding instead of replacing the invalid bytes with replacement characters - Add `encoding` field in `Workspace`

- Pass encoding to `ProjectRegistry::open_path` and set the `encoding` field in `Project`

- Remove the parameter from `BufferStore::open_buffer` as it is not needed

now open the file in the chosen encoding if it is valid or show the invalid screen again if not. (UTF-16 files aren't being handled correctly as of now)

bytes replaced with replacement characters - Fix UTF-16 file handling - Introduce a `ForceOpen` action to allow users to open files despite encoding errors - Add `force` and `detect_utf16` flags - Update UI to provide "Accept the Risk and Open" button for invalid encoding files

associated file was in a different encoding, rather than showing an error.

choosing the correct encoding from `InvalidBufferView`

UI. The `encodings_ui` crate will only have UI related components in the future.

`encodings` - `EncodingWrapper` is replaced with `encodings::Encoding`

re-opened, while retaining the text. - Fix an issue that prevented `InvalidBufferView` from being shown when an incorrect encoding was chosen from the status bar. - Centre the error message in `InvalidBufferView`.

- Implement `From` for `Encoding` and `Clone` for `EncodingOptions`

field of `Buffer`

- Add a licence symlink to `encodings`

ConradIrwin · 2025-11-04T08:07:26Z

@EuclidDivisionLemma

I finally got some time to sit down and make some significant changes. They are here: 7e22d05 because for some reason I am not able to push to your fork directly.

The biggest change is to remove the mutexes and have methods return the detected encoding along with the string; but I also tried to make the BOM handling safer (so Zed will not silently remove the BOM).

Next steps;

Fix the rendering of the selectors
Add actions to reopen/save with encoding to the editor that open the modal directly so you can get there from the command palette
Implement a setting to show/hide the character set indicator
Rebuild the "Take the risk" button using a fake Latin1 encoding (encodings_rs supports this in there mem module, but don't expose it as an encoding for some reason). When you open the character set selector we can show Latin1 (Binary) as one of the options and if that is the case open the file.
(maybe) Stop using a picker for save_or_open and make it use a menu on the status item instead (not sure about this).
Add some tests! I'm particularly worried about cases where you open a file in Zed and save it and we add or remove a BOM.
Consider what to do when you've edited a file to contain characters that cannot be represented in the encoding. Currently this writes HTML-escaped characters, but it might also just error. not sure what we want.

Thanks again for all your work on this. If you get time to build some of this, that would be great; otherwise I'll try and pick it up when I get some time.

ConradIrwin · 2025-11-21T04:20:48Z

Hey! I'm going to close this PR for now, as I realistically don't have the time to get this merged right now.

If you'd like to build on this again, I'd love something more like 7e22d05 where we can avoid the shared mutable state; but there's still a lot of details to iron out.

Thank you again for your contributions here, and I hope to work with you again the next time!

EuclidDivisionLemma · 2025-11-21T05:02:18Z

I'm sorry that i couldn't respond to the last comment immediately as I am caught up with something else. I really wish to make further contributions. I understand that significant portions of the code have been changed. But I still wish to be a part of it as it matters to me. I can try, if you could tell exactly what you want me to work on.

ConradIrwin · 2025-11-21T05:24:41Z

Amazing thank you!

I want to take the approach more like this commit: 7e22d05 (with no Mutexes that allow state to change implicitly) and flesh out the rest of the functionality to make sure it's working).

The other question back of mind for me is about UTF-8 files that start with a zero-width space. Should we interpret that as a BOM and hide it from the editor (as my commit did) or should we assume that actually, very few people use UTF-8 BOM's, and just pass this to the editor as a file that starts with a zero-width space?.

The final change that I want to make is to use a fallback "binary" encoding instead of the existing "open anyway" option. I am not sure that encodings.rs provides one, but I think it would be reasonable to map bytes in the range 0x80-0xff to the corresponding unicode character (\u80-\uff), and the same on the inverse.

EuclidDivisionLemma · 2025-11-24T03:44:40Z

@ConradIrwin
I have implemented the fallback encoding. Please look into it when you have time.

cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Aug 19, 2025

maxdeviant changed the title ~~Add support for non utf encodings~~ Add support for non-UTF encodings Aug 20, 2025

EuclidDivisionLemma force-pushed the add_support_for_non_utf_encodings branch 2 times, most recently from 02890f2 to 4f7a563 Compare August 23, 2025 14:37

EuclidDivisionLemma changed the title ~~Add support for non-UTF encodings~~ Add support for encodings other than UTF-8 Aug 23, 2025

EuclidDivisionLemma force-pushed the add_support_for_non_utf_encodings branch 10 times, most recently from 3b20229 to ce0128c Compare August 27, 2025 02:46

EuclidDivisionLemma marked this pull request as ready for review August 27, 2025 03:08

EuclidDivisionLemma force-pushed the add_support_for_non_utf_encodings branch 4 times, most recently from f53e006 to 4f0bfa6 Compare August 29, 2025 17:18

EuclidDivisionLemma force-pushed the add_support_for_non_utf_encodings branch 3 times, most recently from 0d9a756 to dbf899a Compare August 30, 2025 02:24

EuclidDivisionLemma marked this pull request as draft August 30, 2025 09:20

EuclidDivisionLemma added 20 commits November 1, 2025 11:30

Update tests in copilot.rs to match the new load method signature

8063144

Pass file path to EncodingSelector via Toggle action, if there is

13ea13b

one.

Add a call to open_abs_path to enable opening of files from

37754b0

`InvalidBufferView`

- Return an error if the file contains invalid bytes for the specified

44abaed

encoding instead of replacing the invalid bytes with replacement characters - Add `encoding` field in `Workspace`

- Add a field encoding in both Workspace and Project

183bff5

- Pass encoding to `ProjectRegistry::open_path` and set the `encoding` field in `Project`

- Add optional encoding parameter to Worktree::load_file

d515ddd

- Remove the parameter from `BufferStore::open_buffer` as it is not needed

Clicking on Choose another encoding and selecting an encoding should

0d3095a

now open the file in the chosen encoding if it is valid or show the invalid screen again if not. (UTF-16 files aren't being handled correctly as of now)

Fix an issue that caused a reopened buffer to use UTF-8 even if the

25c6af4

associated file was in a different encoding, rather than showing an error.

Fix an issue that caused the buffer to be in a modified state after

0e38704

choosing the correct encoding from `InvalidBufferView`

Create a new crate encodings that will have all that is not related to

1d95a18

UI. The `encodings_ui` crate will only have UI related components in the future.

- Move the functionality in fs::encodings to a seperate crate

8580683

`encodings` - `EncodingWrapper` is replaced with `encodings::Encoding`

- Fix an issue that caused UTF-8 to be used when a file was closed and

b2187e5

re-opened, while retaining the text. - Fix an issue that prevented `InvalidBufferView` from being shown when an incorrect encoding was chosen from the status bar. - Centre the error message in `InvalidBufferView`.

Remove calls to lock and unwrap as they are no longer needed

c130110

Fix conflicts

0e89634

Fix conflicts

19b06e5

Move the invalid encoding UI from project to workspace

08032bd

- Use EncodingOptions for parameters

0b942fe

- Implement `From` for `Encoding` and `Clone` for `EncodingOptions`

Use Buffer::update and Buffer::update_encoding to set the encoding

2e18a5b

field of `Buffer`

- Change the order in which cx and encoding appear

4330e5f

- Add a licence symlink to `encodings`

EuclidDivisionLemma force-pushed the add_support_for_non_utf_encodings branch from 4a75b86 to 4330e5f Compare November 1, 2025 09:23

ConradIrwin closed this Nov 21, 2025

SomeoneToIgnore mentioned this pull request Nov 24, 2025

Fix UTF-16 BE BOM detection #42786

Closed

SomeoneToIgnore mentioned this pull request Dec 14, 2025

Support opening and saving files with legacy encodings #44819

Merged

ConradIrwin added a commit to zed-industries/encodings-tests that referenced this pull request Dec 15, 2025

Import encoding tests from zed-industries/zed#36497

ad57f2d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for encodings other than UTF-8#36497

Add support for encodings other than UTF-8#36497
EuclidDivisionLemma wants to merge 43 commits intozed-industries:mainfrom
EuclidDivisionLemma:add_support_for_non_utf_encodings

EuclidDivisionLemma commented Aug 19, 2025 •

edited

Loading

Uh oh!

zed-industries-bot commented Aug 29, 2025 •

edited

Loading

Uh oh!

CrazyboyQCD commented Aug 30, 2025

Uh oh!

EuclidDivisionLemma commented Aug 30, 2025 •

edited

Loading

Uh oh!

CrazyboyQCD commented Aug 30, 2025 •

edited

Loading

Uh oh!

EuclidDivisionLemma commented Aug 30, 2025 •

edited

Loading

Uh oh!

CrazyboyQCD commented Aug 30, 2025

Uh oh!

ConradIrwin commented Nov 4, 2025

Uh oh!

ConradIrwin commented Nov 21, 2025

Uh oh!

EuclidDivisionLemma commented Nov 21, 2025 •

edited

Loading

Uh oh!

ConradIrwin commented Nov 21, 2025

Uh oh!

EuclidDivisionLemma commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

EuclidDivisionLemma commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zed-industries-bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CrazyboyQCD commented Aug 30, 2025

Uh oh!

EuclidDivisionLemma commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CrazyboyQCD commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EuclidDivisionLemma commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CrazyboyQCD commented Aug 30, 2025

Uh oh!

ConradIrwin commented Nov 4, 2025

Uh oh!

ConradIrwin commented Nov 21, 2025

Uh oh!

EuclidDivisionLemma commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ConradIrwin commented Nov 21, 2025

Uh oh!

EuclidDivisionLemma commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

EuclidDivisionLemma commented Aug 19, 2025 •

edited

Loading

zed-industries-bot commented Aug 29, 2025 •

edited

Loading

EuclidDivisionLemma commented Aug 30, 2025 •

edited

Loading

CrazyboyQCD commented Aug 30, 2025 •

edited

Loading

EuclidDivisionLemma commented Aug 30, 2025 •

edited

Loading

EuclidDivisionLemma commented Nov 21, 2025 •

edited

Loading