Add support for encodings other than UTF-8#36497
Add support for encodings other than UTF-8#36497EuclidDivisionLemma wants to merge 43 commits intozed-industries:mainfrom
Conversation
02890f2 to
4f7a563
Compare
3b20229 to
ce0128c
Compare
f53e006 to
4f0bfa6
Compare
0d9a756 to
dbf899a
Compare
|
I wonder why use |
|
Well, I considered it, but eventually decided against it as the docs explicitly states
|
|
The main issue is that it is unmaintained, buggy and legacy, so I think a more mordern crate would be better if you don't want to fork and maintain it. |
|
Yes, you're right. I'll definitely look into it. Also, what do you think about using ICU. Surely, it is more mature and has multi-threaded support. There is a C Version of the library, ICU4C, and also the Rust version is a part of the ICU4X project. |
|
|
`InvalidBufferView`
encoding instead of replacing the invalid bytes with replacement characters - Add `encoding` field in `Workspace`
- Pass encoding to `ProjectRegistry::open_path` and set the `encoding` field in `Project`
- Remove the parameter from `BufferStore::open_buffer` as it is not needed
now open the file in the chosen encoding if it is valid or show the invalid screen again if not. (UTF-16 files aren't being handled correctly as of now)
bytes replaced with replacement characters - Fix UTF-16 file handling - Introduce a `ForceOpen` action to allow users to open files despite encoding errors - Add `force` and `detect_utf16` flags - Update UI to provide "Accept the Risk and Open" button for invalid encoding files
associated file was in a different encoding, rather than showing an error.
choosing the correct encoding from `InvalidBufferView`
UI. The `encodings_ui` crate will only have UI related components in the future.
`encodings` - `EncodingWrapper` is replaced with `encodings::Encoding`
re-opened, while retaining the text. - Fix an issue that prevented `InvalidBufferView` from being shown when an incorrect encoding was chosen from the status bar. - Centre the error message in `InvalidBufferView`.
- Implement `From` for `Encoding` and `Clone` for `EncodingOptions`
field of `Buffer`
- Add a licence symlink to `encodings`
4a75b86 to
4330e5f
Compare
|
I finally got some time to sit down and make some significant changes. They are here: 7e22d05 because for some reason I am not able to push to your fork directly. The biggest change is to remove the mutexes and have methods return the detected encoding along with the string; but I also tried to make the BOM handling safer (so Zed will not silently remove the BOM). Next steps;
Thanks again for all your work on this. If you get time to build some of this, that would be great; otherwise I'll try and pick it up when I get some time. |
|
Hey! I'm going to close this PR for now, as I realistically don't have the time to get this merged right now. If you'd like to build on this again, I'd love something more like 7e22d05 where we can avoid the shared mutable state; but there's still a lot of details to iron out. Thank you again for your contributions here, and I hope to work with you again the next time! |
|
I'm sorry that i couldn't respond to the last comment immediately as I am caught up with something else. I really wish to make further contributions. I understand that significant portions of the code have been changed. But I still wish to be a part of it as it matters to me. I can try, if you could tell exactly what you want me to work on. |
|
Amazing thank you! I want to take the approach more like this commit: 7e22d05 (with no Mutexes that allow state to change implicitly) and flesh out the rest of the functionality to make sure it's working). The other question back of mind for me is about UTF-8 files that start with a zero-width space. Should we interpret that as a BOM and hide it from the editor (as my commit did) or should we assume that actually, very few people use UTF-8 BOM's, and just pass this to the editor as a file that starts with a zero-width space?. The final change that I want to make is to use a fallback "binary" encoding instead of the existing "open anyway" option. I am not sure that encodings.rs provides one, but I think it would be reasonable to map bytes in the range |
|
@ConradIrwin |
Add the ability to open and save files in different encodings. Closes #16965