Skip to content

Conversation

@DerkaouiAnas
Copy link

MODEXPW-508 - Add UTF-8 BOM to files written to S3/Minio

Purpose

The purpose of this change is to address issues with Excel not properly displaying Arabic characters in CSV files exported from our system. By adding a UTF-8 Byte Order Mark (BOM) to the beginning of these files, we ensure that applications like Excel correctly recognize the file encoding as UTF-8, thus displaying Arabic characters properly.
Related JIRA issue: https://issues.folio.org/browse/MODEXPW-508

Approach

To implement this change, we've modified the write method in our S3/Minio file writing utility. The approach involves:

  • Defining the UTF-8 BOM as a byte array at the beginning of the method.
  • Using a ByteArrayOutputStream to combine the BOM with the original file content.
  • Writing the combined byte array (BOM + original content) to S3 or Minio.

This change addresses issues with Excel not properly displaying Arabic
characters in CSV files. By adding the BOM, we ensure that applications
like Excel correctly recognize the file encoding as UTF-8.
@CLAassistant
Copy link

CLAassistant commented Sep 17, 2024

CLA assistant check
All committers have signed the CLA.

* @throws IOException - if an I/O error occurs
*/
public String write(String path, byte[] bytes, Map<String, String> headers) throws IOException {
byte[] bom = {(byte)0xEF, (byte)0xBB, (byte)0xBF};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it makes sense to make this a constant instead of initializing the array in the method each time?
private static final byte[] UTF8_BOM = new byte[]{(byte) 0xEF, (byte) 0xBB, (byte) 0xBF};

@khandramai khandramai self-requested a review November 12, 2024 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants