Conversation
ESC currently handles binary data by coercing binary values to a (potentially-invalid) strings. This works okay internally, since Go does not validate the wellformedness of strings when coercing from binary data. However, the standard Go JSON encoder ensures that all strings are [coerced to valid UTF-8 during marshaling](https://pkg.go.dev/encoding/json#Marshal), which replaces any invalid bytes with the replacement character. This corrupts the data. For example, the following environment should write the bytes `dcc5 43bc 1b4f f4fb 8263 d630 a199 2688 166b c084 5b71 1240 24c3 a944 d937 e2fc` to a temporary file: ```yaml values: files: MY_FILE: fn::fromBase64: 3MVDvBtP9PuCY9YwoZkmiBZrwIRbcRJAJMOpRNk34vw= ``` Instead, it writes the bytes `efbf bdef bfbd 43ef bfbd 1b4f efbf bdef bfbd efbf bd63 efbf bd30 efbf bdef bfbd 26ef bfbd 166b efbf bdef bfbd 5b71 1240 24c3 a944 efbf bd37 efbf bdef bfbd`, because the decoded value of `MY_FILE` is marshaled as `"��C�\u001bO���c�0��\u0026�\u0016k��[q\u0012@$éD�7��"` due to the presence of invalid UTF-8 bytes in the decoded value. This commit addresses these challenges through three related changes: 1. `[]byte` values are now allowed inside of `eval.value`. This allows binary data to be faithfully represented within the evaluator. 2. `[]byte` values are now allowed inside of `esc.Value`. If an `esc.Value` contains a `[]byte`, its contents are marshaled as a base64-encoded string and a new `binary` property is set to `true`. This allows binary data to be safely roundtripped without complicated encoding schemes. 3. The `files` stanza now supports binary data. If the value of an entry in `files` is binary data, it is written to the filesystem as-is. Strictly speaking this is a breaking change: prior to these changes, binary data--which can only be manufactured by a provider, rotator, or `fn::fromBase64`--would be marshaled as a plain old JSON string. With these changes, binary data is still marshaled as a JSON string, but its contents are now base64-encoded bytes. If users had binary data that happened to be valid UTF-8, that data will now be encoded using base64 during marshaling instead of being represented verbatim. If this is a major concern, we could change our marshaling behavior to only base64-encode if the binary data is not valid UTF-8. Fixes #480.
|
9ab531a is a bit questionable--might be that the better option is to change the test validation code s.t. it marshals the result of |
|
I'm uneasy about having ToJSON do that base64 encoding. If it returns the underlying []byte directly then it will end up encoding as base64 already during json.Marshal, so I think having it happen early is surprising, and I think it will break roundtripping since FromJSON won't convert it back to []byte. I think it should be unnecessary, because Value is able to roundtrip a []byte correctly thanks to the custom MarshalJSON/UnmarshalJSON implementation. I'd much prefer changing the test validation code to leverage this instead |
|
little note on the |
ESC currently handles binary data by coercing binary values to a (potentially-invalid) strings. This works okay internally, since Go does not validate the wellformedness of strings when coercing from binary data. However, the standard Go JSON encoder ensures that all strings are coerced to valid UTF-8 during marshaling, which replaces any invalid bytes with the replacement character. This corrupts the data.
For example, the following environment should write the bytes
dcc5 43bc 1b4f f4fb 8263 d630 a199 2688 166b c084 5b71 1240 24c3 a944 d937 e2fcto a temporary file:Instead, it writes the bytes
efbf bdef bfbd 43ef bfbd 1b4f efbf bdef bfbd efbf bd63 efbf bd30 efbf bdef bfbd 26ef bfbd 166b efbf bdef bfbd 5b71 1240 24c3 a944 efbf bd37 efbf bdef bfbd, because the decoded value ofMY_FILEis marshaled as"��C�\u001bO���c�0��\u0026�\u0016k��[q\u0012@$éD�7��"due to the presence of invalid UTF-8 bytes in the decoded value.This commit addresses these challenges through three related changes:
[]bytevalues are now allowed inside ofeval.value. This allows binary data to be faithfully represented within the evaluator.[]bytevalues are now allowed inside ofesc.Value. If anesc.Valuecontains a[]byte, its contents are marshaled as a base64-encoded string and a newbinaryproperty is set totrue. This allows binary data to be safely roundtripped without complicated encoding schemes.filesstanza now supports binary data. If the value of an entry infilesis binary data, it is written to the filesystem as-is.Strictly speaking this is a breaking change: prior to these changes, binary data--which can only be manufactured by a provider, rotator, or
fn::fromBase64--would be marshaled as a plain old JSON string. With these changes, binary data is still marshaled as a JSON string, but its contents are now base64-encoded bytes. If users had binary data that happened to be valid UTF-8, that data will now be encoded using base64 during marshaling instead of being represented verbatim. If this is a major concern, we could change our marshaling behavior to only base64-encode if the binary data is not valid UTF-8.Fixes #480.