Skip to content

Suggestions for "UnicodeEncodeError: 'utf-8' codec can't encode character" #116

@boechat107

Description

@boechat107

I would be glad if someone gives me a suggestion.

I want to encode a big dictionary that contains text encoded in something different than utf-8. Does the library offer some option to handle this situation? Or must I change the data before trying to serialize it?

  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 201, in encode_value
    buf.write(encode_string_element(name, value))
  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 170, in encode_string_element
    return b"\x02" + encode_cstring(name) + encode_string(value)
  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 125, in encode_string
    value = value.encode("utf-8")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce1' in position 13: surrogates not allowed

I read the source code, and it seems to not offer any quick fix (something like encode(errors="ignore").

Might the text be passing the condition?

   if isinstance(value, text_type)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions