Suggestions for "UnicodeEncodeError: 'utf-8' codec can't encode character"

I would be glad if someone gives me a suggestion.

I want to encode a big dictionary that contains text encoded in something different than `utf-8`. Does the library offer some option to handle this situation? Or must I change the data before trying to serialize it?

```python
  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 201, in encode_value
    buf.write(encode_string_element(name, value))
  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 170, in encode_string_element
    return b"\x02" + encode_cstring(name) + encode_string(value)
  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 125, in encode_string
    value = value.encode("utf-8")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce1' in position 13: surrogates not allowed
```

I read the [source code](https://github.com/py-bson/bson/blob/master/bson/codec.py#L127), and it seems to not offer any quick fix (something like `encode(errors="ignore")`.

Might the text be passing the condition?

```python
   if isinstance(value, text_type)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions for "UnicodeEncodeError: 'utf-8' codec can't encode character" #116

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suggestions for "UnicodeEncodeError: 'utf-8' codec can't encode character" #116

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions