Which OCR model/configuration is used for Chinese handwriting recognition?

Hello InkSight Team,
First of all, thank you for sharing this incredible project! The ability to convert offline handwriting to digital ink is truly impressive.
I'm currently experimenting with the handwriting segmentation part of the pipeline, specifically for pages containing multi-language text. I've been testing with an image sample that includes Chinese, English, and French handwritten text (as shown below).

<img width="1243" height="410" alt="Image" src="https://github.com/user-attachments/assets/386cbef4-f320-4b78-9279-8e1154331700" />

I've noticed that when using the doctr option for segmentation, the ocr_predictor(pretrained=True) is called. This default predictor works wonderfully for segmenting the English and French text, but it seems to ignore the Chinese characters entirely, resulting in no bounding boxes for them.
My understanding is that the default doctr pretrained model is primarily trained on Latin scripts, which would explain this behavior.

![Image](https://github.com/user-attachments/assets/83ef0594-0869-40d4-b71a-229f49d033f5)

Could you please provide some guidance on what OCR engine, model architecture (det_arch, rec_arch), or specific pretrained weights you recommend or used internally to successfully segment pages that include handwritten Chinese characters?
For example, is there a specific rec_arch from the doctr model zoo that you found works best, or do you recommend a different OCR tool/API altogether for this task?
Any advice would be greatly appreciated. Thank you for your time and for this great contribution to the community!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Which OCR model/configuration is used for Chinese handwriting recognition? #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Which OCR model/configuration is used for Chinese handwriting recognition? #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions