Python Image-Label dataset Generator for OCR
python3 generate.py --lang ug --count 100 --out-dir data/This command will output 100 images into folder data/images/, filename pattern is 'word_{}.jpg'.format(line_num), exmaple:
data/images/word_1.jpg
data/images/word_2.jpg
...
data/images/word_100.jpg
and a gt.txt file, its content pattern is '{}\t{}'.format(filepath, word), like below:
data/images/word_1.jpg ئانا
data/images/word_2.jpg تىلىم
...
data/images/word_100.jpg گۈللە
- ug - Uyghur (Uighur)
- other langs may will come
- How use your own corpus?
Ref: #2
- Uyghur words are separated in image?
Ref: #2
python3 test.py- Ubuntu 18.04.1
- Python 3.6.9
Salam Hiyali
Feel free
MIT