Configurations for tika server.
-
Useful links
- Tika-docker: https://github.com/apache/tika-docker/tree/main
- Tika images: https://hub.docker.com/r/apache/tika/tags
- Tika languages: https://github.com/tesseract-ocr/tessdata
- Python client: https://pypi.org/project/tika/
-
Configuration with tesseract OCR
Uses Tesseract class to do OCR on PDFs. Including languages during installation time. To add more languages download files from ... and update DOCKERFILE.
Build image with:
docker build -t tika-ocr-slk .
-
Other configs TBD