Following have been tested on Google Colab
-
Create
config.tomlfile with following content[MODEL] ORGANIZATION = "google" MODEL_NAME = "pix2struct-docvqa-base" MODELS_DIR = "models"
-
Install the requirements (Better to run in a virtual environment!)
pip install -r requirements.txt
-
Download and convert HF model to ONNX with quantization
python convert.py
-
Run the inference
Available Model Type:
available_models = { "HF_MODEL": Pix2StructHF, "ONNX_MODEL": Pix2StructOnnxWithoutPast, "ONNX_MODEL_WITH_PAST": Pix2StructOnnxWithPast, }
python inference.py \ --m <MODEL_TYPE> \ --i <PATH_TO_IMAGE_FILE> \ --q <QUESTION> \ --quantize [True/False (Default: False)]
See benchmarking results results.md