docai_parse_codes

Utility scripts for integrating with Google Cloud Document AI API and processing document data.
Built for medtax-ocr-prototype on GCP.

Let us know if we miss something to provide

Features

Extracts document data from PDFs/images using Document AI.
Processes and finalizes structured JSON output.
Supports local testing or running in Google Cloud Shell.
Includes a firestore writing process to store data.

How to Test Locally

Note: If you are using Google Cloud Shell, you can skip the setup section.

Use the following link to auto clone the repository to the Google Cloud Shell

Cloud Shell Clone

Note: Google Cloud Shell has a weekly quota and limits of 50 hours a week for usage.

Setup Google Cloud

Install Google Cloud SDK:
Download GoogleCloudSDKInstaller.exe
In your project folder, run:

gcloud init
gcloud auth application-default login

This opens a browser — sign in with the Google account that has project access. Set the project:

gcloud config set project medtax-ocr-prototype

If a browser didn't open, copy the link and paste it on your browser

Install Dependencies:

pip install -r requirements.txt

Testing Extraction

In extractor_caller.py, uncomment and update:

gcs_output_uri = "gs://practice_sample_training/docai/"
gcs_input_uri = "gs://run-sources-medtax-ocr-prototype-us-central1/4 form 2307 pictures.pdf"
input_mime_type = "application/pdf"

gcs_input_uri: path to the document you want to process.

gcs_output_uri: path where processed files will be saved.

To see the results: Uncomment print lines in handle_data.py to see results in your terminal. Or check the output file in your GCS bucket — files ending with _finalized.json contain extracted values.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
__pycache__		__pycache__
return		return
tests		tests
.env		.env
.gitignore		.gitignore
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
detect_mime_type.py		detect_mime_type.py
expense_receipt.py		expense_receipt.py
extractor_caller.py		extractor_caller.py
handle_data_2307.py		handle_data_2307.py
handle_data_expense.py		handle_data_expense.py
image_extract.py		image_extract.py
main.py		main.py
requirements.txt		requirements.txt
service_extractor.py		service_extractor.py
service_invoice_data_handler.py		service_invoice_data_handler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

docai_parse_codes

Features

How to Test Locally

Setup Google Cloud

Testing Extraction

Documentation links

About

Uh oh!

Releases

Packages

Languages

Cpaul777/docai_parse_codes

Folders and files

Latest commit

History

Repository files navigation

docai_parse_codes

Features

How to Test Locally

Setup Google Cloud

Testing Extraction

Documentation links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages