Simplot

Simplot is a chart-based question-answering system leveraging pre-trained models for extracting tables from charts and answering questions. The project is divided into multiple phases, including data preparation, model training, and inference.

Dataset

Download the dataset from the following link:
ChartQA Dataset - HuggingFace Repository

File Descriptions

dataset.py: Prepares the dataset and generates positive and negative PNG samples.
preprocess.py: Processes the dataset to create inputs for model training.
main.py: Handles training of Phase 1 (Teacher Model) and Phase 2 (Student Model).
inference.py: Extracts tables from charts and generates predictions (saved as prediction.csv).
QA.py: Performs question answering using the Gemini model (results saved in qa_results.csv).

Model Training

Follow these steps to train the models:

Download the Dataset
Download the full dataset from the ChartQA repository.
Set Up the Environment
Create a Python environment using the dependencies listed in requirements.txt.
Run Preprocessing
Execute the preprocessing script:
(python preprocess.py)
Phase 1: Teacher Model Training
Train the teacher model using the following command:
(python main.py --phase 1)
Phase 2: Student Model Training
Train the student model by loading the best Phase 1 model state:
(python main.py --phase 2 --state_path './state/phase_1_best_model.pth' --lr 1e-5)

Inference

The pre-trained models are available in the state/ folder. To perform inference:

Set Up the Environment
Create the environment using requirements.txt.
Extract Tables from Charts
Run the following command to generate predictions:
(python inference.py ) Sample output can be found in result/prediction.csv.
Question Answering
Use the Gemini model for question answering:
(python QA.py --api_key 'your_api_key' --qa_type 'human') Results will be saved in result/qa_results.csv. A sample is already provided in the result/ folder.

Folder Structure

Simplot/ │ ├── dataset.py ├── preprocess.py ├── main.py ├── inference.py ├── QA.py ├── requirements.txt ├── state/ # Pre-trained model files │ ├── phase_2_best_model.pth │ ├── ... ├── result/ # Output files │ ├── prediction.csv # Sample table extraction results │ ├── qa_results.csv # Sample QA results └── data/ # Dataset folder

Notes

Ensure you have a valid API key for Gemini when running the QA phase.
Modify paths in the commands if your directory structure differs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
result		result
state		state
unichart		unichart
utils		utils
.gitignore		.gitignore
QA.py		QA.py
dataset.py		dataset.py
inference.py		inference.py
main.py		main.py
model.py		model.py
opencqa.py		opencqa.py
preprocess.py		preprocess.py
preprocess_opencqa.py		preprocess_opencqa.py
readme.md		readme.md
requirements.txt		requirements.txt
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simplot

Dataset

File Descriptions

Model Training

Inference

Folder Structure

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

niladrighosh03/Simplot-using-Gemini

Folders and files

Latest commit

History

Repository files navigation

Simplot

Dataset

File Descriptions

Model Training

Inference

Folder Structure

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages