Chemical Structure Extraction

A web app that leverages Deep Learning to detect relavant chemical structures in patent documents.

This project was built during the course of my internship at iReadRx

Learn more by checking out the blog posts linked below.

Models Used

v1 (chemical structure detection):

Dataset

I trained a YOLOv5 model on images (pdf pages) containing organic compound structures annotated by our chemistry team.

Transfer learning along with Hyper parameter evolution using a genetic algorithm provided great training results.

Training Notebook

v2 (Distinguishing between reactions, relevant structures, intermediates, etc)

Dataset

As YOLOv5 didn't perform well on the new dataset, Detectron 2 was used.

Training Notebook

Getting Started

To get a local copy up and running follow these simple example steps.

git clone https://github.com/shashank524/patent_analysis.git
cd patent_analysis

Colab

If you want to train the models, or just learn more about how this project works hands on, colab would be the best place to do so. All the required colab notebooks are here.

The first few cells take care of the installation.

Docker

The Dockerfile by default builds the image to run both YOLO and Detectron 2.

If you want to use Detectron2 alone, or both YOLOv5 and Detectron2 together, run the following commands directly.

docker build -t compoundextractor .

If you don't want to use Detectron2, comment the following lines in the Dockerfile. It is okay if you don't do so, doing this will only reduce the size of the docker image.

RUN pip install --user torch==1.9 torchvision==0.10 -f https://download.pytorch.org/whl/cu111/torch_stable.html
RUN pip install --user 'git+https://github.com/facebookresearch/fvcore'
RUN pip install cython pyyaml==5.1
RUN pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
RUN python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Finally run the container:

docker run -p 8501:8501 compoundextractor:latest

Without Docker

Run the following commands in your terminal to setup everything required without docker.

pip install -r requirements.txt

pip install --user torch==1.9 torchvision==0.10 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install --user 'git+https://github.com/facebookresearch/fvcore'
pip install cython pyyaml==5.1
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Run the Streamlit app

streamlit run app.py

Results

How do you build on top of this?

I have divided each part of this project from training to deployment into seperate colab notebooks.

Let's say you wanted to know why I chose to perform inference a certain way, you can look at the relavant colab notebook and perhaps find a better way to do the same thing.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
colab_notebooks		colab_notebooks
compound_structures_data		compound_structures_data
data		data
detectron2_data		detectron2_data
models		models
new_dataset		new_dataset
src/assets		src/assets
utils		utils
.gitattributes		.gitattributes
Detector.py		Detector.py
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
data.yaml		data.yaml
detect.py		detect.py
hubconf.py		hubconf.py
requirements.txt		requirements.txt
v2_1000.pt		v2_1000.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chemical Structure Extraction

Models Used

v1 (chemical structure detection):

v2 (Distinguishing between reactions, relevant structures, intermediates, etc)

Getting Started

Colab

Docker

Without Docker

Results

How do you build on top of this?

Blog posts on this project

iReadRx Blog

🤝 Contributing

Show your support

Acknowledgements

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

shashank524/patent_analysis

Folders and files

Latest commit

History

Repository files navigation

Chemical Structure Extraction

Models Used

v1 (chemical structure detection):

v2 (Distinguishing between reactions, relevant structures, intermediates, etc)

Getting Started

Colab

Docker

Without Docker

Results

How do you build on top of this?

Blog posts on this project

iReadRx Blog

🤝 Contributing

Show your support

Acknowledgements

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages