A web app that leverages Deep Learning to detect relavant chemical structures in patent documents.
This project was built during the course of my internship at iReadRx
Learn more by checking out the blog posts linked below.
I trained a YOLOv5 model on images (pdf pages) containing organic compound structures annotated by our chemistry team.
Transfer learning along with Hyper parameter evolution using a genetic algorithm provided great training results.
As YOLOv5 didn't perform well on the new dataset, Detectron 2 was used.
To get a local copy up and running follow these simple example steps.
git clone https://github.com/shashank524/patent_analysis.git
cd patent_analysisIf you want to train the models, or just learn more about how this project works hands on, colab would be the best place to do so. All the required colab notebooks are here.
The first few cells take care of the installation.
The Dockerfile by default builds the image to run both YOLO and Detectron 2.
If you want to use Detectron2 alone, or both YOLOv5 and Detectron2 together, run the following commands directly.
docker build -t compoundextractor .If you don't want to use Detectron2, comment the following lines in the Dockerfile. It is okay if you don't do so, doing this will only reduce the size of the docker image.
RUN pip install --user torch==1.9 torchvision==0.10 -f https://download.pytorch.org/whl/cu111/torch_stable.html
RUN pip install --user 'git+https://github.com/facebookresearch/fvcore'
RUN pip install cython pyyaml==5.1
RUN pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
RUN python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
Finally run the container:
docker run -p 8501:8501 compoundextractor:latestRun the following commands in your terminal to setup everything required without docker.
pip install -r requirements.txt
pip install --user torch==1.9 torchvision==0.10 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install --user 'git+https://github.com/facebookresearch/fvcore'
pip install cython pyyaml==5.1
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'Run the Streamlit app
streamlit run app.pyI have divided each part of this project from training to deployment into seperate colab notebooks.
Let's say you wanted to know why I chose to perform inference a certain way, you can look at the relavant colab notebook and perhaps find a better way to do the same thing.
- Table Detection Using Layout Parser
- Detectron2 vs. Yolov5 (Which One Suits Your Use Case Better?)
- Chemical Patent Analysis Beyond Simple OCR
- Visualizing Relationships between Chemicals and Patent Data
- Building Similar Patent Recommendations for Chemistry Patents
Contributions, issues, and feature requests are welcome!
Feel free to check the issues page.
Give a ⭐️ if you like this project!
This project is MIT licensed.






