PDF Extractor using Natural Language Processing
- Download the repository
- Install the requirements:
pip3 install -r requirements.txt - Load the language model for Spacy:
python3 -m spacy download en- Copy the PDF files to be cleaned into the directory "PDFs"
- Run the extraction tool:
python3 run.py - The output is written to the directory "output"