extractpdf

A python package focused on extracting content out of PDF files.

There seems to be many options out there, but no single solution that is easy to install, even on Windows, and focus specifically on PDF files. So we have created this extractpdf package.

It is based on Textract structure, but focuses on PDF only, and adds also other tools to the pipline, such as PyPDF2 and Camelot.

Usage:

To use this package, install it from pypi using:

pip install extractpdf

Then use it like so:

import extractpdf as epdf

epdf.process('my_file.pdf')
epdf.process('http://www.example.com/some_file.pdf')

Development

We welcome contributers warmly!

For running this project locally, you need first to install the dependency packages. To install them, you can use pipenv:

Installation using pipenv (which combines virtualenv with pip)

Install pipenv

# if you haven't installed pip
sudo easy_install pip

# install pipenv
pip install pipenv

On MacOS - you can use homebrew:

brew install pipenv

Set the pipenv to be local in the project and then, install the packages and run the server

set PIPENV_VENV_IN_PROJECT=true 

 # install all packages
pipenv install

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.circleci		.circleci
.vscode		.vscode
extractpdf		extractpdf
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

extractpdf

Usage:

Development

Installation using pipenv (which combines virtualenv with pip)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

WeAreDevelopers-com/extractpdf

Folders and files

Latest commit

History

Repository files navigation

extractpdf

Usage:

Development

Installation using pipenv (which combines virtualenv with pip)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages