🧠 Local Text Extraction from PDF using Tesseract OCR

This offline Python project extracts text from PDF documents using Tesseract OCR and pdf2image, without relying on any cloud API like IBM Watsonx.

📦 Features

Convert PDFs to images
Use Tesseract to extract text from each page
Save output as a structured JSON file
Fully local — no cloud credentials required

📂 Folder Structure

.
├── text_extraction_local.py       # Main script
├── requirements_local.txt         # Local-only dependencies
├── README.md                      # Project documentation
└── sample/                        # Input/output folder
    ├── input.pdf
    └── output.json

🔧 Installation & Setup

Install system tools
- Install Tesseract OCR
- Install poppler (required by pdf2image)
  - Windows: Poppler for Windows
  - macOS: brew install poppler
  - Linux: sudo apt install poppler-utils

Clone the repo & install dependencies

git clone https://github.com/yourusername/Local-PDF-Text-Extractor.git
cd Local-PDF-Text-Extractor
pip install -r requirements_local.txt

Run the script
```
python text_extraction_local.py
```

📈 Output Format (JSON)

Each PDF page is processed into:

[
  {
    "page": 1,
    "text": "Extracted text from page 1..."
  },
  ...
]

🔒 License

MIT License — Free to use and modify.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Local Text Extraction from PDF using Tesseract OCR

📦 Features

📂 Folder Structure

🔧 Installation & Setup

📈 Output Format (JSON)

🔒 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
input.pdf		input.pdf
output.json		output.json
requirements_local.txt		requirements_local.txt
text_extraction_local.py		text_extraction_local.py

License

ObliviousK0t/tesseract-pdf-parser

Folders and files

Latest commit

History

Repository files navigation

🧠 Local Text Extraction from PDF using Tesseract OCR

📦 Features

📂 Folder Structure

🔧 Installation & Setup

📈 Output Format (JSON)

🔒 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages