Skip to content

This Python script allows users to search through PDF documents located in predefined directories for specific keywords. It uses PyPDF2 to extract text from PDFs and supports single or dual keyword searches.

Notifications You must be signed in to change notification settings

braendma/PDF-finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ PDF Search Tool with Python

A small Python script for automatically searching through PDF files in specific folders for defined keywords.
Perfect for situations where you have many local PDFs and need to quickly find which ones contain relevant content.

πŸš€ Features

  • πŸ” Path selection: Choose from predefined search paths
  • πŸ“‚ Recursive folder search: Includes all subfolders
  • πŸ“ Full-text search in PDFs (single or double keyword search)
  • πŸ“Š Simple result output in the terminal
  • ⚠️ Error handling for unreadable PDFs

πŸ“¦ Requirements

  • Python 3.x
  • Libraries:
    pip install PyPDF2

βš™οΈ Installation & Usage

  1. Clone the repository or download the script
  2. Adjust the search paths
    path1 = 'YOUR\\PATH\\HERE'
    path2 = 'YOUR\\PATH\\HERE'
    path3 = 'YOUR\\PATH\\HERE'
  3. Run the script
    python pdf_search.py
  4. Select a path
    The terminal will prompt you to choose one of the three preset search paths.
  5. Enter keyword(s)
    • Only the first keyword β†’ single keyword search
    • Two keywords β†’ both must appear in the document
  6. View the results
    The terminal will list the found files with their full paths.

πŸ’‘ Example Run

Which path should be selected? (1: PATH1, 2: PATH2, 3: PATH3): 1
Keyword 1: invoice
Keyword 2: 2023

Searching for: ['invoice', '2023'] Number of terms: 2

Search results:

C:\Documents\Projects\Finance\invoice_april_2023.pdf
C:\Documents\Invoices\Clients\invoice_may_2023.pdf

πŸ› οΈ Possible Improvements

  • πŸ“ˆ Progress indicator for large numbers of files
  • πŸ—‚ Export results to CSV or HTML
  • 🧠 Regular expression support for more complex searches
  • πŸ–₯ GUI version for users unfamiliar with the command line

About

This Python script allows users to search through PDF documents located in predefined directories for specific keywords. It uses PyPDF2 to extract text from PDFs and supports single or dual keyword searches.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages