-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Milestone
Description
1. Extract Text from PDFs:
• Use a library like PyMuPDF, PyPDF2, or pdfminer to extract text from each PDF.
2. Preprocess the Text:
• Lower case, etc
4. Store and Index the Text using one of the following methods:
• Use SQLite for a simple, SQL-based index.
• Use libraries like Whoosh for full-text search.
• Use distributed systems like Elasticsearch for large-scale search.
Metadata
Metadata
Assignees
Labels
No labels