Skip to content

Indexing - PDFs to records #95

@HaraldCaap

Description

@HaraldCaap
1.	Extract Text from PDFs:
•	Use a library like PyMuPDF, PyPDF2, or pdfminer to extract text from each PDF.
2.	Preprocess the Text:
• Lower case, etc
4.	Store and Index the Text using one of the following methods:
•	Use SQLite for a simple, SQL-based index.
•	Use libraries like Whoosh for full-text search.
•	Use distributed systems like Elasticsearch for large-scale search.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions