Skip to content

Conversation

@livnugaraa
Copy link

This PR introduces functionality for scanning media files, extracting text content using OCR, and managing file operations more robustly. It integrates three new/updated modules:

file_handler.py: Handles file input/output operations, validation, and error management.
ocr_engine.py: Provides OCR capabilities for extracting text from images and other supported formats.
scan_media.py: Implements the core logic for scanning media files, coordinating between file handling and OCR processing.

Key Changes

  • Implemented file handling utilities to safely load, save, and validate media files.

  • Added an OCR engine wrapper that abstracts text extraction and handles failures gracefully.

  • Built a media scanning pipeline that:

  • Improved error handling and logging for debugging and maintainability.

  • Modularised code to make each component reusable and testable.

  • Accepts image and other supported media types.

  • Uses ocr_engine to extract text content.

  • Provides structured results for downstream processing.

This scans pdf's and images and coverts them to text.
Copy link
Contributor

@lperry022 lperry022 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Copy link
Member

@ben-AI-cybersec ben-AI-cybersec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too

@lperry022 lperry022 assigned lperry022 and unassigned lperry022 Sep 22, 2025
Copy link
Member

@ben-AI-cybersec ben-AI-cybersec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, thanks Liv!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants