Skip to content

refined code to extract tabular data from pdf's and images and conversion to json format.

Notifications You must be signed in to change notification settings

vignexshh/tabular_data_extraction

Repository files navigation

Features

  • Lattice Mode: Ideal for extracting visible tables with clear cell boundaries (e.g., lines or borders).
  • Stream Mode: Best for invisible tables where data is separated using spaces rather than borders.
  • Hybrid Mode: Combines features of both lattice and stream for complex tables.
  • Network Mode: Handles advanced extraction scenarios, adapting to unique PDF structures.

Usage

Camelot allows flexible extraction tailored to the specific structure and formatting of your PDF. Choose the appropriate mode based on the table's visibility and layout to ensure accurate results. Marked for internal tools (Mass extraction, to be optimised with GPU)

About

refined code to extract tabular data from pdf's and images and conversion to json format.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages