Skip to content

ThHanke/PDF2MD

Repository files navigation

PDF2MD

Containerized Application to convert pdf to markdown

Commercial usage

Marker - Submodule

Due to the licensing of the underlying models like layoutlmv3 and nougat, this is only suitable for noncommercial usage (citation from [marker repo] (https://github.com/VikParuchuri/marker)).

  • LayoutLMv3: CC BY-NC-SA 4.0 . Source
  • PyMuPDF - GPL . Source Other dependencies/datasets are openly licensed (doclaynet, byt5), or used in a way that is compatible with commercial usage (ghostscript).

Acknowledgments

This work would not have been possible without marker@vikas.sh. and amazing open source models and datasets, including (but not limited to):

  • Nougat from Meta
  • Layoutlmv3 from Microsoft
  • DocLayNet from IBM
  • ByT5 from Google

Thank you to the authors of these models and datasets for making them available to the community!

About

Containerized Application to convert pdf to markdown

Resources

License

Stars

Watchers

Forks

Packages