A tool for converting PDF files into Accessible Digital Textbooks, ADTs.
The sample report can help in better understanding the process and outputs or you can view the final ADT (Accessible Digital Textbook).
Demos of ADTs created from the outputs of ADT Press:
- Momo Multilingual - Momo and the Leopards, multi-lingual reader from Bhutan (no edits, pure AI output).
- Queremos - Informative reader from Uruguay (lightly edited).
- Cuaderno5 Chapter 1 - Uruguay Grade 5 textbook with Activities (more extensively edited).
- PDF document processing and image extraction
- Image analysis using LLM models:
- Image captioning
- Intelligent image cropping
- Image meaningfulness assessment
- HTML report generation
- Visualization of the processing pipeline
- Python 3.13 or higher
- UV package manager (recommended)
- You must set the environment variable
OPENAI_API_KEYwith your OpenAI API key for the application to work.
This project uses uv for dependency management. If you don't have uv installed, you can install it following the instructions at the uv documentation.
Clone the repository and install dependencies:
git clone git@github.com:unicef/adt-press.git
cd adt-press
uv syncRun the main script with the default configuration:
uv run adt-press.py label=raven pdf_path=assets/raven.pdfThe application uses OmegaConf for configuration management. The default configuration file is located at config/config.yaml.
To override configuration values from the command line:
uv run adt-press.py label=mydocument pdf_path=/path/to/your/document.pdf page_range.start=0 page_range.end=5label: The label for this PDF file, will be used as the subdirectory name underoutput_dirpdf_path: Path to the PDF file to processpage_range: Range of pages to process (start and end)output_dir: Base directory to store outputstemplate_dir: Directory containing HTML templatesclear_cache: Whether to clear the processing cache before the runrender_strategy: Controls which strategy to use for layout generationdynamic(by default) - detectslayout_typesand routes them to render strategiestwo_columnworks best for novels and storybookshtmlworks best for textbooksoverlayworks best for comic books
The application generates the following outputs in the output/[your label] directory:
- Extracted images from the PDF
- Cropped images
- HTML reports with analysis results
- Visualization of the processing pipeline
adt-press includes an evaluation tool used for measuring performance of the various LLM tasks against a gold standard. To run the tool make sure you have the following environment variables set:
LABEL_STUDIO_HOST=[Your LabelStudio Hostname]
LABEL_STUDIO_TOKEN=[Your LabelStudio API Token]
AZURE_STORAGE_ACCOUNT_NAME=[Azure storage account name]
AZURE_STORAGE_ACCOUNT_KEY=[Azure storage account key]
MLFLOW_TRACKING_URI=https://[MLFlow endpoint URL] (optional)Once the environment is set, you can run the adt-eval.py tool the same as the adt-press.py tool, by default, output is put in output/eval
uv run adt-eval.pyThis will create new reports with results against the gold standard in the output directory. Start at index.html.
Alternatively, you can configure various options from the command line, look in config/eval_config.yml for a full list. (as well as config/config.yml for global options)
// limit to only run the first 10 test cases and only the text_extraction task
uv run adt-eval.py label=eval3 eval.limit=10 eval.tasks=text_extractionThis project uses Ruff for code formatting and linting. The configuration is specified in ruff.toml.
To check code style:
uv run ruff check --fixTo format code:
uv run ruff formatRun tests with pytest:
uv run pytestadt_press/: Main packagellm/: LLM integration modulesnodes/: Hamilton nodes for the processing pipelineutils/: Utility functionsmodels/: Data models used in adt-press
assets/: Example filesconfig/: Configuration filesprompts/: LLM prompt templatestemplates/: HTML templatestests/: Test files
Build the image:
docker build -t adt-press .Run the container:
docker run --rm adt-pressTo run a specific command inside the container (for example, to execute uv run adt-press.py with a PDF file):
docker run --rm adt-press uv run adt-press.py label=raven pdf_path=assets/raven.pdfReplace /data/yourfile.pdf with the path to your PDF file inside the container.
If you use Visual Studio Code, you can take advantage of the "Reopen in Container" feature for a full-featured development environment inside Docker.
This allows you to edit, run, and debug your code directly within the container.
To use this, add a .devcontainer configuration to your project and select "Reopen in Container" from the VS Code command palette.
You will need to have the Dev Containers extension installed in VS Code to use this feature.
Note:
The folder .devcontainer needs to be in the root of your project, containing a devcontainer.json file with the following content:
{
"name": "ADT Press",
"build": {
// Sets the run context to one level up instead of the .devcontainer folder.
"context": "..",
// Update the 'dockerFile' property if you aren't using the standard 'Dockerfile' filename.
"dockerfile": "../Dockerfile"
}
}Environment Variable Required
Note:
You must set the environment variableOPENAI_API_KEYwith your OpenAI API key for the application to work.
- When running the Dockerized version, you need to set the
OPENAI_API_KEYvariable every time you run the container.
For example:docker run --rm -e OPENAI_API_KEY=your-key-here adt-press- When using VS Code "Reopen in Container", you can add the variable to your
.envfile or set it in the container terminal before running your scripts.
License
This project is licensed under the Apache License 2.0.