Skip to content

clamsproject/aapb-evaluations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AAPB Evaluations

This repository houses the evaluation codebase, results, and reports for the AAPB-CLAMS collaboration project. Evaluations are conducted using individual CLAMS Apps or pipelines of CLAMS Apps that produce evaluable results for various video metadata extraction tasks.

Structure of This Repository

Each subdirectory in this repository represents a distinct evaluation task. Each task directory contains its own set of evaluation-specific files, including code, documentation, and configuration.

Documentation Policy

All evaluation methodology documentation must be incorporated into the Python module docstrings of the evaluation class. These docstrings automatically appear in the "Evaluation method" section of generated reports via the write_report() method.

No subdirectory README.md files are allowed. Instead:

  • Document evaluation methodology in the class docstring
  • Document input/output formats in the class docstring
  • Document metrics and algorithms in the class docstring
  • Use --help for CLI usage information

This policy ensures documentation stays synchronized with code and appears in generated reports.

Inputs to Evaluations

  • "gold" - The gold standard, human-annotated files against which app predictions are evaluated.
    • Also known as "reference", "ground truth", or "gold standard."
    • Typically use formats like .tsv, .csv, or .txt.
  • "pred" - Files generated by CLAMS workflows containing predicted annotations of the phenomena to be evaluated (e.g., time durations for slate detection).
    • Also known as "test", "system", or "output" files.
    • Primarily .mmif files, occasionally .json.
  • Both gold and prediction files must be placed in their respective directories. The evaluation scripts require two parameters: path-to-gold-directory and path-to-pred-directory.
  • The internal format of gold files is determined by the processing code in the aapb-annotations repository. Prediction files are generated by CLAMS Apps.

Outputs from Evaluations

All evaluation scripts generate a Markdown-based report file summarizing the evaluation results. Additionally, script developers can choose to output:

  • Intermediate result files
  • Final output files
  • Side-by-side visualizations
  • Confusion matrices
  • Other useful artifacts

Workflow of Evaluations

  1. Select a gold dataset from the aapb-annotations repository, specifying a batch name.
  2. Run a CLAMS workflow to generate prediction (.mmif) files locally.
  3. Execute the evaluation script. The common command-line pattern is:
    python -m EvaluationTask.evaluate [--batchname <BATCHNAME>] --golds <GOLD_LOCATION> --preds <PRED_LOCATION> --export <MARKDOWN_REPORT_PATH>
    Individual evaluation tasks may include additional parameters for extended functionality or output options. Refer to each evaluation task's --help documentation for specifics.

Instructions to Run Apps

CLAMS Apps Manual.
TestDrive Instructions (Alternate).

About

Collection of evaluation codebases

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 19

Languages