This repository houses the evaluation codebase, results, and reports for the AAPB-CLAMS collaboration project. Evaluations are conducted using individual CLAMS Apps or pipelines of CLAMS Apps that produce evaluable results for various video metadata extraction tasks.
Each subdirectory in this repository represents a distinct evaluation task. Each task directory contains its own set of evaluation-specific files, including code, documentation, and configuration.
All evaluation methodology documentation must be incorporated into the Python module docstrings of the evaluation class. These docstrings automatically appear in the "Evaluation method" section of generated reports via the write_report() method.
No subdirectory README.md files are allowed. Instead:
- Document evaluation methodology in the class docstring
- Document input/output formats in the class docstring
- Document metrics and algorithms in the class docstring
- Use
--helpfor CLI usage information
This policy ensures documentation stays synchronized with code and appears in generated reports.
- "gold" - The gold standard, human-annotated files against which app predictions are evaluated.
- Also known as "reference", "ground truth", or "gold standard."
- Typically use formats like
.tsv,.csv, or.txt.
- "pred" - Files generated by CLAMS workflows containing predicted annotations of the phenomena to be evaluated (e.g., time durations for slate detection).
- Also known as "test", "system", or "output" files.
- Primarily
.mmiffiles, occasionally.json.
- Both gold and prediction files must be placed in their respective directories. The evaluation scripts require two parameters:
path-to-gold-directoryandpath-to-pred-directory. - The internal format of gold files is determined by the processing code in the
aapb-annotationsrepository. Prediction files are generated by CLAMS Apps.
All evaluation scripts generate a Markdown-based report file summarizing the evaluation results. Additionally, script developers can choose to output:
- Intermediate result files
- Final output files
- Side-by-side visualizations
- Confusion matrices
- Other useful artifacts
- Select a gold dataset from the
aapb-annotationsrepository, specifying a batch name. - Run a CLAMS workflow to generate prediction (
.mmif) files locally. - Execute the evaluation script. The common command-line pattern is:
Individual evaluation tasks may include additional parameters for extended functionality or output options. Refer to each evaluation task's
python -m EvaluationTask.evaluate [--batchname <BATCHNAME>] --golds <GOLD_LOCATION> --preds <PRED_LOCATION> --export <MARKDOWN_REPORT_PATH>
--helpdocumentation for specifics.