This repository contains the official evaluation implementation for the ICDAR'25 MapText competition.
(Evaluation for the ICDAR'24 MapText
competition is available through repository tag icdar-2024.)
Although closely related to previous competitions on robust reading (e.g., ICDAR19-ArT) and document layout analysis (e.g., ICDAR23-HierText), detecting and recognizing text on maps poses new challenges such as complex backgrounds, dense text labels, and various font sizes and styles. The competition features two primary tasks---text detection and end-to-end text recognition---each with a secondary task of linking words into phrase blocks.
Installation depends on whether one is using Conda, VirtualEnv with Pip, or Pipenv. Note that the Robust Reading Challenge (RRC) server hosting the competition currently uses Python 3.9.18; although newer versions should also work, using the same version (as specified below) is more likely to produce matching results.
If using conda, create a conda environment with the requisite versions:
conda env create -f conda-environment.yaml
conda activate maptext-evalMake sure you have Pipenv (along with Pyenv) installed. Pipenv is a more mature virtual environment manager, and Pyenv enables to install any version of Python.
You can install all dependencies and open a shell in the new virtual environment with the following commands:
pipenv install
pipenv shellIf using virtualenv, create a virtualenv environment and install the requisite versions:
virtualenv -p /usr/bin/python3.9 maptext-eval
source maptext-eval/bin/activate
pip install -r requirements.txtTo run from the command-line:
python3 eval.py --gt GT.json --pred YOURFILE.json --task TASKThe options for TASK can be det (Task 1), detedges (Task 2), detrec (Task 3), or detrecedges (Task 4).
For other options (including writing per-image results to persistent file output) use --help.
| Output Key | Metric | Tasks |
|---|---|---|
recall |
Fraction of ground truth words (not ignored) that are true positives | all |
precision |
Fraction of predicted words (not ignored) that are true positives | all |
fscore |
Harmonic mean of recall and precision |
all |
tightness |
Average IoU among true positive words | all |
quality |
Panoptic Quality (PQ); product of fscore and tightness |
all |
char_accuracy |
1 – Avg NED (Normalized Edit Distance) among true positive words | detrec, detrecedges |
char_quality |
Panoptic Character Quality (PCQ); product of quality and char_accuracy |
detrec, detrecedges |
edges_recall |
Fraction of ground truth word links (not ignored) that are true positives | detedges, detrecedges |
edges_precision |
Fraction of predicted word links (not ignored) that are true positives | detedges, detrecedges |
edges_fscore |
Harmonic mean of edges_recall and edges_precision |
detedges, detrecedges |
hmean |
Harmonic mean of all quantities for task-specific evaluation | all |
See Competition Tasks for additional details and metric definitions.
Command:
python3 eval.py --gt data/example_gt.json --pred data/example_pred.json --task detrecedgesOutput:
{"recall": 0.875,
"precision": 1.0,
"fscore": 0.9333333333333333,
"tightness": 0.7915550491751623,
"quality": 0.7387847125634848,
"hmean": 0.6939804663478768,
"char_accuracy": 0.8067226890756303,
"char_quality": 0.595994389967181,
"edges_recall": 0.3333333333333333,
"edges_precision": 1.0,
"edges_fscore": 0.5}After downloading sample data (below) and producing predictions:
python3 eval.py --gt sample.json --pred YOUROUTPUT.json \
--task det --gt-regex sampleSee competition downloads for data details and tasks for file format details.
- General Rumsey Data Set
- Train Split (200 tiles from 196 maps)
- Validation Split (40 tiles from 40 maps)
- French Cadastre Data Set
- Train Split (80 tiles from 37 maps)
- Validation Split (15 tiles from 9 maps)
- Taiwanese Historical Data Set
- Train Split (1,478 tiles from 169 maps)
- Validation Split (166 tiles from 30 maps)
- Competition Test Data
- General Rumsey Data Set
(700 tiles from 700 maps)
- French Cadastre Data Set (coming soon)
- Taiwanese Historical Data Set (coming soon)
- General Rumsey Data Set
- Sample Data Set
- 353 tiles from 31 maps of 9 atlases