Skip to content

Translate FollowTheMoney Document entities via Apertium or Argos

License

Notifications You must be signed in to change notification settings

openaleph/ftm-translate

Repository files navigation

pre-commit Coverage Status

ftm-translate

Local translations for FollowTheMoney Documents.

Translates bodyText and stores results in translatedText and translatedLanguage properties.

Installation

pip install ftm-translate[argos]   # or [apertium]

Apertium requires system installation.

Configuration

Environment Variable Default Description
FTM_TRANSLATE_ENGINE argos Translation engine (argos or apertium)
FTM_TRANSLATE_SOURCE_LANGUAGE - Source language (ISO 639-1)
FTM_TRANSLATE_TARGET_LANGUAGE en Target language (ISO 639-1)

CLI Usage

ftm-translate --help

Translate entities:

ftm-translate entities -i entities.json -o translated.json -s de -t en
ftm-translate entities -i https://data.example.org/entities.ftm.json -s de

Translate text:

echo "Hallo Welt" | ftm-translate text -s de -t en
ftm-translate text -i input.txt -o output.txt -s de -e apertium

Options: -s source, -t target (default: en), -e engine, -i input, -o output.

OpenAleph Worker

To integrate in OpenAleph ingest pipeline using openaleph-procrastinate:

Install dependencies:

pip install ftm-translate[openaleph]

Run the worker:

PROCRASTINATE_APP=ftm_translate.tasks.app procrastinate worker -q translate

Queue name: translate Task identifier: ftm_translate.tasks.translate

Benchmark

Comparison of Argos and Apertium on German → English translation (10 random Wikipedia articles, 3 rounds):

Mode Engine Throughput Speed
Full text (~11k chars) Argos 1,081 chars/sec 1x
Full text (~11k chars) Apertium 8,736 chars/sec 8.1x
Sentences (75 sentences) Argos 895 chars/sec 1x
Sentences (75 sentences) Apertium 1,145 chars/sec 1.3x

Apertium is significantly faster for full-text translation. The gap narrows for sentence-by-sentence translation due to subprocess overhead.

Run benchmark:

python contrib/benchmark.py -n 10 -r 3

Acknowledgements

This is inspired by the preliminary work by and valuable knowledge exchange with the International Consortium of Investigative Journalists whose tech team built ES Translator.

License

ftm-translate, (C) 2026 Data and Research Center – DARC

Licensed under AGPLv3+. See NOTICE and LICENSE.

About

Translate FollowTheMoney Document entities via Apertium or Argos

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •