Research-grade Turkish morphological segmentation system and dataset (roots, suffixes, POS) built from Kaikki, Zemberek, and Wikimedia; optimized for FST-based linguistics.
nlp morphology wikimedia turkish dataset lexicon computational-linguistics finite-state-transducer pos-tagging wiktionary fst zemberek morphological-segmentation kaikki morphotactics
-
Updated
Feb 3, 2026 - Jupyter Notebook