forked from huggingface/olm-datasets
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Cleaning function has no german stopword list at the moment:
Changes need to be made here: data-preparation/preprocessing/training/01a_catalogue_cleaning_and_filtering/clean_helpers/stopwords.py
NLTK should have a list to download
Afterwards add 'remove_references_de' and 'split_sentences_de' as parameters to the clean.py usage in readme
Metadata
Metadata
Assignees
Labels
No labels