Python cleaning script data-preparation/preprocessing/training/01a_catalogue_cleaning_and_filtering/clean.py
is using only the train split at the moment. Iteration over splits is needed and the filter application on all of them is needed!
Hint: just deleting the used split load_from_disk(dataset_path)['train'] by deleting the square brackets will not do it, because you will receive a DatasetDict Object then instead of a Dataset one. In consequence there is dataset.select() not possible because the method only exists for Dataset type