-
Notifications
You must be signed in to change notification settings - Fork 215
Open
Description
Hello all ,
I am trying to run csv exmaple for my file which has 850 records . Also I am trying to find duplicates based on custom function which Levenshtein distance . Trying to group all names under one entity_num which shre match of name more than 80% .
While preparning data I changed smaple size to 50
deduper.prepare_training(data_d,sample_size=50 )
after I finish labeling I am getting following error
Traceback (most recent call last):
File "C:\Python_Projects\Python_extra_code\csv_example.py", line 132, in <module>
deduper.train()
File "C:\Dev\Python3.11\Lib\site-packages\dedupe\api.py", line 1215, in train
self.predicates = self.active_learner.learn_predicates(recall, index_predicates)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 397, in learn_predicates
return self.blocker.learn_predicates(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 136, in learn_predicates
return self.block_learner.learn(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Dev\Python3.11\Lib\site-packages\dedupe\training.py", line 72, in learn
candidate_cover = self.random_forest_candidates(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Dev\Python3.11\Lib\site-packages\dedupe\training.py", line 112, in random_forest_candidates
sample_predicates = random.sample(predicates, pred_sample_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Dev\Python3.11\Lib\random.py", line 453, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
Process finished with exit code 1
Metadata
Metadata
Assignees
Labels
No labels