Skip to content

Getting Error while running csv example for my file  #136

@purnima1612

Description

@purnima1612

Hello all ,
I am trying to run csv exmaple for my file which has 850 records . Also I am trying to find duplicates based on custom function which Levenshtein distance . Trying to group all names under one entity_num which shre match of name more than 80% .

While preparning data I changed smaple size to 50
deduper.prepare_training(data_d,sample_size=50 )

after I finish labeling I am getting following error


Traceback (most recent call last):
  File "C:\Python_Projects\Python_extra_code\csv_example.py", line 132, in <module>
    deduper.train()
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\api.py", line 1215, in train
    self.predicates = self.active_learner.learn_predicates(recall, index_predicates)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 397, in learn_predicates
    return self.blocker.learn_predicates(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 136, in learn_predicates
    return self.block_learner.learn(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\training.py", line 72, in learn
    candidate_cover = self.random_forest_candidates(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\training.py", line 112, in random_forest_candidates
    sample_predicates = random.sample(predicates, pred_sample_size)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\random.py", line 453, in sample
    raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative

Process finished with exit code 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions