Skip to content

Cythonize Stuff for Performance #4

@chipmonkey

Description

@chipmonkey

building 3 mindexes on 10,000,000 records takes: time: 28.926903247833252 seconds
enn originally took time: 0.9821538925170898 seconds for a single query
enne currently takes time: 0.1329345703125 seconds for a single query

by comparison, on the same data using sklearn:
time to fit ball_tree: 15.148420572280884 seconds
time to query ball_tree: 0.0006785392761230469 seconds
time to fit kd_tree: 13.99501895904541 seconds
time to query kd_tree: 0.0007152557373046875 seconds
time to fit brute: 0.002969503402709961 seconds
time to query brute: 0.3765408992767334 seconds

So obviously we have room for improvement. Assuming that the algorithm holds up and that some of the performance issues are inherit to python, the goal of this issue is to cythonize the core algorithms.

See, for example: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/_ball_tree.pyx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions