-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Labels
bugSomething isn't workingSomething isn't workingquestionFurther information is requestedFurther information is requested
Milestone
Description
We had a use case at Argenta, where we worked with table of about 300 cols and ~2 mil. of rows.
There, the preprocessing took a lot of time and memory especially.
What we’d need is to find any dataset which is of similar size and is close to reality (mixture of categorical, flags and continuous variables, has missing) and see how much memory Cobra uses and how slow it is.
The issue occurs in preprocessor.fit() and preprocessor.transform() – but these guys do a lot behind, so I am trying to pinpoint the cause (is it the binning? Incidence replacement? Maybe the data types of intermediate tables are not efficient and it takes too much memory … ).
Once we find the cause, we can figure out how to fix it.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingquestionFurther information is requestedFurther information is requested