Skip to content

Analyze and improve speed and memory consumption #27

@JanBenisek

Description

@JanBenisek

We had a use case at Argenta, where we worked with table of about 300 cols and ~2 mil. of rows.
There, the preprocessing took a lot of time and memory especially.

What we’d need is to find any dataset which is of similar size and is close to reality (mixture of categorical, flags and continuous variables, has missing) and see how much memory Cobra uses and how slow it is.

The issue occurs in preprocessor.fit() and preprocessor.transform() – but these guys do a lot behind, so I am trying to pinpoint the cause (is it the binning? Incidence replacement? Maybe the data types of intermediate tables are not efficient and it takes too much memory … ).

Once we find the cause, we can figure out how to fix it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions