Analyze and improve speed and memory consumption

We had a use case at Argenta, where we worked with table of about 300 cols and ~2 mil. of rows.
There, the preprocessing took a lot of time and memory especially.

What we’d need is to find any dataset which is of similar size and is close to reality (mixture of categorical, flags and continuous variables, has missing) and see how much memory Cobra uses and how slow it is.

The issue occurs in `preprocessor.fit()` and `preprocessor.transform()` – but these guys do a lot behind, so I am trying to pinpoint the cause (is it the binning? Incidence replacement? Maybe the data types of intermediate tables are not efficient and it takes too much memory … ).

Once we find the cause, we can figure out how to fix it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Analyze and improve speed and memory consumption #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Analyze and improve speed and memory consumption #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions