Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Performance problems of multi process implementation of a block of code #28

@wabomo

Description

@wabomo

In the "inside_polygon" function of "cytopy/data/geometry.py", the performance of multi process implementation is very poor. A 10000 line of data takes nearly 30 seconds, but it only takes less than 0.04 seconds to change to normal programming.For the time being, I simply handle it like this, adding a row count judgment.Maybe you have a better way to deal with it.

if len(df) < 100000:  # row count judgment.
    # Single thread implementation
    xy = df[[x, y]].values
    (min_x, min_y, max_x, max_y) = poly.bounds
    mask = []
    for p in xy:
        bol = min_x <= p[0] <= max_x and min_y <= p[1] <= max_y and point_in_poly(p, poly) is True
        mask.append(bol)
else:
    # Multi process implementation
    if njobs < 0:
        njobs = cpu_count()
    xy = df[[x, y]].values
    f = partial(point_in_poly, poly=poly)
    with Pool(njobs) as pool:
        mask = list(pool.map(f, xy))
return df.iloc[mask]

My configuration:
python: 3.8.7
Memory: 32g
CPU: i7-1165g7 (4 cores and 8 threads)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions