Skip to content

bandsample function drops last sample with "non-binary" stepsize #176

@kuchenrolle

Description

@kuchenrolle

What you were trying to do

Create a bandsample from a population of a target size.

What actually happened

If the step (sum_of_frequencies / sample_size) is not representable in binary without rounding error, the last item is not added to the sample, because the comparison of accumulator and step fails. Potentially this might also happen at other points, but that may not be an issue, not sure.

I can get around that if I replace the "accumulator >= step" comparisons with "round(accumulator, 3) >= round(step, 3)", but that seems fishy, presumably there is a better way.

How to reproduce

from pyndl.preprocess import bandsample
population = {'a': 4, 'b': 4, 'c': 2, 'd': 2, 'e': 1, 'f': 1}

bandsample(population, 3, cutoff=1)  # step: 4.666666666666667, one item missing
# Counter({'c': 2, 'a': 4})
bandsample(population, 4, cutoff=1)  # step: 3.5, all items included
# Counter({'d': 2, 'b': 4, 'a': 4, 'c': 2})

System details

Pyndl Information

General Information

Python version: 3.7.2
Pyndl version: 0.6.1

Operating System

OS: Linux x86_64
Kernel: 4.15.0-20-generic
CPU: 12
Mem: 2382MiB/10608MiB
Swap: 0MiB/2047MiB

Dependencies

pandas: 0.24.1
xarray: 0.11.3
pip: 19.0.1
numpy: 1.15.4
cython: 0.29.5
netCDF4: 1.4.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions