-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
Description
What you were trying to do
Create a bandsample from a population of a target size.
What actually happened
If the step (sum_of_frequencies / sample_size) is not representable in binary without rounding error, the last item is not added to the sample, because the comparison of accumulator and step fails. Potentially this might also happen at other points, but that may not be an issue, not sure.
I can get around that if I replace the "accumulator >= step" comparisons with "round(accumulator, 3) >= round(step, 3)", but that seems fishy, presumably there is a better way.
How to reproduce
from pyndl.preprocess import bandsample
population = {'a': 4, 'b': 4, 'c': 2, 'd': 2, 'e': 1, 'f': 1}
bandsample(population, 3, cutoff=1) # step: 4.666666666666667, one item missing
# Counter({'c': 2, 'a': 4})
bandsample(population, 4, cutoff=1) # step: 3.5, all items included
# Counter({'d': 2, 'b': 4, 'a': 4, 'c': 2})
System details
Pyndl Information
General Information
Python version: 3.7.2
Pyndl version: 0.6.1
Operating System
OS: Linux x86_64
Kernel: 4.15.0-20-generic
CPU: 12
Mem: 2382MiB/10608MiB
Swap: 0MiB/2047MiB
Dependencies
pandas: 0.24.1
xarray: 0.11.3
pip: 19.0.1
numpy: 1.15.4
cython: 0.29.5
netCDF4: 1.4.2