Skip to content

Conversation

@bliksemstraal
Copy link

Hi,

I love your project!!

I changed the stats package to a faster single-pass calculation. sumsq is now truly a sum of squares and mean was replaced with sum. This allows for a much faster Put method which is where most time is spent. The determination of min and max is faster too as there is not overhead call to functions in the math package.

Unit tests pass as they are. A local benchmark showed original package of ~18ns/op and new package is ~8ns/op. Every bit helps, especially in genetic algorithms.

@cbarrick
Copy link
Owner

cbarrick commented Jan 9, 2021

Thanks for the contribution! I'm excited to see real users out there.

IIUC this patch is using the naive algorithm for computing variance.

The original algorithm used here is Welford's algorithm for Put and Chan et al.'s algorithm for Merge.

These are numerically stable algorithms, which minimize rounding error. The idea is that the sum can be very large w.r.t. the other values, and error creeps in when performing computation on values at different scales. We can eliminate this by never using the sum directly.

That said, numerical stability may not be that important. The 2.25x speedup just from switching to the naive algorithm may be worth it for many user cases.

So there are trade offs. I am not sure if I want to straight up replace a numerically stable implementation with an unstable-but-fast implementation. I will need to think about it more deeply.

Case studies or macro benchmarks would help sway me either way.

@cbarrick
Copy link
Owner

cbarrick commented Jan 9, 2021

Also, the contributors file is unnecessary. Git tracks that metadata automatically.

@cbarrick
Copy link
Owner

cbarrick commented Jan 9, 2021

By changing the method signatures from Stats to *Stats, you are changing the semantics.

The original version does not modify the stats object.

It may be worth it to change, but it is backwards incompatible.

(Not that this library is likely to have any real production users.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants