Fix division by zero in WeightedRandomStreamRanker#16
Conversation
| // calculate sampling score | ||
| $r = pow(mt_rand() / $max_rand, (1 / $original_element->get_score())); | ||
| } | ||
| $H[strval($r)] = $element; |
There was a problem hiding this comment.
I don't know if we need to be worried about this line though: $H[strval($r)] = $element;. An element with a score of zero should be a rare occurrence, but is it possible some elements will be lost? It seems that way?
There was a problem hiding this comment.
i feel like we should have some logging somewhere to show us definitively whether a score of 0 actually happens... maybe we should add some here, if not?
There was a problem hiding this comment.
ok, i see that we do have this logging, since the exception is happening in production. in that case, maybe a decision could be made here to just drop the element entirely if the weight ends up being 0. that seems to make sense to me? cc @nicola-barbieri @lucila i guess it's a separate broader question of what we should do here when two elements have the exact same score, because yeah, they'll collide here.
There was a problem hiding this comment.
it might still be possible for two elements to have the same score, even if not zero though
There was a problem hiding this comment.
When score->0 then r->infinity, in which case I think assigning zero to it is far from the actual final score.
It all depends on the requirements of this ranker and what a score=0 means:
- if it's an error it means nothing and we can give it indeed a very low score or even make this element an invalid one (you can do this below)
- If it's not an error and it's an acceptable value which is just not handled then
rshould be a high number and not zero or invalid.
There was a problem hiding this comment.
so what if, theoretically, we get two elements with the same score. Only one of those will end up in $H. Does this mean the other one gets discarded?
There was a problem hiding this comment.
another alternative I'm thinking about, to reduce the chances of having more than 1 element with the same score, is to do something like:
$score = $original_element->get_score();
if ($score === 0) {
$score = 0.001;
}
$r = pow(mt_rand() / $max_rand, (1 / $score));
so that we will still call mt_rand() with a very small value on the exp part.
thoughts? @eltongo
I think this is an alternative @lovemeblender had suggested before (internally on a huddle)
There was a problem hiding this comment.
I would think it is also possible for two elements to have the same non-zero score, no? In that case, what we need is to fix $H, because this would only fix the case for the score being zero.
There was a problem hiding this comment.
mt_rand() <= $max_rand
if that's true we're set with either zero or adding an very small number to avoid collisions ^ collisions will be handled gracefully I think as the elements will just appear adjacently
There was a problem hiding this comment.
I agree with fixing the division by zero, maybe adding an epsilon to all input scores:
$score = $score + $epsilon (0.00001)
score = 0 shouldn't happen in the first place (unless rounding errors). which element is causing this ?
| $original_element = $element->get_original_element(); | ||
| // calculate sampling score | ||
| $r = pow(mt_rand() / $max_rand, (1 / $original_element->get_score())); | ||
| if ($original_element->get_score() === 0.0) { |
There was a problem hiding this comment.
just as a note, if for any reason this comes as 0 instead, 0===0.0 evaluates to false
What and why? 🤔
There's a division by zero bug in
WeightedRandomStreamRankerwhen the score for an element is zero. This PR fixes that.Testing Steps ✍️
A code review, and ensuring tests pass is fine in this case.
You're it! 👑
@Automattic/stream-builders