-
Notifications
You must be signed in to change notification settings - Fork 147
Description
In the method #serialize_item in similarity_matrix.rb, I found there could be a bug when converting similarity metrics into a string. the original code is like this
def serialize_item(item_id, max_precision=5)
items = @write_queue[item_id].to_a
items.sort!{ |a,b| b[1] <=> a[1] }
items = items[0..max_neighbors-1]
items = items.map{ |i,s| s>0 ? "#{i}:#{s.to_s[0..max_precision]}" : nil }
items.compact * "|"
end
the issue is at statement "s.to_s[0..max_precision]". For some cases, the value of s could be a fairly small floating point number. when converting it to string, Ruby will use scientific notation as the result of s.to_s. For example,
"8.442380751371887e-05"
After executing "s.to_s[0..max_precision]", given max_precision=5, the return string will become "8.4423", which causes the similarity result being enlarged incorrectly.
I would suggest to change "s.to_s[0..max_precision]" to
"sprint('%.#{max_precision}f", s)"
with sprintf, the above example will output "0.00008".
it is an edge case, there could be other better solution to it. but just want to share my findings on this issue.
Thank you very much for the great gem!
MetaSync