Skip to content

Incorrect precision handling when similarity value is very small... #25

@metasync

Description

@metasync

In the method #serialize_item in similarity_matrix.rb, I found there could be a bug when converting similarity metrics into a string. the original code is like this

def serialize_item(item_id, max_precision=5)
items = @write_queue[item_id].to_a
items.sort!{ |a,b| b[1] <=> a[1] }
items = items[0..max_neighbors-1]
items = items.map{ |i,s| s>0 ? "#{i}:#{s.to_s[0..max_precision]}" : nil }
items.compact * "|"
end

the issue is at statement "s.to_s[0..max_precision]". For some cases, the value of s could be a fairly small floating point number. when converting it to string, Ruby will use scientific notation as the result of s.to_s. For example,

"8.442380751371887e-05"

After executing "s.to_s[0..max_precision]", given max_precision=5, the return string will become "8.4423", which causes the similarity result being enlarged incorrectly.

I would suggest to change "s.to_s[0..max_precision]" to

"sprint('%.#{max_precision}f", s)"

with sprintf, the above example will output "0.00008".

it is an edge case, there could be other better solution to it. but just want to share my findings on this issue.

Thank you very much for the great gem!

MetaSync

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions