use priority queue to find n-best sequences#2
use priority queue to find n-best sequences#2rsennrich wants to merge 2 commits intoJekub:masterfrom
Conversation
|
just found some additional overhead (the code now keeps only Y hypotheses in the priority queue, since we know that for each previous hypothesis, the n-best list is already sorted). Now 50-best tagging is about 20-30x faster than the original implementation. |
|
Thanks for this contribution, it looks cool. Can you please just update your code to match the style of wapiti code. I want to enforce consistent style in all the code base. |
|
do you mean the heap.c and heap.h ? Those aren't mine, but from https://github.com/willemt/CHeap (BSD license). The downside would be that changing the code style will make it harder to merge in any fixes/optimizations from/to upstream (although the implementation looks robust). |
|
just a quick follow-up because I haven't heard back from you regarding the last comment - do you also want to enforce the code style for external code (such as the heap implementation mentioned above)? |
|
@rsennrich Hey is it compatible with the new version of CHeap? There are many variables, definition changes in the new one. |
the current implementation (lines 369-378) has a complexity of O(n**2), whereas the priority queue should work in O(n log n).
with a label set of size 50, n-best tagging was about the same speed up to n=10; for n=50, the new implementation was about 6x faster in a small benchmark.