Skip to content

Conversation

@Alexjmsherman
Copy link

According to the gensim documentation (https://radimrehurek.com/gensim/models/phrases.html#id2) for the models.phrases class, the formula for the phase model is from Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.

In the paper, the equation does not include N (size of the corpus vocabulary) as is listed in your notebook. I updated the equation removing N and it's definition
https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

FYI, I saw you present this at PyData D.C. I thought it was a great presentation and still, clearly, refer to this notebook often. Thanks for putting it together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant