Skip to content

Gradient calculation formula of word2vec #34

@DamirTenishev

Description

@DamirTenishev

In line 523 of word2vec there is a formula:

g = (1 - vocab[word].code[d] - f) * alpha;

Can you please help me understand its logic?

Since f is the cross product of embedding and context in the case of hierarchical softmax we want it to be as close as possible to the turn (0 or 1) in a Huffman tree we have to take for this previous word (embedding) and current word's node index (context). In this case we just need

g = (vocab[word].code[d] - f)*alpha

Taking into account that vocab[word].code[d] could be 0 or 1 only, the "1 - vocab[word].code[d]" is just the inversion left-to-right-and-back nodes; what's its purpose?

I summed up some details here: https://datascience.stackexchange.com/questions/129865/intuition-behind-g-variable-calculation-in-the-original-word2vec-implementation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions