Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip-Gram Model 跳字模型 #29

Open
chengjun opened this issue Aug 9, 2021 · 0 comments
Open

Skip-Gram Model 跳字模型 #29

chengjun opened this issue Aug 9, 2021 · 0 comments

Comments

@chengjun
Copy link
Member

chengjun commented Aug 9, 2021

才发现github的issues不支持markdown格式的数学公式。

The skip-gram model assumes that a word can be used to generate its surrounding words in a text sequence. Take the text sequence “the”, “man”, “loves”, “his”, “son” as an example. Let us choose “loves” as the center word and set the context window size to 2. As shown in Fig. 14.1.1, given the center word “loves”, the skip-gram model considers the conditional probability for generating the context words: “the”, “man”, “his”, and “son”, which are no more than 2 words away from the center word:

𝑃("the","man","his","son"∣"loves").

Assume that the context words are independently generated given the center word (i.e., conditional independence). In this case, the above conditional probability can be rewritten as

𝑃("the"∣"loves")⋅𝑃("man"∣"loves")⋅𝑃("his"∣"loves")⋅𝑃("son"∣"loves").

In the skip-gram model, each word has two 𝑑 -dimensional-vector representations for calculating conditional probabilities. More concretely, for any word with index 𝑖 in the dictionary, denote by 𝐯𝑖∈ℝ𝑑 and 𝐮𝑖∈ℝ𝑑 its two vectors when used as a center word and a context word, respectively.

The conditional probability of generating any context word 𝑤𝑜 (with index 𝑜 in the dictionary) given the center word 𝑤𝑐 (with index 𝑐 in the dictionary) can be modeled by a softmax operation on vector dot products:

image

After training, for any word with index 𝑖 in the dictionary, we obtain both word vectors 𝐯𝑖 (as the center word) and 𝐮𝑖 (as the context word). In natural language processing applications, the center word vectors of the skip-gram model are typically used as the word representations.

http://d2l.ai/chapter_natural-language-processing-pretraining/word2vec.html#the-skip-gram-model

def skip_gram(center, contexts_and_negatives, embed_v, embed_u):
    v = embed_v(center)
    u = embed_u(contexts_and_negatives)
    pred = torch.bmm(v, u.permute(0, 2, 1))
    return pred

or

class SkipGram(nn.Module):
    def __init__(self, vocab_size, embd_size):
        super(SkipGram, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embd_size)
    
    def forward(self, focus, context):
        embed_focus = self.embeddings(focus).view((1, -1)) # input
        embed_ctx = self.embeddings(context).view((1, -1)) # output
        score = torch.mm(embed_focus, torch.t(embed_ctx)) # input*output
        log_probs = F.logsigmoid(score) # sigmoid
        return log_probs
    
    def extract(self, focus):
        embed_focus = self.embeddings(focus)
        return embed_focus

Why we train word2vec with dot product as the similarity measure but use the cosine similarity after the model is trained?

https://stackoverflow.com/questions/54411020/why-use-cosine-similarity-in-word2vec-when-its-trained-using-dot-product-similar

Cosine similarity and Dot product are both similarity measures but dot product is magnitude sensitive while cosine similarity is not. Depending on the occurance count of a word it might have a large or small dot product with another word. We normally normalize our vector to prevent this effect so all vectors have unit magnitude. But if your particular downstream task requires occurance count as a feature then dot product might be the way to go, but if you do not care about counts then you can simlpy calculate the cosine similarity which will normalize them.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant