Example: center = “cut”, context = “government”, vocabulary = {“the”, “government”, “cut”, “deficit”, “spending”}
Step 1 — random initialisation (3 dimensions, values drawn randomly):
| cut |
0.12 |
-0.31 |
0.05 |
| government |
-0.24 |
0.18 |
0.42 |
| the |
0.33 |
0.07 |
-0.15 |
| deficit |
-0.11 |
0.29 |
-0.38 |
| spending |
0.45 |
-0.22 |
0.19 |
Step 2 — numerator: dot product of \(v_{\text{government}}\) and \(v_{\text{cut}}\):
\[(-0.24 \times 0.12) + (0.18 \times -0.31) + (0.42 \times 0.05) = -0.029 - 0.056 + 0.021 = -0.064\] \[\text{exp}(-0.064) = \mathbf{0.938}\]
Step 3 — denominator: dot product of every word with \(v_{\text{cut}}\), exponentiate, sum:
| government |
\(-0.064\) |
\(0.938\) |
| the |
\((0.33 \times 0.12) + (0.07 \times -0.31) + (-0.15 \times 0.05) = 0.010\) |
\(1.010\) |
| deficit |
\((-0.11 \times 0.12) + (0.29 \times -0.31) + (-0.38 \times 0.05) = -0.122\) |
\(0.885\) |
| spending |
\((0.45 \times 0.12) + (-0.22 \times -0.31) + (0.19 \times 0.05) = 0.132\) |
\(1.141\) |
| cut |
\((0.12 \times 0.12) + (-0.31 \times -0.31) + (0.05 \times 0.05) = 0.113\) |
\(1.120\) |
Denominator \(= 0.938 + 1.010 + 0.885 + 1.141 + 1.120 = \mathbf{5.094}\)
Step 4 — probability and log probability:
\[p(\text{government}|\text{cut}) = \frac{0.938}{5.094} = 0.184 \qquad \Rightarrow \qquad \log(0.184) = \mathbf{-1.69}\]
After training: dot product rises from \(-0.064\) to \(0.8\) \(\rightarrow\) \(p(\text{government}|\text{cut})\) rises from \(0.184\) to \(0.45\) \(\rightarrow\) \(\log(0.45) = -0.80\) ✓ less negative, objective improving