In machine learning, we use the dot product every day.
Its definition is far from revealing. For instance, what does the sum of coordinate products have to do with similarity?
There is a beautiful geometric explanation behind.
The dot product is one of the most fundamental concepts in machine learning, making appearances almost everywhere. By definition, the dot product (or inner product) is defined between two vectors as the sum of coordinate products.
The fundamental properties of the dot product
To peek behind the curtain, there are three key properties that we have to understand.
First, the dot product is linear in both variables. This property is called bilinearity.
Second, the dot product is zero if the vectors are orthogonal. (In fact, the dot product generalizes the concept of orthogonality beyond Euclidean spaces. But that's for another day :) )
Third, the dot product of a vector with itself equals the square of its magnitude.
The geometric interpretation of the dot product
Now comes the interesting part. Given a vector , we can decompose into the two components and . One is parallel to , while the other is orthogonal to it.
In physics, we apply the same decomposition to various forces all the time.
The vectors and are characterized by two properties:
- is a scalar multiple of ,
- and is orthogonal to (and thus to ).
We are going to use these properties to find an explicit formula for . Spoiler alert: it is related to the dot product.
Due to being orthogonal to , we can use the bilinearity of the dot product to express the in .
By solving for , we get that it is the ratio of the dot product and the magnitude of .
If both and are unit vectors, the dot product simply expresses the magnitude of the orthogonal projection!
Dot product as similarity
Do you recall how the famous trigonometric functions sine and cosine are defined?
Let's say that the hypotenuse of our right triangle is a unit vector and one of the legs is on the -axis. Then the trigonometric functions equal the magnitudes of the projections to the axes.
Using trigonometric functions, we see that the dot product of two unit vectors is the cosine of their enclosed angle ! This is how the dot product relates to cosine.
If and are not unit vectors, we can scale them and use our previous discovery to get the cosine of .
The closer to its value to , the more similar and are. (In a sense.) In machine learning, we call this quantity the cosine similarity.
Now you understand why.