How the dot product measures similarity

The dot product is one of the most fundamental concepts in machine learning, making appearances almost everywhere. In introductory linear algebra classes, we learn that in the vector space , it is defined by the formula
One of its most important applications is to measure similarity between feature vectors.
But how are similarity and inner product related? The definition doesn't reveal much. In this post, our goal is to unravel the dot product and provide a simple geometric explanation!
The fundamental properties of the dot product
To see what the dot product has to do with similarity, we have three key observations. First, we can see that it is linear in both variables. This property is called bilinearity:
Second, the dot product of orthogonal vectors is zero.
Armed with these, we are ready to explore how similarity is measured!
Dot product as similarity
Suppose that we have two vectors, and . To see the geometric interpretation of their dot product, we first note that can be decomposed into the sum of two components: one is parallel to , while the other is orthogonal.
So, the dot product equals to . If we write as , a scalar multiple of , we can simplify the dot product:
We can go even one step further. If we assume that both and have a magnitude of one, the dot product equals the scaling factor!
Note that this scaling factor is in the interval . It can be negative if the directions of and are the opposite.
Now comes the really interesting part! has a simple geometric meaning. To see this, let's illustrate what is happening. (Recall we assumed that and both have a magnitude of one.)
It is the reason why cosine similarity is defined this way:
I hope that this short post helps you make sense of this concept, and armed with this knowledge, you'll be more confident when dealing with it!