How the dot product measures similarity

Tivadar Danka small portrait Tivadar Danka
The dot product

Understanding math will make you a better engineer.

So, I am writing the best and most comprehensive book about it.

In machine learning, we use the dot product every day.

Its definition is far from revealing. For instance, what does the sum of coordinate products have to do with similarity?

There is a beautiful geometric explanation behind.

The dot product is one of the most fundamental concepts in machine learning, making appearances almost everywhere. By definition, the dot product (or inner product) is defined between two vectors as the sum of coordinate products.

Definition of the dot product

The fundamental properties of the dot product

To peek behind the curtain, there are three key properties that we have to understand.

First, the dot product is linear in both variables. This property is called bilinearity.

Bilinearity of the dot product

Second, the dot product is zero if the vectors are orthogonal. (In fact, the dot product generalizes the concept of orthogonality beyond Euclidean spaces. But that's for another day :) )

Orthogonality of two vectors

Third, the dot product of a vector with itself equals the square of its magnitude.

Dot product with itself

The geometric interpretation of the dot product

Now comes the interesting part. Given a vector yy, we can decompose xx into the two components xox_o and xpx_p. One is parallel to yy, while the other is orthogonal to it.

In physics, we apply the same decomposition to various forces all the time.

Orthogonal decomposition of two vectors

The vectors xox_o and xpx_p are characterized by two properties:

  1. xpx_p is a scalar multiple of yy,
  2. and x0x_0 is orthogonal to xpx_p (and thus to yy).

We are going to use these properties to find an explicit formula for xpx_p. Spoiler alert: it is related to the dot product.

Properties of the orthogonal projection

Due to x0x_0 being orthogonal to yy, we can use the bilinearity of the dot product to express the cc in xp=cyx_p = c y .

Properties of the orthogonal projection

By solving for cc, we get that it is the ratio of the dot product and the magnitude of yy.

The coefficient of the orthogonal projection

If both xx and yy are unit vectors, the dot product simply expresses the magnitude of the orthogonal projection!

Dot product and the orthogonal projection

Dot product as similarity

Do you recall how the famous trigonometric functions sine and cosine are defined?

Let's say that the hypotenuse of our right triangle is a unit vector and one of the legs is on the xx-axis. Then the trigonometric functions equal the magnitudes of the projections to the axes.

The trigonometric functions

Using trigonometric functions, we see that the dot product of two unit vectors is the cosine of their enclosed angle α\alpha! This is how the dot product relates to cosine.

The dot product and the cosine

If xx and yy are not unit vectors, we can scale them and use our previous discovery to get the cosine of α\alpha.

Cosine similarity

The closer to its value to 11, the more similar xx and yy are. (In a sense.) In machine learning, we call this quantity the cosine similarity.

Now you understand why.

Cosine similarity

Having a deep understanding of math will make you a better engineer.

I want to help you with this, so I am writing a comprehensive book that takes you from high school math to the advanced stuff.
Join me on this journey and let's do this together!