Recommender System - Collaborative Filtering

Original Source: https://www.coursera.org/learn/machine-learning

Assume we are a developer working for Netfilx. We want to predict how much the user would like a movie he/she had not seen yet.

First we have to learn feature vector for movie $i$ ; $x^{(i)}$ and parameter vector of user $j$; $\theta^{(j)}$. We can think of parameter vector of a user as a user’s preference pattern of moives.

Then we can predict rating by user $j$ on movie $i$ with following equation: \((\theta^{(j)})^Tx^{(i)}\)

Notations

$n_u$ = number of users

$n_m$ = number of movies

$r(i,j) = 1$ if user j has rated movie i

$y(i,j)$ = rating given by user j to movie i (defined only if $r(i,j)=1$)

Collaborative Filtering

In previous problems, we knew feature vectors of input examples. However, with movie recommendation problem, we do not know the features of the movie. (ie. we don’t know features such as “amount of romance” or “amount of action” in a movie.)

So unlike normal problems, we have to “learn” feature vectors along with parameter vectors.

To do that, we combine two objectives; objective for learning user parameter and objective for learning movie feature.

1. Learning User Parameter Vector

Assume that we know feature vectors of movies.

To learn parameter of all users, we want to minimize the cost with respect to $\theta^{(1)}…\theta^{(n_u)}$:

\[J(\theta^{(1)}...\theta^{(n_u)}) = \dfrac{1}{2}\displaystyle \sum_{j=1}^{n_u} \sum_{i:r(i,j)=1} ((\theta^{(j)})^T(x^{(i)}) - y^{(i,j)})^2 + \dfrac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^n(\theta_k^{(j)})^2\]

Base of the summation $i:r(i,j)=1$ means choosing all $i$ such that $r(i,j)=1$.

2. Learning Movie Feature Vector

Assume that we know parameter vectors of users.

To learn feature of all movies, we want to minimize the cost with respect to $x^{(1)},\dots,x^{(n_m)}$ :

\[J(x^{(1)},\dots,x^{(n_m)}) = \dfrac{1}{2} \displaystyle \sum_{i=1}^{n_m} \sum_{j:r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y^{(i,j)})^2 + \dfrac{\lambda}{2}\sum_{i=1}^{n_m} \sum_{k=1}^{n} (x_k^{(i)})^2\]

Combining Two Objectives

To speed things up, we can simultaneously minimize our features and our parameters:

\[J(x,\theta) = \dfrac{1}{2} \displaystyle \sum_{(i,j):r(i,j)=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})^2 + \dfrac{\lambda}{2}\sum_{i=1}^{n_m} \sum_{k=1}^{n} (x_k^{(i)})^2 + \dfrac{\lambda}{2}\sum_{j=1}^{n_u} \sum_{k=1}^{n} (\theta_k^{(j)})^2\]

If we initialize $x^{(i)},…,x^{(n_m)},\theta^{(1)},…,\theta^{(n_u)}$ to small random values and perform gradient descent based on the above cost function, we will get both movie features and user parameters.

Leave a Comment