CINXE.COM

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="description" content="Keras documentation"> <meta name="author" content="Keras Team"> <link rel="shortcut icon" href="https://keras.io/img/favicon.ico"> <link rel="canonical" href="https://keras.io/examples/structured_data/collaborative_filtering_movielens/" />  <meta property="og:title" content="Keras documentation: Collaborative Filtering for Movie Recommendations"> <meta property="og:image" content="https://keras.io/img/logo-k-keras-wb.png"> <meta name="twitter:title" content="Keras documentation: Collaborative Filtering for Movie Recommendations"> <meta name="twitter:image" content="https://keras.io/img/k-keras-social.png"> <meta name="twitter:card" content="summary"> <title>Collaborative Filtering for Movie Recommendations</title>  <link href="/css/bootstrap.min.css" rel="stylesheet">  <link href="https://fonts.googleapis.com/css2?family=Open+Sans:wght@400;600;700;800&display=swap" rel="stylesheet">  <link href="/css/docs.css" rel="stylesheet"> <link href="/css/monokai.css" rel="stylesheet">  <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= 'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-5DNGF4N'); </script> <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-175165319-128', 'auto'); ga('send', 'pageview'); </script>  <script async defer src="https://buttons.github.io/buttons.js"></script> </head> <body>  <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5DNGF4N" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>  <div class='k-page'> <div class="k-nav" id="nav-menu"> <a href='/'><img src='/img/logo-small.png' class='logo-small' /></a> <div class="nav flex-column nav-pills" role="tablist" aria-orientation="vertical"> <a class="nav-link" href="/about/" role="tab" aria-selected="">About Keras</a> <a class="nav-link" href="/getting_started/" role="tab" aria-selected="">Getting started</a> <a class="nav-link" href="/guides/" role="tab" aria-selected="">Developer guides</a> <a class="nav-link active" href="/examples/" role="tab" aria-selected="">Code examples</a> <a class="nav-sublink" href="/examples/vision/">Computer Vision</a> <a class="nav-sublink" href="/examples/nlp/">Natural Language Processing</a> <a class="nav-sublink active" href="/examples/structured_data/">Structured Data</a> <a class="nav-sublink2" href="/examples/structured_data/structured_data_classification_with_feature_space/">Structured data classification with FeatureSpace</a> <a class="nav-sublink2" href="/examples/structured_data/feature_space_advanced/">FeatureSpace advanced use cases</a> <a class="nav-sublink2" href="/examples/structured_data/imbalanced_classification/">Imbalanced classification: credit card fraud detection</a> <a class="nav-sublink2" href="/examples/structured_data/structured_data_classification_from_scratch/">Structured data classification from scratch</a> <a class="nav-sublink2" href="/examples/structured_data/wide_deep_cross_networks/">Structured data learning with Wide, Deep, and Cross networks</a> <a class="nav-sublink2" href="/examples/structured_data/customer_lifetime_value/">Deep Learning for Customer Lifetime Value</a> <a class="nav-sublink2" href="/examples/structured_data/classification_with_grn_and_vsn/">Classification with Gated Residual and Variable Selection Networks</a> <a class="nav-sublink2" href="/examples/structured_data/classification_with_tfdf/">Classification with TensorFlow Decision Forests</a> <a class="nav-sublink2" href="/examples/structured_data/deep_neural_decision_forests/">Classification with Neural Decision Forests</a> <a class="nav-sublink2" href="/examples/structured_data/tabtransformer/">Structured data learning with TabTransformer</a> <a class="nav-sublink2 active" href="/examples/structured_data/collaborative_filtering_movielens/">Collaborative Filtering for Movie Recommendations</a> <a class="nav-sublink2" href="/examples/structured_data/movielens_recommendations_transformers/">A Transformer-based recommendation system</a> <a class="nav-sublink" href="/examples/timeseries/">Timeseries</a> <a class="nav-sublink" href="/examples/generative/">Generative Deep Learning</a> <a class="nav-sublink" href="/examples/audio/">Audio Data</a> <a class="nav-sublink" href="/examples/rl/">Reinforcement Learning</a> <a class="nav-sublink" href="/examples/graph/">Graph Data</a> <a class="nav-sublink" href="/examples/keras_recipes/">Quick Keras Recipes</a> <a class="nav-link" href="/api/" role="tab" aria-selected="">Keras 3 API documentation</a> <a class="nav-link" href="/2.18/api/" role="tab" aria-selected="">Keras 2 API documentation</a> <a class="nav-link" href="/keras_tuner/" role="tab" aria-selected="">KerasTuner: Hyperparam Tuning</a> <a class="nav-link" href="/keras_hub/" role="tab" aria-selected="">KerasHub: Pretrained Models</a> </div> </div> <div class='k-main'> <div class='k-main-top'> <script> function displayDropdownMenu() { e = document.getElementById("nav-menu"); if (e.style.display == "block") { e.style.display = "none"; } else { e.style.display = "block"; document.getElementById("dropdown-nav").style.display = "block"; } } function resetMobileUI() { if (window.innerWidth <= 840) { document.getElementById("nav-menu").style.display = "none"; document.getElementById("dropdown-nav").style.display = "block"; } else { document.getElementById("nav-menu").style.display = "block"; document.getElementById("dropdown-nav").style.display = "none"; } var navmenu = document.getElementById("nav-menu"); var menuheight = navmenu.clientHeight; var kmain = document.getElementById("k-main-id"); kmain.style.minHeight = (menuheight + 100) + 'px'; } window.onresize = resetMobileUI; window.addEventListener("load", (event) => { resetMobileUI() }); </script> <div id='dropdown-nav' onclick="displayDropdownMenu();"> <svg viewBox="-20 -20 120 120" width="60" height="60"> <rect width="100" height="20"></rect> <rect y="30" width="100" height="20"></rect> <rect y="60" width="100" height="20"></rect> </svg> </div> <form class="bd-search d-flex align-items-center k-search-form" id="search-form"> <input type="search" class="k-search-input" id="search-input" placeholder="Search Keras documentation..." aria-label="Search Keras documentation..." autocomplete="off"> <button class="k-search-btn"> <svg width="13" height="13" viewBox="0 0 13 13"><title>search</title><path d="m4.8495 7.8226c0.82666 0 1.5262-0.29146 2.0985-0.87438 0.57232-0.58292 0.86378-1.2877 0.87438-2.1144 0.010599-0.82666-0.28086-1.5262-0.87438-2.0985-0.59352-0.57232-1.293-0.86378-2.0985-0.87438-0.8055-0.010599-1.5103 0.28086-2.1144 0.87438-0.60414 0.59352-0.8956 1.293-0.87438 2.0985 0.021197 0.8055 0.31266 1.5103 0.87438 2.1144 0.56172 0.60414 1.2665 0.8956 2.1144 0.87438zm4.4695 0.2115 3.681 3.6819-1.259 1.284-3.6817-3.7 0.0019784-0.69479-0.090043-0.098846c-0.87973 0.76087-1.92 1.1413-3.1207 1.1413-1.3553 0-2.5025-0.46363-3.4417-1.3909s-1.4088-2.0686-1.4088-3.4239c0-1.3553 0.4696-2.4966 1.4088-3.4239 0.9392-0.92727 2.0864-1.3969 3.4417-1.4088 1.3553-0.011889 2.4906 0.45771 3.406 1.4088 0.9154 0.95107 1.379 2.0924 1.3909 3.4239 0 1.2126-0.38043 2.2588-1.1413 3.1385l0.098834 0.090049z"></path></svg> </button> </form> <script> var form = document.getElementById('search-form'); form.onsubmit = function(e) { e.preventDefault(); var query = document.getElementById('search-input').value; window.location.href = '/search.html?query=' + query; return False } </script> </div> <div class='k-main-inner' id='k-main-id'> <div class='k-location-slug'> ► <a href='/examples/'>Code examples</a> / <a href='/examples/structured_data/'>Structured Data</a> / Collaborative Filtering for Movie Recommendations </div> <div class='k-content'> <h1 id="collaborative-filtering-for-movie-recommendations">Collaborative Filtering for Movie Recommendations</h1> Author: <a href="https://twitter.com/sidd2006">Siddhartha Banerjee</a> Date created: 2020/05/24 Last modified: 2020/05/24 Description: Recommending movies using a model trained on Movielens dataset. <div class='example_version_banner keras_3'>ⓘ This example uses Keras 3</div> <img class="k-inline-icon" src="https://colab.research.google.com/img/colab_favicon.ico"/> <a href="https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/structured_data/ipynb/collaborative_filtering_movielens.ipynb">View in Colab</a> •<img class="k-inline-icon" src="https://github.com/favicon.ico"/> <a href="https://github.com/keras-team/keras-io/blob/master/examples/structured_data/collaborative_filtering_movielens.py">GitHub source</a> <hr /> <h2 id="introduction">Introduction</h2> This example demonstrates <a href="https://en.wikipedia.org/wiki/Collaborative_filtering">Collaborative filtering</a> using the <a href="https://www.kaggle.com/c/movielens-100k">Movielens dataset</a> to recommend movies to users. The MovieLens ratings dataset lists the ratings given by a set of users to a set of movies. Our goal is to be able to predict ratings for movies a user has not yet watched. The movies with the highest predicted ratings can then be recommended to the user. The steps in the model are as follows: <ol> <li>Map user ID to a "user vector" via an embedding matrix</li> <li>Map movie ID to a "movie vector" via an embedding matrix</li> <li>Compute the dot product between the user vector and movie vector, to obtain the a match score between the user and the movie (predicted rating).</li> <li>Train the embeddings via gradient descent using all known user-movie pairs.</li> </ol> References: <ul> <li><a href="https://dl.acm.org/doi/pdf/10.1145/371920.372071">Collaborative Filtering</a></li> <li><a href="https://dl.acm.org/doi/pdf/10.1145/3038912.3052569">Neural Collaborative Filtering</a></li> </ul> <div class="codehilite"><pre><code>import pandas as pd from pathlib import Path import matplotlib.pyplot as plt import numpy as np from zipfile import ZipFile import keras from keras import layers from keras import ops </code></pre></div> <hr /> <h2 id="first-load-the-data-and-apply-preprocessing">First, load the data and apply preprocessing</h2> <div class="codehilite"><pre><code># Download the actual data from http://files.grouplens.org/datasets/movielens/ml-latest-small.zip" # Use the ratings.csv file movielens_data_file_url = ( "http://files.grouplens.org/datasets/movielens/ml-latest-small.zip" ) movielens_zipped_file = keras.utils.get_file( "ml-latest-small.zip", movielens_data_file_url, extract=False ) keras_datasets_path = Path(movielens_zipped_file).parents[0] movielens_dir = keras_datasets_path / "ml-latest-small" # Only extract the data the first time the script is run. if not movielens_dir.exists(): with ZipFile(movielens_zipped_file, "r") as zip: # Extract files print("Extracting all the files now...") zip.extractall(path=keras_datasets_path) print("Done!") ratings_file = movielens_dir / "ratings.csv" df = pd.read_csv(ratings_file) </code></pre></div> <div class="k-default-codeblock"> <div class="codehilite"><pre><code>Downloading data from http://files.grouplens.org/datasets/movielens/ml-latest-small.zip 978202/978202 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step Extracting all the files now... Done! </code></pre></div> </div> First, need to perform some preprocessing to encode users and movies as integer indices. <div class="codehilite"><pre><code>user_ids = df["userId"].unique().tolist() user2user_encoded = {x: i for i, x in enumerate(user_ids)} userencoded2user = {i: x for i, x in enumerate(user_ids)} movie_ids = df["movieId"].unique().tolist() movie2movie_encoded = {x: i for i, x in enumerate(movie_ids)} movie_encoded2movie = {i: x for i, x in enumerate(movie_ids)} df["user"] = df["userId"].map(user2user_encoded) df["movie"] = df["movieId"].map(movie2movie_encoded) num_users = len(user2user_encoded) num_movies = len(movie_encoded2movie) df["rating"] = df["rating"].values.astype(np.float32) # min and max ratings will be used to normalize the ratings later min_rating = min(df["rating"]) max_rating = max(df["rating"]) print( "Number of users: {}, Number of Movies: {}, Min rating: {}, Max rating: {}".format( num_users, num_movies, min_rating, max_rating ) ) </code></pre></div> <div class="k-default-codeblock"> <div class="codehilite"><pre><code>Number of users: 610, Number of Movies: 9724, Min rating: 0.5, Max rating: 5.0 </code></pre></div> </div> <hr /> <h2 id="prepare-training-and-validation-data">Prepare training and validation data</h2> <div class="codehilite"><pre><code>df = df.sample(frac=1, random_state=42) x = df[["user", "movie"]].values # Normalize the targets between 0 and 1. Makes it easy to train. y = df["rating"].apply(lambda x: (x - min_rating) / (max_rating - min_rating)).values # Assuming training on 90% of the data and validating on 10%. train_indices = int(0.9 * df.shape[0]) x_train, x_val, y_train, y_val = ( x[:train_indices], x[train_indices:], y[:train_indices], y[train_indices:], ) </code></pre></div> <hr /> <h2 id="create-the-model">Create the model</h2> We embed both users and movies in to 50-dimensional vectors. The model computes a match score between user and movie embeddings via a dot product, and adds a per-movie and per-user bias. The match score is scaled to the <code>[0, 1]</code> interval via a sigmoid (since our ratings are normalized to this range). <div class="codehilite"><pre><code>EMBEDDING_SIZE = 50 class RecommenderNet(keras.Model): def __init__(self, num_users, num_movies, embedding_size, **kwargs): super().__init__(**kwargs) self.num_users = num_users self.num_movies = num_movies self.embedding_size = embedding_size self.user_embedding = layers.Embedding( num_users, embedding_size, embeddings_initializer="he_normal", embeddings_regularizer=keras.regularizers.l2(1e-6), ) self.user_bias = layers.Embedding(num_users, 1) self.movie_embedding = layers.Embedding( num_movies, embedding_size, embeddings_initializer="he_normal", embeddings_regularizer=keras.regularizers.l2(1e-6), ) self.movie_bias = layers.Embedding(num_movies, 1) def call(self, inputs): user_vector = self.user_embedding(inputs[:, 0]) user_bias = self.user_bias(inputs[:, 0]) movie_vector = self.movie_embedding(inputs[:, 1]) movie_bias = self.movie_bias(inputs[:, 1]) dot_user_movie = ops.tensordot(user_vector, movie_vector, 2) # Add all the components (including bias) x = dot_user_movie + user_bias + movie_bias # The sigmoid activation forces the rating to between 0 and 1 return ops.nn.sigmoid(x) model = RecommenderNet(num_users, num_movies, EMBEDDING_SIZE) model.compile( loss=keras.losses.BinaryCrossentropy(), optimizer=keras.optimizers.Adam(learning_rate=0.001), ) </code></pre></div> <hr /> <h2 id="train-the-model-based-on-the-data-split">Train the model based on the data split</h2> <div class="codehilite"><pre><code>history = model.fit( x=x_train, y=y_train, batch_size=64, epochs=5, verbose=1, validation_data=(x_val, y_val), ) </code></pre></div> <div class="k-default-codeblock"> <div class="codehilite"><pre><code>Epoch 1/5 1418/1418 ━━━━━━━━━━━━━━━━━━━━ 2s 1ms/step - loss: 0.6591 - val_loss: 0.6201 Epoch 2/5 1418/1418 ━━━━━━━━━━━━━━━━━━━━ 1s 894us/step - loss: 0.6159 - val_loss: 0.6191 Epoch 3/5 1418/1418 ━━━━━━━━━━━━━━━━━━━━ 1s 977us/step - loss: 0.6093 - val_loss: 0.6138 Epoch 4/5 1418/1418 ━━━━━━━━━━━━━━━━━━━━ 1s 865us/step - loss: 0.6100 - val_loss: 0.6123 Epoch 5/5 1418/1418 ━━━━━━━━━━━━━━━━━━━━ 1s 854us/step - loss: 0.6072 - val_loss: 0.6121 </code></pre></div> </div> <hr /> <h2 id="plot-training-and-validation-loss">Plot training and validation loss</h2> <div class="codehilite"><pre><code>plt.plot(history.history["loss"]) plt.plot(history.history["val_loss"]) plt.title("model loss") plt.ylabel("loss") plt.xlabel("epoch") plt.legend(["train", "test"], loc="upper left") plt.show() </code></pre></div> <img alt="png" src="/img/examples/structured_data/collaborative_filtering_movielens/collaborative_filtering_movielens_14_0.png" /> <hr /> <h2 id="show-top-10-movie-recommendations-to-a-user">Show top 10 movie recommendations to a user</h2> <div class="codehilite"><pre><code>movie_df = pd.read_csv(movielens_dir / "movies.csv") # Let us get a user and see the top recommendations. user_id = df.userId.sample(1).iloc[0] movies_watched_by_user = df[df.userId == user_id] movies_not_watched = movie_df[ ~movie_df["movieId"].isin(movies_watched_by_user.movieId.values) ]["movieId"] movies_not_watched = list( set(movies_not_watched).intersection(set(movie2movie_encoded.keys())) ) movies_not_watched = [[movie2movie_encoded.get(x)] for x in movies_not_watched] user_encoder = user2user_encoded.get(user_id) user_movie_array = np.hstack( ([[user_encoder]] * len(movies_not_watched), movies_not_watched) ) ratings = model.predict(user_movie_array).flatten() top_ratings_indices = ratings.argsort()[-10:][::-1] recommended_movie_ids = [ movie_encoded2movie.get(movies_not_watched[x][0]) for x in top_ratings_indices ] print("Showing recommendations for user: {}".format(user_id)) print("====" * 9) print("Movies with high ratings from user") print("----" * 8) top_movies_user = ( movies_watched_by_user.sort_values(by="rating", ascending=False) .head(5) .movieId.values ) movie_df_rows = movie_df[movie_df["movieId"].isin(top_movies_user)] for row in movie_df_rows.itertuples(): print(row.title, ":", row.genres) print("----" * 8) print("Top 10 movie recommendations") print("----" * 8) recommended_movies = movie_df[movie_df["movieId"].isin(recommended_movie_ids)] for row in recommended_movies.itertuples(): print(row.title, ":", row.genres) </code></pre></div> <div class="k-default-codeblock"> <div class="codehilite"><pre><code> 272/272 ━━━━━━━━━━━━━━━━━━━━ 0s 714us/step Showing recommendations for user: 249 ==================================== Movies with high ratings from user -------------------------------- Fight Club (1999) : Action|Crime|Drama|Thriller Serenity (2005) : Action|Adventure|Sci-Fi Departed, The (2006) : Crime|Drama|Thriller Prisoners (2013) : Drama|Mystery|Thriller Arrival (2016) : Sci-Fi -------------------------------- Top 10 movie recommendations -------------------------------- In the Name of the Father (1993) : Drama Monty Python and the Holy Grail (1975) : Adventure|Comedy|Fantasy Princess Bride, The (1987) : Action|Adventure|Comedy|Fantasy|Romance Lawrence of Arabia (1962) : Adventure|Drama|War Apocalypse Now (1979) : Action|Drama|War Full Metal Jacket (1987) : Drama|War Amadeus (1984) : Drama Glory (1989) : Drama|War Chinatown (1974) : Crime|Film-Noir|Mystery|Thriller City of God (Cidade de Deus) (2002) : Action|Adventure|Crime|Drama|Thriller </code></pre></div> </div> </div> <div class='k-outline'> <div class='k-outline-depth-1'> <a href='#collaborative-filtering-for-movie-recommendations'>Collaborative Filtering for Movie Recommendations</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#introduction'>Introduction</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#first-load-the-data-and-apply-preprocessing'>First, load the data and apply preprocessing</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#prepare-training-and-validation-data'>Prepare training and validation data</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#create-the-model'>Create the model</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#train-the-model-based-on-the-data-split'>Train the model based on the data split</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#plot-training-and-validation-loss'>Plot training and validation loss</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#show-top-10-movie-recommendations-to-a-user'>Show top 10 movie recommendations to a user</a> </div> </div> </div> </div> </div> </body> <footer style="float: left; width: 100%; padding: 1em; border-top: solid 1px #bbb;"> <a href="https://policies.google.com/terms">Terms</a> | <a href="https://policies.google.com/privacy">Privacy</a> </footer> </html>