Creating a Hybrid Neural Network for Movie Recommendations using TensorFlow Recommenders
Are you tired of scrolling through endless lists of movies on streaming platforms, only to find nothing that piques your interest? Get ready to say goodbye to the frustration of endless browsing. Keep reading to find out how to build your hybrid neural network for movie recommendations using TensorFlow Recommenders.
In this article, we will be discussing how deep learning techniques can be used to build personalized recommendations. The purpose of this article is to provide a comprehensive guide on how to design and implement a deep learning-based movie recommendation system using the TensorFlow Recommenders library. We will be using a hybrid neural network model that combines deep learning with traditional collaborative filtering techniques, and the popular MovieLens dataset to train and evaluate our model.
Deep learning: Using artificial neural networks to analyze large amounts of movie data and make personalized movie recommendations based on a user’s past preferences and behaviors.
Traditional Collaborative Filtering: A recommendation system that uses past interactions between users and items to make recommendations. It assumes that similar users will like similar items and generates recommendations based on the past behavior of similar users.
Explanation of the deep learning techniques used
We will be using deep learning techniques to build a movie recommendation system. Specifically, we will be implementing a hybrid neural network model.
Hybrid neural network model: Combines the power of deep learning with the efficiency of traditional collaborative filtering techniques. The model takes into account both explicit ratings given by users and implicit signals such as genres and uses a combination of neural networks and matrix factorization to make personalized movie recommendations.
The model takes into account both explicit signals ratings given by users and implicit signals such as genres. By leveraging these signals, the model tendencies the preferences and tastes of users and make personalized recommendations.
A hybrid neural network is like having the best of both worlds! Picture a superhero who can fly and has super strength. That’s a hybrid neural network in a nutshell. It takes the accuracy of deep learning (flying) and combines it with the efficiency of traditional collaborative filtering techniques (super strength). With this super combination, the hybrid neural network can make incredibly accurate movie recommendations based on both what you explicitly tell it (your favorite genres) and what it can infer from your past behavior (the ratings you’ve given). So next time you’re scrolling through Netflix, think of the hybrid neural network as your trusty sidekick, always there to suggest the perfect movie for you.
The movie recommendation model uses a technique called embeddings to understand the patterns and relationships in the data. Embeddings are like shortcuts that summarize big and complex information into smaller, simpler parts.
To make movie recommendations, the model uses a type of neural network that has multiple layers. The network is trained to predict how much a user would like a movie based on the embeddings of the users and movies.
To do this, the model breaks down the information about users and movies into two smaller parts, called user and movie matrices. These matrices are much easier for the model to understand and work with, which allows it to make better predictions. This technique of breaking down information is called matrix factorization.
With the help of a hybrid neural network, you can now say goodbye to scrolling through never-ending movie lists on streaming platforms. No more frustration and disappointment of finding nothing that interests you. Instead, get ready to bring the drama, the laughter, and the excitement straight to your screen! A hybrid neural network is like your very own movie personal assistant, handpicking the perfect flicks just for you, based on your preferences. So sit back, it’s time to bring the drama (and comedy) to your screen!
Steps:
- Data Collection and Preprocessing
- Building the hybrid neural network
- Training and evaluating the neural network
- Making personalized recommendations
Step 1: Data Collection and Preprocessing
Description of the MovieLens dataset
The MovieLens dataset is a widely used dataset for building recommendation systems. It contains over 20 million ratings and 600,000 tag applications applied to 42,000 movies by 280,000 users. The data was collected by the GroupLens Research group at the University of Minnesota. In this project, we will be using the latest version of the dataset, which includes both explicit ratings given by users and implicit signals such as movie genres, etc. The dataset was obtained from the GroupLens website and was preprocessed to fit the format required by TensorFlow Recommenders for training and evaluation.
Explanation of the data preprocessing steps, such as cleaning and transforming the data
The data preprocessing step involves cleaning and transforming the raw data to make it suitable for training the model. The MovieLens dataset, which is used in this project, requires several preprocessing steps to make it ready for training.
Code:
- Importing the necessary libraries: The code imports several libraries to support the implementation of the movie recommendation system. These libraries include pandas, numpy, seaborn, matplotlib, TensorFlow, tensorflow_recommenders, and sklearn
import string # for string manipulation operations
import re # for regular expression operations n.
import pandas as pd #for data manipulation and analysis
import numpy as np #for numerical computing
import seaborn as sns #for data visualization
import matplotlib.pyplot as plt #for data visualization
import tensorflow as tf #for building and training deep learning models
import tensorflow_recommenders as tfrs #as a high-level API for building recommendation models on top of TensorFlow
from collections import Counter #for counting occurrences of elements in a list or other iterable
from typing import Dict, Text #for declaring data types in a function or module
from ast import literal_eval #for evaluating strings as code
from datetime import datetime #for working with dates and times
from wordcloud import WordCloud #for generating word clouds from text
from sklearn.preprocessing import MinMaxScaler #for feature scaling
from sklearn.feature_extraction.text import TfidfVectorizer #for converting text into numerical features
from sklearn.metrics.pairwise import cosine_similarity #for measuring similarity between two non-zero vectors of an inner product space.
import random #To generate random numbers to select random movies for user to rate
import warnings
warnings.filterwarnings('ignore')
2. Importing the data: The code is loading three datasets, ”credits”, “keywords”, and “movies_metadata”, into memory.
#Import the required datasets
credits = pd.read_csv('../input/the-movies-dataset/credits.csv')
keywords = pd.read_csv('../input/the-movies-dataset/keywords.csv')
movies = pd.read_csv('../input/the-movies-dataset/movies_metadata.csv').\
drop(['belongs_to_collection', 'homepage', 'imdb_id', 'poster_path', 'status', 'title', 'video'], axis=1).\
drop([19730, 29503, 35587]) # Incorrect data type
movies['id'] = movies['id'].astype('int64')
#Merge all the datasets together
df = movies.merge(keywords, on='id').merge(credits, on='id')
df.head()
- credits.csv: contains information about the cast and crew involved in the production of movies
- keywords.csv: contains a list of keywords or phrases associated with the movies in the dataset
- movies_metadata.csv: contains information about various movies including title, release date, budget, revenue, genres, production companies, and more
The movies_metadata dataset is cleaned and transformed by dropping certain columns with no useful information and removing rows with the incorrect data type.
The data is then merged into a single data frame, df, using the merge() method, which combines two data frames based on a common column, in this case, “id”. Finally, the code replaces missing values with empty strings and zeros, then drops any remaining missing values from the dataset.
3. Preprocessing the data: After merging, the code performs several cleaning and processing steps such as converting data types, filling in missing values, and transforming data. This includes converting the ‘id’ column to int64, filling in missing values in columns such as original_language, runtime, and tagline, and dropping any remaining missing values. The code then applies the get_text function to several columns such as genres, production_companies, production_countries, and cast to extract relevant information from the columns. Finally, it creates new columns to store information from the cast column such as characters and actors, drops the original cast column, removes duplicate entries, and resets the index.
#Fill in the missing values
df['original_language'] = df['original_language'].fillna('')
df['runtime'] = df['runtime'].fillna(0)
df['tagline'] = df['tagline'].fillna('')
df.dropna(inplace=True)
#A function to extract the useful information from the columns
def get_text(text, obj='name'):
text = literal_eval(text)
if len(text) == 1:
for i in text:
return i[obj]
else:
s = []
for i in text:
s.append(i[obj])
return ', '.join(s)
#Applying the functions to the columns
df['genres'] = df['genres'].apply(get_text)
df['production_companies'] = df['production_companies'].apply(get_text)
df['production_countries'] = df['production_countries'].apply(get_text)
df['crew'] = df['crew'].apply(get_text)
df['spoken_languages'] = df['spoken_languages'].apply(get_text)
df['keywords'] = df['keywords'].apply(get_text)
#Create new columns
df['characters'] = df['cast'].apply(get_text, obj='character')
df['actors'] = df['cast'].apply(get_text)
df.drop('cast', axis=1, inplace=True)
df = df[~df['original_title'].duplicated()]
df = df.reset_index(drop=True)
#Changing the datatype of certain columns
df['release_date'] = pd.to_datetime(df['release_date'])
df['budget'] = df['budget'].astype('float64')
df['popularity'] = df['popularity'].astype('float64')
#Here, I am choosing movies released after 2009 and which were originally shot in English
df = df[df['release_date'].dt.year>2009]
df = df[df['original_language']=='en']
df = df.reset_index()
4. Importing the datasets of the user ratings: The code is used to clean and process a movie ratings dataset. It starts by importing the “ratings” dataset using pd.read_csv()
. Then, the data type of the "date" column is changed using the apply()
method and datetime.fromtimestamp()
. The original "timestamp" column is then dropped using drop()
. The code then merges additional columns from another dataset, df
which includes information about the movie titles, genres, and overviews. Finally, the resulting data frame is filtered to only include non-null values for the "id" column and the index is reset. The final data frame is displayed using head()
.
#Importing the dataset of the ratings
ratings_df = pd.read_csv('../input/the-movies-dataset/ratings.csv')
#Changing the datatype of the 'date' column
ratings_df['date'] = ratings_df['timestamp'].apply(lambda x: datetime.fromtimestamp(x))
ratings_df.drop('timestamp', axis=1, inplace=True)
#Adding the columns from the merged dataset
ratings_df = ratings_df.merge(df[['id', 'original_title', 'genres', 'overview']], left_on='movieId',right_on='id', how='left')
ratings_df = ratings_df[~ratings_df['id'].isna()]
ratings_df.drop('id', axis=1, inplace=True)
ratings_df.reset_index(drop=True, inplace=True)
ratings_df.head()
5. Filtering the data for the required columns for prediction: This code is extracting information from two data frames — ‘df’ and ‘ratings_df’. The first data frame ‘df’ contains ‘id’ and ‘original_title’ columns, which are selected and renamed to ‘movie’. The second data frame ‘ratings_df’ contains ‘userId’, ‘original_title’, and ‘rating’ columns, which are also selected to form a new data frame ‘ratings_df’. Both these data frames, ‘movies_df’ and ‘ratings_df’ are then displayed using the ‘.head()’ function.
#Selecting the id and title column of the merged dataframe
movies_df = df[['id', 'original_title']]
movies_df.rename(columns={'id':'movieId'}, inplace=True)
movies_df.head()
#Selecting the UserId, title, and rating from the ratings dataframe
ratings_df = ratings_df[['userId', 'original_title', 'rating']]
ratings_df.head()
6. Resetting the index: The code is resetting the index of two data frames, movies_df
and df
. The index movies_df
is reset using the reset_index
method, which replaces the current index with a default sequential index starting from 0. After resetting the index, the head of the movies_df
is displayed. Similarly, the index of the df
data frame is reset using the reset_index
method and its head is also displayed. Finally, the 'level_0' column, which was added during the resetting of the index, is deleted from the df
data frame using the del
statement.
#Resetting the index of the movies dataframe
movies_df = movies_df.reset_index()
movies_df.head()
#Resetting the index of the merged dataframe
df = df.reset_index()
df.head()
del df['level_0']
7. Creating a new data frame sorted by popularity: The code sorts the data frame df
by the column 'popularity' in descending order. The data frame is then reset to its index and displays the first five rows using head()
the method. This is to let the user rate the most popular ones according to the popularity score.
#Sorting by popularity to let the user rate the most popular ones
pdf = df.sort_values('popularity', ascending=False)
pdf = pdf.reset_index()
pdf.head()
8. Asking the user to rate some of the most popular movies: This code generates a random list of 10 movie names from a sorted data frame called “pdf”. The code then prompts the user to enter their name, and for each of the 10 randomly chosen movie names, it displays the movie name, its overview, and cast, and then asks the user to rate the movie on a scale of 5. If the user doesn’t know the movie, they can enter “dk” and the loop will continue to the next movie. The user’s ratings are then stored in a data frame called “ratings_df” as a new row with the user’s ID, movie name, and rating.
#Random number generation for the user to rate movies out of 5
user_id = 270897
m = 3
randomList = []
for i in range(10):
r=random.randint(1,10)
if r not in randomList:
randomList.append(r)
print("Welcome")
name = input("Enter your name:")
for i in randomList:
movie = pdf['original_title'][i]
print(i)
movie_row = pdf[pdf['original_title'] == movie]
print()
print(movie)
print("Overview: ", pdf[pdf['original_title'] == movie]['overview'][i])
print("Cast: ", pdf[pdf['original_title'] == movie]['actors'][i])
user_r = input('Enter your rating out of 5 (Enter dk if you dont know the movie)')
if(user_r=='dk'):
continue;
else:
user_r = int(user_r)
new_row = {'userId':270897, 'original_title':movie, 'rating':user_r}
ratings_df = ratings_df.append(new_row, ignore_index=True)
Here is an example of a part of the output of the code:
Step 2: Building the Hybrid Neural Network:
Introducing embeddings
The recommendation system is built using dense embeddings of low-dimensional representations of high-dimensional data used in machine learning. They encode discrete variables like movie titles and user IDs into a continuous numerical space, capturing patterns and relationships in the data. In recommendation systems, embeddings are used to represent movies and users in a way that captures their similarities. They are then used as inputs to a model that predicts user ratings or preferences based on movie and user embeddings. Using embeddings allows for more accurate and personalized recommendations.
The architecture of the recommendation model
The recommendation model architecture is a deep neural network with multiple layers, designed to learn complex patterns in data. It uses matrix factorization to decompose the rating matrix into two lower-dimensional matrices representing users and movies, learning preferences, and making personalized recommendations. This technique also incorporates both explicit and implicit information, reducing data dimensionality for more efficient training and deployment.
Discussion of the deep learning techniques used in the model, such as neural networks and collaborative filtering
Deep learning allows the model to learn complex, non-linear relationships between users and items. In the movie recommendation system we are discussing, a hybrid neural network model is used that combines both deep learning and traditional collaborative filtering techniques.
Neural networks are used in the recommendation model to learn the underlying patterns and relationships in the data. The architecture of the model consists of multiple layers of interconnected neurons, each layer representing a different level of abstraction in the data. The model is trained to predict user ratings for movies based on their dense embeddings of low-dimensional representations of high-dimensional data.
Collaborative filtering is a traditional technique used in recommendation systems, which makes recommendations based on the past behavior of users. In the movie recommendation system, collaborative filtering is used in conjunction with deep learning to make personalized recommendations. The model takes into account both explicit ratings given by users and implicit signals such as movie views and clicks and uses these signals to learn the preferences and tastes of users. By leveraging the strengths of both deep learning and collaborative filtering, the hybrid model can make more accurate and personalized recommendations.
Code
- Creating the required data from the data frames for the model: Two datasets, “ratings” and “movies” are created from the ratings_df and movies_df data frames respectively. The “ratings” dataset is transformed to contain only three columns, “original_title”, “userId”, and “rating”. The “movies” dataset is transformed to contain only the “original_title” column. These datasets are used as input to the recommendation model.
#Changing the datatype to string
ratings_df['userId'] = ratings_df['userId'].astype(str)
#Creating data for the model
ratings = tf.data.Dataset.from_tensor_slices(dict(ratings_df[['userId', 'original_title', 'rating']]))
movies = tf.data.Dataset.from_tensor_slices(dict(movies_df[['original_title']]))
ratings = ratings.map(lambda x: {
"original_title": x["original_title"],
"userId": x["userId"],
"rating": float(x["rating"])
})
movies = movies.map(lambda x: x["original_title"])
2. Defining the class to create the model: Defines a TensorFlow class called MovieModel
, which implements a deep learning model for movie recommendation. The model is constructed with two user and movie models, which are used to produce user and movie embeddings. These embeddings are then passed through a multi-layer neural network to predict ratings. The model also defines two tasks, rating, and retrieval, which correspond to the two losses that the model optimizes. The rating task is to predict the rating scores, while the retrieval task is to retrieve similar movies based on user embeddings and movie embeddings. The final loss is a weighted sum of the two task losses. The user and movie models, as well as the loss weights, are passed as parameters to the constructor of the MovieModel
class.
#Creating the TensorFlow Model
class MovieModel(tfrs.models.Model):
def __init__(self, rating_weight: float, retrieval_weight: float) -> None:
# We take the loss weights in the constructor: this allows us to instantiate
# several model objects with different loss weights.
super().__init__()
embedding_dimension = 64
# User and movie models.
self.movie_model: tf.keras.layers.Layer = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_movie_titles, mask_token=None),
tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension)
])
self.user_model: tf.keras.layers.Layer = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_user_ids, mask_token=None),
tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
])
# A small model to take in user and movie embeddings and predict ratings.
# We can make this as complicated as we want as long as we output a scalar
# as our prediction.
self.rating_model = tf.keras.Sequential([
tf.keras.layers.Dense(256, activation="relu"),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(1),
])
# The tasks.
self.rating_task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.RootMeanSquaredError()],
)
self.retrieval_task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=movies.batch(128).map(self.movie_model)
)
)
# The loss weights.
self.rating_weight = rating_weight
self.retrieval_weight = retrieval_weight
def call(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
# We pick out the user features and pass them into the user model.
user_embeddings = self.user_model(features["userId"])
# And pick out the movie features and pass them into the movie model.
movie_embeddings = self.movie_model(features["original_title"])
return (
user_embeddings,
movie_embeddings,
# We apply the multi-layered rating model to a concatentation of
# user and movie embeddings.
self.rating_model(
tf.concat([user_embeddings, movie_embeddings], axis=1)
),
)
def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
ratings = features.pop("rating")
user_embeddings, movie_embeddings, rating_predictions = self(features)
# We compute the loss for each task.
rating_loss = self.rating_task(labels=ratings,predictions=rating_predictions)
retrieval_loss = self.retrieval_task(user_embeddings, movie_embeddings)
# And combine them using the loss weights.
return (self.rating_weight * rating_loss + self.retrieval_weight * retrieval_loss)
3. Creating a model based on the class definition: The model is compiled using the Adagrad optimizer, which is an optimization algorithm used in deep learning to update the model weights during training. The optimizer is configured with a learning rate of 0.1. The training data (train) is shuffled, grouped into batches of 1,000 instances each, and cached. The same is done for the test data (test). The model is trained by calling the fit method on the cached training data, with the number of training epochs set to 3. An epoch is a complete iteration over the entire training dataset. During each epoch, the model updates its parameters to minimize the loss function, which measures the difference between the model’s predictions and the actual training labels.
#Creating a model based on the above Class defintion
model = MovieModel(rating_weight=1.0, retrieval_weight=1.0)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
cached_train = train.shuffle(100_000).batch(1_000).cache()
cached_test = test.batch(1_000).cache()
model.fit(cached_train, epochs=3)
Step 3: Training and Evaluating the Model:
Description of the training process and the evaluation metrics used
A new instance of the “MovieModel” class is created, with the rating_weight and retrieval_weight parameters set to 1.0. This instance of the class will serve as the model for the training.
Code
- Evaluate the model: This code evaluates the performance of a model for movie recommendation by calculating two evaluation metrics. The first metric is “Retrieval top-100 accuracy” which measures the accuracy of the top 100 predictions for movie ratings. The second metric is “Ranking RMSE” which measures the root mean squared error of the predicted movie ratings. The results of the evaluation are then printed in a format showing the accuracy and the RMSE values to three decimal places.
#Evaluating the model
metrics = model.evaluate(cached_test, return_dict=True)
print(f"\nRetrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}")
print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}")
Step 4: Making Personalized Recommendations:
Explanation of how the trained model can be used to make personalized movie recommendations
The trained model can be used to make personalized movie recommendations by querying the trained embeddings. Embeddings are dense, low-dimensional representations of high-dimensional data that capture the underlying patterns and relationships in the data. In the case of movie recommendations, the embeddings represent the movies and the users. The trained model can be used to predict user ratings for movies based on their embeddings. This is done using a technique called matrix factorization. By querying the trained embeddings, the model can identify the movies that are most similar to the ones the user has liked in the past, and recommend them to the user. Additionally, the model can also take into account other implicit signals such as movie views and clicks to make more personalized recommendations.
- Functions for recommending movies to the users: This code defines two functions in Python:
predict_movie
andpredict_rating
.
The predict_movie
function takes in a user ID and a value for the number of recommendations (top_n
) to display, and uses the trained movie recommendation model to generate movie recommendations for the user. It creates an index from the movie dataset and uses it to retrieve the top top_n
recommendations for the user. The recommended movies are then printed on the console.
The predict_rating
function takes in a user ID and a movie title, and uses the trained movie recommendation model to predict the rating the user will give for the movie. It passes the user ID and movie title as inputs to the model, and the predicted rating is returned and printed to the console.
#A function to recommend movies to the user
def predict_movie(user, top_n=3):
# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
# recommends movies out of the entire movies dataset.
index.index_from_dataset(
tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))
)
# Get recommendations.
_, titles = index(tf.constant([str(user)]))
print('Top {} recommendations for user {}:\n'.format(top_n, name))
for i, title in enumerate(titles[0, :top_n].numpy()):
print('{}. {}'.format(i+1, title.decode("utf-8")))
#Predict the rating a user will give for a movie
def predict_rating(user, movie):
trained_movie_embeddings, trained_user_embeddings, predicted_rating = model({
"userId": np.array([str(user)]),
"original_title": np.array([movie])
})
print("Predicted rating for {}: {}".foxrmat(movie, predicted_rating.numpy()[0][0]))
#Predicting the movies for the users
predict_movie(270897, 5
Example of the output:
2. Displaying the data of the recommendations: The code is a recommendation system that recommends the top 5 movies to a user. The system starts by creating an index using the tfrs.layers.factorized_top_k.BruteForce
method and passing the model.user_model
as an argument. The index is then built from the entire movie dataset using the index_from_dataset
method. The index
the function is then used to recommend movies to the user. The details of the recommendations are stored in a new data frame, pred_df
by merging the details of the recommended movies from the original data frame df
on the 'original_title' column. The details of the recommendations are then displayed, including the movie title, overview, genres, and cast.
#Get meta data for recommendations
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
# recommends movies out of the entire movies dataset.
index.index_from_dataset(
tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))
)
#Store the details of the recommendations in a new datafram
_, titles = index(tf.constant(['270897']))
pred_movies = pd.DataFrame({'original_title': [i.decode('utf-8') for i in titles[0,:5].numpy()]})
pred_df = pred_movies.merge(df[['original_title', 'genres', 'overview', 'actors']], on='original_title', how='left')
pred_df = pred_df[~pred_df['original_title'].duplicated()]
pred_df.reset_index(drop=True, inplace=True)
pred_df.index = np.arange(1, len(pred_df)+1)
#Displaying the details of the recommendations
for i in range(1,6):
print("Details about your recommendations:")
print("Number: ", i)
print("Movie: ", pred_df['original_title'][i])
print("Overview: ", pred_df['overview'][i])
print("Genres: ", pred_df['genres'][i])
print("Cast: ", pred_df['actors'][i])
print()
Example of the output:
Conclusion
Well, well, if it isn’t the movie recommendation system, giving me 5 movie predictions to watch. To be honest, I have to say I might be a bit biased when it comes to these recommendations. I did spend a ton of time working on it, so, I have to give it some love. I’d say out of those 5, I’d be willing to watch 4 of them. Not gonna lie, I have high standards when it comes to movies, and I won’t settle for anything less than an amazing experience. But hey, you never know, the fifth one might surprise me and be the hidden gem in the bunch. Here’s to hoping for a movie marathon filled with laughter, tears, and all the feels!
Summary of the steps
The article focuses on building a movie recommendation system using deep learning techniques. The TensorFlow Recommenders (TFRS) library is used to build and train the recommendation model, and the MovieLens dataset is used to evaluate the model’s performance. The recommendation model is a hybrid neural network that combines deep learning and traditional collaborative filtering techniques. This allows the model to take into account both explicit ratings given by users as well as implicit signals such as movie views and clicks.
The deep learning techniques used in the model include the use of embeddings, which are dense, low-dimensional representations of high-dimensional data. These embeddings are used to capture the underlying relationships between different entities, such as users and movies. Additionally, neural networks model the complex relationships between users and movies. Matrix factorization is also used to decompose the rating matrix into two lower-dimensional matrices, which helps to capture the underlying patterns in the data.
Before training the model, the data must undergo preprocessing steps to ensure it is in a suitable format. These preprocessing steps involve cleaning and transforming the data, such as removing any misz§sing values or transforming categorical data into numerical data.
The results of the model show that it can provide personalized movie recommendations based on user data. These recommendations can be presented to the user in various ways, such as a ranked list of recommended movies or a personalized movie playlist.
Discussion of the potential future developments and improvements to the model
One potential future development is to incorporate reinforcement learning techniques into the model. This can help the model to learn from user interactions with the recommendations and make continuous improvements over time. Additionally, there is potential to integrate the model into existing movie streaming platforms or websites to provide real-time recommendations to users. These developments and improvements can help to take the movie recommendation system to the next level and provide even more value to users.
In conclusion, the article highlights the importance of combining deep learning and traditional collaborative filtering techniques in recommendation systems and provides a comprehensive overview of the process of building such a system. There is always room for improvement in any machine learning model, and future developments could include incorporating additional data sources, such as social media data or demographic data, to improve the model’s accuracy and personalization. Furthermore, the use of newer deep learning techniques, such as self-attention mechanisms, could also be explored to further improve the model’s performance.
Credit (The original code): https://www.kaggle.com/code/mfaaris/hybrid-and-tensorflow-recommender-system [I used this code to build the model and I added the part about the user rating movies on top of this code]
TensorFlow Recommenders” library documentation: https://www.tensorflow.org/recommenders
Original dataset: https://grouplens.org/datasets/movielens/
Dataset on Kaggle: https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset
“Deep Learning” by Ian Goodfellow, Yoshua Bengio, and Aaron Courville: http://imlab.postech.ac.kr/dkim/class/csed514_2019s/DeepLearningBook.pdf (Combining collaborative filtering with neural networks)
GitHub code link: https://github.com/Thesavagecoder7784/DeepLearningforMovieRecommendation/blob/main/Deep%20Learning%20Movie%20Recommendation%20System.ipynb