Text-based Emotion Aware Recommender

We apply the concept of users' emotion vectors (UVECs) and movies' emotion vectors (MVECs) as building components of Emotion Aware Recommender System. We built a comparative platform that consists of five recommenders based on content-based and collaborative filtering algorithms. We employed a Tweets Affective Classifier to classify movies' emotion profiles through movie overviews. We construct MVECs from the movie emotion profiles. We track users' movie watching history to formulate UVECs by taking the average of all the MVECs from all the movies a user has watched. With the MVECs, we built an Emotion Aware Recommender as one of the comparative platforms' algorithms. We evaluated the top-N recommendation lists generated by these Recommenders and found the top-N list of Emotion Aware Recommender showed serendipity recommendations.


INTRODUCTION
We have illustrated in the paper [1] the benefit of using movie emotion vectors (mvec) and user emotion vectors (uvec) to enhance a Recommender's top-N recommendation making process. The goal of this paper is to make use of mvec and uvec embeddings as emotional components besides making the top-N recommendations, also develop an end-to-end Emotion Aware Recommender (EAR). In the article, [1], the mvec embeddings represent a movie's emotional features derived from the movie overview. We developed a Tweets Affective Classifier (TAC) cable of classifying six primary human emotions, and we added a neutral mood to TAC for affective computing convenience. We use TAC to classify movie overviews to obtain the movie's emotional profile, mvec. A uvec embeddings represent the mean value of all the mvec film moods' embeddings a user has watched. In this paper, we expand the coverage of the mvec embeddings to include other movies' textual metadata, such as genres. We denote the expanded mvec embeddings as item vectors (ivec). In the same token, we named the extended coverage of uvec, wvec.
We demonstrated, in [1], the affective movie recommendation making through an SVD-CF Recommender. In this study, we build a comparative Recommender platform, which makes movie recommendations through Recommender algorithms of Content-based (CB), and Collaborative Filtering (CF). In the case of CB Recommender, we develop a movie genres CB Recommender denotes as Genres Aware Recommender (GAR). We transform mvec embeddings of movie overviews into a multi-label emotion classification in One-Hot Encoded (OHE) embeddings and named the embeddings as ivec. We build an ivec embeddings CB Recommender and denote it as Emotion Aware Recommender (EAR). We then combine the emotion and genres into an expanded ivec for developing a Multi-channel Aware Recommender (MAR). We also construct an Item-based Collaborative Filtering (IBCF) and a User-based Collaborative Filtering (UBCF) Recommenders from scratch. We compare the differences between the five recommender algorithms' comparative performance through the recommendations-making process.
We apply the Cosine Similarity depicted in equation 2 as the primary algorithm in building our Recommender platform. In the case of recommended movies in the top-N recommendations contain similar genres of films that the active user has watched and liked, Cosine Similarity will reveal the closeness in the similarity between the recommended movies and the movies the active user has viewed and liked. Similarly, we can apply Cosine Similarity to find the similarity in the emotion profile of a top-N movies list and the movie's emotion profile of an active user who has watched and loved. Moreover, in UBCF, we apply a rating matrix, R, to compute the collaborative filtering for recommending movies to an active user. Each row of UBCF in R represents the rating value of films a user has watched and rated; whereas, each column in R represents a movie of rating scores it received from users who have viewed and assessed. By comparing the Cosine Similarity between the active user and a user in the corresponding rows, effectively, we compare two rows in R; thus, we know the closeness of the two users. Once we find the closest similarity score of the active user and a particularuser in R, we scan the active user's unwatched movies that match the watched films of the closest similar user. Through collaborative filtering, we make the top-N movie recommendations to the active user. Lastly, we evaluate the performance of Recommenders in the comparative platform by contrasting each top-N recommendation list generated by the five Recommender algorithms. We find the top-N recommendation list made by the Emotion Aware Recommender (EAR) shows intrigue results.
In the advent of the Internet era, large conglomerates, small and medium businesses (SMB), have deployed Recommender Systems to gain a competitive advantage in providing customers a personalized, useful business transaction experience while understanding customers' tastes and decision-making habits. For customers who left feedback regarding their experiences of the goods and services they received, Recommender can mine customers' opinions through sentiment analysis (SA) to better understand the what, why, and how customers' likes and dislikes the goods and services they consumed. Also, if customers have rated the goods and services, Recommender can make use of the rating information along with the sentiment analysis on opinion feedbacks to make a future personalized recommendation of products and services to customers that meet their tastes and expectation. For example, such Recommender is known as Hybrid Recommender System using Collaborative Filtering with Sentiment Analysis (CF-SA) [2]. CF-SA Recommender is also known to outperform the baseline Collaborate Filtering Recommender System in personalized recommendation making [3] [4].
Nevertheless, no Recommender was built with design to explicitly collect human emotions data [5] [6]. Also, no publicly available dataset contains explicit affective features for implementing a Recommender System. The alternative for Recommender researchers is to build an affective aware Recommender by deriving the needed emotional features from some datasets implicitly [7] and [8] [9]. Movies and music datasets are the two most popular datasets with metadata, such as genres and reviews for affective features mining [5].
In the next section, we will level set readers with the related work in the field of affective computing and Emotion Aware Recommenders. Next, we will illustrate the development of the comparative platform for the five Recommender algorithms, Tweets Affective Classifier, and datasets in the methodology section. Next, in the implementation section, we will highlight the five Recommenders with flowcharts. In the evaluation section, we will show the top-N recommendations lists generated by the five Recommender algorithms while contrasting their differences. We also will highlight our observations regarding the limitations and deficiencies of developing the comparative platform. We will document our future work plan in the future work section before closing our report with a conclusion. Following the conclusion section is the reference section and the authors' brief biography.

RELATED WORK
Emotion Aware Recommender System (EAR) is a field in active research. Illustrated below are samples of a few recent works. Orellana-Rodriguez [10] [11] advocated that instead of detecting the affective polarity features (i.e., positive/negative) of a given short video in YouTube, they detect the paired eight basic human emotions advocated by Plutchik [12] [13] into four opposing pairs of basic moods: joy-sadness, anger-fear, trust-disgust, and anticipation-surprise. Orellana-Rodriguez [10] also leveraged the auto extraction of film metadata's moods context for making emotion-aware movie recommendations. Qian et al. [14] proposed an EARS based on hybrid information fusion using user rating information as explicit data, user social network data as implicit information, and sentiment from user reviews as the source of emotional information. They [14] also claimed the proposed method achieved higher prediction ratings and significantly enhanced the recommendation accuracy. Also, Narducci et al. [15] [16] described a general architecture for building an EARS and demonstrated through a music Recommender with promising results.
Moreover, Mizgajski and Morzy [17] formulated an innovative multi-dimensional model EARS for making recommendations on a large-scale news Recommender. The database consists of over 13 million news pages based on 2.7 million unique user's self-assessed emotional reactions resulted in over 160,000 emotional reactions collected against 85,000 news articles. Katarya and Verma [5] [25]. In the case of image-oriented data, Facial Detection, and Recognition (FDR) is the main thrust in research [26] [27] to study basic human emotions through facial expression. For textual based data with subjective writing, Sentiment Analysis (SA) takes the lead [28] [29] [30] to extract emotions from fine-grained sentiment. The aim is to uncover the affective features from texts or images and classify the emotional features into the categories of moods. Paul Ekman, a renowned psychologist and professor emeritus at the University of California, San Francisco, advocated the six basic human moods classification: happiness, sadness, disgust, fear, surprise, and anger [31] [32]. Ekman later added "contempt" as the seventh primary human emotion to his list [33] [34]. Another renowned psychologist, Robert Plutchik, invented the Wheel of Emotions advocated eight primary emotions: anger, anticipation, joy, trust, fear, surprise, sadness, and disgust [12]. Research at Glasgow University in 2014 amended that couple pairs of primary human emotions such as fear and surprise elicit similar facial muscles response, so are disgust and anger. The study broke the raw human emotions down to four fundamental emotions: happiness, sadness, fear/surprise, and disgust/anger [35] [36]. This paper adopts Paul Ekman's classification of six primary human emotions: happiness, sadness, disgust, fear, surprise, and anger for modeling the ivec embeddings while adding "neutral" as the seventh emotion feature for convenience in affective computing.
FDR on facial expression has a drawback -it fails to classify an image's emotional features with the absence of human face on the image. In the case of using FDR to classify movie poster images, often, the poster may contain a faceless image. Thus, we propose to indirectly classify the affective features of a poster image through textual-based emotion detection and recognition (EDR) using a movie overview rather than facial-based FDR directly on the poster image.

METHODOLOGY
We propose an innovative method as our contribution to Recommender research, which based on the following sources:  item's explicit rating information  item's implicit affective data embeddings  user's emotion and taste profile embedding To implement an end-to-end Multi-channel Emotion Aware Recommender System (e2eMcEARS) or McEAR for short. Several researchers have documented that emotions playing an essential role in the human decision-making process [ [46]. We envision that affective embeddings can represent any product or service. In our previous work [1], we illustrated a method to derive an emotion classifier from tweets' affective tags and use the affective model to predict the mood of a movie through the movie overview. We denoted the mood embeddings of the movie as mvec. We also stated that the value of the embedding of a mvec would hold the same value throughout its lifespan. Also, we denote uvec represents the average value of all mvec of the movies a user has watched. The value of uvec will change each time the user watches a movie. We want to expand the coverage of the mvec to other metadata of the movie, such as genres. We denote the expanded mvec as item embeddings (ivec), which holds the mood embeddings of movie overview and genres. Similarly, uvec will expand its embedding as the average value of all ivec of the movies a user has consumed. We denote the expanded uvec embeddings as wvec.

Overview of the Tweets Affective Classifier Model
We developed the Tweets Affective Classifier (TAC), as illustrated in [1], which employed an asymmetric butterfly wing double-decker bidirectional LSTM -CNN Conv1D architecture to detect and recognize emotional features from tweets' text messages. We have preprocessed the seven emotion words embeddings to be used as input to train TAC through the pre-trained GloVe embeddings using the glove.twitter.27B.200d.txt dataset. We have two types of input words embeddings: trainable emotion words embeddings and frozen emotion words embeddings. By frozen the embeddings, we mean the weights in the embeddings are frozen and cannot be modified during TAC's training session. We started with the first half of the butterfly wing by feeding preprocessed TAC input emotion words embeddings to the double-decker bidirectional LSTM neural nets. We fed the frozen emotion words embeddings to the top bidirectional LSTM and fed the trainable emotion words embeddings to the bottom bidirectional LSTM. Next, we concatenated the top and bottom bidirectional LSTM to form the double-decker neural net. We fed the output from the double-decker bidirectional LSTM to seven sets of CNN Conv1D neural nets with the dropout parameter set at 0.5 in each set of Conv1D as regularization to prevent the neural net from overfitting. We then concatenated all the outputs of Conv1Ds to form the overall output of the first half of the butterfly wing neural nets.
The architecture layout of the second half of the butterfly wing neural nets is different from the peer. We started by setting up seven pairs of CNN Conv1D neural nets. With each pair of Conv1D, we fed in parallel the preprocessed TAC's frozen emotion words embeddings as input to a Conv1D and the trainable emotion words embeddings to the other. We set the dropout parameter at 0.5 for all seven pairs of conv1D to prevent overfitting. We concatenated all the outputs of seven pairs Conv1D to become a single output and fed that to a single bidirectional LSTM with the dropout value set at 0.3. We then concatenated the first half of the butterfly wing output with the second half to form the overall output. Next, the output then fed through in series to a MaxPooling1D with the dropout value set at 0.5, followed by a Flatten neural net before going through a Dense neural net and another Dense neural net with sigmoid activation to classify the emotion classification in a probabilistic distribution. When predicting a movie's emotion profile using TAC, the classifier will classify the mood of a movie through the movie overview. TAC output the movie emotion prediction in the form of the probabilistic distribution of seven values, indicating the value in percentage of each class of the seven emotions, or the emotion profile of the movie.

Overview of Comparative Platform for Recommenders
Building the comparative platform for Recommenders from scratch provides a way to study and observe the process of making recommendations under different context situations. We apply the most basic method to build the collection of Recommenders in the comparative platform. Thus, we are not aiming for best practice algorithms to build Recommender with a high performance nor high throughput in mind; but it is easy to modify and adapt to a different information context, and highly functional is most desirable. A Recommender is known to build with a specific domain in mind. As we march down the path of researching Emotion Aware Recommenders, we want the comparative platform that we are developing for the movie-oriented Recommenders can later transfer the learning to other information domains.
We reckon that in the context of movie domain, for example, a Genres Aware Recommender (GAR) may be adequate for making movie recommendations through movie genres, but without some adaptable in processing logic, the movie GAR may not handle well when feeding it with music genres. Of course, movie GAR will fail to make recommendations if we feed other domain data absence of genre information. However, primary human emotions are the same universally in different races and cultures. Once we obtain an emotion profile of a user obtains from a domain, the user's same emotion profile should be transferable to other domains with no required modification. The caveat is that the other domain must contain data that is emotion detectable and recognizable or emotion aware enable.

Datasets
The success of any machine learning project requires large enough domain-specific data for computation. For movie-related affecting computing, no affective labeled dataset is readily available. Thus, we need to build the required dataset by deriving it from the following sources. For movie rating datasets, we obtained these datasets from the GroupLens' MovieLens repository [47]. We scraped The Movie Database (TMDb) [48] for movie overviews and other metadata. MovieLens contains a "links" file that provides cross-reference links between MovieLens' movie id, TMDb's tmdb id, and IMDb's imdb id. We connect MovieLens and TMDb datasets through the "links" file.
Using a brute force method, we scrape the TMDb database for movie metadata, particularly for movie overview or storyline, which contains subjective writings of movie descriptions that we can classify the mood of the text. We can query the TMDb database by tmdb id, a unique movie identifier assigned to a movie. The tmdb id starts from 1 and up. However, in the sequence of tmdb id, gaps may exist between consecutive numbers. Our scraping effort yields 452,102 records after the cleansing of raw data that we scraped from TMDb.
We developed a seven text-based emotion classifier capable of classifying seven basic human emotions in tweets, as illustrated in [1]. We apply the Tweets Affective Classifier (TAC) to classify the moods of movie overviews by running TAC through all the 452,102 overviews that scraped from the TMDb database to create a movie emotion label dataset.
MovieLens datasets come in different sizes. We work with the following MovieLens datasets: the ml-20m dataset, 20 million rating information; the ml-latest-small dataset, about ten thousand rating information of 610 users; ml-latest-full dataset, holds 27 million rating information; and the recently leased ml-25m dataset, with 25 million rating information. The name of the MovieLens dataset coveys the number of ratings, movies, users, and tags contained in the dataset. Table 1 depicts the number of ratings, users, and movies; each of the MovieLens datasets contain. Each of the depicted MovieLens datasets provides a links file to cross-reference between MovieLens and two other movie databases, TMDb and the Internet Movie Database (IMDb ) [49], through movie id, tmdb id, and imdb id. MovieLens maintains a small number of data fields, but users can link it to TMDb and IMDb databases via the links file to access other metadata that MovieLens lacks. The ml-latest-full dataset is the largest in the MovieLens dataset collection. However, the mllatest-full dataset will change over time and is not proper for reporting research results. We use the ml-latest-small, and ml-latest-full datasets in proof of concept and prototyping, not research reporting work. The other MovieLens 20M and 25M datasets are stable benchmark datasets which we will use for research reporting.
Although we have scraped 452,102 movie overviews from TMDb when merging with MovieLens, we can only make use of one-eighth of the number of overviews that we have collected. Table 2 shows the number of movie overviews the MovieLens datasets can extract from TMDb after cleaning from raw data. We merged the MovieLens datasets with the emotion label datasets obtained from TAC. Form our cleansed ml-latest-small training dataset of 9625 rows extracted from the raw 9742 rows, after merging with the emotion label dataset, the applicable data point row is down to 9613. MovieLens datasets are known for preprocessed and cleaned datasets. Nevertheless, when going through the necessary data preparation steps, we still experienced a 1.32% data loss from the original dataset. Depicted below in table 3 is the first few rows of the final cleansed training dataset.

Recommender Platform
We develop a movie Recommender platform for our study to evaluate five Recommender algorithms in movie recommendations making. We employ the following five Recommender algorithms in the Recommender platform.
• an Item-based Collaborative Filtering (IBCF) movie Recommender to compute pairwise items Cosine Similarity, as depicts in equation 2 for identifying the closeness of similar items. The rating matrix, R, configures with rows representing movie titles, and columns representing users. • a User-based Collaborative Filtering (UBCF) movie Recommender to compute pairwise users Cosine Similarity, as depicts in equation 2 for identifying the closeness of similar users. The rating matrix, R, configures with rows representing users, and columns representing movie titles. • A genre-aware Content-based Recommender (GAR) using Cosine Similarity as depicted in equation 2 to compute the pairwise similarity between two movies' genres. • an emotion aware Content-based Recommender (EAR) using Cosine Similarity as defined in equation 2 to compute the pairwise similarity between two emotion aware movies • an emotion and genres aware multi-modal Content-based Recommender (MAR) using Cosine Similarity as depicted in equation 2 to compute the pairwise similarity between two items with affective awareness and genres embeddings.

EVALUATION
We deployed the ml-latest-small dataset as the training dataset. We then randomly picked user id 400 as the active test user. We created the testing dataset from concatenating the other MovieLens datasets: ml-20m, ml-25m, and ml-latest-full. We extracted all data points that was belonging to the user id 400 and removed all the duplicated data points from the testing dataset and those found in the training dataset and named it as test400 dataset. The list of movies contains in the test400 dataset represents the list of movies the active user id 400 has yet watched.
We compare the top-20 movie list generated by the Recommender algorithms against the active user unseen movie list in the testing dataset. We also get each top-5 list from each top-20 list by computing the closest similarity between the active user's wvec and each movie's ivec on the top-20 list and sorted in the descending order. In the top-5 list, films indicate a high probability the active user may accept one of the movies from the recommendations. However, the assumption we make on the active user choosing one of the unwatched films from the recommendations has a drawback. If a movie the active user likes to watch does not appear on the list, he would not choose the cinema but wait till he sees the popular film shows up on the recommendation list.

FUTURE WORK
We started the track to study the impact of affective features may have on Recommender Systems by examining how emotional attributes can interplay at the stage of the Recommender making top-N recommendations [1]. In this paper, we introduced a way to make Recommender emotionally aware. We focused on extracting affective features from textual movie metadata. We plan soon to perform an in-depth study in Multi-channel Emotion Aware Recommender by extracting emotion features from images such as movie posters as a component in building the Recommender. We also intrigued by the idea of using users' emotion profiles to enhance Group Recommenders in user grouping, group formation, group dynamics, and group decision making.