推荐系统 中的数学
Have you ever wondered how your streaming services and sites you shop from, Netflix and Amazon prime and other e-commerce websites magically knows what you are in the mood for? How do they know exactly what kind of things you might be interested in? Let’s find out.
您是否曾经想过,从Netflix和Amazon Prime以及其他电子商务网站上购买的流媒体服务和网站如何神奇地知道您的心情? 他们如何确切地知道您可能会对哪种事物感兴趣? 让我们找出答案。
When was the last time you did online shopping?
您上次购物是什么时候?
Did you noticed the website or application showing you “items you might be interested in”, or something like “Customers also brought this along with this”? You guessed it right, that’s what recommendation system does. It’s an algorithm which suggests you items to buy or watch based on your search and buy history and other similar user’s history. You know data is the new oil. Now, the obvious question comes.
您是否注意到网站或应用程序向您显示“您可能感兴趣的项目”,或诸如“客户也将其与此一同带来”之类的内容? 您猜对了,这就是推荐系统所做的。 这是一种算法,可根据您的搜索和购买历史记录以及其他类似用户的历史记录,建议您购买或观看商品。 您知道数据是新油。 现在,显而易见的问题来了。
Here are some of the reasons —
以下是一些原因-
To provide a broader exposure to the user. 为用户提供更广泛的曝光。 Possibility of continual usage or purchase of products. 持续使用或购买产品的可能性。 To provide a better user experience. 提供更好的用户体验。Let’s understand the Recommendation System briefly.
让我们简要地了解推荐系统。
Content-Based Recommender System
基于内容的推荐系统
They try to figure out the user’s favourite as likes, tastes and preferences.
他们试图找出用户的喜欢程度,喜好和喜好。
Content-Based Recommender System 基于内容的推荐系统Consider the following example where the user has given the ratings for 3 movies Movie 1, Movie 2, Movie 3 as 2, 10, 8 respectively. Let’s find out what movie from Movie 4, Movie 5, Movie 6 will be recommended to the user.
考虑以下示例,其中用户已将3部电影Movie 1,Movie 2,Movie 3的评分分别指定为2、10、8。 让我们找出将电影4,电影5,电影6中的哪些电影推荐给用户。
Content-Based 基于内容The first step is to make a one-hot encoded matrix based on the genres present in a movie. All the genre’s of the matrix are assigned a column. If a particular genre is present in the Movie it is assigned as 1 otherwise 0.
第一步是根据电影中的流派制作一个单编码矩阵。 矩阵的所有流派都分配一列。 如果电影中存在特定类型,则将其指定为1,否则指定为0 。
One Hot encoding 一种热编码Now, the user rating matrix is multiplied with the one-hot encoded matrix to form Weighted Genre Matrix.
现在,将用户评分矩阵与一码编码矩阵相乘以形成加权体裁矩阵。
Weighted Genre Matrix 加权流派矩阵This Weighted Genre Matrix is aggregated to form the user profile which is normalised later to help to make a recommendation matrix.
该加权类型矩阵被汇总以形成用户配置文件,稍后将其标准化以帮助创建推荐矩阵。
User Profile 用户资料The normalization of the User Profile is done by dividing each element in a row by the sum of the elements in that row.
通过将行中的每个元素除以该行中元素的总和来完成用户配置文件的规范化。
Here, 18+12+20+10 = 60
在这里18 + 12 + 20 + 10 = 60
So each element’s normalized value will become x/60.
因此,每个元素的归一化值将变为x / 60。
Normalized User Profile 规范化的用户个人资料Lastly, the Normalized User Profile is multiplied by the one-hot encoded matrix of the remaining available movies which are not rated by that User and then aggregated to give the recommendation matrix.
最后,将归一化用户配置文件乘以该用户未评级的剩余可用电影的单次热编码矩阵,然后进行汇总以得出推荐矩阵。
Normalized User Profile * Movies Matrix 规范化的用户个人资料*电影矩阵 Recommendation Matrix 推荐矩阵The Recommendation matrix formed will be used to make recommendations. The movie with the highest weight will be recommended to the User.
形成的推荐矩阵将用于提出建议。 将向用户推荐重量最大的电影。
We have learned how the recommendations are made using Content-Based System, but there is a drawback of it as well.
我们已经了解了如何使用基于内容的系统提出建议,但是它也有一个缺点 。
The genre which users have never watched will not be in their profile.
用户从未观看过的流派不会出现在他们的个人资料中。
This drawback is overcome by using Collaborative filtering Based Recommendation System.
通过使用基于协作过滤的推荐系统可以克服此缺点。
Consider it as the user is saying tell me what’s popular among my neighbours. Finds similar group of users, and provide recommendations based on similar tastes within that group.
就像用户说的那样,考虑一下这是我的邻居中最受欢迎的东西。 查找相似的用户组,并根据该组中的相似爱好提供推荐。
There are two different methods to collaborative filtering — a) User-Based Collaborative filtering — based on user’s neighbourhoodb) Item-Based Collaborative filtering — based on the item’s similarity
协作过滤有两种不同的方法-a)基于用户的协作过滤-基于用户的邻居b)基于项目的协作过滤-基于项目的相似性
User-Based vs Item-Based collaborative filtering 基于用户的基于项目的协作过滤The solid lines represent the user’s preference while the dashed lines represent a recommendation.
实线表示用户的偏好,而虚线表示推荐。
Both of these methods work on the same mathematical fundamentals. Here, we’ll be looking at User-Based method. The Item-Based method can be intuited by it.
这两种方法都基于相同的数学原理。 在这里,我们将研究基于用户的方法。 基于项目的方法可以被它直观地理解。
The first step is to discover how similar the active user is to the other users.
第一步是发现活动用户与其他用户的相似程度。
How do we do this?
我们如何做到这一点?
Well, this can be done through several different statistical and vectorial techniques such as distance or similarity measurements, including Euclidean Distance, Pearson Correlation, Cosine Similarity, and so on.
好的,这可以通过几种不同的统计和矢量技术来完成,例如距离或相似性测量,包括欧几里得距离,皮尔逊相关性,余弦相似性等等。
To calculate the level of similarity between 2 users, we use the 3 movies that both the users have rated in the past. Regardless of what we use for similarity measurement, let’s say, for example, the similarity, could be 0.7, 0.9, and 0.4 between the active user and other users. These numbers represent similarity weights or proximity of the active user to other users in the dataset.
为了计算2个用户之间的相似度,我们使用了两个用户过去都评价过的3部电影。 不管我们用什么来进行相似性度量,例如,假设活动用户和其他用户之间的相似度可以是0.7、0.9和0.4。 这些数字表示数据集中活动用户与其他用户的相似度权重或接近程度。
User’s rating matrix 用户的评分矩阵The blank columns in the above table represent the movie is not rated by the user.
上表中的空白列表示影片未由用户评分。
Here, User 4 is the Active User for whom we have to recommend a movie out of Movie 1 and Movie 5.
在这里,用户4是活动用户,我们必须为其推荐电影1和电影5中的电影。
Next, we’ll form the weighted rating matrix using User rating matrix and the similarity weights of each user. The movie columns are selected from which recommendation is to be made from the User rating matrix and each is multiplied by the respective user similarity index.
接下来,我们将使用“用户评分矩阵”和每个用户的相似度权重来形成加权评分矩阵。 从用户评级矩阵中选择要从中提出建议的电影列,并将每个电影列乘以相应的用户相似性指数。
Weighted Rating matrix 加权评分矩阵Now, the weighted rating matrix is aggregated to form the recommendation matrix. The normalized recommendation matrix is made by dividing each movie weighted sum by the sum of the similarity index.
现在,加权评级矩阵被汇总以形成推荐矩阵。 通过将每个电影加权总和除以相似度指数的总和来形成归一化推荐矩阵。
Recommendation Matrix 推荐矩阵The movie with the highest weight (here Movie 5) will be recommended to the Active User i.e. User 4.
权重最高的电影(此处为电影5)将推荐给活动用户,即用户4。
This is how the Collaborative Recommendation System works. But there are some drawbacks of it as well —
这就是协作推荐系统的工作方式。 但是它也有一些缺点 -
a) Data Sparsity — Users in general rate only a limited number of items.b) Cold Sort — Difficulty in recommendation to new users or new items.c) Scalability — Increase in the number of users or items which may lead to performance issues due to growth in similarity computation.
a) 数据稀疏性 -用户通常仅对有限数量的项目评分。b) 冷排序 -难以向新用户或新项目推荐建议。c) 可伸缩性 -用户或项目数量增加,这可能会导致性能问题相似度计算的增长。
This is overcome by using Hybrid recommendation System a mix of both Content-based and Collaborative filtering methods.
通过使用混合推荐系统(基于内容的过滤方法和协作过滤方法的混合方法)可以克服此问题。
Let’s connect on LinkedIn. You may also reach out to me via ankita2108prasad@gmail.com.
让我们在 LinkedIn 上连接 。 您也可以通过ankita2108prasad@gmail.com与我联系。
翻译自: https://medium.com/@ankita2108prasad/the-mathematics-of-recommendation-systems-e8922a50bdea
推荐系统 中的数学

