Collaborative filtering recommendedMain idea：Predict what current users are most likely to like or be interested in using past behavior or opinions of existing user groups.
The input data of pure collaborative filtering method is given only.User-Item Scoring Matrix，There are several types of output data:
- Indicates how much the current user likes or dislikes the item.Prediction value；
- nList of items recommended。Of course, this top-N list does not contain items that the current user has purchased.
2.1 User-based Nearest Neighbor Recommendation
ItsMain idea：Firstly, given a scoring data set and the ID of the current user as input, other users who have similar preferences with the current user in the past are identified, sometimes referred to as peer users or nearest neighbors. Then, for each item P that the current user has not seen, the predicted value is calculated by using the score of its nearest neighbor to P.
User-based Nearest Neighbor Recommendation
utilizePearsoncoefficientTo assess user similarity.The assumption of this method is that：(1) If users passTo have similarPreferences, then they will have similar preferences in the future; (2) user preferences will not change over time.
However, experimental analysis shows that，For user-based recommendation systems, Pearson’s correlation coefficient is better than other methods. However, the cosine similarity method is more effective than Pearson coefficient recommendation in item-based recommendation technology. In fact, the nearest neighbor-based prediction method is encountering only current users.There are very few common items that score incorrectly, leading to inaccurate predictions. So some experts and scholars put forward the idea.Method of Importance Empowerment——Similarity weight reduction method based on linearization.
Choosing nearest neighbor:In order to reduce the amount of computation and ensure the calculation of predicted values, we only include those users who have a positive correlation with the current user.
The method of reducing the size of the nearest neighbor set is usually to define a specific minimum threshold for user similarity, or to limit the size to a fixed value, and only consider k nearest neighbors. But these two methods have potential problems: if the threshold is too high, the scale will be much smaller, which means that many goods can not be predicted.。 On the contrary, if the threshold is too low, the scale will not decrease significantly. How much K is a question. Tests on MOVIE LENS show that in most cases, 20 to 50 neighbors seem reasonable.
2.2 Item-based Nearest Neighbor Recommendation
The idea of item-based recommendation algorithm is to use the similarity between items to calculate the predicted value. utilizecosine similarityTo calculate the similarity of items.
Using some part of the data in the scoring matrix to reduce the complexity. A basic technologyTwo sampling，This technique can randomly select a subset of data, or ignore user records with only a small number of ratings or very popular items.
2.3 About grading
Explicit formulaCollect users’views: usually 5 or 7 points. By recording the user’s score, the data set of user’s score is collected.
ImplicitCollect users’views: Track users’ browsing records, collect users’browsing logs, and analyze users’ browsing logs.Recommendation may be more effective.
Data sparseness and cold start problems:In practical application, the user rating data that may be obtained are very few, therefore. Scoring matrix is a sparse matrix. The challenge in this case is to use less data to obtain accurate predictions. Direct approach, useUser’s additional information (age, gender, educational level, interest, etc.) helps classify users，This is based on outside information.