WO2012034606A2

WO2012034606A2 - Multiverse recommendation method for context-aware collaborative filtering

Info

Publication number: WO2012034606A2
Application number: PCT/EP2011/001382
Authority: WO
Inventors: Xabier Amatriain Rubio; Nuria Oliver Ramirez; Alexandros Karatzoglou
Original assignee: Telefonica SA
Current assignee: Telefonica SA
Priority date: 2010-09-15
Filing date: 2011-03-21
Publication date: 2012-03-22
Anticipated expiration: 2013-03-15
Also published as: WO2012034606A3

Abstract

The invention provides a system for Context-aware recommendation comprising performing collaborative filtering introducing a user- item -context interaction as a definition of the data and modelling them using tensor factorization and generating one or more recommendations using said modelling.

Description

Multiverse recommendation method for context-aware collaborative filtering

Field of the invention Recommender Systems have become ubiquitous tools to find our way in the age of information. User preferences are inferred from past consumption patterns or explicit feedback and predictions are computed by analyzing other users - Collaborative Filtering (CF) - or categorizing the items by their content - Content-based - Recommendations. The simplest formulation of the recommendation problem involves predicting values for user; item pairs. In the CF setting, this turns the problem into a sparse two-dimensional matrix in which very few values are assigned for some user; item pairs that must be used to compute the missing values of interest. The two main CF approaches that emerged as victorious from the Netflix Prize were neighbourhood methods and latent factor models. Neighbourhood methods use similarity functions such as the Pearson Correlation or Cosine Distance to compute sets of neighbours to a user or an item.

Recommendations are then computed by using data from those neighbours. On the other hand, latent factor models [6] such as Matrix Factorization (MF) solve the recommendation problem by decomposing the user; item matrix and learning latent factors for each user and item in the data.

The underlying assumption on the above proposals is that both users and items can be modelled by a reduced number of factors. This approach has proven to be the most accurate method in isolation in different settings.

Although the simplified user; item recommendation model can be used successfully in many settings, it is not uncommon to find real settings in which additional variables come into play. For instance, there are many situations where time plays an important role in defining a user's preference for an item. In this case, the two- dimensional matrix is turned into a three dimensional user; item; time tensor. The set of variables that influence the user's preference for a given item are referred to as context.

This invention proposes a generic CF (collaborative filtering) model that is based on a generalization of MF to address contextual recommendation problems. To this end, the concept of matrix factorization is extended to that of tensor factorization, being a tensor a generalization of the matrix concept to multiple dimensions. Technical Background

Most traditional Collaborative Filtering Systems focus on the traditional two- dimensional user/item setting; there has been a recent increase of interest in adding context to recommendations. This is probably due to the relevance of context in mobile applications and the success that some applications have had using some contextual variable such as location (e.g. Foursquare 1 ). Adomavicius and Tuzhilin [3] do a thorough review of approaches to contextual recommendations in their book chapter and categorize context-aware recommender systems (CARS) into three types: contextual pre-filtering, where context drives data selection; contextual post-filtering, where (http://www.foursquare.com) context is used to filter recommendations once they have been computed using a traditional approach; and contextual modelling, where context is integrated directly into the model. An example of contextual pre-filtering is the so-called user micro-profile, in which a single user is represented by a hierarchy of possibly overlapping contextual profiles [4]. Post-filtering methods can use traditional approaches and then apply filtering or weighting. In their experimental evaluation, Panniello et al. [16] found that the choice of a pre-filtering or post-filtering strategy depends on the particular method and sometimes a simple post-filter can outperform an elaborate pre-filtering approach. The approach proposed in this paper belongs to the last category of contextual modelling. Although some standard model-based approaches could theoretically accept more dimensions, the only models to report results in this category are Oku et al.'s Context-aware Support Vector Machines (SVM) [13]. The authors consider support vectors in a multidimensional space and find the separating hyper plane. Their experiments show that contextual recommendations perform better than non-contextual. Other authors that have addressed the issue of context-aware recommendations from a multidimensional perspective are Adomaviciuset al. [1 ], who introduce a multidimensional model based on the On-Line Analytical Processing (OLAP) technique for decision support which still is based on a pre-filtering method; and Palmisano et al. [15] who describe contextual information with K dimensions, each of them having q attributes. Factorization models have recently become one of the preferred approaches to Collaborative Filtering, especially when combined with neighbourhood methods [9]. However, even when processing datasets such as the Netflix Prize, the importance of context has become clear. Koren has recently introduced a temporal model into the factor model called time SVD (singular value decomposition)** [10] which significantly improves the Root Mean Squared Error (RMSE) on the Netflix data compared to previous non-temporal factorization models. Other authors have also introduced time as a specific factor model. Xiong et al. present a Bayesian Probabilistic TF (tensor factorization) model to capture the temporal evolution of online shopping preferences [23]. The authors show in their experiments that results using this third dimension in the form of a tensor does improve accuracy when compared to the non-temporal case.

Finally, three dimensional TF has also been proposed by Rendle et al., for Collaborative tag Recommendation. Rendle et al. use the third tensor dimension in their user; item; tag model to represent the target variable which in their case are tags coded as binary vectors where the presences of a tag is market by a 1 and the absence by a 0 [18].

However and to the best of the knowledge of the present inventors, there is no previous work on the use of N-dimensional tensors for CF, which is the main contribution of this work and will be presented next in detail.

Current systems which rely on splitting and pre- or post-filtering the data based on context, do not utilize all the available ratings to model the users and the items. Splitting or pre-or post- filtering the data based on the context can lead to loss of information about the interactions between the different context settings. Moreover many of the proposed methods rely on a sequence of techniques which often prove to be computationally expensive rather than on a single and less computationally expensive. Existing systems have difficulties scaling to a large number of context dimensions. While very significant performance benefits, the experiments of the current inventors show that Multiverse Recommendations outperform state-of-the art context-aware methods by a significant margin

Description of the invention

The core of the current invention is based into model Context-aware recommendation by extending the traditional User-Item interaction, which is modeled by Matrix Factorization to User-Item-Context interaction and model it using Tensor Factorization (TF).

Then, and considering the example given at the opening of this description, the usual user; item two-dimensional matrix is converted here into a three-dimensional tensor (see Figure 1). Tensor Factorization (TF) can be used to add any number - and kind - of variables to a recommender system. In particular, it could be used to hybridize content and CF in a way similar to the approach by Pilaszy and Tikk [17]. However, in the current invention focus is put on the use of TF for adding contextual information. The proposed approach to contextual recommendation via TF is termed here Multiverse Recommendation because it can be used to efficiently bridge hidden "worlds" separated by different contexts and therefore collapse parallel dimensions into a coherent model.

TF is an existing N-dimensional extension of Matrix Factorization. However, a straightforward use of this model makes it unsuitable for the CF case. In the following the model of Matrix and Tensor Factorization is introduced and the details of how this model has been adapted for N-dimensional CF are explained.

The main idea behind the use of TF in a context-aware recommendation system is that advantage can be take of the same principles behind Matrix Factorization to deal with N-dimensional information. However, before we dive into the details of the TF model, we shall briefly summarize the two-dimensional MF approach.

Context has been recognized as an important factor to consider in personalized Recommender Systems. However, most model-based Collaborative Filtering approaches such as Matrix Factorization do not provide a straightforward way of integrating context information into the model. In this work, we introduce a system working with a Collaborative Filtering method based on Tensor Factorization, a generalization of Matrix Factorization that allows for a flexible and generic integration of contextual information by modeling the data as a User-Item- Context N-dimensional tensor instead of the traditional 2D User Item matrix. In the proposed system based on the TF-based CF module, or Multiverse Recommendation, different types of context are considered as additional dimensions in the representation of the data as a tensor. The factorization of this tensor leads to a compact model of the data which can be used to provide context- aware recommendations.

The invention provides an algorithm that implements the CF module of a Recommendation System that addresses the N-dimensional factorization, and show that the Multiverse Recommendation improves upon non-contextual Matrix Factorization up to 30% in terms of the Mean Absolute Error (MAE). We also compare to two state-of-the-art context-aware methods and show that Tensor Factorization consistently outperforms them both in semi-synthetic and real-world data - improvements range from 2.5% to more than 12% depending on the data. Noticeably, proposed approach outperforms other methods by a wider margin whenever more contextual information is available.

Brief description of the drawings

Fig. 1 is a diagram showing the CF module of a system to implement the proposed method based on N-dimensional TF.

Fig. 2 is an illustration of the (3-dimensional) HOSVD tensor factorization model.

Fig. 3 is a comparison of matrix (no context) and tensor (context) factorization on the Adorn and Food data.

Fig. 4 is a comparison of context-aware methods on the Yahoo artificial data (β

Fig. 5 is a graphic showing the evolution of MAE values for different methods with increasing influence of the context variable on the Yahoo data.

Fig. 6 is a comparison of context-aware methods on the Adorn and Food data.

Detailed description of the invention

The invention provides a system for Context-aware recommendation (Diagram of Fig. 1 ). Details for the TF model are following provided. The TF model is crucial for the functionality and performance of the model.

A short description of the previously used Matrix Factorization techniques a first presented and then the full TF model used in the CF module of the current method or Multiverse Recommendation.

Matrix Factorization

CF techniques based on MF work by assuming that ratings provided by users on items can be represented in asparse matrix

where n is the number of users and m the number of items. The observed values in Y are thus formed by the rating information provided by the users on the items. The CF problem then boils down to Matrix Completion problem. In MF techniques the aim is to factorize the matrix of observed ratings into two matrices

such that F:= UM^T approximates Y , i.e. minimizes a loss function L(F, Y ) between observed and predicted values. In most cases, a regularization term for better generalization performance is added to the loss function and thus the objective function becomes L(F, Y ) + Q(F ).

Standard choices for L include the least squares loss function L(F,Y) = 1/2(F -Y)² and for Ω the Frobenius norm i.e.

The CF module (diagram of Fig. 1) of the system is based on N-dimensional TF.

N-dimensional TF is a generic model framework for recommendations that is able to handle N-dimensional data and profit from most of the advantages of MF, such as fast prediction computations and simple and efficient optimization techniques. The generic TF model will now be introduced and its specific adaptation to context-aware CF will be discussed later.

Notation

For the sake of simplicity, the model will be described for a single contextual variable C, and therefore Y the tensor containing the ratings will be a 3-dimensional tensor. The generalization to larger numbers of context variables and N dimensions is trivial. In the following the sparse tensor of rating observations are denoted by

where n are the number of users, m the number of items, and c where ci C {1 c} the number of contextual variables. Typically, the rating is given on a five star scale and thus

where the value 0 indicates that a user did not rate an item. In this sense, 0 is special since it does not indicate that a user dislikes an item but rather that data is missing. We are using two tensor operations a tensor-matrix multiplication operator denoted by where the subscript shows the direction on the tensor on which to multiply the matrix i.e

and the tensor outer product denoted by

Finally, D

is a binary tensor with nonzero entries Dijk whenever Yijk is observed and we denote by Ui*the entries of the ith row of matrix U. HOSVD-decomposition

There are different types of tensor decomposition models in the literature, such as the Canonical Decomposition or Parallel Factors - which is also known as the CP- decomposition - and the High Order Singular Value decomposition (HOSVD). In our approach, we follow the HOSVD formulation shown in Figure 2, where the 3- dimensional tensor is factorized into three matrices

and one central tensor

In this case, the decision function for a single user i, item j, context k combination becomes:

This decomposition model allows for full control over the dimensionality of the factors extracted for the users, items and context by adjusting the du, dM and dC parameters. This property is valuable in the case of large real world datasets where the matrices U and M can grow in size and potentially pose a storage problem.

Tensor factorization for CF will be following explained ion detail

Loss Function

In analogy to MF approaches [19, 22], the loss function is defined as:

where

is a point wise loss function penalizing the distance between estimate and observation and Fijk is given by equation 1. Note that the total loss L is defined only on the observed values in the tensor Y. Possible choices for the loss function I include the following:

Squared error Provides an estimate of the conditional mean

Absolute loss: Provides an estimate of the conditional median

ε-insensitive loss: Chosen to ignore deviations of up to ε via

These are not the only loss functions possible. For instance, by using a quantile loss we can prioritize estimates based on the confidence with which we would like to recommend events. Other possible loss functions include the Huber loss [8] and the hinge loss function, which can be useful in the case of implicit taste information [21]. Regularization are following developed.

Simply minimizing a loss function is known to lead to over fitting. Given the factors U, M, C, S which constitute the proposed model, a choice of ways exists to ensure that the model complexity does not grow without bound. A simple option is to add a regularization term based on the 12 norm of these factors [20]. In the case of a matrix, this norm is also known as the Frobenius norm.

In a similar manner, the complexity of the central tensor S can be restricted by imposing a 12 norm penalty:

Note here that one can also impose an ell1 norm as a regularizer which is known to lead to sparse solutions [12, 7]:

Even though regularizer in Eq.5 leads to particularly sparse models, optimization is non-trivial. During the course of the optimization process, a potentially large number of parameters is needed to make progress. Hence, in this invention use of regularizers 3 for U, M, C and 4 for S are used.

Optimization

Overall, an effort is done to minimize a regularized risk functional that is a combination of L(F, Y ) and Ω[υ, M, C]. The objective function for the minimization problem is:

Minimizing this objective function can be done using many approaches. Subspace descent is a popular choice in MF methods and could be used in the tensor setting. In subspace descent one optimizes iteratively over individual components of the model while keeping the remaining components fixed, e.g. optimize over the U matrix while keeping the remaining matrices and tensor fixed, then over M etc. This method leads to quick convergence but requires the optimization procedure to be run in a batch setting. As dataset sizes grow, it becomes increasingly infeasible to solve factorization problems by batch optimization. Instead, in this invention a simple online algorithm is proposed which performs stochastic gradient descent (SGD) in the factors Ui*, Mj*, Ck*and S for a given rating Yijk simultaneously. In order to compute the updates for the SGD algorithm, the gradients of the loss function need to be computed and eventually the objective function with respect to the individual components of the model:

The Multiverse Recommendation TF method of this invention is summarized in Algorithm 1 , which is easy to implement since it accesses only one row of U, M, and C at a time. In addition, it is easy to parallelize by performing several updates independently, provided that the (i,j,k) sets are all non-overlapping. Note that the algorithm scales linearly to the number of ratings K and the dimensionality of the factors dU ,dM ,dC . Therefore, the algorithm complexity is O(KdUdMdC). Finally, it easily generalizes to the case of N context dimensions by adding one additional update per context variable. Missing Context Information

TF also allows for an intuitive way of dealing with missing context information. Assume that the context information of a rating done by user i' on item j' Yi'j', is missing . Intuitively, one would like to add the rating information in the profile information of the user and the item while either not updating the information of the context profile or applying the update equally on all context profiles. There are thus two options:

• Update the Ui'*and Mj'*factors and skip the C and S factors update

• Update the Ui'^*and Mj'*factors while updating all the context profiles in C, but with a step size η divided by the number of context cases c. We can then update the central tensor S using all the context cases dividing the step size by c.

From the above it appear clear that the aim in proposing a recommendation system based on an N-dimensional TF model module is to model the context variables in the same way as the users and items are modeled in MF techniques by taking the interactions between users- items-context into account. We refer to this approach as Multiverse Recommendations.

Existing HOSVD TF methods (e.g. [1 1]) require a dense matrix Y and therefore ignore the sparsity of the input data. Treating Y as a dense tensor with missing entries being assumed to be 0, would introduce a bias against unobserved ratings. The model of Regularized TF introduced in this section avoids these issues by optimizing only the observed values in the rating tensor. Also note that, in contrast to standard SVD and HOSVD methods, in CF there is no need for imposing orthogonality constrains on the factors. Multiverse Recommendations have a number of advantages compared to many of the current context-based methods, including:

[1 ] No need for pre- or post-filtering: In contrast to many of the current systems which rely on splitting and pre- or post-filtering the data based on context, a system based on a TF module utilizes all the available ratings to model the users and the items. Splitting or pre-or post- filtering the data based on the context can lead to loss of information about the interactions between the different context settings;

[2] Computational simplicity: Many of the proposed methods rely on a sequence of techniques which often prove to be computationally expensive rather than on a single and less computationally expensive model, as is the case in TF;

[3] Ability to handle N-dimensions: Moreover, the TF approach we introduce generalizes well to an arbitrary amount of context variables while adding relatively little computational overhead.

[4] Very significant performance benefits; our experiments show that Multiverse Recommendations outperform state-of-the art context-aware methods by a significant margin.

Therefore the proposed system brings a number of contributions to the area of contextual recommendations, including the ability to:

- Generalize efficient MF approaches to the N-dimensional case in a compact way.

- Include any number of contextual dimensions into the model itself.

- Benefit from several loss functions designed to fine tune the optimization criteria.

- Train the model with a fast and straightforward algorithm.

- Take advantage of the sparsity of the data while still exploiting the interaction between all users-items and context.

Experimental evaluation of the proposed Multiverse Recommendation TF algorithm.

First, the impact of using contextual information by comparing the algorithm to standard non-context-aware MF was analized. Then TF was compared with to two state-of-the art context-based collaborative filtering algorithms presented in [2, 5]. The algorithms on two real world datasets are evaluated, one of them kindly provided by Adomavicius et al. [2]. Moreover, in order to better understand the behavior of the methods, a synthetic dataset was used where control the influence of the context variable was controlled.

Before reporting the results, the experimental setup including the datasets used, the experimental protocol, and the different context-aware methods used to compare against, are detailed.

The data

The proposed method was tested on six semi-synthetic data sets with ratings in a {1 , . . . , 5} scale. The datasets were generated using the Yahoo! Webscope movies data set 2,

2Webscope v1.0, http://research.yahoo.com/ which contains 221 K ratings, for 1 1 ,915 movies by 7,642 users. The semi-synthetic data sets are used to analyze context-aware methods when varying the influence of the context on the user ratings. The original Yahoo! data set contains user age and gender features. Three age groups were defined: users below 18, between 18 and 50, and above 50. The original Yahoo! data set was modified by replacing the gender feature with a new artificial feature c e {0, 1} that was assigned randomly to the value 1 or 0 for each rating. This feature c is representing a contextual condition that can affect the rating, a * 100% items were randomly choose from the dataset, and for these items β = 100% was randomly pick of the ratings to modify. The rating value was increased or decreased by one if c = 1 (c = 0) if the rating value was not already 5 (1 ). For example, if a = 0.9 and β = 0.5, the corresponding synthetic data set has 90% of its items altered with profiles that have 50% of their ratings changed. 6 semi-synthetic data sets were generated varying a

Because of the way these datasets have been generated, the contextual condition is "influencing" the rating value more as a and β increase.

The second dataset is derived from the one used by Adomavicius et al. [2]. It contains 1464 ratings by 84 users for 192 movies. The ratings were collected in a survey of college students. The students where asked to fill out a questionnaire on movies using a rating scale ranging from 1 (hate) to 13 (absolutely love). They were also queried on additional information about the context of the movie-watching experience. In the current experiments 5 contextual features were used: companion, day of the week, if it was on the opening weekend, season, and year seen. Note that in experiments here disclosed a slightly different dataset was used compared to the original experiments in [2]: we used more ratings and took into account features that were not considered before, i.e., the year and season when the movie was seen. The third dataset was used by Hideki et al. [14] and provided by the authors. It contains food rating data from 212 users on 20 food menus. To acquire the data, the authors designed a two stage Internet questionnaire survey. The users were asked to rate the food menu while being in different levels of hunger. Moreover, some ratings were done while really experiencing the situation (i.e., participants were hungry and ordered the menu) and some while imagining the situation. For example, in the virtual situation participants could be full, but should have provided a rating for a food menu imagining that they are hungry. For this data set two context features were used here: the three degrees of hunger and binary feature specifying if the situation was virtual or real. Evaluation Protocol

The performance of the models was assesed by conducting a 5-fold cross-validation and computing the Mean Absolute Error (MAE), defined as follows:

where K is the total number of ratings in the test set. This measure is one of the most widely used performance measures in the recommender systems literature.

The Absolute loss function previously defined was used here. It is important to note that the TF approach allows for direct optimization of the evaluation measure, i.e., when using the MAE the ideal loss function for minimizing this measure is the Absolute loss function. However, if the error measure was set to RMSE, the least-square loss function con be optimized. For the training of the TF algorithm 10 epochs were used.

The I2 norm was regularized both on the matrices and the central tensor as explained in Sec. 3.1.2.2. Due to computational and time constrains only limited parameter search was conducted on all methods tested. For the TF algorithm, we set the regularization parameters and we thus end up with two

regularization parameters A and AS, the initial learning rate, and the dimensionalities of the individual components of the models dU , dM etc. as parameters.

All reported results are the average of a 5-fold cross-validation. We do not report on the variance of the results since it was insignificant in the referred experiments, and did not qualitatively influence our findings and conclusions. Moreover, all differences between TF and the other methods are statistically significant.

Context-based Methods in Comparison

Two state-of-the-art context aware collaborative filtering methods that are based on pre-filtering were selected and compared to N-dimensional Multiverse Recommendation based on TF.

The first one is a reduction based approach, which is based on OLAP [2], and extends classical Collaborative Filtering methods by adding contextual information to the representation of users and items. This reduction based method computes recommendations using only the ratings made in the same context as the target one. The exact contextual segments that optimize the prediction are searched (optimized) among those that improve the accuracy of the prediction. This is a rather computationally expensive operation as for each combination of contextual conditions, a Collaborative Filtering model needs to be trained and tested. The second method that was used for this comparison is item splitting [5], which overcomes the computational issues of the reduction-based approaches and provides a more dynamic solution. Item splitting identifies items that have significant differences in the ratings. For each of these items, it splits the ratings into two subsets, creating two new artificial items with ratings assigned to these two subsets. The split is determined by the value of one contextual variable q- i.e., all the ratings that have been acquired in a context where the contextual feature took a certain value. The method then determines if the two subsets of ratings have some (statistical significant) difference, e.g., in the mean. If this is the case, the split is done and the original item in the ratings matrix is replaced by the two newly generated items.

Note, that both pre-filtering methods use a standard MF approach as the main CF algorithm. Moreover, both methods exploit the tradeoff between less and more relevant data and thus increase data sparsity. This is not the case in the TF model where all the data is used to model users, item and context factors.

Results are following detailed.

Some experiments were first conducted to assess the relevance of contextual information. In order to do so, we compare our TF method with a regular non-context- aware MF method. Then the goodness of the proposed approach was measured by comparing it to the other context-aware methods explained previously on different datasets and contextual conditions.

Tensor vs. Matrix Factorization The N -dimensional TF approach was first compared to standard MF. That is, all the available context information was used in the TF model while the MF approach was limited to the standard user-item-rating setting, not adding any contextual variables.

Figure 3 shows the results of Tensor and Matrix Factorization on the real world Adorn and Food data (see previous reference) for details on any of the used datasets. It was observed that adding information in the form of context variables does make a significant (14% for the Adorn and 10% for the Food data) difference in the performance. Note that when the Adorn and Food data was used it ends up with 7- Dimensional and 4-Dimensional TF models respectively since the Adorn data contains 5 contextual variables and the Food data 2. Figure 4 depicts the results of Tensor (in bold) and Matrix Factorization (in cross area) on the artificial Yahoo dataset. As expected, the TF model outperforms the standard non-context aware MF method by 5% in the low context data (a = 0.1 ) up to 30% in the high context case (a = 0.9). When the context variable is not used in the Collaborative Filtering model, the contextual information acts as noise added to the data. A non-contextual model such as MF cannot distinguish between noise and a functional dependency from a hidden variable. Since according to the invention proposal all the information is collapsing into a standard two-dimensional MF model, the method fails to model this influence. In fact, MF alone cannot exploit the additional information contained in this feature and cannot effectively deal with the influence of this variable. It was observed that the stronger the influence of the context variable the higher the MAE for MF.

Note here that the training time of a Python implementation of the Multiverse Recommendation TF method on 80% of the Yahoo! data takes approximately 10 minutes on a Core 2 duo computer. Comparison to Context-Aware Methods

The pre-filtering based context-aware methods are now compared against TF. Figure 5 shows a comparison of the TF method with the various context-aware CF methods. The higher the influence of the context variable, the better the TF methods perform compared to the other context-aware methods. Also note that when "context" has strong influence (a≥ 0.5, β≥ 0.5), all three context-aware methods outperform the MF method from 4% in the low context case up to 12% in the high context data (a = 0.9). The performance advantage of the context-based method increases substantially when more information is covered by the contextual feature c (see figure 5).

The biggest improvement is observed when α = 0.9, β = 0.9. In fact, for this data set the reduction-based approach improved MAE by 23.9%, item splitting improved by 24.2%, TF by 30.8% compared to the non-Context model.

More notably, for all 3 synthetic datasets TF uniformly outperforms all other methods. It was observed that as the influence of context increases, so does the gain when using TF when compared to both the OLAP-based approach and Item splitting. The proposed Multiverse Recommendation method also has the overall best performance when a = 0.9, β = 0.9. It seems to efficiently exploit the linear dependency between the ratings and the contextual feature. It has been also observed that the performance of the non-context- aware MF model deteriorates with the growing influence of the context variable. As mentioned above, this is due to the fact that without the context variable included in the model, the influence of the context in the data acts as a source of noise that cannot be modeled by simple MF. A second set of experiments were carried using the described real world data - Adorn and Food (see previous reference for more details). Figure 6 compares the same methods as in the previous experiment. The best performing method for these datasets is again TF, which outperforms both contextual pre-filtering methods and the context free method (MF) by at least 4.5%. For the Adorn data it was observed that TF outperforms both context-aware methods while for the food data we compare against the reduction-based method which again is outperformed by the TF method by 2%.

References

[1 ] G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin.

Incorporating contextual information in recommender systems using a multidimensional approach. ACM Transactions on Information System,

2005.

[2] G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin.

Incorporating contextual information in recommender systems using a multidimensional approach. ACM Transactins on Information Systems, 23(1 ): 103-145, 2005.

[3] G. Adomavicius and A. Tuzhilin. Recommender Systems Handbook, chapter Context-aware Recommender Systems. Springer, 2010.

[4] L. Baltrunas and X. Amatriain. Towards time-dependant recommendation based on implicit feedback. In Workshop on Context- Aware Recommender Systems (CARS 2009) in ACM Recsys 2009,

2009.

[5] L. Baltrunas and F. Ricci. Context-based splitting of item ratings in collaborative filtering. In L. D. Bergman, A. Tuzhilin, R. Burke, A. Felfernig, and L. Schmidt-Thieme, editors, RecSys, pages 245-248. ACM, 2009.

[6] J. Basilico and T. Hofmann. Unifying collaborative and content-based filtering. In Proc. Intl. Conf. Machine Learning, pages 65-72, New York, NY, 2004. ACM Press.

[7] G. Fung, O. L. Mangasarian, and A. J. Smola. Minimal kernel classifiers. Journal of Machine Learning Research, 3:303-321 , 2002.

[8] P. J. Huber. Robust Statistics. John Wiley and Sons, New York, 1981.

[9] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proc. of KDD '08, pages 426-434, 2008.

[10] Y. Koren. Collaborative filtering with temporal dynamics. In KDD '09:

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 447-456, New York, NY, USA, 2009. ACM.

[1 1 ] L. D. Lathauwer, B. D. Moor, J. Vandewalle, and J. V. A multilinear singular value decomposition. SIAM. J. Matrix Anal. & Appl, 21 :1253— 1278, 2000.

[12] O. L. Mangasarian. Linear and nonlinear separation of patterns by linear programming. Oper. Res., 13:444-452, 1965.

[13] K. Oku, S. Nakajima, J. Miyazaki, and S. Uemura. Context-aware svm for context-dependent information recommendation. In Proceedings of the 7th international Conference on Mobile Data Management, 2006.

[14] C. Ono, Y. Takishima, Y. Motomura, and H. Asoh. Context-aware preference model based on a study of difference between real and supposed situation data. In UMAP Ό9: Proceedings of the 17th International Conference on User Modeling, Adaptation, and Personalization, pages 102-113, Berlin, Heidelberg, 2009. Springer- Verlag.

[15] C. Palmisano, A. Tuzhilin, and M. Gorgoglione. Using context to improve predictive models of customers in personalization applications. IEEE Transactions onKnowledge and Data

Engineering, 20(1 1 ):1535

[16] U. Panniello, A. Tuzhilin, M. Gorgoglione, C. Palmisano, and A.

Pedone. Experimental comparison of pre- vs. post-filtering approaches in context-aware recommender systems. In Proceedings of the ACM Recommender Systems Conference (RecSys 2009), pages 265-268, [17] I. Pilaszy and D. Tikk. Recommending new movies: Even a few ratings are more valuable than metadata. In Proc. of the 3rd ACM Conference on Recommender Systems (RecSys 2009). ACM, 2009.

[18] S. Rendle, L. Balby Marinho, A. Nanopoulos, and L. Schmidt- Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD Ό9: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 727-736, New York, NY, USA, 2009. ACM.

[19] N. Srebro, J. Rennie, and T. Jaakkola. Maximum-margin matrix factorization. In L K. Saul, Y. Weiss, and L Bottou, editors, Advances in Neural Information Processing Systems 17, Cambridge, MA, 2005. MIT Press.

[20] N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. In P.

Auer and R. Meir, editors, Proc. Annual Conf. Computational Learning Theory, number 3559 in Lecture Notes in Artificial Intelligence, pages 545-560. Springer-Verlag, June 2005.

[21] M. Weimer, A. Karatzoglou, and M. Bruch. Maximum margin code recommendation. In Proc. of the 3rd ACM Conference on Recommender Systems (RecSys 2009). ACM, 2009.

[22] M. Weimer, A. Karatzoglou, and A. Smola. Improving maximum margin matrix factorization. Machine Learning, 72(3), September 2008.

[23] L. Xiong, X. Chen, T. Huang, and J. G. C. J. Schneider.

Temporal collaborative filtering with bayesian probabilistic tensor factorization. In Proceedings of SIAM Data Mining, 2010.

Claims

1.-Multiverse recommendation method for context-aware collaborative filtering, comprising a) performing collaborative filtering introducing a user- item -context interaction as a definition of the data and modelling them using tensor factorization; b) generating one or more recommendations using said modelling; and c) displaying the recommendations to a user.

2. - Multiverse recommendation method according to claim 1 wherein different types of context are considered as additional dimensions in the representation of the data as a tensor.

3. - Multiverse recommendation method according to claim 1, wherein said factorization is a N-dimensional factorization.

4. - Multiverse recommendation method according to claim 1 , wherein a CF engine is provided with a user list of items identifications and the relevant context of recommendations and in that a TF based CF model trained on historical data is used to calculate preference scores/predict ratings.

5. - Multiverse recommendation method according to claim 4, wherein in said step c) a list of N items is presented to the user based on the score of the CF model.