Matrix Factorization For Collaborative Filtering Is Just Solving An Adjoint Latent Dirichlet Allocation Model After All
Matrix Factorization For Collaborative Filtering Is Just Solving An Adjoint Latent Dirichlet Allocation Model After All
Matrix Factorization For Collaborative Filtering Is Just Solving An Adjoint Latent Dirichlet Allocation Model After All
55
RecSys ’21, September 27-October 1, 2021, Amsterdam, Netherlands Florian Wilhelm
LDA-inspired probabilistic method. Our work differs from these in 4 MATRIX FACTORIZATION
that two additional vectors of parameters are introduced into the MF-based methods for collaborative filtering share the idea of
original LDA formulation, adding an inductive bias to support the approximating the sparse matrix of user-item interactions X ∈
use case. One vector for the popularity of the items regularizes the R |U |×|I | by the product of two low-rank matrices W ∈ R |U |×|K |
item preferences over the user cohorts while another vector weights and H ∈ R |I |×|K | , i.e.,
this regularization for each user thus indicating the conformity of
the user with the item popularity. X ≈ X̂ := W H t ,
MF and LDA are often considered together to derive collabo-
where K = {1, . . . , |K |} is the index set of the latent dimensions.
rative methods that also include content information. Wang and
Derived from this general form, we will define the personalized
Blei [34] propose a collaborative topic regression (CTRlda) model
score of a user u for an item i as
that combines a textual LDA model for the content information
and a probabilistic matrix factorization to jointly explain the ob- x̂ui = ⟨wu , hi ⟩ + bi , (1)
served content and user ratings, respectively. A similar approach
where bi ∈ R is an item bias. Adding an explicit item bias term has
is proposed by Nikolenko [23] while Rao et al. [28] extend CTRlda
been shown to improve MF-based models in many studies [17, 26]
using the special words with background (SWB) model [4] instead
and can be interpreted as the popularity of an item independent of
of LDA. As these methods use additional content information, they
a user’s preferences. The personalized scores of a user then induce
differ from LDA4Rec in this aspect while also not including the
the personalized ranking ⩾u by virtue of x̂ui ≥ x̂u j for i, j ∈ I.
aforementioned additional parameters.
Note that in our implicit feedback scenario, there is no need for a
Zhang et al. [38] emphasize the interpretability of NMF for col-
user bias term bu as the personalized ranking ⩾u would not change
laborative filtering and regard the latent user vector as an additive
by definition.
mixture of different user communities, i.e., cohorts. A similar, but
The actual approximation depends on the optimization loss
more probabilistic NMF approach, rendering the mixture a distri-
L(X , X̂ ) and over the years many were derived, most notably SVD++
bution over cohorts of users, is presented by Hernando et al. [12].
[16], which was used to win the Netflix price, WR-MF [14, 24] and
Compared to LDA4Rec, these works also lack the proposed addi-
PMF [32]. Since we are ultimately interested in an optimal rank-
tional parameters in LDA4Rec similar to the previously mentioned
ing ⩾u rather than an approximation of the original matrix, the
works that apply LDA directly. The lacking interpretability of MF
Bayesian Personalized Ranking (BPR) method was proposed by
without non-negativity constraints is also recognized by Datta et al.
Rendle et al. [29] to directly reflect this task in its loss L and can be
[6] and addressed with a different approach than NMF. The authors
considered a differentiable analogy to Area Under the ROC Curve
propose a shadow model that learns a mapping from interpretable
(AUC) optimization [29].
auxiliary features to the latent factors of MF. Therefore, their ap-
Despite the simple formulation (1) of MF, the actual interpre-
proach cannot be considered as pure collaborative filtering like
tation of the latent vectors wu and hi is not as easy. The latent
LDA4Rec, since additional content information is used.
elements of an item i, i.e., hik , k ∈ K, might quantify the prevalence
of some latent feature in an item while the corresponding element
of a user u, i.e., wuk , quantifies the user’s preference for this feature.
The problem with this notion becomes apparent when considering
3 NOTATION AND TERMINOLOGY negative elements, since, for example, a strong negative prevalence
together with a negative preference can lead to a large positive
In this section, we formalize the problem and establish a common
term in the scalar product. This observation motivates the usage of
notation, which is to some extent based on the work of Rendle et al.
MF methods that demand non-negativity for wu and hi .
[29]. Matrices are denoted by capital letters X , transposed matrices
with X t , vectors by bold letters x, sets by calligraphic letters X, and
the cardinality of a set by |X|. The scalar product of two vectors x
4.1 Non-Negative Matrix Factorization
and y is denoted by ⟨x, y⟩ := ni=1 x i yi and the l 1 -norm is denoted
Í The non-negative Matrix Factorization (NMF) was introduced by
Ín
by ∥x∥1 := i=1 |x i |, where n is the dimension of the vector space. Lee and Seung [19, 20] as a method to learn parts of objects, which
The Hadamard element-wise vector multiplication and division are can then be combined again to form a whole. NMF differs from MF in
|K | |K |
denoted by ⊙ and ⊘, respectively. A concatenation of two vectors (1) only in that we have wu ∈ R ≥0 , hi ∈ R ≥0 and bi ∈ R ≥0 . Using
x, z is denoted by [x, z] and 1 is the vector of all ones. The i-th the example of faces, Lee and Seung [20] showed that NMF is able to
row vector of a matrix X is denoted by xi whereas the j-th column learn localized features, e.g., eye area, which can then be used again
vector is expressed with the help of the Kleene star as x∗j . The to form a whole face by an additive mixture. Although the notion
symbol R ≥0 is used for non-negative real numbers. of feature prevalences within items and user preferences for certain
Let U be the set of all users and I the set of all items. With features translates well to NMF, in many practical applications the
S ⊂ U × I we denote the set of implicit feedback from users results achieved with NMF, unfortunately, fall short of those of
u ∈ U having interacted with items i ∈ I. Following the definition MF [21].
of Rendle et al. [29], the task of personalized ranking is to provide A mathematically more rigorous interpretation is that NMF finds
each user u with a personalized total ranking ⩾u on I. In particular, a |K | clustering of the column vectors of X , i.e., x∗i for i ∈ I in the
since we assume that ⩾u is a total order, we have for i, j ∈ I with space of users u. Ding et al. [8] prove that NMF with least squares
i , j that either i >u j or i <u j. optimization is mathematically equivalent to the minimization of
56
Matrix Factorization Formulated as an Adjoint LDA4Rec Model RecSys ’21, September 27-October 1, 2021, Amsterdam, Netherlands
57
RecSys ’21, September 27-October 1, 2021, Amsterdam, Netherlands Florian Wilhelm
σλ2
( (
+ wuk if wuk ≥ 0 − −wuk if wuk < 0
w uk = and w uk = ,
0 otherwise 0 otherwise
|K ′ |
The hyperparameters µ ∗ , σ∗2 can be used to incorporate prior Proof. Without loss of generality, we assume wu ∈ R ≥0 , hi ∈
knowledge about the relations of λ, δ and φ k . In Step 4b, we see |K ′ |
R ≥0 and bi ∈ R ≥0 by virtue of the lemma. We define
that the probability of a user interacting with an item not only
depends on the preference assigned to the item by the cohort, i.e., vu = cu−1 wu ⊙ nu , gi (u) = (hi + tu−1bi 1) ⊘ nu ,
φ zus , but also the popularity δi of the item and the conformity
λu of the user to the general popularity. The graphical model of where nu = (nuk )k ∈K ′ with nuk = i ∈I hik + tu−1bi , tu = ∥wu ∥1
Í
LDA4Rec is illustrated in Figure 2. and cu = ⟨wu , nu ⟩. We can neglect the pathological cases, i.e.,
We show now that MF, as introduced in Section 4, has an adjoint tu = 0 and nuk = 0, as in the former case we have a trivial so-
formulation that corresponds to the parameters φ k , θ u , δi and λu lution and ⩾u only depends on b whereas in the latter case, we
have i ∈I hik = 0 and thus the latent vector (hik )i ∈I could just
Í
of LDA4Rec. Finally, this allows us to intuitively interpret the latent
factors of MF. be removed. By construction, we have now that ∥vu ∥1 = 1 and
58
Matrix Factorization Formulated as an Adjoint LDA4Rec Model RecSys ’21, September 27-October 1, 2021, Amsterdam, Netherlands
∥gi (u)∥1 = 1. We can thus conclude that due to our random split. This reduces the number of interactions in
′
xui ≥ xu′ j ⇐⇒ ⟨vu , gi (u)⟩ ≥ ⟨vu , gj (u)⟩ MovieLens-1M to approximately 661 thousand and in MovieLens-
100K to 60 thousand while the number of interactions in Goodbooks
⇐⇒ ⟨cu−1 wu ⊙ nu , (hi + tu−1bi 1) ⊘ nu ⟩ is unaffected.
≥ ⟨cu−1 wu ⊙ nu , (hj + tu−1b j 1) ⊘ nu ⟩ As evaluation metrics, we use the mean reciprocal rank (MRR),
precision at 10 (Prec@10), and recall at 10 (Recall@10) to measure
⇐⇒ ⟨wu , hi + tu−1bi 1⟩ ≥ ⟨wu , hj + tu−1b j 1⟩ the quality of our models. We define Prec@10 as the fraction of
⇐⇒ ⟨wu , hi ⟩ + ⟨wu , tu−1bi 1⟩ ≥ ⟨wu , hj ⟩ known positives in the first 10 positions of the ranked list of results
and Recall@10 as the number of known positives in the first 10
+ ⟨wu , tu−1b j 1⟩
positions divided by the total number of known positives.
⇐⇒ ⟨wu , hi ⟩ + bi ≥ ⟨wu , hj ⟩ + b j
⇐⇒ x̂ui ≥ x̂u j . 6.2 Experiments
Consequently, xui
′ induces the same total ranking ⩾u as x̂ui . □ 6.2.1 Comparison of the Different Variants of Matrix Factorization.
Despite the theoretical results from Subsection 5.2, which allow
In the light of these constructive proofs and noting that SNMF us to transform a personalized ranking solution found through
was an intermediate step in the proof of the lemma, we can also MF into an NMF formulation, this does not necessarily mean that
make a statement about the expressive power of MF, NMF, SNMF, a solution found through direct application of NMF has the same
and LDA4Rec. In particular, we have seen that each latent dimension quality. For this reason, we implemented MF, NMF, and SNMF using
indexed by k of MF is split up into two corresponding dimensions BPR as loss function and the Adam optimizer [15]. The implemen-
in the NMF representation. Following our previous interpretation, tations of NMF and SNMF differ from MF only in that they restrict
those dimensions stand for cohorts having complementary item the corresponding parameters to non-negative values using the
preferences. sigmoid function. Our implementation heavily relies on Spotlight
[18] and PyTorch [25]. In our experiment various batch sizes and at
Corollary. The expressive power, i.e., the number of possible total
last 3 different seeds as random initialization are used. In order to
rankings ⩾u that can be encoded, of MF is twice as high as in the case
provide a baseline, we also implemented a purely popularity-based
of NMF for a given latent vector length |K |. LDA4Rec has the same
recommender (Pop).
expressive power as NMF and the expressive power of MF is equivalent
to the expressive power of SNMF. 6.2.2 Transformation of Matrix Factorization to the Adjoint LDA4Rec
Formulation. Although we have mathematically proven in Sub-
It is important to note here that we have only proved that the
section 5.2 that an MF solution can be transformed to NMF and
personalized ranking ⩾u remains constant under some transforma-
subsequently also to an adjoint LDA formulation, floating-point
tions that allow us to express an MF as NMF or an adjoint LDA4Rec
arithmetic may pose challenges in a practical application. To follow
formulation. Since ⩾u is eventually the result of an optimization
up on this, we implemented the presented transformations as an
problem with some loss function L, e.g., BPR for MF or likelihood for
optional preprocessing step before the evaluation. This allows us to
LDA4Rec, we make no statement about maintaining the optimality
evaluate a solution obtained from MF directly and after the trans-
of some solution under these transformations.
formation in order to compare the resulting personalized rankings
⩾u , which should be equivalent in theory.
6 EVALUATION
To support our theoretical considerations with empirical results, 6.2.3 Comparison of LDA4Rec to Matrix Factorization. We imple-
several experiments were conducted with real-world datasets. The mented the LDA4Rec model as presented in Subsection 5.1 with the
source code of our implementation and the detailed results of all help of the Pyro deep universal probabilistic programming frame-
experiments are publicly available1 . work [2]. To cope with the high dimensionality low-sample size
setting, we decided for a stochastic variational inference (SVI) [13]
6.1 Datasets & Evaluation Metrics approach using the Adam optimizer [15]. Due to the presence of
discrete latent variables, a trace implementation of ELBO-based SVI
For our experiments, three different datasets were used. MovieLens-
[27, 35] with exhaustive enumeration over discrete sample sites was
1M encompasses approximately 1 million movie ratings across
chosen. To predict the personalized ranking scores x̂ui , we sampled
6,040 users and 3,706 movies while MovieLens-100K has roughly
items from the posterior predictive distribution [10] and counted
100 thousand interactions across 610 users and 9,724 movies [11].
the occurrences to obtain a personalized ranking. Ties were broken
Goodbooks has approximately 6 million interactions across 53,425
by adding a small non-negative random number.
users and 10,001 books [37]. We split these datasets randomly into
Our implementation and thus the evaluation of LDA4Rec turned
train, validation, and test sets using 90% of interactions for training,
out to be several orders of magnitude slower than the MF-based
and 5% each for validation and testing. The explicit feedback of
methods. For this reason, the experiments comparing LDA4Rec
these datasets was treated as implicit, i.e., the various user ratings
to MF-based methods were performed on the smaller MovieLens-
were converted to 1 representing an interaction while no rating
100K dataset. While the variational inference makes the training
means no interaction. Also, we limited the maximum number of
process quite fast, the bottleneck of LDA4Rec is the prediction of
interactions that a single user might have to 200 to avoid results
the personalized rankings for which we need a high number of
that are skewed towards users with a high number of interactions
samples per user to compute a stable ranking. Thus for each user
1 https://github.com/FlorianWilhelm/lda4rec 10,000 items were sampled.
59
RecSys ’21, September 27-October 1, 2021, Amsterdam, Netherlands Florian Wilhelm
Table 1: Comparison of different variants of matrix factorization with varying number of latent parameters |K |.
Goodbooks MovieLens-1M
|K | Model MRR@10 Prec@10 Recall@10 MRR@10 Prec@10 Recall@10
Pop 0.023918 0.027079 0.047867 0.033488 0.033084 0.065908
4 NMF 0.014388 0.022056 0.038420 0.032927 0.033031 0.066281
SNMF 0.036186 0.040912 0.072584 0.046642 0.046211 0.097219
MF 0.038901 0.044045 0.079124 0.050495 0.048702 0.103090
8 NMF 0.015121 0.019261 0.033310 0.033445 0.033191 0.066516
SNMF 0.042435 0.047185 0.085326 0.053639 0.052028 0.108542
MF 0.044683 0.049835 0.090115 0.058240 0.057044 0.119924
16 NMF 0.019945 0.026436 0.046623 0.033461 0.033351 0.066191
SNMF 0.049671 0.055800 0.101652 0.062695 0.061526 0.130733
MF 0.050875 0.057127 0.103747 0.063849 0.062131 0.131973
32 NMF 0.028766 0.028268 0.050012 0.033223 0.033155 0.064652
SNMF 0.055048 0.062215 0.113179 0.068511 0.066453 0.141667
MF 0.056080 0.062841 0.114629 0.064506 0.064888 0.138600
48 NMF 0.032190 0.033548 0.060065 0.032996 0.033084 0.066660
SNMF 0.058595 0.066861 0.122292 0.068369 0.068143 0.146653
MF 0.058730 0.066321 0.121440 0.067427 0.065777 0.143905
64 NMF 0.034171 0.039272 0.070492 0.032925 0.032978 0.066924
SNMF 0.061561 0.070156 0.128119 0.069775 0.069050 0.151497
MF 0.060261 0.068837 0.126254 0.067474 0.066489 0.145744
60
Matrix Factorization Formulated as an Adjoint LDA4Rec Model RecSys ’21, September 27-October 1, 2021, Amsterdam, Netherlands
7 CONCLUSION [7] Thiago de Paulo Faleiros and Alneu de Andrade Lopes. 2016. On the equiv-
alence between algorithms for Non-negative Matrix Factorization and Latent
From a theoretical point of view, we have discussed several vari- Dirichlet Allocation. In 24 th European Symposium on Artificial Neural Networks,
ants of matrix factorization, i.e., MF, SNMF, NMF, and introduced Computational Intelligence and Machine Learning.
[8] Chris Ding, Xiaofeng He, and Horst D. Simon. 2005. On the Equivalence of
the novel and interpretable LDA4Rec model, which extends the Nonnegative Matrix Factorization and Spectral Clustering. In Proceedings of the
traditional LDA by incorporating parameters for the popularity of 2005 SIAM International Conference on Data Mining. Society for Industrial and
items and conformity of users. We have proven that the personal- Applied Mathematics, 606–610. https://doi.org/10.1137/1.9781611972757.70
[9] C.H.Q. Ding, Tao Li, and M.I. Jordan. 2010. Convex and Semi-Nonnegative Matrix
ized ranking induced by MF can be transformed so that the same Factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence 32,
personalized ranking is induced by NMF as well as by an adjoint 1 (Jan. 2010), 45–55. https://doi.org/10.1109/TPAMI.2008.277
formulation corresponding to the parameters of LDA4Rec. The ad- [10] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 2004. Bayesian
Data Analysis (2nd ed. ed.). Chapman and Hall/CRC.
joint LDA4Rec formulation of an MF allows easy interpretation of [11] F. Maxwell Harper and Joseph A. Konstan. 2016. The MovieLens Datasets: History
its parameters without sacrificing accuracy. and Context. ACM Transactions on Interactive Intelligent Systems 5, 4 (Jan. 2016),
1–19. https://doi.org/10.1145/2827872
In several experiments, we have shown that SNMF performs [12] Antonio Hernando, Jesús Bobadilla, and Fernando Ortega. 2016. A non negative
slightly better than MF in same cases and is interpretable at the matrix factorization for collaborative filtering recommender systems based on a
same time. Our evaluations also show that the result obtained by Bayesian probabilistic model. Knowledge-Based Systems 97 (April 2016), 188–202.
https://doi.org/10.1016/j.knosys.2015.12.018
directly solving LDA4Rec outperforms MF with BPR loss while [13] Matthew D. Hoffman, David M. Blei, Chong Wang, and John Paisley. 2013. Sto-
being more interpretable. Our empirical results combined with the chastic Variational Inference. Journal of Machine Learning Research 14, 4 (2013),
derivation of LDA4Rec as a mathematical model suggest that its 1303–1347. http://jmlr.org/papers/v14/hoffman13a.html
[14] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for
generative process represents reality well and and thus provides Implicit Feedback Datasets. In 2008 Eighth IEEE International Conference on Data
means to interpret the results of traditional MF-based methods. Mining. Ieee, IEEE, Pisa, Italy, 263–272. https://doi.org/10.1109/ICDM.2008.22
[15] Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimiza-
Following on from this and assuming that the unknown, real- tion. International Conference on Learning Representations (12 2014).
world process behind implicit user feedback is actually well rep- [16] Yehuda Koren. 2009. The bellkor solution to the netflix grand prize. Netflix prize
resented by LDA4Rec, some conclusion about the effectiveness documentation 81, 2009 (2009), 1–10.
[17] Yehuda Koren and Robert Bell. 2015. Advances in Collaborative Filtering. In Rec-
of Neural Collaborative Filtering (NCF) can also be drawn. NCF ommender Systems Handbook, Francesco Ricci, Lior Rokach, and Bracha Shapira
replaces the scalar product of MF with a learned similarity, e.g., (Eds.). Springer US, Boston, MA, 77–118. https://doi.org/10.1007/978-1-4899-
using a multi-layer perceptron (MLP). Rendle et al. [30] show in a 7637-6_3
[18] Maciej Kula. 2017. Spotlight. https://github.com/maciejkula/spotlight.
reproducibility paper that the scalar product outperforms several [19] Daniel Lee and Hyunjune Seung. 2001. Algorithms for Non-negative Matrix
NCF-based methods and that it should thus be the default choice for Factorization. Adv. Neural Inform. Process. Syst. 13 (02 2001).
[20] Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects
combining embeddings, i.e., vectors of the latent factors. Similar re- by non-negative matrix factorization. Nature 401, 6755 (Oct. 1999), 788–791.
sults demonstrating the effectiveness of simple factorization-based https://doi.org/10.1038/44565
models are shown by Dacrema et al. [5]. Our work underpins these [21] Joonseok Lee, Mingxuan Sun, and Guy Lebanon. 2012. A Comparative Study
of Collaborative Filtering Algorithms. arXiv:1205.3193 [cs, stat] (May 2012).
findings as the scalar product of embeddings can be interpreted as http://arxiv.org/abs/1205.3193 arXiv: 1205.3193.
a mixture of several preferences thus explaining its effectiveness. [22] Henry W. Lin, Max Tegmark, and David Rolnick. 2017. Why does deep and cheap
Since learning a multiplication and also a scalar product is possible learning work so well? Journal of Statistical Physics 168, 6 (Sept. 2017), 1223–1247.
https://doi.org/10.1007/s10955-017-1836-5 arXiv: 1608.08225.
in theory [22] but proves difficult in practice [1, 30, 33] for an MLP, [23] Sergey Nikolenko. 2015. SVD-LDA: Topic Modeling for Full-Text Recommender
MF-based methods will continue to have an advantage over NCF Systems. In Advances in Artificial Intelligence and Its Applications, Obdulia
Pichardo Lagunas, Oscar Herrera Alcántara, and Gustavo Arroyo Figueroa (Eds.).
under this assumption. Vol. 9414. Springer International Publishing, Cham, 67–79. https://doi.org/10.
1007/978-3-319-27101-9_5 Series Title: Lecture Notes in Computer Science.
[24] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N. Liu, Rajan Lukose, Martin Scholz,
REFERENCES and Qiang Yang. 2008. One-Class Collaborative Filtering. In 2008 Eighth IEEE
[1] Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H. International Conference on Data Mining. IEEE, Pisa, Italy, 502–511. https://doi.
Chi. 2018. Latent Cross: Making Use of Context in Recurrent Recommender org/10.1109/ICDM.2008.16
Systems. In Proceedings of the Eleventh ACM International Conference on Web [25] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory
Search and Data Mining. ACM, Marina Del Rey CA USA, 46–54. https://doi.org/ Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban
10.1145/3159652.3159727 Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan
[2] Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith
Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning
Noah D. Goodman. 2018. Pyro: Deep Universal Probabilistic Programming. Library. arXiv:1912.01703 [cs.LG]
arXiv:1810.09538 [cs, stat] (Oct. 2018). http://arxiv.org/abs/1810.09538 arXiv: [26] Arkadiusz Paterek. 2007. Improving regularized singular value decomposition
1810.09538. for collaborative filtering. Proceedings of KDD cup and workshop vol. 2007 (2007),
[3] David M. Blei, Andres Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet pp. 5–8.
Allocation. Journal of Machine Learning Research 3 (2003), pp. 993–1022. Issue [27] Rajesh Ranganath, Sean Gerrish, and David M. Blei. 2013. Black Box Variational
4-5. Inference. arXiv:1401.0118 [cs, stat] (Dec. 2013). http://arxiv.org/abs/1401.0118
[4] Chaitanya Chemudugunta, Padhraic Smyth, and Mark Steyvers. 2007. Modeling arXiv: 1401.0118.
General and Specific Aspects of Documents with a Probabilistic Topic Model. [28] Vidyadhar Rao, KV Rosni, and Vineet Padmanabhan. 2017. Divide and Transfer:
Advances in Neural Information Processing Systems 19 (2007). https://proceedings. Understanding Latent Factors for Recommendation Tasks. In RecSysKTL. 1–8.
neurips.cc/paper/2006/file/ec47a5de1ebd60f559fee4afd739d59b-Paper.pdf [29] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
[5] Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. 2019. Are We 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. (01 2009),
Really Making Much Progress? A Worrying Analysis of Recent Neural Recom- 452–461.
mendation Approaches. Proceedings of the 13th ACM Conference on Recommender [30] Steffen Rendle, Walid Krichene, Li Zhang, and John Anderson. 2020. Neural
Systems - RecSys ’19 (2019), 101–109. https://doi.org/10.1145/3298689.3347058 Collaborative Filtering vs. Matrix Factorization Revisited. In Fourteenth ACM
arXiv: 1907.06902. Conference on Recommender Systems (Virtual Event, Brazil) (RecSys ’20). As-
[6] Anupam Datta, Sophia Kovaleva, Piotr Mardziel, and Shayak Sen. 2018. Latent sociation for Computing Machinery, New York, NY, USA, 240–248. https:
Factor Interpretations for Collaborative Filtering. arXiv:1711.10816 [cs] (April //doi.org/10.1145/3383313.3412488
2018). http://arxiv.org/abs/1711.10816 arXiv: 1711.10816.
61
RecSys ’21, September 27-October 1, 2021, Amsterdam, Netherlands Florian Wilhelm
[31] Cynthia Rudin. 2019. Stop explaining black box machine learning models for [35] David Wingate and Theophane Weber. 2013. Automated Variational Inference
high stakes decisions and use interpretable models instead. Nature Machine in Probabilistic Programming. arXiv:1301.1299 [cs, stat] (Jan. 2013). http://arxiv.
Intelligence 1, 5 (May 2019), 206–215. https://doi.org/10.1038/s42256-019-0048-x org/abs/1301.1299 arXiv: 1301.1299.
[32] Ruslan Salakhutdinov and Andriy Mnih. 2007. Probabilistic Matrix Factorization. [36] WenBo Xie, Qiang Dong, and Hui Gao. 2014. A Probabilistic Recommendation
In Proceedings of the 20th International Conference on Neural Information Processing Method Inspired by Latent Dirichlet Allocation Model. Mathematical Problems
Systems (Vancouver, British Columbia, Canada) (NIPS’07). Curran Associates Inc., in Engineering 2014 (2014), 1–10. https://doi.org/10.1155/2014/979147
Red Hook, NY, USA, 1257–1264. [37] Zygmunt Zajac. 2017. Goodbooks-10k: a new dataset for book recommendations.
[33] Andrew Trask, Felix Hill, Scott E Reed, Jack Rae, Chris Dyer, and Phil Blunsom. http://fastml.com/goodbooks-10k. FastML (2017).
2018. Neural Arithmetic Logic Units. (2018), 10. [38] Sheng Zhang, Weihong Wang, James Ford, and Fillia Makedon. 2006. Learning
[34] Chong Wang and David M. Blei. 2011. Collaborative topic modeling for recom- from Incomplete Ratings Using Non-negative Matrix Factorization. In Proceedings
mending scientific articles. In Proceedings of the 17th ACM SIGKDD international of the 2006 SIAM International Conference on Data Mining. Society for Industrial
conference on Knowledge discovery and data mining - KDD ’11. ACM Press, San and Applied Mathematics, 549–553. https://doi.org/10.1137/1.9781611972764.58
Diego, California, USA, 448. https://doi.org/10.1145/2020408.2020480
62