Abstract
Facilitating a satisfying user experience requires a detailed understanding of user behavior and intentions. The key is to leverage observations of activities, usually the clicks performed on Web pages. A common approach is to transform user sessions into Markov chains and analyze them using mixture models. However, model selection and interpretability of the results are often limiting factors. As a remedy, we present a Bayesian nonparametric approach to group user sessions and devise behavioral patterns. Empirical results on a social network and an electronic text book show that our approach reliably identifies underlying behavioral patterns and proves more robust than baseline competitors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We focus on first-order dependencies but the approach is easily generalized to higher-order models; notation is quickly getting messy though.
References
Mitchell, A., Olmstead, K., Purcell, K., Rainie, L., Rosenstiel, T.: Understanding the participatory news consumer (2010)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
Cadez, I., Heckerman, D., Meek, C., Smyth, P., White, S.: Visualization of navigation patterns on a web site using model-based clustering. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 280–284 (2000)
Ishwaran, H., Zarepour, M.: Exact and approximate sum representations for the dirichlet process. Can. J. Statistics/La Revue Canadienne de Statistique 30(2), 269–283 (2002)
Schreiber, W., Sochatzy, F., Ventzke, M.: Das multimediale Schulbuch - kompetenzorientiert, individualisierbar und konstruktionstransparent. In: Analyse von Schulbüchern als Grundlage empirischer Geschichtsdidaktik, pp. 212–232 (2013)
Pirolli, P.L., Pitkow, J.E.: Distributions of surfers’ paths through the world wide web: empirical characterizations. World Wide Web 2(1–2), 29–45 (1999)
Manavoglu, E., Pavlov, D., Giles, C. L.: Probabilistic user behavior models. In: Third IEEE International Conference on Data Mining, ICDM 2003. IEEE (2003)
Ypma, A., Heskes, T.: Automatic categorization of web pages and user clustering with mixtures of hidden markov models. In: Zaïane, O.R., Srivastava, J., Spiliopoulou, M., Masand, B. (eds.) WebKDD 2002. LNCS (LNAI), vol. 2703, pp. 35–49. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39663-5_3
Deshpande, M., Karypis, G.: Selective markov models for predicting web page accesses. ACM Trans. Internet Technol. (TOIT) 4(2), 163–184 (2004)
Mochihashi, D., Sumita, E.: The infinite markov model. In: NIPS, pp. 1017–1024 (2007)
Bühlmann, P., Wyner, A.J.: Variable length markov chains. Ann. Stat. 27(2), 480–513 (1999)
Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order markov models. J. Artif. Intell. Res. 22, 385–421 (2004)
Dubey, A., Hwang, S., Rangel, C., Rasmussen, C.E., Ghahramani, Z., Wild, D.L.: Clustering protein sequence and structure space with infinite gaussian mixture models. In: Pacific Symposium on Biocomputing, pp. 399–410 (2003)
Brown, D.P.: Efficient functional clustering of protein sequences using the dirichlet process. Bioinformatics 24(16), 1765–1771 (2008)
Paul, T., Puscher, D., Strufe, T.: Improving the Usability of Privacy Settings in Facebook. CoRR (2011)
Du, N., Farajtabar, M., Ahmed, A., Smola, A.J., Song, L.: Dirichlet-Hawkes processes with applications to clustering continuous-time document streams. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 219–228 (2015)
Giraud, C.: Introduction to High-dimensional Statistics, vol. 138. CRC Press, Boca Raton (2014)
Cocea, M., Weibelzahl, S.: Cross-system validation of engagement prediction from log files. In: Duval, E., Klamma, R., Wolpers, M. (eds.) EC-TEL 2007. LNCS, vol. 4753, pp. 14–25. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75195-3_2
Salmeron-Majadas, S., Santos, O.C., Boticario, J.G.: Exploring indicators from keyboard and mouse interactions to predict the user affective state. In: Educational Data Mining (2014)
Kurihara, K., Welling, M., Teh, Y.W.: Collapsed variational dirichlet process mixture models. In: IJCAI 2007, pp. 2796–2801 (2007)
Olkin, I., Pratt, J.W.: Unbiased estimation of certain correlation coefficients. Ann. Math. Stat. 29(1), 201–211 (1958)
Haider, P., Chiarandini, L., Brefeld, B.: Discriminative clustering for market segmentation. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2012)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Ser. B (methodological) 39(1), 1–38 (1977)
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. control 19(6), 716–723 (1974)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Roberts, G.O., Smith, A.: Simple conditions for the convergence of the gibbs sampler and metropolis-hastings algorithms. Stoch. Processes Appl. 49(2), 207–216 (1994)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Baker, F.B.: The basics of item response theory (2001). For full text: http://ericae.net/irt/baker
DeMars, C.: Item Response Theory. Oxford University Press, New York (2010)
Acknowledgements
This research has been funded in parts by the German Science Foundation DFG under grant GRK/1907 and by the German Federal Ministry of Education and Science BMBF under grant QQM/01LSA1503C.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Reubold, J., Boubekki, A., Strufe, T., Brefeld, U. (2018). Infinite Mixtures of Markov Chains. In: Appice, A., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2017. Lecture Notes in Computer Science(), vol 10785. Springer, Cham. https://doi.org/10.1007/978-3-319-78680-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-78680-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78679-7
Online ISBN: 978-3-319-78680-3
eBook Packages: Computer ScienceComputer Science (R0)