Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2736277.2741080acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web

Published: 18 May 2015 Publication History

Abstract

When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful for e.g., improving underlying network structures, predicting user clicks or enhancing recommendations. In this work, we present a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our approach utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to leverage the sensitivity of Bayes factors on the prior for comparing hypotheses with each other. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including website navigation, business reviews and online music played. Our work expands the repertoire of methods available for studying human trails on the Web.

References

[1]
D. Achlioptas. Database-friendly random projections. In Symposium on Principles of Database Systems, pages 274--281. ACM, 2001.
[2]
J. An, D. Quercia, and J. Crowcroft. Partisan sharing: facebook evidence and societal consequences. In Conference on Online Social Networks, pages 13--24. ACM, 2014.
[3]
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, 1999.
[4]
T. Berners-Lee and M. Fischetti. Weaving the Web: The original design and ultimate destiny of the World Wide Web by its inventor. Harper Information, 2000.
[5]
M. Bilenko and R. W. White. Mining the search trails of surfing crowds: identifying relevant websites from user activity. In International Conference on World Wide Web, pages 51--60. ACM, 2008.
[6]
J. Borges and M. Levene. Data mining of user navigation patterns. In Web usage analysis and user profiling, pages 92--112. Springer, 2000.
[7]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In International Conference on World Wide Web, pages 107--117. Elsevier Science Publishers B. V., 1998.
[8]
D. P. Brumby and A. Howes. Good enough but i'll just check: Web-page search as attentional refocusing. In International Conference on Cognitive Modeling, pages 46--51, 2004.
[9]
V. Bush. As we may think. The Atlantic Monthly, 176(1):101--108, 1945.
[10]
L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the world-wide web. Computer Networks and ISDN Systems, 27(6):1065--1073, 1995.
[11]
O. Celma. Music Recommendation and Discovery in the Long Tail. Springer, 2010.
[12]
M. Chalmers, K. Rodden, and D. Brodbeck. The order of things: activity-centred information access. Computer Networks and ISDN Systems, 30(1):359--367, 1998.
[13]
E. H. Chi, P. L. T. Pirolli, K. Chen, and J. Pitkow. Using information scent to model user information needs and actions and the web. In Conference on Human Factors in Computing Systems, pages 490--497. ACM, 2001.
[14]
F. Chierichetti, R. Kumar, P. Raghavan, and T. Sarlos. Are web users really markovian? In International Conference on World Wide Web, pages 609--618. ACM, 2012.
[15]
S. Dasgupta and A. Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60--65, 2003.
[16]
C. Davidson-Pilon. Probablistic Programming & Bayesian Methods for Hackers. 2014.
[17]
M. Deshpande and G. Karypis. Selective markov models for predicting web page accesses. ACM Transactions on Internet Technology, 4(2):163--184, May 2004.
[18]
P. H. Garthwaite, J. B. Kadane, and A. O'Hagan. Statistical methods for eliciting probability distributions. Journal of the American Statistical Association, 100(470):680--701, 2005.
[19]
S. Gore. Biostatistics and the medical research council. Medical Research Council News, 35:19--20, 1987.
[20]
B. A. Huberman, P. L. T. Pirolli, J. E. Pitkow, and R. M. Lukose. Strong regularities in world wide web surfing. Science, 280(5360):95--97, Mar 1998.
[21]
R. E. Kass and A. E. Raftery. Bayes factors. Journal of the American Statistical Association, 90(430):773--795, 1995.
[22]
S. Laxman, V. Tankasali, and R. W. White. Stream prediction using a generative model based on frequent episodes in event sequences. In International Conference on Knowledge Discovery and Data Mining, pages 453--461. ACM, 2008.
[23]
R. Lempel and S. Moran. The stochastic approach for link-structure analysis (salsa) and the tkc effect. Computer Networks, 33(1):387--401, June 2000.
[24]
P. Li, T. J. Hastie, and K. W. Church. Very sparse random projections. In International Conference on Knowledge Discovery and Data Mining, pages 287--296. ACM, 2006.
[25]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.
[26]
Y. Matsubara, Y. Sakurai, C. Faloutsos, T. Iwata, and M. Yoshikawa. Fast mining and forecasting of complex time-stamped events. In International Conference on Knowledge Discovery and Data Mining, pages 271--279. ACM, 2012.
[27]
T. H. Nelson. Complex information processing: a file structure for the complex, the changing and the indeterminate. In National Conference, pages 84--100. ACM, 1965.
[28]
J. Oakley. Eliciting univariate probability distributions. Rethinking Risk Measurement and Reporting, 1, 2010.
[29]
B. J. Pierce, S. R. Parkinson, and N. Sisson. Effects of semantic similarity, omission probability and number of alternatives in computer menu search. International Journal of Man-Machine Studies, 37(5):653--677, 1992.
[30]
P. L. T. Pirolli and S. K. Card. Information foraging. Psychological Review, 106(4):643--675, 1999.
[31]
P. L. T. Pirolli and J. E. Pitkow. Distributions of surfers? paths through the world wide web: Empirical characterizations. World Wide Web, 2(1-2):29--45, Jan 1999.
[32]
D. d. S. Price. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5):292--306, 1976.
[33]
H. Rubenstein and J. B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627--633, 1965.
[34]
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988.
[35]
P. Singer, D. Helic, B. Taraghi, and M. Strohmaier. Detecting memory and structure in human navigation patterns using markov chain models of varying order. PloS one, 9(7):e102070, 2014.
[36]
P. Singer, T. Niebler, M. Strohmaier, and A. Hotho. Computing semantic relatedness from human navigational paths: A case study on wikipedia. International Journal on Semantic Web and Information Systems, 9(4):41--70, 2013.
[37]
R. W. Sinnott. Virtues of the haversine. Sky and Telescope, 68(2):158, 1984.
[38]
C. C. Strelioff, J. P. Crutchfield, and A. W. Hübler. Inferring markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. Physical Review E, 76(1):011106, Jul 2007.
[39]
W. Vanpaemel. Prior sensitivity in theory testing: An apologia for the bayes factor. Journal of Mathematical Psychology, 54(6):491--498, 2010.
[40]
W. Vanpaemel. Constructing informative model priors using hierarchical methods. Journal of Mathematical Psychology, 55(1):106--117, 2011.
[41]
W. Vanpaemel and M. D. Lee. Using priors to formalize theory: Optimal attention and the generalized context model. Psychonomic Bulletin & Review, 19(6):1047--1056, 2012.
[42]
S. Walk, P. Singer, and M. Strohmaier. Sequential action patterns in collaborative ontology-engineering projects: A case-study in the biomedical domain. In International Conference on Conference on Information & Knowledge Management. ACM, 2014.
[43]
S. Walk, P. Singer, M. Strohmaier, T. Tudorache, M. A. Musen, and N. F. Noy. Discovering beaten paths in collaborative ontology-engineering projects using markov chains. Journal of Biomedical Informatics, 51:254--271, 2014.
[44]
L. Wasserman. Bayesian model selection and model averaging. Journal of Mathematical Psychology, 44(1):92--107, 2000.
[45]
R. West and J. Leskovec. Human wayfinding in information networks. In International Conference on World Wide Web, pages 619--628. ACM, 2012.
[46]
R. West, J. Pineau, and D. Precup. Wikispeedia: An online game for inferring semantic distances between concepts. In International Joint Conference on Artificial Intelligence, pages 1598--1603. Morgan Kaufmann Publishers Inc., 2009.
[47]
R. W. White and J. Huang. Assessing the scenic route: measuring the value of search trails in web logs. In Conference on Research and Development in Information Retrieval, pages 587--594. ACM, 2010.
[48]
W. Xie, P. O. Lewis, Y. Fan, L. Kuo, and M.-H. Chen. Improving marginal likelihood estimation for bayesian phylogenetic model selection. Systematic Biology, 60(2):150--160, 2010.
[49]
J. Yang, J. McAuley, J. Leskovec, P. LePendu, and N. Shah. Finding progression stages in time-evolving event sequences. In International Conference on World Wide Web, pages 783--794. ACM, 2014.

Cited By

View all
  • (2024)CompTrails: comparing hypotheses across behavioral networksData Mining and Knowledge Discovery10.1007/s10618-023-00996-838:3(1258-1288)Online publication date: 1-May-2024
  • (2023)Bayesian estimation of decay parameters in Hawkes processesIntelligent Data Analysis10.3233/IDA-21628327:1(223-240)Online publication date: 30-Jan-2023
  • (2022)Wikipedia Reader NavigationProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498496(16-26)Online publication date: 11-Feb-2022
  • Show More Cited By

Index Terms

  1. HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '15: Proceedings of the 24th International Conference on World Wide Web
    May 2015
    1460 pages
    ISBN:9781450334693

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 18 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bayesian statistics
    2. human trails
    3. hypotheses
    4. markov chain
    5. paths
    6. sequences
    7. sequential human behavior
    8. web

    Qualifiers

    • Research-article

    Funding Sources

    • DFG German Science Fund
    • FWF Austrian Science Fund

    Conference

    WWW '15
    Sponsor:
    • IW3C2

    Acceptance Rates

    WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 18 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CompTrails: comparing hypotheses across behavioral networksData Mining and Knowledge Discovery10.1007/s10618-023-00996-838:3(1258-1288)Online publication date: 1-May-2024
    • (2023)Bayesian estimation of decay parameters in Hawkes processesIntelligent Data Analysis10.3233/IDA-21628327:1(223-240)Online publication date: 30-Jan-2023
    • (2022)Wikipedia Reader NavigationProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498496(16-26)Online publication date: 11-Feb-2022
    • (2022)Learning the Markov Order of Paths in GraphsProceedings of the ACM Web Conference 202210.1145/3485447.3512091(1559-1569)Online publication date: 25-Apr-2022
    • (2022)VIBE: A Design Space for VIsual Belief Elicitation in Data JournalismComputer Graphics Forum10.1111/cgf.1455641:3(477-488)Online publication date: 12-Aug-2022
    • (2022)Predictivity of Category Based Human Navigation and the Effect of Navigation Path Length on the Prediction Accuracy in Knowledge Networks2022 International Conference on Communications, Information, Electronic and Energy Systems (CIEES)10.1109/CIEES55704.2022.9990641(1-6)Online publication date: 24-Nov-2022
    • (2021)From Symbols to Embeddings: A Tale of Two Representations in Computational Social ScienceJournal of Social Computing10.23919/JSC.2021.00112:2(103-156)Online publication date: Jun-2021
    • (2021)Proximity dimensions and the emergence of collaboration: a HypTrails study on German AI researchScientometrics10.1007/s11192-021-03922-1126:12(9847-9868)Online publication date: 1-Dec-2021
    • (2021)Detection of Asynchronous Concatenation Emergent Behaviour in Multi-Agent SystemsAgents and Multi-Agent Systems: Technologies and Applications 202110.1007/978-981-16-2994-5_7(77-88)Online publication date: 8-Jun-2021
    • (2019)Behavior Analysis for Electronic Commerce Trading Systems: A SurveyIEEE Access10.1109/ACCESS.2019.29332477(108703-108728)Online publication date: 2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media