Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Large-Scale Characterization of How Readers Browse Wikipedia

Published: 03 April 2023 Publication History

Abstract

Despite the importance and pervasiveness of Wikipedia as one of the largest platforms for open knowledge, surprisingly little is known about how people navigate its content when seeking information. To bridge this gap, we present the first systematic large-scale analysis of how readers browse Wikipedia. Using billions of page requests from Wikipedia’s server logs, we measure how readers reach articles, how they transition between articles, and how these patterns combine into more complex navigation paths. We find that navigation behavior is characterized by highly diverse structures. Although most navigation paths are shallow, comprising a single pageload, there is much variety, and the depth and shape of paths vary systematically with topic, device type, and time of day. We show that Wikipedia navigation paths commonly mesh with external pages as part of a larger online ecosystem, and we describe how naturally occurring navigation paths are distinct from targeted navigation in lab-based settings. Our results further suggest that navigation is abandoned when readers reach low-quality pages. Taken together, these insights contribute to a more systematic understanding of readers’ information needs and allow for improving their experience on Wikipedia and the Web in general.

References

[1]
Ashton Anderson, Ravi Kumar, Andrew Tomkins, and Sergei Vassilvitskii. 2014. The dynamics of repeat consumption. In Proceedings of the International World Wide Web Conference (WWW’14).
[2]
Dan Andreescu, Kinneret Gordon, Isaac Johnson, and Nicholas Perry. 2021. Searching for Wikipedia. Retrieved October 13, 2021 from https://techblog.wikimedia.org/2021/06/07/search ing-for-wikipedia/. Accessed 25 January 2023.
[3]
Akhil Arora, Martin Gerlach, Tiziano Piccardi, Alberto García-Durán, and Robert West. 2022. Wikipedia reader navigation: When synthetic data is enough. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM’22). Association for Computing Machinery, New York, NY, 16–26.
[4]
Mamoun A. Awad and Latifur R. Khan. 2007. Web navigation prediction using multiple evidence combination and domain knowledge. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 37, 6 (2007), 1054–1062.
[5]
Marcia J. Bates. 1989. The design of browsing and berrypicking techniques for the online search interface. Online Review.
[6]
Austin R. Benson, Ravi Kumar, and Andrew Tomkins. 2016. Modeling user consumption sequences. In Proceedings of the International World Wide Web Conference (WWW’16).
[7]
Mikhail Bilenko and Ryen W. White. 2008. Mining the search trails of surfing crowds: Identifying relevant websites from user activity. In Proceedings of the 17th International Conference on World Wide Web. 51–60.
[8]
Vannevar Bush. 1945. As we may think. The Atlantic Monthly 176, 1 (1945), 101–108.
[9]
Ed H. Chi, Peter Pirolli, Kim Chen, and James Pitkow. 2001. Using information scent to model user information needs and actions and the web. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 490–497.
[10]
Flavio Chierichetti, Ravi Kumar, Prabhakar Raghavan, and Tamas Sarlos. 2012. Are web users really Markovian?. In Proceedings of the International World Wide WebConference (WWW’12).
[11]
Alexander Dallmann, Thomas Niebler, Florian Lemmerich, and Andreas Hotho. 2016. Extracting semantics from random walks on Wikipedia: Comparing learning and counting methods. In Proceedings of the Conference on Web and Social Media (ICWSM’16).
[12]
Mukund Deshpande and George Karypis. 2004. Selective Markov models for predicting web page accesses. ACM Transactions on Internet Technology (TOIT) 4, 2 (2004), 163–184.
[13]
Dimitar Dimitrov, Florian Lemmerich, Fabian Flöck, and Markus Strohmaier. 2018. Query for architecture, click through military: Comparing the roles of search and navigation on Wikipedia. In Proceedings of the Conference on Web Science (WebSci’18).
[14]
Dimitar Dimitrov, Philipp Singer, Florian Lemmerich, and Markus Strohmaier. 2017. What makes a link successful on Wikipedia?. In Proceedings of the International World Wide Web Conference (WWW’17).
[15]
Carsten Eickhoff, Jaime Teevan, Ryen White, and Susan Dumais. 2014. Lessons from the journey: A query log analysis of within-session learning. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining. 223–232.
[16]
Wikimedia Foundation. 2019. Medium-term plan 2019: The model for engagement. Retrieved October 13, 2021 from https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Medium-term_plan_2019#The_model_for_engagement. Accessed: 25 January 2023.
[17]
Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White. 2005. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS) 23, 2 (2005), 147–168.
[18]
Ruili Geng and Jeff Tian. 2015. Improving web navigation usability by comparing actual and anticipated usage. IEEE Transactions on Human-Machine Systems 45, 1 (2015), 84–94.
[19]
Patrick Gildersleve and Taha Yasseri. 2018. Inspiration, captivation, and misdirection: Emergent properties in networks of online navigation. Complex Networks IX (2018), 271–282.
[20]
Aaron Halfaker. 2017. Interpolating quality dynamics in Wikipedia and demonstrating the Keilana effect. In Proceedings of the International Symposium on Open Collaboration (OpenSym’17).
[21]
Aaron Halfaker and R. Stuart Geiger. 2019. ORES: Lowering barriers with participatory machine learning in Wikipedia. In Proceedings of the Human-Computer Interaction (HCI’19).
[22]
Aaron Halfaker, Os Keyes, Daniel Kluver, Jacob Thebault-Spieker, Tien Nguyen, Kenneth Shores, Anuradha Uduwage, and Morten Warncke-Wang. 2015. User session identification based on strong regularities in inter-activity time. In Proceedings of the International World Wide Web Conference (WWW’15).
[23]
Denis Helic. 2012. Analyzing user click paths in a Wikipedia navigation game. In Proceedings of the International Convention MIPRO.
[24]
Hostinger Tutorials. 2022. The most visited website in every country (that isn’t a search engine). https://www.hostinger.com/tutorials/the-most-visited-website-in-every-country.
[25]
Jeff Huang and Ryen W. White. 2010. Parallel browsing behavior on the web. In Proceedings of the 21st ACM Conference on Hypertext and Hypermedia. 13–18.
[26]
Luis-Daniel Ibáñez and Elena Simperl. 2022. A comparison of dataset search behaviour of internal versus search engine referred sessions. In ACM SIGIR Conference on Human Information Interaction and Retrieval. 158–168.
[27]
Daxin Jiang, Jian Pei, and Hang Li. 2013. Mining search and browse logs for web search: A survey. ACM Transactions on Intelligent Systems and Technology (TIST) 4, 4 (2013), 1–37.
[28]
Honey Jindal, Neetu Sardana, and Raghav Mehta. 2020. Efficient web navigation prediction using hybrid models based on multiple evidence combinations. International Journal of Computers and Applications 42, 7 (2020), 715–728.
[29]
Isaac Johnson, Florian Lemmerich, Diego Sáez-Trumper, Robert West, Markus Strohmaier, and Leila Zia. 2020. Global gender differences in Wikipedia readership. In Proceedings of the Conference on Web and Social Media (ICWSM’20).
[30]
Faten Khalil, Jiuyong Li, and Hua Wang. 2009. An integrated model for next page access prediction. International Journal of Knowledge and Web Intelligence 1, 1-2 (2009), 48–80.
[31]
Muneo Kitajima, Marilyn H. Blackmon, and Peter G. Polson. 2000. A comprehension-based model of web navigation and its application to web usability analysis. In People and Computers XIV—Usability or Else!Springer, 357–373.
[32]
Tobias Koopmann, Alexander Dallmann, Lena Hettinger, Thomas Niebler, and Andreas Hotho. 2019. On the right track! Analysing and predicting navigation success in Wikipedia. In Proceedings of the Conference on Hypertext and Social Media (HT’19).
[33]
Sean Kross, Eszter Hargittai, and Elissa M. Redmiles. 2021. Characterizing the online learning landscape: What and how people learn online. ACM Human-Computer Interaction 5, CSCW1 (Feb.2021), 19.
[34]
Juhi Kulshrestha, Marcos Oliveira, Orkut Karacalik, Denis Bonnay, and Claudia Wagner. 2020. Web routineness and limits of predictability: Investigating demographic and behavioral differences using web tracking data. 15 (2021), 327–338.
[35]
Daniel Lamprecht, Dimitar Dimitrov, Denis Helic, and Markus Strohmaier. 2016. Evaluating and improving navigability of Wikipedia: A comparative study of eight language editions. In Proceedings of the International Symposium on Open Collaboration (OpenSym’16).
[36]
Daniel Lamprecht, Kristina Lerman, Denis Helic, and Markus Strohmaier. 2017. How the structure of Wikipedia articles influences user navigation. New Review of Hypermedia and Multimedia 23, 1 (2017), 29–50.
[37]
David Lazer, Eszter Hargittai, Deen Freelon, Sandra Gonzalez-Bailon, Kevin Munger, Katherine Ognyanova, and Jason Radford. 2021. Meaningful measures of human society in the twenty-first century. Nature 595, 7866 (2021), 189–196.
[38]
David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343, 6176 (2014), 1203–1205.
[39]
Janette Lehmann, Claudia Müller-Birn, David Laniado, Mounia Lalmas, and Andreas Kaltenbrunner. 2014. Reader preferences and behavior on Wikipedia. In Proceedings of the Conference on Hypertext and Social Media (HT’14).
[40]
Florian Lemmerich, Diego Sáez-Trumper, Robert West, and Leila Zia. 2019. Why the world reads Wikipedia: Beyond English speakers. In Proceedings of the International Conference on Web Search and Data Mining (WSDM’19).
[41]
David M. Lydon-Staley, Dale Zhou, Ann Sizemore Blevins, Perry Zurn, and Danielle S. Bassett. 2021. Hunters, busybodies and the knowledge network building associated with deprivation curiosity. Nature Human Behaviour 5, 3 (2021), 327–336.
[42]
Nizar R. Mabroukeh and Christie I. Ezeife. 2009. Semantic-rich Markov models for web prefetching. In Proceedings of the International Conference on Data Mining Workshops (ICDMW’09). IEEE, 465–470.
[43]
Fritz Machlup. 1983. The study of information: Interdisciplinary messages.
[44]
Lauren A. Maggio, Ryan M. Steinberg, Tiziano Piccardi, and John M. Willinsky. 2020. Meta-research: Reader engagement with medical content on Wikipedia. Elife 9 (2020), e52426.
[45]
M. Mangel, W. H. Satterthwaite, P. Pirolli, B. Suh, and Y. Zhang. 2013. Invasion biology and the success of social collaboration networks, with application to Wikipedia. Israel Journal of Ecology and Evolution 59, 1 (2013), 17–26.
[46]
Connor McMahon, Isaac Johnson, and Brent Hecht. 2017. The substantial interdependence of Wikipedia and Google: A case study on the relationship between peer production communities and information technologies. In Proceedings of the Conference on Web and Social Media (ICWSM’17).
[47]
Blagoj Mitrevski, Tiziano Piccardi, and Robert West. 2020. WikiHist.html: English Wikipedia’s full revision history in HTML format. In Proceedings of the Conference on Web and Social Media (ICWSM’20).
[48]
Jack Muramatsu and Wanda Pratt. 2001. Transparent queries: Investigation users’ mental models of search engines. In Conference on Research & Development in Information Retrieval (SIGIR’01).
[49]
Meera Narvekar and Shaikh Sakina Banu. 2015. Predicting user’s web navigation behavior using hybrid approach. Procedia Computer Science 45 (2015), 3–12.
[50]
Richard E. Nisbett and Timothy D. Wilson. 1977. Telling more than we can know: Verbal reports on mental processes. Psychological Review 84, 3 (1977), 231.
[51]
Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kıcıman. 2019. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data 2 (2019), 13.
[52]
Ashwin Paranjape, Robert West, Leila Zia, and Jure Leskovec. 2016. Improving website hyperlink structure using server logs. In Proceedings of the International Conference on Web Search and Data Mining (WSDM’16).
[53]
Tiziano Piccardi, Michele Catasta, Leila Zia, and Robert West. 2018. Structuring Wikipedia articles with section recommendations. In Conference on Research & Development in Information Retrieval (SIGIR’18).
[54]
Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, and Robert West. 2020. Quantifying engagement with citations on Wikipedia. In Proceedings of the International World Wide Web Conference (WWW’20).
[55]
Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, and Robert West. 2021. On the value of Wikipedia as a gateway to the web. In Proceedings of the International World Wide Web Conference (WWW’21).
[56]
Tiziano Piccardi and Robert West. 2021. Crosslingual topic modeling with wikiPDA. In Proceedings of the International World Wide Web Conference (WWW’21).
[57]
Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological Review 106, 4 (1999), 643.
[58]
Peter L. T. Pirolli and James E. Pitkow. 1999. Distributions of surfers’ paths through the world wide web: Empirical characterizations. World Wide Web 2, 1 (1999), 29–45.
[59]
Yan Qu and George W. Furnas. 2008. Model-driven formative evaluation of exploratory search: A study under a sensemaking framework. Information Processing & Management 44, 2 (2008), 534–555.
[60]
Miriam Redi, Martin Gerlach, Isaac Johnson, Jonathan Morgan, and Leila Zia. 2020. A taxonomy of knowledge gaps for Wikimedia projects (second draft). (Aug. 2020). arXiv:2008.12314.
[61]
Giovanna Chiara Rodi, Vittorio Loreto, and Francesca Tria. 2017. Search strategies of Wikipedia readers. PloS One 12, 2 (Feb. 2017), 1–15.
[62]
Dana Rotman, Sarah Vieweg, Sarita Yardi, Ed Chi, Jenny Preece, Ben Shneiderman, Peter Pirolli, and Tom Glaisyer. 2011. From slacktivism to activism: Participatory culture in the age of social media. In CHI’11 Extended Abstracts on Human Factors in Computing Systems.
[63]
Matthew J. Salganik. 2019. Bit by Bit: Social Research in the Digital Age. Princeton University Press.
[64]
Aju Thalappillil Scaria, Rose Marie Philip, Robert West, and Jure Leskovec. 2014. The last click: Why users give up information network navigation. In Proceedings of the International Conference on Web Search and Data Mining (WSDM’14).
[65]
Aaron Shaw and Eszter Hargittai. 2018. The pipeline of online participation inequalities: The case of Wikipedia editing. The Journal of Communication 68, 1 (Feb.2018), 143–168.
[66]
Philipp Singer, Florian Lemmerich, Robert West, Leila Zia, Ellery Wulczyn, Markus Strohmaier, and Jure Leskovec. 2017. Why we read Wikipedia. In Proceedings of the International World Wide Web Conference (WWW’17).
[67]
Philipp Singer, Thomas Niebler, Markus Strohmaier, and Andreas Hotho. 2013. Computing semantic relatedness from human navigational paths: A case study on Wikipedia. International Journal on Semantic Web and Information Systems 9, 4 (Oct.2013), 41–70.
[68]
Adish Singla, Ryen White, and Jeff Huang. 2010. Studying trailfinding algorithms for enhanced web search. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 443–450.
[69]
Bongwon Suh, Lichan Hong, Peter Pirolli, and Ed H. Chi. 2010. Want to be retweeted? Large scale analytics on factors impacting retweet in twitter network. In 2010 IEEE Second International Conference on Social Computing. IEEE, 177–184.
[70]
Linda Tauscher and Saul Greenberg. 1997. Revisitation patterns in world wide web navigation. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’97).
[71]
Nathan TeBlunthuis, Tilman Bayer, and Olga Vasileva. 2019. Dwelling on Wikipedia: Investigating time spent by global encyclopedia readers. In Proceedings of the International Symposium on Open Collaboration (OpenSym’19).
[72]
Michele Tizzoni, André Panisson, Daniela Paolotti, and Ciro Cattuto. 2020. The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic. PLoS Computational Biology 16, 3 (March2020), e1007633.
[73]
Nicholas Vincent and Brent Hecht. 2021. A deeper investigation of the importance of Wikipedia links to search engine results. Proceedings of the ACM on Human-Computer Interacttion 5, CSCW1 (April2021), 1–15.
[74]
Claudia Wagner, Markus Strohmaier, Alexandra Olteanu, Emre Kıcıman, Noshir Contractor, and Tina Eliassi-Rad. 2021. Measuring algorithmically infused societies. Nature 595, 7866 (2021), 197–204.
[75]
Shoujin Wang, Longbing Cao, Yan Wang, Quan Z. Sheng, Mehmet A. Orgun, and Defu Lian. 2021. A survey on session-based recommender systems. ACM Computing Surveys 54, 7 (July2021), 1–38.
[76]
Robert West and Jure Leskovec. 2012. Automatic versus human navigation in information networks. In Proceedings of the Conference on Web and Social Media (ICWSM’12).
[77]
Robert West and Jure Leskovec. 2012. Human wayfinding in information networks. In Proceedings of the International World Wide Web Conference (WWW’12).
[78]
Robert West, Ashwin Paranjape, and Jure Leskovec. 2015. Mining missing hyperlinks from human navigation traces: A case study of Wikipedia. In Proceedings of the International World Wide Web Conference (WWW’15).
[79]
Robert West, Joelle Pineau, and Doina Precup. 2009. Wikispeedia: An online game for inferring semantic distances between concepts. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’09).
[80]
Ryen W. White, Mikhail Bilenko, and Silviu Cucerzan. 2007. Studying the use of popular destinations to enhance web search interaction. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 159–166.
[81]
Ryen W. White and Steven M. Drucker. 2007. Investigating behavioral variability in web search. In Proceedings of the International World Wide Web Conference (WWW’07). 21–30.
[82]
Ryen W. White and Jeff Huang. 2010. Assessing the scenic route: Measuring the value of search trails in web logs. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 587–594.
[83]
Tom D. Wilson. 1981. On user studies and information needs. Journal of Documentation 37, 1 (1981), 3–15.
[84]
Tom D. Wilson. 1997. Information behaviour: An interdisciplinary perspective. Information Processing & Management 33, 4 (1997), 551–572.
[85]
Tom D. Wilson. 1999. Models in information behaviour research. Journal of Documentation 55, 3 (1999), 249–270.
[86]
Ellery Wulczyn and Dario Taraborelli. 2015. Wikipedia clickstream. https://meta.wikimedia.org/wiki/Research:Wikiped ia_clickstream. Accessed 25 January 2023.
[87]
Paula Younger. 2010. Internet-based information-seeking behaviour amongst doctors and nurses: A short review of the literature. Health Information & Libraries Journal 27, 1 (2010), 2–10.
[88]
Kai Zhu, Dylan Walker, and Lev Muchnik. 2020. Content growth and attention contagion in information networks: Addressing information poverty on Wikipedia. Information Systems Research 31, 2 (June2020), 491–509.

Cited By

View all
  • (2024)A Time-Sensitive Graph Neural Network for Session-Based New Item RecommendationElectronics10.3390/electronics1301022313:1(223)Online publication date: 3-Jan-2024
  • (2024)Architectural styles of curiosity in global Wikipedia mobile app readershipScience Advances10.1126/sciadv.adn326810:43Online publication date: 25-Oct-2024
  • (2023)Understanding Search Behavior Bias in WikipediaAdvances in Bias and Fairness in Information Retrieval10.1007/978-3-031-37249-0_11(134-146)Online publication date: 15-Jul-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web
ACM Transactions on the Web  Volume 17, Issue 2
May 2023
170 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/3589222
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 April 2023
Online AM: 13 January 2023
Accepted: 04 November 2022
Revised: 26 October 2022
Received: 02 March 2022
Published in TWEB Volume 17, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Wikipedia
  2. web navigation
  3. server logs
  4. information needs

Qualifiers

  • Research-article

Funding Sources

  • Swiss National Science Foundation
  • Swiss Data Science Center
  • Microsoft Swiss Joint Research Center

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)205
  • Downloads (Last 6 weeks)27
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Time-Sensitive Graph Neural Network for Session-Based New Item RecommendationElectronics10.3390/electronics1301022313:1(223)Online publication date: 3-Jan-2024
  • (2024)Architectural styles of curiosity in global Wikipedia mobile app readershipScience Advances10.1126/sciadv.adn326810:43Online publication date: 25-Oct-2024
  • (2023)Understanding Search Behavior Bias in WikipediaAdvances in Bias and Fairness in Information Retrieval10.1007/978-3-031-37249-0_11(134-146)Online publication date: 15-Jul-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media