Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

You, the Web, and Your Device: Longitudinal Characterization of Browsing Habits

Published: 27 September 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Understanding how people interact with the web is key for a variety of applications, e.g., from the design of effective web pages to the definition of successful online marketing campaigns. Browsing behavior has been traditionally represented and studied by means of clickstreams, i.e., graphs whose vertices are web pages, and edges are the paths followed by users. Obtaining large and representative data to extract clickstreams is, however, challenging.
    The evolution of the web questions whether browsing behavior is changing and, by consequence, whether properties of clickstreams are changing. This article presents a longitudinal study of clickstreams from 2013 to 2016. We evaluate an anonymized dataset of HTTP traces captured in a large ISP, where thousands of households are connected. We first propose a methodology to identify actual URLs requested by users from the massive set of requests automatically fired by browsers when rendering web pages. Then, we characterize web usage patterns and clickstreams, taking into account both the temporal evolution and the impact of the device used to explore the web. Our analyses precisely quantify various aspects of clickstreams and uncover interesting patterns, such as the typical short paths followed by people while navigating the web, the fast increasing trend in browsing from mobile devices, and the different roles of search engines and social networks in promoting content.
    Finally, we contribute a dataset of anonymized clickstreams to the community to foster new studies.<sup;>1</sup;>

    References

    [1]
    Eytan Adar, Jaime Teevan, and Susan T. Dumais. 2008. Large scale analysis of web revisitation patterns. In Proceedings of the 2008 SIGCHI Conference on Human Factors in Computing Systems. ACM, 1197--1260.
    [2]
    Xiao Bai, B. Barla Cambazoglu, and Flavio P. Junqueira. 2011. Discovering URLs through user feedback. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, 77--86.
    [3]
    Ignacio N. Bermudez, Marco Mellia, Maurizio M. Munafo, Ram Keralapura, and Antonio Nucci. 2012. DNS to the rescue: Discerning content and services in a tangled web. In Proceedings of the 2012 ACM SIGCOMM Internet Measurement Conference. ACM, 413--426.
    [4]
    Andrea Bianco, Gianluca Mardente, Marco Mellia, Maurizio Munafò, and Luca Muscariello. 2009. Web user-session inference by means of clustering techniques. IEEE/ACM Trans. Netw. 17, 2 (2009), 405--416.
    [5]
    Matthias Böhmer, Brent Hecht, Johannes Schöning, Antonio Krüger, and Gernot Bauer. 2011. Falling asleep with angry birds, facebook and kindle: A large scale study on mobile application usage. In Proceedings of the 13th International Conference on Human Computer Interaction with Mobile Devices and Services. ACM, 47--56.
    [6]
    Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5--32.
    [7]
    Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC press.
    [8]
    Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Comput. Netw. 33, 1 (2000), 309--320.
    [9]
    Randolph E. Bucklin and Catarina Sismeiro. 2009. Click here for internet insight: Advances in clickstream data analysis in marketing. J. Interact. Market. 23, 1 (2009), 35--48.
    [10]
    Michael Butkiewicz, Harsha V. Madhyastha, and Vyas Sekar. 2014. Characterizing web page complexity and its impact. IEEE/ACM Trans. Netw. 22, 3 (2014), 943--956.
    [11]
    Lara D. Catledge and James E. Pitkow. 1995. Characterizing browsing strategies in the world-wide web. Elsevier Comput. Netw. ISDN Syst. 27, 6 (1995), 1065--1073.
    [12]
    Nick Craswell and Martin Szummer. 2007. Random walks on the click graph. In Proceedings of the 30th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 239--246.
    [13]
    Yanqing Cui and Virpi Roto. 2008. How people use the web on mobile devices. In Proceedings of the 17th International Conference on World Wide Web. ACM, 905--914.
    [14]
    Sergio Duarte Torres, Ingmar Weber, and Djoerd Hiemstra. 2014. Analysis of search and browsing behavior of young users on the web. ACM Trans. Web 8, 2 (2014), 1--54.
    [15]
    Adrienne Porter Felt, Richard Barnes, April King, Chris Palmer, and Chris Bentzel. 2017. Measuring HTTPS adoption on the web. In Proceedings of the 26th USENIX Security Symposium. 1323--1338.
    [16]
    Alessandro Finamore, Marco Mellia, Michela Meo, Maurizio Munafo, and Dario Rossi. 2011. Experiences of internet traffic monitoring with tstat. IEEE Netw. 25, 3 (2011), 8--14.
    [17]
    Alessandro Finamore, Matteo Varvello, and Kostantina Papagiannaki. 2017. Mind the gap between HTTP and HTTPS in mobile networks. In Proceedings of the 2017 International Conference on Passive and Active Network Measurement. Springer, 217--228.
    [18]
    Max I. Fomitchev. 2010. How google analytics and conventional cookie tracking techniques overestimate unique visitors. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1093--1094.
    [19]
    Vinicius Gehlen, Alessandro Finamore, Marco Mellia, and Maurizio M. Munafò. 2012. Uncovering the big players of the web. In Proceedings of the 2012 International Workshop on Traffic Monitoring and Analysis. Springer, 15--28.
    [20]
    Torsten J. Gerpott and Sandra Thomas. 2014. Empirical research on mobile Internet usage: A meta-analysis of the literature. Telecommun. Policy 38, 3 (2014), 291--310.
    [21]
    Simon Haykin. 1994. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR.
    [22]
    Zied Ben Houidi, Giuseppe Scavo, Samir Ghamri-Doudane, Alessandro Finamore, Stefano Traverso, and Marco Mellia. 2014. Gold mining in a river of internet content traffic. In Proceedings of the 2014 International Workshop on Traffic Monitoring and Analysis. Springer, 91--103.
    [23]
    Bernardo A. Huberman, Peter L. T. Pirolli, James E. Pitkow, and Rajan M. Lukose. 1998. Strong regularities in world wide web surfing. AAAS Sci. 280, 5360 (1998), 95--97.
    [24]
    Sunghwan Ihm and Vivek S. Pai. 2011. Towards understanding modern web traffic. In Proceedings of the 2011 ACM SIGCOMM Internet Measurement Conference. ACM, 295--312.
    [25]
    Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 133--142.
    [26]
    Nils Kammenhuber, Julia Luxenburger, Anja Feldmann, and Gerhard Weikum. 2006. Web search clickstreams. In Proceedings of the 2006 ACM SIGCOMM Internet Measurement Conference. ACM, 245--250.
    [27]
    Ravi Kumar and Andrew Tomkins. 2010. A characterization of online browsing behavior. In Proceedings of the 19th International Conference on World Wide Web. ACM, 561--570.
    [28]
    Ida Mele. 2013. Web usage mining for enhancing search-result delivery and helping users to find interesting web content. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. ACM, 765--770.
    [29]
    Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2014. Graph structure in the web—revisited: A trick of the heavy tail. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 427--432.
    [30]
    Tom Mitchell and McGraw Hill. 1997. Machine Learning. McGraw-Hill.
    [31]
    Hartmut Obendorf, Harald Weinreich, Eelco Herder, and Matthias Mayer. 2007. Web page revisitation revisited: Implications of a long-term click-stream study of browser usage. In Proceedings of the 2007 SIGCHI Conference on Human Factors in Computing Systems. ACM, 597--606.
    [32]
    Daniel Olmedilla, Enrique Frías-Martínez, and Rubén Lara. 2010. Mobile web profiling: A study of off-portal surfing habits of mobile users. In Proceedings of the 18th International Conference on User Modeling, Adaptation, and Personalization. Springer-Verlag, 339--350.
    [33]
    Antti Oulasvirta, Tye Rattenbury, Lingyi Ma, and Eeva Raita. 2012. Habits make smartphone use more pervasive. Pers. Ubiq. Comput. 16, 1 (2012), 105--114.
    [34]
    Ioannis Papapanagiotou, Erich M. Nahum, and Vasileios Pappas. 2012. Smartphones vs. laptops: Comparing web browsing behavior and the implications for caching. ACM SIGMETRICS Perf. Eval. Rev. 40, 1 (2012), 423--424.
    [35]
    Katy E. Pearce and Ronald E. Rice. 2013. Digital divides from access to activities: Comparing mobile and personal computer internet users. J. Commun. 63, 4 (2013), 721--744.
    [36]
    K. Sudheer Reddy, M. Kantha Reddy, and V. Sitaramulu. 2013. An effective data preprocessing method for web usage mining. In Proceedings of the 2013 International Conference on Information Communication and Embedded Systems. IEEE, 7--10.
    [37]
    Y. Ren, M. Tomko, F. Salim, K. Ong, and M. Sanderson. 2017. Analyzing web behavior in indoor retail spaces. John Wiley and Sons Association for Information Science and Technology Journal 68, 1 (2017), 62--76.
    [38]
    Fabian Schneider, Anja Feldmann, Balachander Krishnamurthy, and Walter Willinger. 2009. Understanding online social network usage from a network perspective. In Proceedings of the 2009 ACM SIGCOMM Internet Measurement Conference. ACM, 35--48.
    [39]
    Abigail J. Sellen, Rachel Murphy, and Kate L. Shaw. 2002. How knowledge workers use the web. In Proceedings of the 2002 SIGCHI Conference on Human Factors in Computing Systems. ACM, 227--234.
    [40]
    Yang Song, Hao Ma, Hongning Wang, and Kuansan Wang. 2013. Exploring and exploiting user search behavior on mobile and tablet devices to improve search relevance. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1201--1212.
    [41]
    Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan. 2000. Web usage mining: Discovery and applications of usage patterns from web data. ACM SIGKDD Explor. Newslett. 1, 2 (2000), 12--23.
    [42]
    Mitali Srivastava, Rakhi Garg, and P. K. Mishra. 2015. Analysis of data extraction and data cleaning in web usage mining. In Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering 8 Technology. ACM, 1--6.
    [43]
    Alexey Tikhonov, Liudmila Ostroumova Prokhorenkova, Arseniy Chelnokov, Ivan Bogatyy, and Gleb Gusev. 2015. What can be found on the web and how: A characterization of web browsing patterns. In Proceedings of the 2015 ACM Web Science Conference. ACM, 1--10.
    [44]
    Chad Tossell, Philip Kortum, Ahmad Rahmati, Clayton Shepard, and Lin Zhong. 2012. Characterizing web use on smartphones. In Proceedings of the 2012 SIGCHI Conference on Human Factors in Computing Systems. ACM, 2769--2778.
    [45]
    Luca Vassio, Idilio Drago, and Marco Mellia. 2016. Detecting user actions from HTTP traces: Toward an automatic approach. In Proceedings of the 2016 International Wireless Communications and Mobile Computing Conference. IEEE, 50--55.
    [46]
    Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y. Zhao. 2013. You are how you click: Clickstream analysis for sybil detection. In Proceedings of the 22nd USENIX Security Symposium. USENIX Association, 241--256.
    [47]
    Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng, and Ben Y. Zhao. 2016. Unsupervised clickstream clustering for user behavior analysis. In Proceedings of the 2016 SIGCHI Conference on Human Factors in Computing Systems. ACM, 225--236.
    [48]
    Harald Weinreich, Hartmut Obendorf, Eelco Herder, and Matthias Mayer. 2008. Not quite the average: An empirical study of web use. ACM Trans. Web 2, 1 (2008), 1--31.
    [49]
    Guowu Xie, Marios Iliofotou, Thomas Karagiannis, Michalis Faloutsos, and Yaohui Jin. 2013. Resurf: Reconstructing web-surfing activity from network traffic. In Proceedings of the 2013 IFIP Networking Conference. 1--9.

    Cited By

    View all
    • (2023)Share and Multiply: Modeling Communication and Generated Traffic in Private WhatsApp GroupsIEEE Access10.1109/ACCESS.2023.325491311(25401-25414)Online publication date: 2023
    • (2023)Toward practical defense against traffic analysis attacks on encrypted DNS trafficComputers & Security10.1016/j.cose.2022.103001124(103001)Online publication date: Jan-2023
    • (2022)Flow-Based User Click Identification in Encrypted Web Traffic2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta)10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00272(1882-1889)Online publication date: Dec-2022
    • Show More Cited By

    Index Terms

    1. You, the Web, and Your Device: Longitudinal Characterization of Browsing Habits

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 12, Issue 4
        November 2018
        215 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/3281744
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 September 2018
        Accepted: 01 June 2018
        Revised: 01 April 2018
        Received: 01 May 2017
        Published in TWEB Volume 12, Issue 4

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Passive measurements
        2. clickstream
        3. surfing behavior
        4. web usage evolution

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • BigDAMA
        • Vienna Science and Technology Fund (WWTF)

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)23
        • Downloads (Last 6 weeks)1
        Reflects downloads up to

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Share and Multiply: Modeling Communication and Generated Traffic in Private WhatsApp GroupsIEEE Access10.1109/ACCESS.2023.325491311(25401-25414)Online publication date: 2023
        • (2023)Toward practical defense against traffic analysis attacks on encrypted DNS trafficComputers & Security10.1016/j.cose.2022.103001124(103001)Online publication date: Jan-2023
        • (2022)Flow-Based User Click Identification in Encrypted Web Traffic2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta)10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00272(1882-1889)Online publication date: Dec-2022
        • (2022)RLBrowse: Generating Realistic Packet Traces with Reinforcement LearningNOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS54207.2022.9789851(1-6)Online publication date: 25-Apr-2022
        • (2022)Routines and the Predictability of Day-to-Day Web UseMedia Psychology10.1080/15213269.2022.212128626:3(229-251)Online publication date: 6-Sep-2022
        • (2021)How Do Home Computer Users Browse the Web?ACM Transactions on the Web10.1145/347334316:1(1-27)Online publication date: 28-Sep-2021
        • (2021)Deployable Models for Approximating Web QoE Metrics From Encrypted TrafficIEEE Transactions on Network and Service Management10.1109/TNSM.2021.307367218:3(3336-3352)Online publication date: Sep-2021
        • (2021)Automatically Inferring User Behavior Models in Large-Scale Web ApplicationsInformation and Software Technology10.1016/j.infsof.2021.106704(106704)Online publication date: Aug-2021
        • (2021)Understanding web pornography usage from traffic analysisComputer Networks10.1016/j.comnet.2021.107909189(107909)Online publication date: Apr-2021
        • (2021)Towards website domain name classification using graph based semi-supervised learningComputer Networks10.1016/j.comnet.2021.107865188(107865)Online publication date: Apr-2021
        • Show More Cited By

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media