Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3512283acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

“Way back then”: A Data-driven View of 25+ years of Web Evolution

Published: 25 April 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Since the inception of the first web page three decades back, the Web has evolved considerably, from static HTML pages in the beginning to the dynamic web pages of today, from mainly the text-based pages of the 1990s to today’s multimedia rich pages, etc. Although much of this is known anecdotally, to our knowledge, there is no quantitative documentation of the extent and timing of these changes. This paper attempts to address this gap in the literature by looking at the top 100 Alexa websites for over 25 years from the Internet Archive or the “Wayback Machine”, archive.org. We study the changes in popularity, from Geocities and Yahoo! in the mid-to-late 1990s to the likes of Google, Facebook, and Tiktok of today. We also look at different categories of websites and their popularity over the years and find evidence for the decline in popularity of news and education-related websites, which have been replaced by streaming media and social networking sites. We explore the emergence and relative prevalence of different MIME-types (text vs. image vs. video vs. javascript and json) and study whether the use of text on the Internet is declining.

    References

    [1]
    Pushkal Agarwal, Sagar Joglekar, Panagiotis Papadopoulos, Nishanth Sastry, and Nicolas Kourtellis. 2020. Stop tracking me bro! differential tracking of user demographics on hyper-partisan websites. In Proceedings of The Web Conference 2020. 1479–1490.
    [2]
    Vibhor Agarwal, Yash Vekaria, Pushkal Agarwal, Sangeeta Mahapatra, Shounak Set, Sakthi Balan Muthiah, Nishanth Sastry, and Nicolas Kourtellis. 2021. Under the Spotlight: Web Tracking in Indian Partisan News Websites. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 15. 26–37.
    [3]
    Matthew Allen. 2013. What was Web 2.0? Versions as the dominant mode of internet history. New Media & Society 15, 2 (2013), 260–275.
    [4]
    Payal Arora. 2014. The leisure commons: A spatial history of Web 2.0. Routledge.
    [5]
    Selcuk Aya, Blazej J Kot, Lucia Walle, Ruth Mitchell, Pavel Dmitriev, and William Y Arms. 2006. Building a research library for the history of the web. In Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’06). IEEE, 95–102.
    [6]
    Gabriele Balbi and Paolo Magaudda. 2018. A history of digital media: An intermedia and global perspective. Routledge.
    [7]
    Data Is Beautiful. 2019. Most Popular Websites 1996 - 2019. (2019). https://www.youtube.com/watch?v=2Uj1A9AguFs
    [8]
    Ghazaleh Beigi and Huan Liu. 2019. ” Identifying novel privacy issues of online users on social media platforms” by Ghazaleh Beigi and Huan Liu with Martin Vesely as coordinator. ACM SIGWEB NewsletterWinter (2019), 1–7.
    [9]
    Anat Ben-David. 2016. What does the Web remember of its deleted past? An archival reconstruction of the former Yugoslav top-level domain. New Media & Society 18, 7 (2016), 1103–1119.
    [10]
    Anat Ben-David and Adam Amram. 2018. The Internet Archive and the socio-technical construction of historical facts. Internet Histories 2, 1-2 (2018), 179–201.
    [11]
    Niels Brügger. 2011. Web archiving—Between past, present, and future. The handbook of Internet studies(2011), 24–42.
    [12]
    Niels Brugger. 2018. The archived web: Doing history in the digital age. MIT Press.
    [13]
    Niels Brügger and Ian Milligan. 2018. The SAGE handbook of web history. Sage.
    [14]
    Niels Brügger and Ralph Schroeder. 2017. The web as history: Using web archives to understand the past and the present. UCL Press.
    [15]
    Robert Cailliau and Helen Ashman. 1999. Hypertext in the Web—a history. ACM Computing Surveys (CSUR) 31, 4es (1999), 35–es.
    [16]
    Hyunyoung Choi and Hal Varian. 2012. Predicting the present with Google Trends. Economic record 88(2012), 2–9.
    [17]
    Jonathan Coopersmith. 1998. Pornography, technology and progress. Icon (1998), 94–125.
    [18]
    Mozilla Firefox. 2014. Firefox Release 33.0 Notes. (2014). https://www.mozilla.org/en-US/firefox/33.0/releasenotes/
    [19]
    Kirsten Foot and Steven Schneider. 2010. Object-oriented web historiography. na.
    [20]
    Ned Freed, Alexey Melnikov, and Murray Kucherawy. 2021. Media Types. (2021). https://www.iana.org/assignments/media-types/media-types.xhtml
    [21]
    Gerard Goggin and Mark J McLelland. 2017. The Routledge companion to global Internet histories. Routledge New York.
    [22]
    Xuehui Hu, Guillermo Suarez de Tangil, and Nishanth Sastry. 2020. Multi-country study of third party trackers from real browser histories. In 2020 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 70–86.
    [23]
    Xuehui Hu and Nishanth Sastry. 2019. Characterising Third Party Cookie Usage in the EU after GDPR. In Proceedings of the 11th ACM Conference on Web Science. ACM.
    [24]
    Xuehui Hu and Nishanth Sastry. 2020. What a Tangled Web We Weave: Understanding the Interconnectedness of the Third Party Cookie Ecosystem. In Proceedings of the 12th ACM Conference on Web Science. ACM.
    [25]
    Alexa Internet. 2021. How are Alexa’s traffic rankings determined?(2021). https://support.alexa.com/hc/en-us/articles/200449744-How-are-Alexa-s-traffic-%20rankings-determined
    [26]
    Andrew N Jackson. 2012. Formats over time: Exploring uk web history. arXiv preprint arXiv:1210.1714(2012).
    [27]
    Dmytro Karamshuk, Frances Shaw, Julie Brownlie, and Nishanth Sastry. 2017. Bridging big data and qualitative methods in the social sciences: A case study of Twitter responses to high profile deaths by suicide. Online Social Networks and Media 1 (2017), 33–43.
    [28]
    Ada Lerner, Tadayoshi Kohno, and Franziska Roesner. 2017. Rewriting history: Changing the archived web from the present. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 1741–1755.
    [29]
    Cleborne D Maddux and D LaMont Johnson. 1997. The World Wide Web: History, cultural context, and a manual for developers of educational information-based Web sites. Educational Technology 37, 5 (1997), 5–12.
    [30]
    Seyed M Mirtaheri, Mustafa Emre Dinçktürk, Salman Hooshmand, Gregor V Bochmann, Guy-Vincent Jourdan, and Iosif Viorel Onut. 2014. A brief history of web crawlers. arXiv preprint arXiv:1405.0749(2014).
    [31]
    Jaber Al Nahian. 2014. Combine Firefox Address Bar and Search Bar into One Like Chrome. (2014). https://www.techgainer.com/combine-firefox-address-bar-search-bar-like-chrome/
    [32]
    Sudhakar V Nuti, Brian Wayda, Isuru Ranasinghe, Sisi Wang, Rachel P Dreyer, Serene I Chen, and Karthik Murugiah. 2014. The use of google trends in health care research: a systematic review. PloS one 9, 10 (2014), e109583.
    [33]
    Tim o’Reilly. 2009. What is web 2.0. ” O’Reilly Media, Inc.”.
    [34]
    Greg Orilind. 2017. Top 6 Myths about the Alexa Traffic Rank. (2017). https://blog.alexa.com/top-6-myths-about-the-alexa-traffic-rank/
    [35]
    Antonio Peruzzi, Fabiana Zollo, Walter Quattrociocchi, and Antonio Scala. 2018. How news may affect markets’ complex structure: The case of cambridge analytica. Entropy 20, 10 (2018), 765.
    [36]
    B Aditya Prakash, Alex Beutel, Roni Rosenfeld, and Christos Faloutsos. 2012. Winner takes all: competing viruses or ideas on fair-play networks. In Proceedings of the 21st international conference on World Wide Web. 1037–1046.
    [37]
    Richard Rogers. 2017. Doing Web history with the Internet Archive: screencast documentaries. Internet Histories 1, 1-2 (2017), 160–172.
    [38]
    D Shivalingaiah and Umesha Naik. 2008. Comparative study of web 1.0, web 2.0 and web 3.0. (2008).
    [39]
    Starry.com. 2019. How big is the internet?(2019). https://starry.com/blog/inside-the-internet/how-big-is-the-internet
    [40]
    Seth Stephens-Davidowitz. 2013. Essays using Google data. Harvard University.
    [41]
    Seth Stephens-Davidowitz. 2017. Everybody Lies: The New York Times Bestseller. Bloomsbury Publishing.
    [42]
    Ben Stock, Martin Johns, Marius Steffens, and Michael Backes. 2017. How the web tangled itself: Uncovering the history of client-side web (in) security. In 26th {USENIX} Security Symposium ({USENIX} Security 17). 971–987.
    [43]
    Mike Thelwall and Liwen Vaughan. 2004. A fair history of the Web? Examining country balance in the Internet Archive. Library & information science research 26, 2 (2004), 162–176.
    [44]
    Google Trends. 2021. Search Interest over Time. (2021). https://trends.google.com/trends/explore
    [45]
    Yash Vekaria, Vibhor Agarwal, Pushkal Agarwal, Sangeeta Mahapatra, Sakthi Balan Muthiah, Nishanth Sastry, and Nicolas Kourtellis. 2021. Differential Tracking Across Topical Webpages of Indian News Media. In 13th ACM Web Science Conference 2021. 299–308.
    [46]
    Wikipedia.org. 2021. List of most visited websites. (2021). https://en.wikipedia.org/wiki/List_of_most_visited_websites
    [47]
    Lisa Zyga. 2009. Internet Growth Follows Moore’s law Too. (2009). https://phys.org/news/2009-01-internet-growth-law.html

    Cited By

    View all
    • (2024)Phishing Vs. Legit: Comparative Analysis of Client-Side Resources of Phishing and Target Brand WebsitesProceedings of the ACM on Web Conference 202410.1145/3589334.3645535(1756-1767)Online publication date: 13-May-2024
    • (2023)You Call This Archaeology? Evaluating Web Archives for Reproducible Web Security MeasurementsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616688(3168-3182)Online publication date: 15-Nov-2023
    • (2023)Privacy Lost in Online Education: Analysis of Web Tracking EvolutionAdvanced Data Mining and Applications10.1007/978-3-031-46664-9_30(440-455)Online publication date: 27-Aug-2023

    Index Terms

    1. “Way back then”: A Data-driven View of 25+ years of Web Evolution
                  Index terms have been assigned to the content through auto-classification.

                  Recommendations

                  Comments

                  Information & Contributors

                  Information

                  Published In

                  cover image ACM Conferences
                  WWW '22: Proceedings of the ACM Web Conference 2022
                  April 2022
                  3764 pages
                  ISBN:9781450390965
                  DOI:10.1145/3485447
                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Sponsors

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  Published: 25 April 2022

                  Permissions

                  Request permissions for this article.

                  Check for updates

                  Author Tags

                  1. archive.org
                  2. internet archive
                  3. wayback machine
                  4. web history

                  Qualifiers

                  • Research-article
                  • Research
                  • Refereed limited

                  Conference

                  WWW '22
                  Sponsor:
                  WWW '22: The ACM Web Conference 2022
                  April 25 - 29, 2022
                  Virtual Event, Lyon, France

                  Acceptance Rates

                  Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

                  Contributors

                  Other Metrics

                  Bibliometrics & Citations

                  Bibliometrics

                  Article Metrics

                  • Downloads (Last 12 months)69
                  • Downloads (Last 6 weeks)5
                  Reflects downloads up to 27 Jul 2024

                  Other Metrics

                  Citations

                  Cited By

                  View all
                  • (2024)Phishing Vs. Legit: Comparative Analysis of Client-Side Resources of Phishing and Target Brand WebsitesProceedings of the ACM on Web Conference 202410.1145/3589334.3645535(1756-1767)Online publication date: 13-May-2024
                  • (2023)You Call This Archaeology? Evaluating Web Archives for Reproducible Web Security MeasurementsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616688(3168-3182)Online publication date: 15-Nov-2023
                  • (2023)Privacy Lost in Online Education: Analysis of Web Tracking EvolutionAdvanced Data Mining and Applications10.1007/978-3-031-46664-9_30(440-455)Online publication date: 27-Aug-2023

                  View Options

                  Get Access

                  Login options

                  View options

                  PDF

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader

                  HTML Format

                  View this article in HTML Format.

                  HTML Format

                  Media

                  Figures

                  Other

                  Tables

                  Share

                  Share

                  Share this Publication link

                  Share on social media