Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3422713.3422745acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbdtConference Proceedingsconference-collections
research-article

Predicting Monthly Pageview of Wikipedia Pages by Neighbor Pages

Published: 23 October 2020 Publication History

Abstract

Predicting traffic has been important for websites' daily services. Developing efficient models for Wikipedia's page traffic would deepen our knowledge about people's behavior on Wikipedia and potentially for other crowdsourcing pages. The current project attempted to experiment with incorporating time series data from a linked page trying to improve the prediction accuracy of future traffic of a page. The current study experimented with three timeseries models. The baseline model uses the monthly traffic of 2019 of a page to predict the monthly traffic of January of 2020. The random neighbor model randomly selects a page which has a hyperlink to the focal page and uses the 2019 data of the focal page and the neighboring page to predict the monthly traffic of January of 2020. The similar neighbor model also uses data from the focal and a neighboring page, but the neighbor is selected based on its content similarity to the focal page. The results show that prediction with a similar neighbor model has better prediction performance than with the Random neighbor model on popular pages. The baseline model has the best performance with the smallest MSE, MAE, and MAPE, while the random neighbor model and similar neighbor model have much larger MSE than the Baseline model.

References

[1]
Wikimedia (2020). Wikistats Pageview Files. Retrieved from https://dumps.wikimedia.org/other/pagecounts-ez/merged/
[2]
Li, Jia & Moore, Andrew. (2008). Forecasting web page views: Methods and observations. Journal of Machine Learning Research. 9. 2217--2250.
[3]
Gupta, Chaudhari, An approach to predictive analytics of website visitors traffic and pageviews, International Journal of Computer Science and Applications, 9(1)
[4]
N. Petluri, E. Al-Masri (2018), Web traffic prediction of Wikipedia pages, IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, 5427--5429.
[5]
Mestyán, M., Yasseri, T., & Kertész, J. (2013). Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data. PloS one, 8(8), e71226. https://doi.org/10.1371/journal.pone.0071226
[6]
Osborne, M., Petrovic, S., McCreadie, R., MacDonald, C., & Ounis, I. (2012). Bieber no more: First Story Detection using Twitter and Wikipedia.
[7]
Georgescu M, Kanhabua N, Krause D, Nejdl W, Siersdorfer S (2013) Extracting event-related information from article updates in Wikipedia. Advances in Information Retrieval, Springer Berlin Heidelberg, 7814, 254--266.
[8]
Moat, H., Curme, C., Avakian (2013), Quantifying Wikipedia Usage Patterns Before Stock Market Moves. Sci Rep 3, 1801. https://doi. org/10.103 8/srep01801
[9]
Smith, B. K., & Gustafson, A. (2017). Using wikipedia to predict election outcomes: online behavior as a predictor of voting. Public Opinion Quarterly, 81(3), 714--735.
[10]
Yasseri, T., & Bright, J. (2016). Wikipedia traffic data and electoral prediction: towards theoretically informed models. EPJ Data Science, 5(1), 22.
[11]
Khansa, L., & Liginlal, D. (2011). Predicting stock market returns from malicious attacks: A comparative analysis of vector autoregression and time-delayed neural networks. Decision Support Systems, 51(4), 745--759.
[12]
Ruck, D. J., Rice, N. M., Borycz, J., & Bentley, R. A. (2019). Internet Research Agency Twitter activity predicted 2016 US election polls. First Monday.
[13]
Özcan, A., & Öğüdücü, Ş. G. (2015, June). Multivariate temporal link prediction in evolving social networks. In 2015 IEEE/ACIS 14th International Conference on Computer and Information Science (ICIS) (pp. 185--190). IEEE

Cited By

View all
  • (2023)An Intelligent Network Traffic Prediction Scheme Based on Ensemble Learning of Multi-Layer Perceptron in Complex NetworksElectronics10.3390/electronics1206126812:6(1268)Online publication date: 7-Mar-2023
  • (2023)Scientometric and Wikipedia Pageview AnalysisTrends in Data Protection and Encryption Technologies10.1007/978-3-031-33386-6_39(243-252)Online publication date: 27-Apr-2023
  • (2021)Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous TrainingMathematics10.3390/math90404219:4(421)Online publication date: 21-Feb-2021

Index Terms

  1. Predicting Monthly Pageview of Wikipedia Pages by Neighbor Pages

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICBDT '20: Proceedings of the 3rd International Conference on Big Data Technologies
    September 2020
    250 pages
    ISBN:9781450387859
    DOI:10.1145/3422713
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Vector Auto Regression
    2. Wikipedia
    3. time-series models

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICBDT 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)An Intelligent Network Traffic Prediction Scheme Based on Ensemble Learning of Multi-Layer Perceptron in Complex NetworksElectronics10.3390/electronics1206126812:6(1268)Online publication date: 7-Mar-2023
    • (2023)Scientometric and Wikipedia Pageview AnalysisTrends in Data Protection and Encryption Technologies10.1007/978-3-031-33386-6_39(243-252)Online publication date: 27-Apr-2023
    • (2021)Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous TrainingMathematics10.3390/math90404219:4(421)Online publication date: 21-Feb-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media