Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3209978.3209984acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open access

Structuring Wikipedia Articles with Section Recommendations

Published: 27 June 2018 Publication History

Abstract

Sections are the building blocks of Wikipedia articles. They enhance readability and can be used as a structured entry point for creating and expanding articles. Structuring a new or already existing Wikipedia article with sections is a hard task for humans, especially for newcomers or less experienced editors, as it requires significant knowledge about how a well-written article looks for each possible topic. Inspired by this need, the present paper defines the problem of section recommendation for Wikipedia articles and proposes several approaches for tackling it. Our systems can help editors by recommending what sections to add to already existing or newly created Wikipedia articles. Our basic paradigm is to generate recommendations by sourcing sections from articles that are similar to the input article. We explore several ways of defining similarity for this purpose (based on topic modeling, collaborative filtering, and Wikipedia's category system). We use both automatic and human evaluation approaches for assessing the performance of our recommendation system, concluding that the category-based approach works best, achieving precision@10 of about 80% in the human evaluation.

References

[1]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBPedia: A nucleus for a web of open data. The Semantic Web (2007), 722--735.
[2]
Siddhartha Banerjee and Prasenjit Mitra. 2015. Filling the gaps: Improving Wikipedia stubs. In Proc. ACM Symposium on Document Engineering.
[3]
Siddhartha Banerjee and Prasenjit Mitra. 2015. Wikikreator: Automatic authoring of Wikipedia content. AI Matters 2, 1 (2015), 4--6.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022.
[5]
Paolo Boldi and Corrado Monti. 2016. Cleansing Wikipedia categories using centrality. In Proc. International Conference on the World Wide Web.
[6]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proc. ACM SIGMOD International Conference on Management of Data.
[7]
The Austrian Wikimedia Community. 2017. WikiDaheim. Website. (2017). https: //www.wikidaheim.at/.
[8]
Dan Cosley, Dan Frankowski, Loren Terveen, and John Riedl. 2007. SuggestBot: Using intelligent task routing to help people "nd work in Wikipedia. In Proc. International Conference on Intelligent User Interfaces.
[9]
Peter Eades, Xuemin Lin, and William F Smyth. 1993. A fast and e#ective heuristic for the feedback arc set problem. Inform. Process. Lett. 47, 6 (1993), 319--323.
[10]
Tiziano Flati, Daniele Vannella, Tommaso Pasini, and Roberto Navigli. 2016. MultiWiBi: The multilingual Wikipedia bitaxonomy project. Arti!cial Intelligence 241, Supplement C (2016), 66--102.
[11]
Amit Gupta, Francesco Piccinno, Mikhail Kozhevnikov, Marius Pasca, and Daniele Pighin. 2016. Revisiting taxonomy induction over Wikipedia. In Proc. International Conference on Computational Linguistics.
[12]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (Aug. 2009), 30--37.
[13]
Tie-Yan Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3 (2009), 225--331.
[14]
Ashwin Paranjape, Robert West, Leila Zia, and Jure Leskovec. 2016. Improving website hyperlink structure using server logs. In Proc. ACM International Conference on Web Search and Data Mining.
[15]
Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proc. LREC Workshop on New Challenges for NLP Frameworks.
[16]
Harald Steck. 2011. Item popularity and recommendation accuracy. In Proc. ACM Conference on Recommender Systems.
[17]
Thang Hoang Ta and Chutiporn Anutariya. 2014. A model for enriching multilingual Wikipedias using infobox and Wikidata property alignment. In Proc. Joint International Semantic Technology Conference.
[18]
Diego Torres, Pascal Molli, Hala Skaf-Molli, and Alicia Diaz. 2012. Improving Wikipedia with DBPedia. In Proc. International Conference on the World Wide Web.
[19]
User:0x010C and User:Ash_Crow. 2017. Ma Commune. Website. (2017). https: //macommune.wikipedia.fr/.
[20]
Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledge base. Commun. ACM 57, 10 (2014), 78--85.
[21]
Wikipedia. 2017. Wikipedia:Statistics. Website. (2017). https://en.wikipedia.org/ wiki/Wikipedia:Statistics.
[22]
Ellery Wulczyn, Robert West, Leila Zia, and Jure Leskovec. 2016. Growing Wikipedia across languages via recommendation. In Proc. International Conference on the World Wide Web.

Cited By

View all
  • (2024)Mining the History Sections of Wikipedia Articles on Science and TechnologyProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00037(200-204)Online publication date: 26-Jun-2024
  • (2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
  • (2023)Descartes: Generating Short Descriptions of Wikipedia ArticlesProceedings of the ACM Web Conference 202310.1145/3543507.3583220(1446-1456)Online publication date: 30-Apr-2023
  • Show More Cited By

Index Terms

  1. Structuring Wikipedia Articles with Section Recommendations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
    June 2018
    1509 pages
    ISBN:9781450356572
    DOI:10.1145/3209978
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. category network
    2. recommender system
    3. sections
    4. wikipedia

    Qualifiers

    • Research-article

    Conference

    SIGIR '18
    Sponsor:

    Acceptance Rates

    SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)142
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Mining the History Sections of Wikipedia Articles on Science and TechnologyProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00037(200-204)Online publication date: 26-Jun-2024
    • (2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
    • (2023)Descartes: Generating Short Descriptions of Wikipedia ArticlesProceedings of the ACM Web Conference 202310.1145/3543507.3583220(1446-1456)Online publication date: 30-Apr-2023
    • (2022)Controlled Analyses of Social Biases in Wikipedia BiosProceedings of the ACM Web Conference 202210.1145/3485447.3512134(2624-2635)Online publication date: 25-Apr-2022
    • (2022)Crosslingual Section Title Alignment in Wikipedia2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020462(5892-5901)Online publication date: 17-Dec-2022
    • (2021)PerSummReJournal of Cases on Information Technology10.4018/JCIT.20220701.oa724:3(1-18)Online publication date: 11-Aug-2021
    • (2021)Language-agnostic Topic Classification for WikipediaCompanion Proceedings of the Web Conference 202110.1145/3442442.3452347(594-601)Online publication date: 19-Apr-2021
    • (2021)Cross-lingual Language Model Pretraining for RetrievalProceedings of the Web Conference 202110.1145/3442381.3449830(1029-1039)Online publication date: 19-Apr-2021
    • (2021)How Inclusive Are Wikipedia’s Hyperlinks in Articles Covering Polarizing Topics?2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671943(1300-1307)Online publication date: 15-Dec-2021
    • (2020)Quantifying Engagement with Citations on WikipediaProceedings of The Web Conference 202010.1145/3366423.3380300(2365-2376)Online publication date: 20-Apr-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media