Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2682571.2797073acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

Filling the Gaps: Improving Wikipedia Stubs

Published: 08 September 2015 Publication History
  • Get Citation Alerts
  • Abstract

    The availability of only a limited number of contributors on Wikipedia cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, our goal is to automate the process of generation of content for Wikipedia. In this work, we propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns features from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the completeness of such stubs. We conduct experiments using several classifiers - Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Our experiments reveal that the LDA based model outperforms the other models (~6% F-score). Our generation approach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by our system outperform the articles generated using the baseline. Content generated by our system has been appended to several stubs and successfully retained in Wikipedia.

    References

    [1]
    S. Banerjee, C. Caragea, and P. Mitra. Playscript classification and automatic wikipedia play articles generation. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR), pages 3630--3635. IEEE, 2014.
    [2]
    S. Banerjee and P. Mitra. Wikikreator: Improving wikipedia stubs automatically. In Proceedings of the Joint Conference of the 53rd Annual Meeting of the ACL and the 7th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 2015.
    [3]
    D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003.
    [4]
    Y.-l. Boureau, Y. L. Cun, et al. Sparse feature learning for deep belief networks. In Advances in neural information processing systems, pages 1185--1192, 2008.
    [5]
    J. Clarke and M. Lapata. Global inference for sentence compression: An integer linear programming approach. J. Artif. Intell. Res.(JAIR), 31:399--429, 2008.
    [6]
    G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res.(JAIR), 22(1):457--479, 2004.
    [7]
    C. Kohlschütter, P. Fankhauser, and W. Nejdl. Boilerplate detection using shallow text features. In Proceedings of the third ACM international conference on Web search and data mining, pages 441--450. ACM, 2010.
    [8]
    Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196, 2014.
    [9]
    P. Li, Y. Wang, and J. Jiang. Automatically building templates for entity summary construction. Information Processing & Management, 49(1):330--340, 2013.
    [10]
    C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, 2004.
    [11]
    A. Nenkova, S. Maskey, and Y. Liu. Automatic summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011, page 3. Association for Computational Linguistics, 2011.
    [12]
    C. Sauper and R. Barzilay. Automatically generating wikipedia articles: A structure-aware approach. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 208--216. Association for Computational Linguistics, 2009.
    [13]
    I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254--255. ACM, 1999.
    [14]
    C. Yao, X. Jia, S. Shou, S. Feng, F. Zhou, and H. Liu. Autopedia: automatic domain-independent wikipedia article generation. In Proceedings of the 20th international conference companion on World wide web, pages 161--162. ACM, 2011.

    Cited By

    View all
    • (2020)QWikiProceedings of the 16th International Symposium on Open Collaboration10.1145/3412569.3412576(1-12)Online publication date: 25-Aug-2020
    • (2019)EncyCatalogRec: catalog recommendation for encyclopedia article completionFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.180036321:3(436-447)Online publication date: 5-Sep-2019
    • (2019)Using Topical Networks to Detect Editor Communities in Wikipedias2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS.2019.8931865(102-109)Online publication date: Oct-2019
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DocEng '15: Proceedings of the 2015 ACM Symposium on Document Engineering
    September 2015
    248 pages
    ISBN:9781450333078
    DOI:10.1145/2682571
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 September 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. text summarization
    2. topic modeling
    3. wikipedia generation

    Qualifiers

    • Short-paper

    Conference

    DocEng '15
    Sponsor:
    DocEng '15: ACM Symposium on Document Engineering 2015
    September 8 - 11, 2015
    Lausanne, Switzerland

    Acceptance Rates

    DocEng '15 Paper Acceptance Rate 11 of 31 submissions, 35%;
    Overall Acceptance Rate 178 of 537 submissions, 33%

    Upcoming Conference

    DocEng '24
    ACM Symposium on Document Engineering 2024
    August 20 - 23, 2024
    San Jose , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)QWikiProceedings of the 16th International Symposium on Open Collaboration10.1145/3412569.3412576(1-12)Online publication date: 25-Aug-2020
    • (2019)EncyCatalogRec: catalog recommendation for encyclopedia article completionFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.180036321:3(436-447)Online publication date: 5-Sep-2019
    • (2019)Using Topical Networks to Detect Editor Communities in Wikipedias2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS.2019.8931865(102-109)Online publication date: Oct-2019
    • (2018)Structuring Wikipedia Articles with Section RecommendationsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3209984(665-674)Online publication date: 27-Jun-2018
    • (2016)WikiWriteProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3061004(2740-2746)Online publication date: 9-Jul-2016
    • (2015)WikiKreatorAI Matters10.1145/2813536.28135382:1(4-6)Online publication date: 7-Oct-2015

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media