Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2682571.2797073acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

Filling the Gaps: Improving Wikipedia Stubs

Published: 08 September 2015 Publication History

Abstract

The availability of only a limited number of contributors on Wikipedia cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, our goal is to automate the process of generation of content for Wikipedia. In this work, we propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns features from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the completeness of such stubs. We conduct experiments using several classifiers - Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Our experiments reveal that the LDA based model outperforms the other models (~6% F-score). Our generation approach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by our system outperform the articles generated using the baseline. Content generated by our system has been appended to several stubs and successfully retained in Wikipedia.

References

[1]
S. Banerjee, C. Caragea, and P. Mitra. Playscript classification and automatic wikipedia play articles generation. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR), pages 3630--3635. IEEE, 2014.
[2]
S. Banerjee and P. Mitra. Wikikreator: Improving wikipedia stubs automatically. In Proceedings of the Joint Conference of the 53rd Annual Meeting of the ACL and the 7th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 2015.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003.
[4]
Y.-l. Boureau, Y. L. Cun, et al. Sparse feature learning for deep belief networks. In Advances in neural information processing systems, pages 1185--1192, 2008.
[5]
J. Clarke and M. Lapata. Global inference for sentence compression: An integer linear programming approach. J. Artif. Intell. Res.(JAIR), 31:399--429, 2008.
[6]
G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res.(JAIR), 22(1):457--479, 2004.
[7]
C. Kohlschütter, P. Fankhauser, and W. Nejdl. Boilerplate detection using shallow text features. In Proceedings of the third ACM international conference on Web search and data mining, pages 441--450. ACM, 2010.
[8]
Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196, 2014.
[9]
P. Li, Y. Wang, and J. Jiang. Automatically building templates for entity summary construction. Information Processing & Management, 49(1):330--340, 2013.
[10]
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, 2004.
[11]
A. Nenkova, S. Maskey, and Y. Liu. Automatic summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011, page 3. Association for Computational Linguistics, 2011.
[12]
C. Sauper and R. Barzilay. Automatically generating wikipedia articles: A structure-aware approach. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 208--216. Association for Computational Linguistics, 2009.
[13]
I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254--255. ACM, 1999.
[14]
C. Yao, X. Jia, S. Shou, S. Feng, F. Zhou, and H. Liu. Autopedia: automatic domain-independent wikipedia article generation. In Proceedings of the 20th international conference companion on World wide web, pages 161--162. ACM, 2011.

Cited By

View all
  • (2020)QWikiProceedings of the 16th International Symposium on Open Collaboration10.1145/3412569.3412576(1-12)Online publication date: 25-Aug-2020
  • (2019)EncyCatalogRec: catalog recommendation for encyclopedia article completionFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.180036321:3(436-447)Online publication date: 5-Sep-2019
  • (2019)Using Topical Networks to Detect Editor Communities in Wikipedias2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS.2019.8931865(102-109)Online publication date: Oct-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '15: Proceedings of the 2015 ACM Symposium on Document Engineering
September 2015
248 pages
ISBN:9781450333078
DOI:10.1145/2682571
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. text summarization
  2. topic modeling
  3. wikipedia generation

Qualifiers

  • Short-paper

Conference

DocEng '15
Sponsor:
DocEng '15: ACM Symposium on Document Engineering 2015
September 8 - 11, 2015
Lausanne, Switzerland

Acceptance Rates

DocEng '15 Paper Acceptance Rate 11 of 31 submissions, 35%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)QWikiProceedings of the 16th International Symposium on Open Collaboration10.1145/3412569.3412576(1-12)Online publication date: 25-Aug-2020
  • (2019)EncyCatalogRec: catalog recommendation for encyclopedia article completionFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.180036321:3(436-447)Online publication date: 5-Sep-2019
  • (2019)Using Topical Networks to Detect Editor Communities in Wikipedias2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS.2019.8931865(102-109)Online publication date: Oct-2019
  • (2018)Structuring Wikipedia Articles with Section RecommendationsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3209984(665-674)Online publication date: 27-Jun-2018
  • (2016)WikiWriteProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3061004(2740-2746)Online publication date: 9-Jul-2016
  • (2015)WikiKreatorAI Matters10.1145/2813536.28135382:1(4-6)Online publication date: 7-Oct-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media