short-paper

Filling the Gaps: Improving Wikipedia Stubs

Authors:

Siddhartha Banerjee,

Prasenjit MitraAuthors Info & Claims

DocEng '15: Proceedings of the 2015 ACM Symposium on Document Engineering

Pages 117 - 120

https://doi.org/10.1145/2682571.2797073

Published: 08 September 2015 Publication History

Get Access

Abstract

The availability of only a limited number of contributors on Wikipedia cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, our goal is to automate the process of generation of content for Wikipedia. In this work, we propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns features from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the completeness of such stubs. We conduct experiments using several classifiers - Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Our experiments reveal that the LDA based model outperforms the other models (~6% F-score). Our generation approach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by our system outperform the articles generated using the baseline. Content generated by our system has been appended to several stubs and successfully retained in Wikipedia.

References

[1]

S. Banerjee, C. Caragea, and P. Mitra. Playscript classification and automatic wikipedia play articles generation. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR), pages 3630--3635. IEEE, 2014.

Digital Library

Google Scholar

[2]

S. Banerjee and P. Mitra. Wikikreator: Improving wikipedia stubs automatically. In Proceedings of the Joint Conference of the 53rd Annual Meeting of the ACL and the 7th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 2015.

Google Scholar

[3]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003.

Digital Library

Google Scholar

[4]

Y.-l. Boureau, Y. L. Cun, et al. Sparse feature learning for deep belief networks. In Advances in neural information processing systems, pages 1185--1192, 2008.

Digital Library

Google Scholar

[5]

J. Clarke and M. Lapata. Global inference for sentence compression: An integer linear programming approach. J. Artif. Intell. Res.(JAIR), 31:399--429, 2008.

Digital Library

Google Scholar

[6]

G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res.(JAIR), 22(1):457--479, 2004.

Digital Library

Google Scholar

[7]

C. Kohlschütter, P. Fankhauser, and W. Nejdl. Boilerplate detection using shallow text features. In Proceedings of the third ACM international conference on Web search and data mining, pages 441--450. ACM, 2010.

Digital Library

Google Scholar

[8]

Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196, 2014.

Digital Library

Google Scholar

[9]

P. Li, Y. Wang, and J. Jiang. Automatically building templates for entity summary construction. Information Processing & Management, 49(1):330--340, 2013.

Digital Library

Google Scholar

[10]

C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, 2004.

Google Scholar

[11]

A. Nenkova, S. Maskey, and Y. Liu. Automatic summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011, page 3. Association for Computational Linguistics, 2011.

Digital Library

Google Scholar

[12]

C. Sauper and R. Barzilay. Automatically generating wikipedia articles: A structure-aware approach. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 208--216. Association for Computational Linguistics, 2009.

Digital Library

Google Scholar

[13]

I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254--255. ACM, 1999.

Digital Library

Google Scholar

[14]

C. Yao, X. Jia, S. Shou, S. Feng, F. Zhou, and H. Liu. Autopedia: automatic domain-independent wikipedia article generation. In Proceedings of the 20th international conference companion on World wide web, pages 161--162. ACM, 2011.

Digital Library

Google Scholar

Cited By

View all

Setia SIyengar SVerma ARobles GStol KWang X(2020)QWikiProceedings of the 16th International Symposium on Open Collaboration10.1145/3412569.3412576(1-12)Online publication date: 25-Aug-2020
https://dl.acm.org/doi/10.1145/3412569.3412576
Lu WLiu JXu WWang PWei B(2019)EncyCatalogRec: catalog recommendation for encyclopedia article completionFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.180036321:3(436-447)Online publication date: 5-Sep-2019
https://doi.org/10.1631/FITEE.1800363
Kretschmer MGoschlberger BKlamma R(2019)Using Topical Networks to Detect Editor Communities in Wikipedias2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS.2019.8931865(102-109)Online publication date: Oct-2019
https://doi.org/10.1109/SNAMS.2019.8931865
Show More Cited By

Index Terms

Filling the Gaps: Improving Wikipedia Stubs
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

Sentiment diversification for short review summarization
WI '17: Proceedings of the International Conference on Web Intelligence

With the abundance of reviews published on the Web about a given product, consumers are looking for ways to view major opinions that can be presented in a quick and succinct way. Reviews contain many different opinions, making the ability to show a ...
SumCR: A new subtopic-based extractive approach for text summarization

In text summarization, relevance and coverage are two main criteria that decide the quality of a summary. In this paper, we propose a new multi-document summarization approach SumCR via sentence extraction. A novel feature called Exemplar is introduced ...
Graph-based abstractive biomedical text summarization
Graphical abstract

Display Omitted
Highlights
- A graph generation and frequent itemset mining approach have been used for the generation of extractive summaries.
- The T5 model has been adopted to generate abstractive summaries in the biomedical domain.
- The ROUGE metric has been ...
Abstract
Summarization is the process of compressing a text to obtain its important informative parts. In recent years, various methods have been presented to extract important parts of textual documents to present them in a summarized form. The first ...

Comments

Information & Contributors

Information

Published In

DocEng '15: Proceedings of the 2015 ACM Symposium on Document Engineering

September 2015

248 pages

ISBN:9781450333078

DOI:10.1145/2682571

General Chair:
Christine Vanoirbeek
EPFL, Switzerland
,
Program Chair:
Pierre Genevès
CNRS, France

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

DocEng '15

Sponsor:

SIGWEB

DocEng '15: ACM Symposium on Document Engineering 2015

September 8 - 11, 2015

Lausanne, Switzerland

Acceptance Rates

DocEng '15 Paper Acceptance Rate 11 of 31 submissions, 35%;

Overall Acceptance Rate 194 of 564 submissions, 34%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
166
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Setia SIyengar SVerma ARobles GStol KWang X(2020)QWikiProceedings of the 16th International Symposium on Open Collaboration10.1145/3412569.3412576(1-12)Online publication date: 25-Aug-2020
https://dl.acm.org/doi/10.1145/3412569.3412576
Lu WLiu JXu WWang PWei B(2019)EncyCatalogRec: catalog recommendation for encyclopedia article completionFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.180036321:3(436-447)Online publication date: 5-Sep-2019
https://doi.org/10.1631/FITEE.1800363
Kretschmer MGoschlberger BKlamma R(2019)Using Topical Networks to Detect Editor Communities in Wikipedias2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS.2019.8931865(102-109)Online publication date: Oct-2019
https://doi.org/10.1109/SNAMS.2019.8931865
Piccardi TCatasta MZia LWest RCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Structuring Wikipedia Articles with Section RecommendationsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3209984(665-674)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3209984
Banerjee SMitra P(2016)WikiWriteProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3061004(2740-2746)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3060832.3061004
Banerjee SMitra P(2015)WikiKreatorAI Matters10.1145/2813536.28135382:1(4-6)Online publication date: 7-Oct-2015
https://dl.acm.org/doi/10.1145/2813536.2813538

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Sentiment diversification for short review summarization

SumCR: A new subtopic-based extractive approach for text summarization

Graph-based abstractive biomedical text summarization

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations