research-article

Crowdsourcing research opportunities: lessons from natural language processing

Authors:

Kalina Bontcheva,

Arno ScharlAuthors Info & Claims

i-KNOW '12: Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies

Article No.: 17, Pages 1 - 8

https://doi.org/10.1145/2362456.2362479

Published: 05 September 2012 Publication History

Abstract

Although the field has led to promising early results, the use of crowdsourcing as an integral part of science projects is still regarded with skepticism by some, largely due to a lack of awareness of the opportunities and implications of utilizing these new techniques. We address this lack of awareness, firstly by highlighting the positive impacts that crowdsourcing has had on Natural Language Processing research. Secondly, we discuss the challenges of more complex methodologies, quality control, and the necessity to deal with ethical issues. We conclude with future trends and opportunities of crowdsourcing for science, including its potential for disseminating results, making science more accessible, and enriching educational programs.

References

[1]

T. Abekawa, M. Utiyama, E. Sumita, and K. Kageura. Community-based Construction of Draft and Final Translation Corpus through a Translation Hosting Site Minna no Hon'yaku (MNH). In Proc. of LREC, 2010.

[2]

V. Ambati and S. Vogel. Can Crowds Build Parallel Corpora for Machine Translation Systems? In Callison-Burch and Dredze {8}, pages 62--65.

Digital Library

[3]

T. S. Behrend, D. J. Sharek, A. W. Meade, and E. N. Wiebe. The viability of crowdsourcing for survey research. Behav. Res., 43(3):800--813, 2011.

[4]

A. Bernstein, M. Klein, and T. W. Malone. Programming the Global Brain. Commun. ACM, 55(5):41--43, 2012.

Digital Library

[5]

A. Brew, D. Greene, and P. Cunningham. Using Crowdsourcing and Active Learning to Track Sentiment in Online Media. In Proc. of the European Conf. on Artificial Intelligence, pages 145--150, 2010.

Digital Library

[6]

C. Callison-Burch. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In Proc. of the Conf. on Empirical Methods in NLP, pages 286--295, 2009.

Digital Library

[7]

C. Callison-Burch and M. Dredze. Creating Speech and Language Data with Amazon's Mechanical Turk. In Proc. of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk {8}, pages 1--12.

Digital Library

[8]

C. Callison-Burch and M. Dredze, editors. Proc. of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, 2010.

Digital Library

[9]

P. Cohn. Can Volunteers Do Real Research? BioScience, 58(3):192--197, 2010.

[10]

S. Cooper, F. Khatib, A. Treuille, J. Barbero, J. Lee, M. Beenen, A. Leaver-Fay, D. Baker, Z. Popovic, and Foldit players. Predicting protein structures with a multiplayer online game. Nature, 466(7307):756--760, 2010.

[11]

A. Doan, R. Ramakrishnan, and A. Y. Halevy. Crowdsourcing Systems on the World-Wide Web. Commun. ACM, 54(4):86--96, April 2011.

Digital Library

[12]

C. Eickhoff and A. de Vries. Increasing Cheat Robustness of Crowdsourcing Tasks. Information Retrieval, 15:1--17, 2012. 10.1007/s10791-011-9181-9.

[13]

M. El-Haj, U. Kruschwitz, and C. Fox. Using Mechanical Turk to Create a Corpus of Arabic Summaries. In Proc. of LREC, 2010.

[14]

T. A. Finholt. Collaboratories. Annual Review of Info. Science and Technology, 36:74--104, 2002.

[15]

T. Finin, W. Murnane, A. Karandikar, N. Keller, J. Martineau, and M. Dredze. Annotating Named Entities in Twitter Data with Crowdsourcing. In Callison-Burch and Dredze {8}, pages 80--88.

Digital Library

[16]

K. Fort, G. Adda, and K. B. Cohen. Amazon Mechanical Turk: Gold Mine or Coal Mine? Computational Linguistics, 37(2):413--420, 2011.

Digital Library

[17]

D. Gillick and Y. Liu. Non-Expert Evaluation of Summarization Systems is Risky. In Callison-Burch and Dredze {8}, pages 148--151.

Digital Library

[18]

E. Hand. Citizen science: People power. Nature, 466:685--687, 2010.

[19]

L. Hoffmann. Crowd Control. Commun. ACM, 52(3):16--17, 2009.

Digital Library

[20]

E. H. Hovy, M. P. Marcus, M. Palmer, L. A. Ramshaw, and R. M. Weischedel. OntoNotes: The 90% Solution. In Proc. of HLT-NAACL, 2006.

Digital Library

[21]

J. Howe. Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business, 2009. http://crowdsourcing.typepad.com/.

Digital Library

[22]

A. Irvine and A. Klementiev. Using Mechanical Turk to Annotate Lexicons for Less Commonly Used Languages. In Callison-Burch and Dredze {8}, pages 108--113.

Digital Library

[23]

A. Kawrykow, G. Roumanis, A. Kam, D. Kwak, C. Leung, C. Wu, E. Zarour, and Phylo players. Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment. PLoS ONE, 7(3):e31362, 2012.

[24]

A. Koller, K. Striegnitz, A. Gargett, D. Byron, J. Cassell, R. Dale, J. Moore, and J. Oberlander. Report on the Second NLG Challenge on Generating Instructions in Virtual Environments (GIVE-2). In Proc. of the Int. Natural Language Generation Conf., pages 243--250, 2010.

Digital Library

[25]

M. Krause and J. Smeddinck. Human Computation Games: a Survey. In Proc. of 19th European Signal Processing Conference (EUSIPCO 2011), 2011.

[26]

S. A. Kunath and S. H. Weinberger. The Wisdom of the Crowd's Ear: Speech Accent Rating and Annotation with Amazon Mechanical Turk. In Callison-Burch and Dredze {8}, pages 168--171.

Digital Library

[27]

F. Laws, C. Scheible, and H. Schütze. Active Learning with Amazon Mechanical Turk. In Proc. of the Conf. on Empirical Methods in NLP, pages 1546--1556, 2011.

Digital Library

[28]

N. Lawson, K. Eustice, M. Perkowitz, and M. Yetisgen-Yildiz. Annotating Large Email Datasets for Named Entity Recognition with Mechanical Turk. In Callison-Burch and Dredze {8}, pages 71--79.

Digital Library

[29]

N. Madnani, J. Boyd-Graber, and P. Resnik. Measuring Transitivity Using Untrained Annotators. In Callison-Burch and Dredze {8}, pages 188--194.

Digital Library

[30]

T. W. Malone, R. Laubacher, and C. Dellarocas. The Collective Intelligence Genome. MIT Sloan, 51(3):21--31, 2010.

[31]

J. Mrozinski, E. Whittaker, and S. Furui. Collecting a Why-Question Corpus for Development and Evaluation of an Automatic QA-System. In Proc. of ACL: HLT, pages 443--451, 2008.

[32]

R. Munro, S. Bethard, V. Kuperman, V. T. Lai, R. Melnick, C. Potts, T. Schnoebelen, and H. Tily. Crowdsourcing and Language Studies: The New Generation of Linguistic Data. In Callison-Burch and Dredze {8}, pages 122--130.

Digital Library

[33]

M. Negri, L. Bentivogli, Y. Mehdad, D. Giampiccolo, and A. Marchetti. Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora. In Proc. of the Conf. on Empirical Methods in NLP, pages 670--679, 2011.

Digital Library

[34]

M. Negri and Y. Mehdad. Creating a Bi-lingual Entailment Corpus through Translations with Mechanical Turk: 100 for a 10-day Rush. In Callison-Burch and Dredze {8}, pages 212--216.

Digital Library

[35]

O. Nov, O. Arazy, and D. Anderson. Crowdsourcing for Science: Understanding and Enhancing Science Sourcing Contribution. In WS. on the Changing Dynamics of Scientific Collaborations, 2010.

[36]

S. Novotney and C. Callison-Burch. Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription. In Proc. of HLT-NAACL, pages 207--215, 2010.

Digital Library

[37]

G. Parent and M. Eskenazi. Clustering Dictionary Definitions Using Amazon Mechanical Turk. In Callison-Burch and Dredze {8}, pages 21--29.

Digital Library

[38]

G. Parent and M. Eskenazi. Speaking to the Crowd: Looking at Past Achievements in Using Crowdsourcing for Speech and Predicting Future Challenges. In Proc. of INTERSPEECH, pages 3037--3040, 2011.

[39]

M. Poesio, U. Kruschwitz, J. Chamberlain, L. Robaldo, and L. Ducceschi. Phrase Detectives: Utilizing Collective Intelligence for Internet-Scale Language Resource Creation. Transactions on Interactive Intelligent Systems, 2012. To Appear.

Digital Library

[40]

A. J. Quinn and B. B. Bederson. Human Computation: A Survey and Taxonomy of a Growing Field. In Proc. of Human Factors in Computing Systems, pages 1403--1412, 2011.

Digital Library

[41]

W. Rafelsberger and A. Scharl. Games with a Purpose for Social Networking Platforms. In Proc. of the Conf. on Hypertext and Hypermedia, pages 193--198, 2009.

Digital Library

[42]

N. Savage. Gaining Wisdom from Crowds. Commun. ACM, 55(3):13--15, 2012.

Digital Library

[43]

A. Scharl, M. Föls, and M. Sabou. ClimateQuiz: A Game for Gathering Environmental Knowledge, 2012. Under Peer Review.

[44]

A. Scharl, M. Sabou, S. Gindl, W. Rafelsberger, and A. Weichselbraun. Leveraging the Wisdom of the Crowds for the Acquisition of Multilingual Language Resources. In Proc. of the LREC, 2012.

[45]

R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and Fast---but is it Good?: Evaluating Non-Expert Annotations for Natural Language Tasks. In Proc. of EMNLP, pages 254--263, 2008.

Digital Library

[46]

S. Thaler, K. Siorpaes, C. Hofer, and E. Simperl. A Survey on Games for Knowledge Acquisition. Technical report, Semantic Techology Institute, Innsbruck, 2011.

[47]

K. Vertanen and P. O. Kristensson. The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources. In Proc. of the Conf. on Empirical Methods in NLP, pages 700--711, 2011.

Digital Library

[48]

L. von Ahn and L. Dabbish. Designing games with a purpose. Commun. ACM, 51(8):58--67, 2008.

Digital Library

[49]

A. Wang, C. D. V. Hoang, and M. Y. Kan. Perspectives on Crowdsourcing Annotations for Natural Language Processing. Language Resources and Evaluation, 2012.

[50]

A. Wiggins. Crowdsourcing Science: Organizing Virtual Participation in Knowledge Production. In Proc. of the Int. Conf. on Supporting Group Work, pages 337--338, 2010.

Digital Library

[51]

A. Wiggins and K. Crowston. From Conservation to Crowdsourcing: A Typology of Citizen Science. In Proc. of the Hawaii Int. Conf. on System Sciences, pages 1--10, 2011.

Digital Library

[52]

O. F. Zaidan and C. Callison-Burch. Crowdsourcing Translation: Professional Quality from Non-Professionals. In Proc. of ACL: HLT, pages 1220--1229, 2011.

Digital Library

Cited By

Jaworski RSeljan SDunđer I(2023)Four Million Segments and Counting: Building an English-Croatian Parallel Corpus through Crowdsourcing Using a Novel Gamification-Based PlatformInformation10.3390/info1404022614:4(226)Online publication date: 6-Apr-2023
https://doi.org/10.3390/info14040226
Krueger KTrager LFarmer KByrne R(2022)Tool Use in HorsesAnimals10.3390/ani1215187612:15(1876)Online publication date: 22-Jul-2022
https://doi.org/10.3390/ani12151876
Zhang J(2022)Knowledge Learning With Crowdsourcing: A Brief Review and Systematic PerspectiveIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2022.1054349:5(749-762)Online publication date: May-2022
https://doi.org/10.1109/JAS.2022.105434
Show More Cited By

Index Terms

Recommendations

Perspectives on crowdsourcing annotations for natural language processing

Crowdsourcing has emerged as a new method for obtaining annotations for training models for machine learning. While many variants of this process exist, they largely differ in their methods of motivating subjects to contribute and the scale of their ...
Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys
CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

Crowdsourcing is increasingly being used as a means to tackle problems requiring human intelligence. With the ever-growing worker base that aims to complete microtasks on crowdsourcing platforms in exchange for financial gains, there is a need for ...
Where's the Learning in Education Crowdsourcing?
L@S '20: Proceedings of the Seventh ACM Conference on Learning @ Scale

Crowdsourcing has shown promise in education domains. For example, researchers have leveraged the wisdom of the crowd to process grading in MOOCs and develop learning resources. An untapped domain is harnessing the crowd to systematically process ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

i-KNOW '12: Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies

September 2012

244 pages

ISBN:9781450312424

DOI:10.1145/2362456

Conference Chairs:
Stefanie Lindstaedt
Graz University of Technology, Austria
,
Michael Granitzer
University Passau, Germany

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Engineering and Physical Sciences Research Council

Conference

i-Know '12

i-Know '12: 12th International Conference on Knowledge Management and Knowledge Technologies

September 5 - 7, 2012

Graz, Austria

Acceptance Rates

Overall Acceptance Rate 77 of 238 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
1,077
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)5

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jaworski RSeljan SDunđer I(2023)Four Million Segments and Counting: Building an English-Croatian Parallel Corpus through Crowdsourcing Using a Novel Gamification-Based PlatformInformation10.3390/info1404022614:4(226)Online publication date: 6-Apr-2023
https://doi.org/10.3390/info14040226
Krueger KTrager LFarmer KByrne R(2022)Tool Use in HorsesAnimals10.3390/ani1215187612:15(1876)Online publication date: 22-Jul-2022
https://doi.org/10.3390/ani12151876
Zhang J(2022)Knowledge Learning With Crowdsourcing: A Brief Review and Systematic PerspectiveIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2022.1054349:5(749-762)Online publication date: May-2022
https://doi.org/10.1109/JAS.2022.105434
Ge-Stadnyk J(2022)The Hive Mind at Work: Crowdsourcing E-Tourism ResearchHandbook of e-Tourism10.1007/978-3-030-48652-5_119(617-633)Online publication date: 2-Sep-2022
https://doi.org/10.1007/978-3-030-48652-5_119
Krueger KEsch LByrne R(2021)Need or opportunity? A study of innovations in equidsPLOS ONE10.1371/journal.pone.025773016:9(e0257730)Online publication date: 27-Sep-2021
https://doi.org/10.1371/journal.pone.0257730
Ge-Stadnyk J(2021)The Hive Mind at Work: Crowdsourcing E-Tourism ResearchHandbook of e-Tourism10.1007/978-3-030-05324-6_119-1(1-17)Online publication date: 28-Mar-2021
https://doi.org/10.1007/978-3-030-05324-6_119-1
Zlabinger MSabou MHofstätter SSertkan MHanbury AHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)DEXA: Supporting Non-Expert Annotators with Dynamic Examples from ExpertsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401334(2109-2112)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401334
Islam SBhuiyan MAhmed ZAl-Amin M(2020)An Empirical Survey on Crowdsourcing-Based Data Management TechniquesProceedings of the International Conference on Computing Advancements10.1145/3377049.3377106(1-7)Online publication date: 10-Jan-2020
https://dl.acm.org/doi/10.1145/3377049.3377106
Sabou MWinkler DBiffl S(2020)Empirical Software Engineering Experimentation with Human ComputationContemporary Empirical Methods in Software Engineering10.1007/978-3-030-32489-6_7(173-215)Online publication date: 28-Aug-2020
https://doi.org/10.1007/978-3-030-32489-6_7
Alonso O(2019)The Practice of CrowdsourcingSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00904ED1V01Y201903ICR06611:1(1-149)Online publication date: 28-May-2019
https://doi.org/10.2200/S00904ED1V01Y201903ICR066
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents