Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2362456.2362479acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesi-knowConference Proceedingsconference-collections
research-article

Crowdsourcing research opportunities: lessons from natural language processing

Published: 05 September 2012 Publication History

Abstract

Although the field has led to promising early results, the use of crowdsourcing as an integral part of science projects is still regarded with skepticism by some, largely due to a lack of awareness of the opportunities and implications of utilizing these new techniques. We address this lack of awareness, firstly by highlighting the positive impacts that crowdsourcing has had on Natural Language Processing research. Secondly, we discuss the challenges of more complex methodologies, quality control, and the necessity to deal with ethical issues. We conclude with future trends and opportunities of crowdsourcing for science, including its potential for disseminating results, making science more accessible, and enriching educational programs.

References

[1]
T. Abekawa, M. Utiyama, E. Sumita, and K. Kageura. Community-based Construction of Draft and Final Translation Corpus through a Translation Hosting Site Minna no Hon'yaku (MNH). In Proc. of LREC, 2010.
[2]
V. Ambati and S. Vogel. Can Crowds Build Parallel Corpora for Machine Translation Systems? In Callison-Burch and Dredze {8}, pages 62--65.
[3]
T. S. Behrend, D. J. Sharek, A. W. Meade, and E. N. Wiebe. The viability of crowdsourcing for survey research. Behav. Res., 43(3):800--813, 2011.
[4]
A. Bernstein, M. Klein, and T. W. Malone. Programming the Global Brain. Commun. ACM, 55(5):41--43, 2012.
[5]
A. Brew, D. Greene, and P. Cunningham. Using Crowdsourcing and Active Learning to Track Sentiment in Online Media. In Proc. of the European Conf. on Artificial Intelligence, pages 145--150, 2010.
[6]
C. Callison-Burch. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In Proc. of the Conf. on Empirical Methods in NLP, pages 286--295, 2009.
[7]
C. Callison-Burch and M. Dredze. Creating Speech and Language Data with Amazon's Mechanical Turk. In Proc. of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk {8}, pages 1--12.
[8]
C. Callison-Burch and M. Dredze, editors. Proc. of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, 2010.
[9]
P. Cohn. Can Volunteers Do Real Research? BioScience, 58(3):192--197, 2010.
[10]
S. Cooper, F. Khatib, A. Treuille, J. Barbero, J. Lee, M. Beenen, A. Leaver-Fay, D. Baker, Z. Popovic, and Foldit players. Predicting protein structures with a multiplayer online game. Nature, 466(7307):756--760, 2010.
[11]
A. Doan, R. Ramakrishnan, and A. Y. Halevy. Crowdsourcing Systems on the World-Wide Web. Commun. ACM, 54(4):86--96, April 2011.
[12]
C. Eickhoff and A. de Vries. Increasing Cheat Robustness of Crowdsourcing Tasks. Information Retrieval, 15:1--17, 2012. 10.1007/s10791-011-9181-9.
[13]
M. El-Haj, U. Kruschwitz, and C. Fox. Using Mechanical Turk to Create a Corpus of Arabic Summaries. In Proc. of LREC, 2010.
[14]
T. A. Finholt. Collaboratories. Annual Review of Info. Science and Technology, 36:74--104, 2002.
[15]
T. Finin, W. Murnane, A. Karandikar, N. Keller, J. Martineau, and M. Dredze. Annotating Named Entities in Twitter Data with Crowdsourcing. In Callison-Burch and Dredze {8}, pages 80--88.
[16]
K. Fort, G. Adda, and K. B. Cohen. Amazon Mechanical Turk: Gold Mine or Coal Mine? Computational Linguistics, 37(2):413--420, 2011.
[17]
D. Gillick and Y. Liu. Non-Expert Evaluation of Summarization Systems is Risky. In Callison-Burch and Dredze {8}, pages 148--151.
[18]
E. Hand. Citizen science: People power. Nature, 466:685--687, 2010.
[19]
L. Hoffmann. Crowd Control. Commun. ACM, 52(3):16--17, 2009.
[20]
E. H. Hovy, M. P. Marcus, M. Palmer, L. A. Ramshaw, and R. M. Weischedel. OntoNotes: The 90% Solution. In Proc. of HLT-NAACL, 2006.
[21]
J. Howe. Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business, 2009. http://crowdsourcing.typepad.com/.
[22]
A. Irvine and A. Klementiev. Using Mechanical Turk to Annotate Lexicons for Less Commonly Used Languages. In Callison-Burch and Dredze {8}, pages 108--113.
[23]
A. Kawrykow, G. Roumanis, A. Kam, D. Kwak, C. Leung, C. Wu, E. Zarour, and Phylo players. Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment. PLoS ONE, 7(3):e31362, 2012.
[24]
A. Koller, K. Striegnitz, A. Gargett, D. Byron, J. Cassell, R. Dale, J. Moore, and J. Oberlander. Report on the Second NLG Challenge on Generating Instructions in Virtual Environments (GIVE-2). In Proc. of the Int. Natural Language Generation Conf., pages 243--250, 2010.
[25]
M. Krause and J. Smeddinck. Human Computation Games: a Survey. In Proc. of 19th European Signal Processing Conference (EUSIPCO 2011), 2011.
[26]
S. A. Kunath and S. H. Weinberger. The Wisdom of the Crowd's Ear: Speech Accent Rating and Annotation with Amazon Mechanical Turk. In Callison-Burch and Dredze {8}, pages 168--171.
[27]
F. Laws, C. Scheible, and H. Schütze. Active Learning with Amazon Mechanical Turk. In Proc. of the Conf. on Empirical Methods in NLP, pages 1546--1556, 2011.
[28]
N. Lawson, K. Eustice, M. Perkowitz, and M. Yetisgen-Yildiz. Annotating Large Email Datasets for Named Entity Recognition with Mechanical Turk. In Callison-Burch and Dredze {8}, pages 71--79.
[29]
N. Madnani, J. Boyd-Graber, and P. Resnik. Measuring Transitivity Using Untrained Annotators. In Callison-Burch and Dredze {8}, pages 188--194.
[30]
T. W. Malone, R. Laubacher, and C. Dellarocas. The Collective Intelligence Genome. MIT Sloan, 51(3):21--31, 2010.
[31]
J. Mrozinski, E. Whittaker, and S. Furui. Collecting a Why-Question Corpus for Development and Evaluation of an Automatic QA-System. In Proc. of ACL: HLT, pages 443--451, 2008.
[32]
R. Munro, S. Bethard, V. Kuperman, V. T. Lai, R. Melnick, C. Potts, T. Schnoebelen, and H. Tily. Crowdsourcing and Language Studies: The New Generation of Linguistic Data. In Callison-Burch and Dredze {8}, pages 122--130.
[33]
M. Negri, L. Bentivogli, Y. Mehdad, D. Giampiccolo, and A. Marchetti. Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora. In Proc. of the Conf. on Empirical Methods in NLP, pages 670--679, 2011.
[34]
M. Negri and Y. Mehdad. Creating a Bi-lingual Entailment Corpus through Translations with Mechanical Turk: 100 for a 10-day Rush. In Callison-Burch and Dredze {8}, pages 212--216.
[35]
O. Nov, O. Arazy, and D. Anderson. Crowdsourcing for Science: Understanding and Enhancing Science Sourcing Contribution. In WS. on the Changing Dynamics of Scientific Collaborations, 2010.
[36]
S. Novotney and C. Callison-Burch. Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription. In Proc. of HLT-NAACL, pages 207--215, 2010.
[37]
G. Parent and M. Eskenazi. Clustering Dictionary Definitions Using Amazon Mechanical Turk. In Callison-Burch and Dredze {8}, pages 21--29.
[38]
G. Parent and M. Eskenazi. Speaking to the Crowd: Looking at Past Achievements in Using Crowdsourcing for Speech and Predicting Future Challenges. In Proc. of INTERSPEECH, pages 3037--3040, 2011.
[39]
M. Poesio, U. Kruschwitz, J. Chamberlain, L. Robaldo, and L. Ducceschi. Phrase Detectives: Utilizing Collective Intelligence for Internet-Scale Language Resource Creation. Transactions on Interactive Intelligent Systems, 2012. To Appear.
[40]
A. J. Quinn and B. B. Bederson. Human Computation: A Survey and Taxonomy of a Growing Field. In Proc. of Human Factors in Computing Systems, pages 1403--1412, 2011.
[41]
W. Rafelsberger and A. Scharl. Games with a Purpose for Social Networking Platforms. In Proc. of the Conf. on Hypertext and Hypermedia, pages 193--198, 2009.
[42]
N. Savage. Gaining Wisdom from Crowds. Commun. ACM, 55(3):13--15, 2012.
[43]
A. Scharl, M. Föls, and M. Sabou. ClimateQuiz: A Game for Gathering Environmental Knowledge, 2012. Under Peer Review.
[44]
A. Scharl, M. Sabou, S. Gindl, W. Rafelsberger, and A. Weichselbraun. Leveraging the Wisdom of the Crowds for the Acquisition of Multilingual Language Resources. In Proc. of the LREC, 2012.
[45]
R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and Fast---but is it Good?: Evaluating Non-Expert Annotations for Natural Language Tasks. In Proc. of EMNLP, pages 254--263, 2008.
[46]
S. Thaler, K. Siorpaes, C. Hofer, and E. Simperl. A Survey on Games for Knowledge Acquisition. Technical report, Semantic Techology Institute, Innsbruck, 2011.
[47]
K. Vertanen and P. O. Kristensson. The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources. In Proc. of the Conf. on Empirical Methods in NLP, pages 700--711, 2011.
[48]
L. von Ahn and L. Dabbish. Designing games with a purpose. Commun. ACM, 51(8):58--67, 2008.
[49]
A. Wang, C. D. V. Hoang, and M. Y. Kan. Perspectives on Crowdsourcing Annotations for Natural Language Processing. Language Resources and Evaluation, 2012.
[50]
A. Wiggins. Crowdsourcing Science: Organizing Virtual Participation in Knowledge Production. In Proc. of the Int. Conf. on Supporting Group Work, pages 337--338, 2010.
[51]
A. Wiggins and K. Crowston. From Conservation to Crowdsourcing: A Typology of Citizen Science. In Proc. of the Hawaii Int. Conf. on System Sciences, pages 1--10, 2011.
[52]
O. F. Zaidan and C. Callison-Burch. Crowdsourcing Translation: Professional Quality from Non-Professionals. In Proc. of ACL: HLT, pages 1220--1229, 2011.

Cited By

View all
  • (2023)Four Million Segments and Counting: Building an English-Croatian Parallel Corpus through Crowdsourcing Using a Novel Gamification-Based PlatformInformation10.3390/info1404022614:4(226)Online publication date: 6-Apr-2023
  • (2022)Tool Use in HorsesAnimals10.3390/ani1215187612:15(1876)Online publication date: 22-Jul-2022
  • (2022)Knowledge Learning With Crowdsourcing: A Brief Review and Systematic PerspectiveIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2022.1054349:5(749-762)Online publication date: May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
i-KNOW '12: Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
September 2012
244 pages
ISBN:9781450312424
DOI:10.1145/2362456
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowdsourcing
  2. games with a purpose
  3. natural language processing
  4. resource acquisition

Qualifiers

  • Research-article

Funding Sources

Conference

i-Know '12

Acceptance Rates

Overall Acceptance Rate 77 of 238 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)5
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Four Million Segments and Counting: Building an English-Croatian Parallel Corpus through Crowdsourcing Using a Novel Gamification-Based PlatformInformation10.3390/info1404022614:4(226)Online publication date: 6-Apr-2023
  • (2022)Tool Use in HorsesAnimals10.3390/ani1215187612:15(1876)Online publication date: 22-Jul-2022
  • (2022)Knowledge Learning With Crowdsourcing: A Brief Review and Systematic PerspectiveIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2022.1054349:5(749-762)Online publication date: May-2022
  • (2022)The Hive Mind at Work: Crowdsourcing E-Tourism ResearchHandbook of e-Tourism10.1007/978-3-030-48652-5_119(617-633)Online publication date: 2-Sep-2022
  • (2021)Need or opportunity? A study of innovations in equidsPLOS ONE10.1371/journal.pone.025773016:9(e0257730)Online publication date: 27-Sep-2021
  • (2021)The Hive Mind at Work: Crowdsourcing E-Tourism ResearchHandbook of e-Tourism10.1007/978-3-030-05324-6_119-1(1-17)Online publication date: 28-Mar-2021
  • (2020)DEXA: Supporting Non-Expert Annotators with Dynamic Examples from ExpertsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401334(2109-2112)Online publication date: 25-Jul-2020
  • (2020)An Empirical Survey on Crowdsourcing-Based Data Management TechniquesProceedings of the International Conference on Computing Advancements10.1145/3377049.3377106(1-7)Online publication date: 10-Jan-2020
  • (2020)Empirical Software Engineering Experimentation with Human ComputationContemporary Empirical Methods in Software Engineering10.1007/978-3-030-32489-6_7(173-215)Online publication date: 28-Aug-2020
  • (2019)The Practice of CrowdsourcingSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00904ED1V01Y201903ICR06611:1(1-149)Online publication date: 28-May-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media