Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2675133.2675285acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article

Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards

Published: 28 February 2015 Publication History

Abstract

In just a few years, crowdsourcing markets like Mechanical Turk have become the dominant mechanism for for building "gold standard" datasets in areas of computer science ranging from natural language processing to audio transcription. The assumption behind this sea change - an assumption that is central to the approaches taken in hundreds of research projects - is that crowdsourced markets can accurately replicate the judgments of the general population for knowledge-oriented tasks. Focusing on the important domain of semantic relatedness algorithms and leveraging Clark's theory of common ground as a framework, we demonstrate that this assumption can be highly problematic. Using 7,921 semantic relatedness judgements from 72 scholars and 39 crowdworkers, we show that crowdworkers on Mechanical Turk produce significantly different semantic relatedness gold standard judgements than people from other communities. We also show that algorithms that perform well against Mechanical Turk gold standard datasets do significantly worse when evaluated against other communities' gold standards. Our results call into question the broad use of Mechanical Turk for the development of gold standard datasets and demonstrate the importance of understanding these datasets from a human-centered point-of-view. More generally, our findings problematize the notion that a universal gold standard dataset exists for all knowledge tasks.

References

[1]
Babbie, E. R., et al. Survey research methods. Wadsworth Belmont, CA, 1990.
[2]
Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., Van Der Goot, E., Halkia, M., Pouliquen, B., and Belyaeva, J. Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 (2013).
[3]
Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M., and Gergle, D. Omnipedia: Bridging the wikipedia language gap. In CHI '12 (2012).
[4]
Bergstrom, T., and Karahalios, K. Conversation clusters: grouping conversation topics through human-computer dialog. In CHI '09 (Boston, MA, 2009), 2349--2352.
[5]
Bloodgood, M., and Callison-Burch, C. Using mechanical turk to build machine translation evaluation sets. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (2010).
[6]
Budanitsky, A., and Hirst, G. Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics 32, 1 (2006), 13--47.
[7]
Buhrmester, M., Kwang, T., and Gosling, S. D. Amazon's mechanical turk a new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science 6, 1 (Jan. 2011), 3--5.
[8]
Callison-Burch, C., and Dredze, M. Creating speech and language data with amazon's mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics (2010), 1--12.
[9]
Clark, H. H. Using Language. Cambridge University Press, May 1996.
[10]
Dong, W., and Fu, W.-T. Cultural difference in image tagging. In CHI '10 (Atlanta, Georgia, USA, 2010), 981.
[11]
Dong, Z., Shi, C., Sen, S., Terveen, L., and Riedl, J. War versus inspirational in forrest gump: Cultural effects in tagging communities. In ICWSM '12 (May 2012).
[12]
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. Placing search in context: The concept revisited. ACM Transactions on Information Systems 20, 1 (2002), 116--131.
[13]
Freitas, A., Oliveira, J. G., O'Riain, S., da Silva, J. C., and Curry, E. Querying linked data graphs using semantic relatedness: A vocabulary independent approach. Data & Knowledge Engineering 88, 0 (2013), 126--141.
[14]
Gabrilovich, E., and Markovitch, S. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI '07 (Hyberabad, India, 2007).
[15]
Gergle, D., Kraut, R. E., and Fussell, S. R. Action as language in a shared visual space. In Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, CSCW '04, ACM (New York, NY, USA, 2004), 487--496.
[16]
Gergle, D., Millen, D. R., Kraut, R. E., and Fussell, S. R. Persistence matters: Making the most of chat in tightly-coupled work. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '04, ACM (New York, NY, USA, 2004), 431--438.
[17]
Grieser, K., Baldwin, T., Bohnert, F., and Sonenberg, L. Using ontological and document similarity to estimate museum exhibit relatedness. 10:110:20. Cited by 0013.
[18]
Halawi, G., Dror, G., Gabrilovich, E., and Koren, Y. Large-scale learning of word relatedness with constraints. In KDD '12, ACM (New York, NY, USA, 2012), 14061414.
[19]
Hecht, B., Carton, S. H., Quaderi, M., Schöning, J., Raubal, M., Gergle, D., and Downey, D. Explanatory semantic relatedness and explicit spatialization for exploratory search. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, ACM (2012), 415--424.
[20]
Hecht, B., and Gergle, D. The tower of babel meets web 2.0: User-generated content and its applications in a multilingual context. In CHI '10, ACM (Atlanta, GA, 2010), 291300. ACM ID: 1753370.
[21]
Heer, J., and Bostock, M. Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In CHI '10 (2010), 203212.
[22]
Ipeirotis, P. G. Demographics of mechanical turk.
[23]
Kittur, A., Chi, E. H., and Suh, B. What's in wikipedia?: Mapping topics and conflict using socially annotated category structure. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '09, ACM (New York, NY, USA, 2009), 1509--1512.
[24]
Liesaputra, V., and Witten, I. H. Realistic electronic books. International Journal of Human-Computer Studies 70, 9 (Sept. 2012), 588--610. Cited by 0002.
[25]
Miller, G. A., and Charles, W. G. Contextual correlates of semantic similarity. 1--28.
[26]
Milne, D., and Witten, I. H. Learning to link with wikipedia. In CIKM '08 (Napa Valley, California, USA, 2008), 509518. ACM ID: 1458150.
[27]
Mooney, C. Z., Duval, R. D., and Duvall, R. Bootstrapping: A nonparametric approach to statistical inference. Sage, 1993.
[28]
Patwardhan, S., Banerjee, S., and Pedersen, T. Using measures of semantic relatedness for word sense disambiguation. In Computational Linguistics and Intelligent Text Processing, A. Gelbukh, Ed. Springer Berlin Heidelberg, Jan. 2003, 241--257.
[29]
Pavlick, E., Post, M., Irvine, A., Kachaev, D., and Callison-Burch, C. The language demographics of amazon mechanical turk. Transactions of the Association for Computational Linguistics 2 (2014), 79--92.
[30]
Pedersen, T., Pakhomov, S. V., Patwardhan, S., and Chute, C. G. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40, 3 (2006), 288--299.
[31]
Pirró, G., and Seco, N. Design, implementation and evaluation of a new semantic similarity metric combining features and intrinsic information content. In On the Move to Meaningful Internet Systems: OTM 2008, R. Meersman and Z. Tari, Eds., no. 5332 in Lecture Notes in Computer Science. Springer Berlin Heidelberg, Jan. 2008, 1271--1288.
[32]
Ponzetto, S. P., and Strube, M. Exploiting semantic role labeling, WordNet and wikipedia for coreference resolution. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (2006), 192199.
[33]
Popescu, A., and Grefenstette, G. Mining user home location and gender from flickr tags. In ICSWM '10 (2010).
[34]
Radinsky, K., Agichtein, E., Gabrilovich, E., and Markovitch, S. A word at a time: Computing word relatedness using temporal semantic analysis. In WWW '11 (Hyberabad, India, 2011), 337--346.
[35]
Resnick, P. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI '95 (Montreal, Quebec, Canada, 1995), 448--453.
[36]
Rubenstein, H., and Goodenough, J. B. Contextual correlates of synonymy. Communications of the ACM 8, 10 (Oct. 1965), 627633.
[37]
Schöning, J., Hecht, B., Raubal, M., Krger, A., Marsh, M., and Rohs, M. Improving interaction with virtual globes through spatial thinking: Helping users ask Why?. In IUI '08 (Masapalomas, Gran Canaria, Spain, 2008), 129--138.
[38]
Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Y. Cheap and fastbut is it good?: evaluating non-expert annotations for natural language tasks. In EMNLP '08 (2008), 254263.
[39]
Strube, M., and Ponzetto, S. P. WikiRelate! computing semantic relatedness using wikipedia. In AAAI '06 (Boston, MA, 2006), 1419--1424.
[40]
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., and Stede, M. Lexicon-based methods for sentiment analysis. Computational linguistics 37, 2 (2011), 267--307.
[41]
Witten, I., and Milne, D. An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, AAAI Press, Chicago, USA (2008), 25--30.
[42]
Zesch, T., and Gurevych, I. Wisdom of crowds versus wisdom of linguists-measuring the semantic relatedness of words. Natural Language Engineering 16, 1 (2010), 25.

Cited By

View all
  • (2024)Are We Asking the Right Questions?: Designing for Community Stakeholders’ Interactions with AI in PolicingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642738(1-20)Online publication date: 11-May-2024
  • (2024)Trustworthy human computation: a surveyArtificial Intelligence Review10.1007/s10462-024-10974-157:12Online publication date: 12-Oct-2024
  • (2023)Diverse Perspectives Can Mitigate Political Bias in Crowdsourced Content ModerationProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594080(1280-1291)Online publication date: 12-Jun-2023
  • Show More Cited By

Index Terms

  1. Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CSCW '15: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing
    February 2015
    1956 pages
    ISBN:9781450329224
    DOI:10.1145/2675133
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 February 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. amazon mechanical turk
    2. cultural communities
    3. gold standard datasets
    4. natural language processing
    5. semantic relatedness
    6. user studies

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CSCW '15
    Sponsor:

    Acceptance Rates

    CSCW '15 Paper Acceptance Rate 161 of 575 submissions, 28%;
    Overall Acceptance Rate 2,235 of 8,521 submissions, 26%

    Upcoming Conference

    CSCW '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 17 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Are We Asking the Right Questions?: Designing for Community Stakeholders’ Interactions with AI in PolicingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642738(1-20)Online publication date: 11-May-2024
    • (2024)Trustworthy human computation: a surveyArtificial Intelligence Review10.1007/s10462-024-10974-157:12Online publication date: 12-Oct-2024
    • (2023)Diverse Perspectives Can Mitigate Political Bias in Crowdsourced Content ModerationProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594080(1280-1291)Online publication date: 12-Jun-2023
    • (2022)AI Ethics—A Bird’s Eye ViewApplied Sciences10.3390/app1209413012:9(4130)Online publication date: 20-Apr-2022
    • (2022)In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd WorkersFrontiers in Artificial Intelligence10.3389/frai.2022.8281875Online publication date: 18-May-2022
    • (2022)Diversity in sociotechnical machine learning systemsBig Data & Society10.1177/205395172210820279:1Online publication date: 29-Mar-2022
    • (2022)Documenting Data Production ProcessesProceedings of the ACM on Human-Computer Interaction10.1145/35556236:CSCW2(1-34)Online publication date: 11-Nov-2022
    • (2022)CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset AnnotationProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency10.1145/3531146.3534647(2342-2351)Online publication date: 21-Jun-2022
    • (2022)One Rating to Rule Them All?Proceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557410(768-779)Online publication date: 17-Oct-2022
    • (2022)A Labeling Task Design for Supporting Recent Algorithmic Needs2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020415(2689-2698)Online publication date: 17-Dec-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media