Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems

Published: 01 April 2013 Publication History

Abstract

In the field of information retrieval (IR), researchers and practitioners are often faced with a demand for valid approaches to evaluate the performance of retrieval systems. The Cranfield experiment paradigm has been dominant for the in-vitro evaluation of IR systems. Alternative to this paradigm, laboratory-based user studies have been widely used to evaluate interactive information retrieval (IIR) systems, and at the same time investigate users’ information searching behaviours. Major drawbacks of laboratory-based user studies for evaluating IIR systems include the high monetary and temporal costs involved in setting up and running those experiments, the lack of heterogeneity amongst the user population and the limited scale of the experiments, which usually involve a relatively restricted set of users. In this paper, we propose an alternative experimental methodology to laboratory-based user studies. Our novel experimental methodology uses a crowdsourcing platform as a means of engaging study participants. Through crowdsourcing, our experimental methodology can capture user interactions and searching behaviours at a lower cost, with more data, and within a shorter period than traditional laboratory-based user studies, and therefore can be used to assess the performances of IIR systems. In this article, we show the characteristic differences of our approach with respect to traditional IIR experimental and evaluation procedures. We also perform a use case study comparing crowdsourcing-based evaluation with laboratory-based evaluation of IIR systems, which can serve as a tutorial for setting up crowdsourcing-based IIR evaluations.

References

[1]
Alonso O. and Baeza-Yates R. Clough P., Foley C., Gurrin C., Jones G., Kraaij W., Lee H., and Mudoch V. Design and implementation of relevance assessments using crowdsourcing Advances in information retrieval, volume 6611 of lecture notes in computer science 2011 New York Springer 153-164
[2]
Alonso, O., & Mizzaro, S. (2009). Can we get rid of trec assessors? using mechanical turk for relevance assessment. In SIGIR ’09: workshop on the future of IR evaluation.
[3]
Alonso O., Rose D. E., and Stewart B. Crowdsourcing for relevance evaluation SIGIR Forum 2008 42 9-15
[4]
Arguello J., Diaz F., Callan J., and Carterette B. Clough P., Foley C., Gurrin C., Jones G., Kraaij W., Lee H., and Mudoch V. A methodology for evaluating aggregated search results Advances in information retrieval, volume 6611 of lecture notes in computer science 2011 New York Springer 141-152
[5]
Borlund, P. (2003). The IIR evaluation model: A framework for evaluation of interactive information retrieval systems. Information Research, 8(3), 152. http://www.doaj.org/doaj?func=abstract&id=88950.
[6]
Carter P. J. IQ and psychometric tests 2007 London Kogan Page
[7]
Dang, H. T., Kelly, D., & Lin, J. (2007). Overview of the trec 2007 question answering track. In Proceedings of the text REtrieval conference.
[8]
Dang, H. T., Lin, J., & Kelly, D. (2006). Overview of the trec 2006 question answering track. In Proceedings of the text REtrieval conference.
[9]
Feild, H., Jones, R., Miller, R. C., Nayak, R., Churchill, E. F., & Velipasaoglu, E. (2009). Logging the search self-efficacy of amazon mechanical turkers. In SIGIR 2009 work on crowdsourcing for search eval.
[10]
Grady, C., & Lease, M. (2010). Crowdsourcing document relevance assessment with mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk, CSLDAMT ’10, (pp. 172–179). PA, USA: Stroudsburg. Association for Computational Linguistics.
[11]
Grimes, C., Tang, D., & Russell, D. (2007). Query logs alone are not enough. In Workshop on query log analysis at WWW.
[12]
Ipeirotis P. G. Analyzing the amazon mechanical turk marketplace XRDS 2010 17 16-21
[13]
Ipeirotis, P. G. (2010b). Demographics of mechanical turk. NYU working paper no. ; CEDER-10-01. Available at http://hdl.handle.net/2451/29585, March 2010.
[14]
Ipeirotis, P. G., Provost, F., & Wang, J. (2010). Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD workshop on human computation, HCOMP ’10, (pp. 64–67). New York, NY, USA: ACM.
[15]
Kazai G. Clough P., Foley C., Gurrin C., Jones G., Kraaij W., Lee H., and Mudoch V. In Search of quality in crowdsourcing for search engine evaluation Advances in information retrieval volume 6611 of lecture notes in computer science 2011 New York Springer 165-176
[16]
Kelly D. Methods for evaluating interactive information retrieval systems with users Foundations and Trends in Information Retrieval 2009 3 1–2 1-224
[17]
Kelly D., Dumais S., and Pedersen J. Evaluation challenges and directions for information-seeking support systems Computer 2009 42 3 60-66
[18]
Leelanupab, T. (2012). A Ranking framework and evaluation for diversity-based retrieval. PhD thesis, University of Glasgow.
[19]
Leelanupab, T., Hopfgartner, F., & Jose, J. (2009). User centred evaluation of a recommendation based image browsing system. In Proceedings of the 4th Indian international conference on artificial intelligence (pp. 558–573). Citeseer.
[20]
Lin, C. Y. (2004). Rouge: a package for automatic evaluation of summaries. In Proceedings of the workshop on text summarization, ACL 2004. Spain: Barcelona.
[21]
Mason, W., & Watts, D. J. (2009). Financial incentives and the performance of crowds. In Proceedings of the ACM SIGKDD workshop on human computation, HCOMP ’09, (pp. 77–85), New York, NY, USA: ACM.
[22]
McCreadie, R., Macdonald, C., & Ounis, I.: Crowdsourcing Blog Track Top News Judgments at TREC. In M. Lease, V. Carvalho, E. Yilmaz (eds) Proceedings of the workshop on crowdsourcing for search and data mining (CSDM) at the 4th ACM international conference on web search and data mining (WSDM) (pp. 23–26). Hong Kong, China, February 2011.
[23]
Over, P. (1997). Trec-6 interactive track report. In Proceedings of the text REtrieval conference (pp. 57–64).
[24]
Over P. The trec interactive track: an annotated bibliography Information Processing & Management 2001 37 3 369-381
[25]
Potthast, M., Stein, B., Barrón-Cedeño, A., & Rosso, P. (2010). An evaluation framework for plagiarism detection. In Proceedings of the 23rd international conference on computational linguistics: posters, COLING ’10 (pp. 997–1005). Stroudsburg, PA, USA: Association for Computational Linguistics.
[26]
Ross, J., Zaldivar, A., Irani, L., Tomlinson, B., & Silberman, M. S. (2010). Who are the crowdworkers? shifting demographics in mechanical turk. In Proceedings CHI 2010 (pp. 2863–2872).
[27]
Santos R., Peng J., Macdonald C., and Ounis I. Gurrin C., He Y., Kazai G., Kruschwitz U., Little S., Roelleke T., Rüger S., and van Rijsbergen K. Explicit search result diversification through sub-queries Advances in information retrieval, volume 5993 of lecture notes in computer science 2010 New York Springer 87-99
[28]
Shadish W. R., Cook T. D., and Campbell D. T. Experimental and quasi-experimental designs for generalized causal inference (2nd edn.) 2001 Boston Houghton Mifflin
[29]
Voorhees E. M. Trec: Improving information access through evaluation Bulletin of the American Society for Information Science and Technology 2005 32 1 16-21
[30]
Voorhees E. M. and Harman D. TREC: Experiment and evaluation in information retrieval. Digital libraries and electronic publishing 2005 Cambridge, MA MIT Press
[31]
Zuccon, G., Leelanupab, T., Whiting, S., Jose, E. Y. J., & Azzopardi, L. (2011a). Crowdsourcing interactions—Capturing query sessions through crowdsourcing. In B. Carterette, E. Kanoulas, P. Clough, & M. Sanderson (Eds.), Proceedings of the workshop on information retrieval over query sessions at the European conference on information retrieval (ECIR). Dublin, Ireland, April 2011.
[32]
Zuccon, G., Leelanupab, T., Whiting, S., Jose, J., & Azzopardi, L. (2011b). Crowdsourcing interactions—A proposal for capturing user interactions through crowdsourcing. In M. Lease, V. Carvalho, & E. Yilmaz (Eds.), Proceedings of the workshop on crowdsourcing for search and data mining (CSDM) at the 4th ACM international conference on web search and data mining (WSDM) (pp. 35–38). Hong Kong, China, February 2011.

Cited By

View all
  • (2024)Re-evaluating the Command-and-Control Paradigm in Conversational Search InteractionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679588(2260-2270)Online publication date: 21-Oct-2024
  • (2023)Predicting Crowd Workers Performance: An Information Quality CaseWeb Engineering10.1007/978-3-031-34444-2_6(75-90)Online publication date: 6-Jun-2023
  • (2022)Authentic versus syntheticJournal of the Association for Information Science and Technology10.1002/asi.2455473:3(362-375)Online publication date: 7-Feb-2022
  • Show More Cited By

Index Terms

  1. Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Information Retrieval
        Information Retrieval  Volume 16, Issue 2
        Apr 2013
        215 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 April 2013
        Accepted: 26 June 2012
        Received: 20 May 2011

        Author Tags

        1. Crowdsourcing evaluation
        2. Interactive IR evaluation
        3. Interactions

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 17 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Re-evaluating the Command-and-Control Paradigm in Conversational Search InteractionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679588(2260-2270)Online publication date: 21-Oct-2024
        • (2023)Predicting Crowd Workers Performance: An Information Quality CaseWeb Engineering10.1007/978-3-031-34444-2_6(75-90)Online publication date: 6-Jun-2023
        • (2022)Authentic versus syntheticJournal of the Association for Information Science and Technology10.1002/asi.2455473:3(362-375)Online publication date: 7-Feb-2022
        • (2021)The many dimensions of truthfulnessInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10271058:6Online publication date: 1-Nov-2021
        • (2019)On Annotation Methodologies for Image Search EvaluationACM Transactions on Information Systems10.1145/330999437:3(1-32)Online publication date: 27-Mar-2019
        • (2019)All Those Wasted HoursProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3291035(321-329)Online publication date: 30-Jan-2019
        • (2019)The impact of result diversification on search behaviour and performanceInformation Retrieval10.1007/s10791-019-09353-022:5(422-446)Online publication date: 1-Oct-2019
        • (2019)Evaluating interactive bibliographic information retrieval systemsProceedings of the Association for Information Science and Technology10.1002/pra2.2018.1450550106855:1(628-637)Online publication date: 1-Feb-2019
        • (2018)Juggling with Information Sources, Task Type, and Information QualityProceedings of the 2018 Conference on Human Information Interaction & Retrieval10.1145/3176349.3176390(82-91)Online publication date: 1-Mar-2018
        • (2017)A Study of Snippet Length and InformativenessProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080824(135-144)Online publication date: 7-Aug-2017
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media