Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2806416.2806492acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model

Published: 17 October 2015 Publication History

Abstract

Existing approaches used for training and evaluating search engines often rely on crowdsourced assessments of document relevance with respect to a user query. To use such assessments for either evaluation or learning, we propose a new framework for the inference of true document relevance from crowdsourced data---one simpler than previous approaches and achieving better performance. For each assessor, we model assessor quality and bias in the form of Gaussian distributed class conditionals of relevance grades. For each document, we model true relevance and difficulty as continuous variables. We estimate all parameters from crowdsourced data, demonstrating better inference of relevance as well as realistic models for both documents and assessors.
A document-pair likelihood model works best, and it is extended to pairwise learning to rank. Utilizing more information directly from the input data, it shows better performance as compared to existing state-of-the-art approaches for learning to rank from crowdsourced assessments. Experimental validation is performed on four TREC datasets.

References

[1]
D. Andrich. A rating formulation for ordered response categories. Psychometrika, 43:561--573, 1978.
[2]
Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). 2006.
[3]
C.J.C. Burges. From ranknet to lambdarank to lambdamart: An overview, 2010.
[4]
M. Lease C. Buckley and M. D. Smucker. Overview of the TREC 2010 Relevance Feedback Track (Notebook). In TREC, 2010.
[5]
O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. JMLR, 14:1--24, 2011.
[6]
X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In WSDM, pages 193--202, 2013.
[7]
A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 28(1):20--28, 1979.
[8]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 2000.
[9]
Y. Ganjisaffar, R. Caruana, and C. V. Lopes. Bagging gradient-boosted trees for high precision, low variance ranking models. SIGIR, pages 85--94, 2011.
[10]
M. Hosseini, I. J. Cox, N. Milic-Frayling, G. Kazai, and V. Vinay. On aggregating labels from multiple crowd workers to infer relevance of documents. In ECIR, 2012.
[11]
V. E. Johnson. On bayesian analysis of multirater ordinal data: An application to automated essay grading. Journal of the American Statistical Association, 91(433):42--51, 1996.
[12]
Chao L. and Y.-M. Wang. Truelabel confusions: A spectrum of probabilistic models in analyzing multiple ratings. In ICML, pages 225--232, 2012.
[13]
B. Lakshminarayanan and Y. W. Teh. Inferring ground truth from multi-annotator ordinal data: A probabilistic approach. arXiv:1305.0015, 2013.
[14]
Q. Liu, J. Peng, and A. T Ihler. Variational inference for crowdsourcing. In NIPS, pages 692--700. 2012.
[15]
G. N. Masters. A rasch model for partial credit scoring. Psychometrika, 47:149--174, 1982.
[16]
P. Metrikov, J. Wu, J. Anderton, V. Pavlu, and J. A. Aslam. A modification of lambdamart to handle noisy crowdsourced assessments. In ICTIR, 2013.
[17]
Paul Mineiro. Ordered values and mechanical turk. http://www.machinedlearnings.com, 2011.
[18]
S. Niu, Y. Lan, J. Guo, X. Cheng, L. Yu, and G. Long. Listwise approach for rank aggregation in crowdsourcing. In WSDM, pages 253--262, 2015.
[19]
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes 3rd Edition: The Art of Scientific Computing. 3 edition, 2007.
[20]
V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. J. Mach. Learn. Res., 11:1297--1322, 2010.
[21]
S. Rogers, M. Girolami, and T. Polajnar. Semi-parametric analysis of multi-rater data. Statistics and Computing, 20(3):317--334, 2010.
[22]
V. Sheng, F. Provost, and P. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In KDD, 2008.
[23]
M. Smucker, G. Kazai, and M. Lease. Overview of the TREC 2013 Crowdsourcing Track. In TREC, 2014.
[24]
M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In CIKM, 2007.
[25]
J. S. Uebersax and W. M. Grove. A latent trait finite mixture model for the analysis of rating agreement. In Biometrics, December 1993.
[26]
M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi. Community-based bayesian aggregation models for crowdsourcing. In WWW, 2014.
[27]
Maksims N. Volkovs and Richard S. Zemel. New learning methods for supervised and unsupervised preference aggregation. JMLR, 15:1135--1176, 2014.
[28]
T. P. Waterhouse. Pay by the bit: An information-theoretic metric for collective human judgment. In CSCW, pages 623--638, 2013.
[29]
J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. R. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, pages 2035--2043, 2009.
[30]
D. Zhou, Q. Liu, J. C. Platt, and C. Meek. Aggregating ordinal labels from crowds by minimax conditional entropy. In ICML, 2014.
[31]
D. Zhou, J. Platt, S. Basu, and Y. Mao. Learning from the wisdom of crowds by minimax entropy. In NIPS, 2012.

Cited By

View all
  • (2020)Truth Inference With a Deep Clustering-Based Aggregation ModelIEEE Access10.1109/ACCESS.2020.29644848(16662-16675)Online publication date: 2020
  • (2017)Aggregating crowd wisdoms with label-aware autoencodersProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3171642.3171830(1325-1331)Online publication date: 19-Aug-2017
  • (2017)Fusion of Valence and Arousal Annotations through Dynamic Subjective Ordinal Modelling2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)10.1109/FG.2017.48(331-338)Online publication date: 30-May-2017

Index Terms

  1. Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
    October 2015
    1998 pages
    ISBN:9781450337946
    DOI:10.1145/2806416
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. assessor modeling
    2. crowdsourcing
    3. informativeness
    4. learning to rank
    5. ordinal label aggregation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CIKM'15
    Sponsor:

    Acceptance Rates

    CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Truth Inference With a Deep Clustering-Based Aggregation ModelIEEE Access10.1109/ACCESS.2020.29644848(16662-16675)Online publication date: 2020
    • (2017)Aggregating crowd wisdoms with label-aware autoencodersProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3171642.3171830(1325-1331)Online publication date: 19-Aug-2017
    • (2017)Fusion of Valence and Arousal Annotations through Dynamic Subjective Ordinal Modelling2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)10.1109/FG.2017.48(331-338)Online publication date: 30-May-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media