research-article

Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model

Authors:

Pavel Metrikov,

Javed A. AslamAuthors Info & Claims

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Pages 1391 - 1400

https://doi.org/10.1145/2806416.2806492

Published: 17 October 2015 Publication History

Abstract

Existing approaches used for training and evaluating search engines often rely on crowdsourced assessments of document relevance with respect to a user query. To use such assessments for either evaluation or learning, we propose a new framework for the inference of true document relevance from crowdsourced data---one simpler than previous approaches and achieving better performance. For each assessor, we model assessor quality and bias in the form of Gaussian distributed class conditionals of relevance grades. For each document, we model true relevance and difficulty as continuous variables. We estimate all parameters from crowdsourced data, demonstrating better inference of relevance as well as realistic models for both documents and assessors.

A document-pair likelihood model works best, and it is extended to pairwise learning to rank. Utilizing more information directly from the input data, it shows better performance as compared to existing state-of-the-art approaches for learning to rank from crowdsourced assessments. Experimental validation is performed on four TREC datasets.

References

[1]

D. Andrich. A rating formulation for ordered response categories. Psychometrika, 43:561--573, 1978.

[2]

Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). 2006.

Digital Library

[3]

C.J.C. Burges. From ranknet to lambdarank to lambdamart: An overview, 2010.

[4]

M. Lease C. Buckley and M. D. Smucker. Overview of the TREC 2010 Relevance Feedback Track (Notebook). In TREC, 2010.

[5]

O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. JMLR, 14:1--24, 2011.

[6]

X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In WSDM, pages 193--202, 2013.

Digital Library

[7]

A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 28(1):20--28, 1979.

[8]

J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 2000.

[9]

Y. Ganjisaffar, R. Caruana, and C. V. Lopes. Bagging gradient-boosted trees for high precision, low variance ranking models. SIGIR, pages 85--94, 2011.

Digital Library

[10]

M. Hosseini, I. J. Cox, N. Milic-Frayling, G. Kazai, and V. Vinay. On aggregating labels from multiple crowd workers to infer relevance of documents. In ECIR, 2012.

Digital Library

[11]

V. E. Johnson. On bayesian analysis of multirater ordinal data: An application to automated essay grading. Journal of the American Statistical Association, 91(433):42--51, 1996.

[12]

Chao L. and Y.-M. Wang. Truelabel confusions: A spectrum of probabilistic models in analyzing multiple ratings. In ICML, pages 225--232, 2012.

[13]

B. Lakshminarayanan and Y. W. Teh. Inferring ground truth from multi-annotator ordinal data: A probabilistic approach. arXiv:1305.0015, 2013.

[14]

Q. Liu, J. Peng, and A. T Ihler. Variational inference for crowdsourcing. In NIPS, pages 692--700. 2012.

Digital Library

[15]

G. N. Masters. A rasch model for partial credit scoring. Psychometrika, 47:149--174, 1982.

[16]

P. Metrikov, J. Wu, J. Anderton, V. Pavlu, and J. A. Aslam. A modification of lambdamart to handle noisy crowdsourced assessments. In ICTIR, 2013.

Digital Library

[17]

Paul Mineiro. Ordered values and mechanical turk. http://www.machinedlearnings.com, 2011.

[18]

S. Niu, Y. Lan, J. Guo, X. Cheng, L. Yu, and G. Long. Listwise approach for rank aggregation in crowdsourcing. In WSDM, pages 253--262, 2015.

Digital Library

[19]

W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes 3rd Edition: The Art of Scientific Computing. 3 edition, 2007.

Digital Library

[20]

V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. J. Mach. Learn. Res., 11:1297--1322, 2010.

Digital Library

[21]

S. Rogers, M. Girolami, and T. Polajnar. Semi-parametric analysis of multi-rater data. Statistics and Computing, 20(3):317--334, 2010.

Digital Library

[22]

V. Sheng, F. Provost, and P. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In KDD, 2008.

Digital Library

[23]

M. Smucker, G. Kazai, and M. Lease. Overview of the TREC 2013 Crowdsourcing Track. In TREC, 2014.

[24]

M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In CIKM, 2007.

Digital Library

[25]

J. S. Uebersax and W. M. Grove. A latent trait finite mixture model for the analysis of rating agreement. In Biometrics, December 1993.

[26]

M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi. Community-based bayesian aggregation models for crowdsourcing. In WWW, 2014.

Digital Library

[27]

Maksims N. Volkovs and Richard S. Zemel. New learning methods for supervised and unsupervised preference aggregation. JMLR, 15:1135--1176, 2014.

Digital Library

[28]

T. P. Waterhouse. Pay by the bit: An information-theoretic metric for collective human judgment. In CSCW, pages 623--638, 2013.

Digital Library

[29]

J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. R. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, pages 2035--2043, 2009.

Digital Library

[30]

D. Zhou, Q. Liu, J. C. Platt, and C. Meek. Aggregating ordinal labels from crowds by minimax conditional entropy. In ICML, 2014.

Digital Library

[31]

D. Zhou, J. Platt, S. Basu, and Y. Mao. Learning from the wisdom of crowds by minimax entropy. In NIPS, 2012.

Digital Library

Cited By

Yin LLiu YZhang WYu Y(2020)Truth Inference With a Deep Clustering-Based Aggregation ModelIEEE Access10.1109/ACCESS.2020.29644848(16662-16675)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2964484
Yin LHan JZhang WYu Y(2017)Aggregating crowd wisdoms with label-aware autoencodersProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3171642.3171830(1325-1331)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3171642.3171830
Ruiz AMartinez OBinefa XSukno F(2017)Fusion of Valence and Arousal Annotations through Dynamic Subjective Ordinal Modelling2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)10.1109/FG.2017.48(331-338)Online publication date: 30-May-2017
https://dl.acm.org/doi/10.1109/FG.2017.48

Index Terms

Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model
1. Information systems
  1. Information retrieval

Recommendations

Learning to rank with groups
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

An essential issue in document retrieval is ranking, and the documents are ranked by their expected relevance to a given query. Multiple labels are used to represent different level of relevance for documents to a given query, and the corresponding ...
Label Aggregation with Clustering for Biased Crowdsourced Labeling
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing

With the rapid development of crowdsourcing learning, amount of label aggregation methods are proposed to infer the true labels of instances from multiple noisy labels provided by inexpert crowd workers. Most of the label aggregation methods take the ...
Compression-Based Selective Sampling for Learning to Rank
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can be later used to rank new query results. These training sets are very costly and laborious to produce, requiring human annotators to assess the relevance or ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

October 2015

1998 pages

ISBN:9781450337946

DOI:10.1145/2806416

General Chairs:
James Bailey
The University of Melbourne
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Charu C. Aggarwal
IBM
,
Maarten de Rijke
University of Amsterdam
,
Ravi Kumar
Google
,
Vanessa Murdock
Microsoft
,
Timos Sellis
RMIT University
,
Jeffrey Xu Yu
Chinese University of Hong Kong

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

CIKM'15

Sponsor:

CIKM'15: 24th ACM International Conference on Information and Knowledge Management

October 18 - 23, 2015

Melbourne, Australia

Acceptance Rates

CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
279
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yin LLiu YZhang WYu Y(2020)Truth Inference With a Deep Clustering-Based Aggregation ModelIEEE Access10.1109/ACCESS.2020.29644848(16662-16675)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2964484
Yin LHan JZhang WYu Y(2017)Aggregating crowd wisdoms with label-aware autoencodersProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3171642.3171830(1325-1331)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3171642.3171830
Ruiz AMartinez OBinefa XSukno F(2017)Fusion of Valence and Arousal Annotations through Dynamic Subjective Ordinal Modelling2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)10.1109/FG.2017.48(331-338)Online publication date: 30-May-2017
https://dl.acm.org/doi/10.1109/FG.2017.48

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents