Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3234944.3234968acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

On the Theory of Weak Supervision for Information Retrieval

Published: 10 September 2018 Publication History

Abstract

Neural network approaches have recently shown to be effective in several information retrieval (IR) tasks. However, neural approaches often require large volumes of training data to perform effectively, which is not always available. To mitigate the shortage of labeled data, training neural IR models with weak supervision has been recently proposed and received considerable attention in the literature. In weak supervision, an existing model automatically generates labels for a large set of unlabeled data, and a machine learning model is further trained on the generated "weak" data. Surprisingly, it has been shown in prior art that the trained neural model can outperform the weak labeler by a significant margin. Although these obtained improvements have been intuitively justified in previous work, the literature still lacks theoretical justification for the observed empirical findings. In this paper, we provide a theoretical insight into weak supervision for information retrieval, focusing on learning to rank. We model the weak supervision signal as a noisy channel that introduces noise to the correct ranking. Based on the risk minimization framework, we prove that given some sufficient constraints on the loss function, weak supervision is equivalent to supervised learning under uniform noise. We also find an upper bound for the empirical risk of weak supervision in case of non-uniform noise. Following the recent work on using multiple weak supervision signals to learn more accurate models, we find an information theoretic lower bound on the number of weak supervision signals required to guarantee an upper bound for the pairwise error probability. We empirically verify a set of presented theoretical findings, using synthetic and real weak supervision data.

References

[1]
N. Asadi, D. Metzler, T. Elsayed, and J. Lin . 2011. Pseudo Test Collections for Learning Web Search Ranking Functions SIGIR '11. 1073--1082.
[2]
R. Attar and A. S. Fraenkel . 1977. Local Feedback in Full-Text Retrieval Systems. J. ACM, Vol. 24, 3 (1977), 397--417.
[3]
L. Azzopardi, M. de Rijke, and K. Balog . 2007. Building Simulated Queries for Known-item Topics: An Analysis Using Six European Languages SIGIR '07. 455--462.
[4]
Alan Joseph Bekker and Jacob Goldberger . 2016. Training deep neural-networks based on unreliable labels ICASSP '16. 2682--2686.
[5]
Avrim Blum and Tom Mitchell . 1998. Combining Labeled and Unlabeled Data with Co-training COLT' 98. 92--100.
[6]
Daniel Cohen and W. Bruce Croft . 2018. A Hybrid Embedding Approach to Noisy Answer Passage Retrieval ECIR '18. 127--140.
[7]
Gordon V. Cormack, Mark D. Smucker, and Charles L. Clarke . 2011. Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets. Inf. Retr., Vol. 14, 5 (Oct. . 2011), 441--465.
[8]
W. B. Croft and D. J. Harper . 1979. Using Probabilistic Models of Document Retrieval Without Relevance Information. J. Doc., Vol. 35, 4 (1979), 285--295.
[9]
W. Bruce Croft, Donald Metzler, and Trevor Strohman . 2009. Search Engines: Information Retrieval in Practice (bibinfoedition1st ed.). Addison-Wesley Publishing Company.
[10]
Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, and Bernhard Schölkopf . 2018. Fidelity-Weighted Learning. In ICLR '18.
[11]
M. Dehghani, A. Severyn, S. Rothe, and J. Kamps . 2017 a. Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision. CoRR Vol. abs/1711.00313 (2017).
[12]
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W. Bruce Croft . 2017 b. Neural Ranking Models with Weak Supervision. In SIGIR '17. 65--74.
[13]
Aritra Ghosh, Himanshu Kumar, and P. S. Sastry . 2017. Robust Loss Functions under Label Noise for Deep Neural Networks AAAI '18. 1919--1925.
[14]
Aritra Ghosh, Naresh Manwani, and P. S. Sastry . 2015. Making risk minimization tolerant to label noise. Neurocomputing Vol. 160 (2015), 93--107.
[15]
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft . 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval CIKM '16. 55--64.
[16]
Thorsten Joachims . 2002. Optimizing Search Engines Using Clickthrough Data. KDD '02. 133--142.
[17]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel . 2017. Unbiased Learning-to-Rank with Biased Feedback. In WSDM '17. 781--789.
[18]
Ashish Khetan, Zachary C. Lipton, and Anima Anandkumar . 2018. Learning From Noisy Singly-labeled Data. In ICLR '18.
[19]
Diederik P. Kingma and Jimmy Ba . 2015. Adam: A Method for Stochastic Optimization. In ICLR '15.
[20]
Victor Lavrenko and W. Bruce Croft . 2001. Relevance Based Language Models. In SIGIR '01. 120--127.
[21]
Gideon S. Mann and Andrew McCallum . 2010. Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data. J. Mach. Learn. Res. Vol. 11 (2010), 955--984.
[22]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean . 2013. Distributed Representations of Words and Phrases and their Compositionality NIPS '13. 3111--3119.
[23]
Yifan Nie, Alessandro Sordoni, and Jian-Yun Nie . 2018. Multi-level Abstraction Convolutional Model with Weak Supervision for Information Retrieval SIGIR '18. 985--988.
[24]
Greg Pass, Abdur Chowdhury, and Cayley Torgeson . 2006. A Picture of Search InfoScale '06.
[25]
Jeffrey Pennington, Richard Socher, and Christopher Manning . 2014. GloVe: Global Vectors for Word Representation. EMNLP '14. 1532--1543.
[26]
Jay M. Ponte and W. Bruce Croft . 1998. A Language Modeling Approach to Information Retrieval SIGIR '98. 275--281.
[27]
David Rolnick, Andreas Veit, Serge J. Belongie, and Nir Shavit . 2017. Deep Learning is Robust to Massive Label Noise. CoRR Vol. abs/1705.10694 (2017).
[28]
Robert E. Schapire and Yoav Freund . 2012. Boosting: Foundations and Algorithms. The MIT Press.
[29]
Nikos Voskarides, Edgar Meij, Ridho Reinanda, Abhinav Khaitan, Miles Osborne, Giorgio Stefanoni, Prabhanjan Kambadur, and Maarten de Rijke . 2018. Weakly-supervised Contextualization of Knowledge Graph Facts SIGIR '18. 765--774.
[30]
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power . 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling SIGIR '17. 55--64.
[31]
Hamed Zamani, Michael Bendersky, Xuanhui Wang, and Mingyang Zhang . 2017. Situational Context for Ranking in Personal Search WWW '17. 1531--1540.
[32]
Hamed Zamani and W. Bruce Croft . 2017. Relevance-based Word Embedding. In SIGIR '17. 505--514.
[33]
Hamed Zamani, W. Bruce Croft, and J. Shane Culpepper . 2018 a. Neural Query Performance Prediction Using Weak Supervision from Multiple Signals SIGIR '18. 105--114.
[34]
Hamed Zamani, Mostafa Dehghani, Fernando Diaz, Hang Li, and Nick Craswell . 2018 b. SIGIR 2018 Workshop on Learning from Limited or Noisy Data for Information Retrieval SIGIR '18. 1439--1440.
[35]
Chengxiang Zhai and John Lafferty . 2001. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval SIGIR '01. 334--342.

Cited By

View all
  • (2024)Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and ChallengesACM Computing Surveys10.1145/364847156:7(1-33)Online publication date: 14-Feb-2024
  • (2024)Generalized Weak Supervision for Neural Information RetrievalACM Transactions on Information Systems10.1145/364763942:5(1-26)Online publication date: 27-Apr-2024
  • (2024)Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documentsArtificial Intelligence and Law10.1007/s10506-023-09388-1Online publication date: 15-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICTIR '18: Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval
September 2018
238 pages
ISBN:9781450356565
DOI:10.1145/3234944
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 September 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information theory
  2. learning to rank
  3. noisy data
  4. risk minimization
  5. theoretical analysis
  6. weak supervision

Qualifiers

  • Research-article

Conference

ICTIR '18
Sponsor:

Acceptance Rates

ICTIR '18 Paper Acceptance Rate 19 of 47 submissions, 40%;
Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and ChallengesACM Computing Surveys10.1145/364847156:7(1-33)Online publication date: 14-Feb-2024
  • (2024)Generalized Weak Supervision for Neural Information RetrievalACM Transactions on Information Systems10.1145/364763942:5(1-26)Online publication date: 27-Apr-2024
  • (2024)Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documentsArtificial Intelligence and Law10.1007/s10506-023-09388-1Online publication date: 15-Feb-2024
  • (2023)Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question AnsweringProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605137(169-176)Online publication date: 9-Aug-2023
  • (2023)Dense Retrieval Adaptation using Target Domain DescriptionProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605127(95-104)Online publication date: 9-Aug-2023
  • (2023)Leveraging Large Language Models and Weak Supervision for Social Media Data Annotation: An Evaluation Using COVID-19 Self-reported Vaccination TweetsHCI International 2023 – Late Breaking Papers10.1007/978-3-031-48044-7_26(356-366)Online publication date: 23-Jul-2023
  • (2022)Stochastic Retrieval-Conditioned RerankingProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545141(81-91)Online publication date: 23-Aug-2022
  • (2022)SenatusProceedings of the 19th International Conference on Mining Software Repositories10.1145/3524842.3527947(511-523)Online publication date: 23-May-2022
  • (2022)Retrieval-Enhanced Machine LearningProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531722(2875-2886)Online publication date: 6-Jul-2022
  • (2022)Machine learning approach for the search of resonances with topological features at the Large Hadron ColliderInternational Journal of Modern Physics A10.1142/S0217751X2150241937:03Online publication date: 4-Feb-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media