Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Data Augmentation for Sample Efficient and Robust Document Ranking

Published: 29 April 2024 Publication History

Abstract

Contextual ranking models have delivered impressive performance improvements over classical models in the document ranking task. However, these highly over-parameterized models tend to be data-hungry and require large amounts of data even for fine-tuning. In this article, we propose data-augmentation methods for effective and robust ranking performance. One of the key benefits of using data augmentation is in achieving sample efficiency or learning effectively when we have only a small amount of training data. We propose supervised and unsupervised data augmentation schemes by creating training data using parts of the relevant documents in the query-document pairs. We then adapt a family of contrastive losses for the document ranking task that can exploit the augmented data to learn an effective ranking model. Our extensive experiments on subsets of the MS MARCO and TREC-DL test sets show that data augmentation, along with the ranking-adapted contrastive losses, results in performance improvements under most dataset sizes. Apart from sample efficiency, we conclusively show that data augmentation results in robust models when transferred to out-of-domain benchmarks. Our performance improvements in in-domain and more prominently in out-of-domain benchmarks show that augmentation regularizes the ranking model and improves its robustness and generalization capability.

References

[1]
Shivani Agarwal and Michael Collins. 2010. Maximum margin ranking algorithms for information retrieval. In Proceedings of the Advances in Information Retrieval: 32nd European Conference on IR Research, ECIR 2010, Milton Keynes, UK, March 28-31, 2010. Proceedings 32. Springer, 332–343.
[2]
Abhijit Anand, Jurek Leonhardt, Koustav Rudra, and Avishek Anand. 2022. Supervised contrastive learning approach for contextual ranking. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval. 61–71.
[3]
Paheli Bhattacharya, Kripabandhu Ghosh, Saptarshi Ghosh, Arindam Pal, Parth Mehta, Arnab Bhattacharya, and Prasenjit Majumder. 2019. FIRE 2019 AILA track: Artificial intelligence for legal assistance. In Proceedings of the 11th Annual Meeting of the Forum for Information Retrieval Evaluation. 4–6.
[4]
Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, and Rodrigo Nogueira. 2022. Inpars: Unsupervised dataset generation for information retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2387–2392.
[5]
Kiran Butt and Abid Hussain. 2021. Evaluation of scholarly information retrieval using precision and recall. Library Philosophy and Practice (2021), 1–11.
[6]
Wei-Cheng Chang, X. Yu Felix, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. 2020. Pre-training tasks for embedding-based large-scale retrieval. In 8th International Conference on Learning Representations (ICLR’20) Addis Ababa, Ethiopia, April 26-30, 2020. https://openreview.net/forum?id=rkg-mA4FDr
[7]
Pengguang Chen, Shu Liu, Hengshuang Zhao, and Jiaya Jia. 2020. Gridmask data augmentation. arXiv:2001.04086. Retrieved from https://arxiv.org/abs/cs/001.04086
[8]
Yanda Chen, Chris Kedzie, Suraj Nair, Petra Galuščáková, Rui Zhang, Douglas W. Oard, and Kathleen Mckeown. 2021. Cross-language sentence selection via data augmentation and rationale training. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 3881–3895.
[9]
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2019. TREC-2019-Deep-Learning. Retrieved from https://microsoft.github.io/TREC-2019-Deep-Learning/. (2019).
[10]
Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 113–123.
[11]
Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with contextual neural language modeling. In Proceedings of the ACM SIGIR’19. 985–988.
[12]
Zhuyun Dai, Vincent Y. Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton Bakalov, Kelvin Guu, Keith Hall, and Ming-Wei Chang. 2022. Promptagator: Few-shot dense retrieval from 8 examples. In The Eleventh International Conference on Learning Representations. https://openreview.net/pdf?id=gmL46YMpu2J
[13]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. [n. d.]. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). 4171.
[14]
Jeff Donahue and Karen Simonyan. 2019. Large scale adversarial representation learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 10542–10552.
[15]
Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2021. SPLADE: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2288–2292.
[16]
Luke Gallagher. 2019. Pairwise t-test on TREC Run Files. Retrieved from https://github.com/lgrz/pairwise-ttest/. (2019).
[17]
Luyu Gao and Jamie Callan. 2021. Condenser: A pre-training architecture for dense retrieval. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 981–993. DOI:DOI:
[18]
Luyu Gao and Jamie Callan. 2022. Unsupervised corpus aware language model pre-training for dense passage retrieval. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 2843–2853. DOI:
[19]
Xiang Gao, Ripon K. Saha, Mukul R. Prasad, and Abhik Roychoudhury. 2020. Fuzz testing based data augmentation to improve robustness of deep neural networks. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1147–1158.
[20]
Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised Representation Learning by Predicting Image Rotations. In 6th International Conference on Learning Representations (ICLR’18), Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https://openreview.net/forum?id=S1v4N2l0-
[21]
Jacob Goldberger, Sam T. Roweis, Geoffrey E. Hinton, and Ruslan Salakhutdinov. 2004. Neighbourhood components analysis. In Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, (NIPS’04, December 13-18, 2004, Vancouver, British Columbia, Canada). 513–520.
[22]
Beliz Gunel, Jingfei Du, Alexis Conneau, and Veselin Stoyanov. 2021. Supervised contrastive learning for Pre-trained language model fine-tuning. In 9th International Conference on Learning Representations (ICLR’21). Virtual Event, Austria. https://openreview.net/forum?id=cu7IUiOhujH
[23]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729–9738.
[24]
R. Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. In 7th International Conference on Learning Representations (ICLR’19), New Orleans, LA.
[25]
Sebastian Hofstätter, Hamed Zamani, Bhaskar Mitra, Nick Craswell, and Allan Hanbury. 2020. Local self-attention over long text for efficient document retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021–2024.
[26]
Sebastian Hofstätter, Markus Zlabinger, and Allan Hanbury. 2020. Interpretable & time-budget-constrained contextual- ization for re-ranking. In ECAI 2020 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain-Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS’20). IOS Press, 1–8.
[27]
Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. 2021. A survey on contrastive self-supervised learning. Technologies 9, 1 (2021), 2.
[28]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6769–6781.
[29]
Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 39–48.
[30]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems. 18661–18673.
[31]
Varun Kumar, Ashutosh Choudhary, and Eunah Cho. 2020. Data augmentation using Pre-trained transformer models. In Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems. 18–26.
[32]
Carlos Lassance, Hervé Dejean, and Stéphane Clinchant. 2023. An experimental study on pretraining transformers from scratch for IR. In Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part I. 504–520.
[33]
Jurek Leonhardt, Koustav Rudra, and Avishek Anand. 2023. Extractive explanations for interpretable text ranking. ACM Transactions on Information Systems 41, 4 (2023), 1–31.
[34]
Jurek Leonhardt, Koustav Rudra, Megha Khosla, Abhijit Anand, and Avishek Anand. 2022. Efficient neural ranking using forward indexes. In WWW’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, 266–276.
[35]
Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, and Yingfei Sun. 2023. PARADE: Passage representation aggrega-tion for document reranking. ACM Transactions on Information Systems 42, 2 (2023), 1–26.
[36]
Minghan Li, Diana Nicoleta Popa, Johan Chagnon, Yagmur Gizem Cinar, and Eric Gaussier. 2023. The power of selecting key blocks with local pre-ranking for long document information retrieval. ACM Transactions on Information Systems 41, 3 (2023), 1–35.
[37]
Yizhi Li, Zhenghao Liu, Chenyan Xiong, and Zhiyuan Liu. 2021. More robust dense retrieval with contrastive dual learning. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval. 287–296.
[38]
Yijiang Lian, Zhenjun You, Fan Wu, Wenqiang Liu, and Jing Jia. 2020. Retrieve synonymous keywords for frequent queries in sponsored search in a data augmentation way. arXiv:2008.01969. Retrieved from https://arxiv.org/abs/cs/2008.01969
[39]
Jimmy Lin. 2019. The neural hype and comparisons against weak baselines. In ACM SIGIR Forum, Vol. 52. ACM, 40–51.
[40]
Erik Lindgren, Sashank Reddi, Ruiqi Guo, and Sanjiv Kumar. 2021. Efficient training of retrieval models using negative cache. In Advances in Neural Information Processing Systems. M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 4134–4146. Retrieved from https://proceedings.neurips.cc/paper/2021/file/2175f8c5cd9604f6b1e576b252d4c86e-Paper.pdf
[41]
Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-margin softmax loss for convolutional neural networks. In Proceedings of the ICML, Vol. 2. 7.
[42]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https://arxiv.org/abs/cs/1907.11692
[43]
Shayne Longpre, Yu Wang, and Chris DuBois. 2020. How effective is task-agnostic data augmentation for pretrained transformers? In Findings of the Association for Computational Linguistics: (EMNLP’20). 4401–4411.
[44]
Xueguang Ma, Xinyu Zhang, Ronak Pradeep, and Jimmy Lin. 2023. Zero-shot listwise document reranking with a large language model. arXiv:2305.02156. Retrieved from https://arxiv.org/abs/cs/2305.02156
[45]
Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1101–1104.
[46]
John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. 2020. TextAttack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 119–126.
[47]
Markus Mühling, Nikolaus Korfhage, Kader Pustu-Iren, Joanna Bars, Mario Knapp, Hicham Bellafkir, Markus Vogelbacher, Daniel Schneider, Angelika Hörth, Ralph Ewerth, and Bernd Freisleben. 2022. VIVA: Visual information retrieval in video archives. International Journal on Digital Libraries 23, 4 (2022), 319–333.
[48]
Thai-Son Nguyen, Sebastian Stueker, Jan Niehues, and Alex Waibel. 2020. Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7689–7693.
[49]
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT. arXiv:1901.04085. Retrieved from https://arxiv.org/abs/cs/1901.04085
[50]
Helmi Satria Nugraha and Suyanto Suyanto. 2019. Typographic-based data augmentation to improve a question retrieval in short dialogue system. In Proceedings of the 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). IEEE, 44–49.
[51]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv:1807.03748. Retrieved from https://arxiv.org/abs/cs/1807.03748
[52]
Baolin Peng, Chenguang Zhu, Michael Zeng, and Jianfeng Gao. 2021. Data augmentation for spoken language understanding via pretrained language models. In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association. 1219–1223.
[53]
Libo Qin, Minheng Ni, Yue Zhang, and Wanxiang Che. 2021. CoSDA-ML: Multi-lingual code-switching data augmentation for zero-shot cross-lingual NLP. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. 3853–3860.
[54]
Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, and Michael Freisleben. 2023. Large language models are effective text rankers with pairwise ranking prompting. arXiv:2306.17563. Retrieved from https://arxiv.org/abs/cs/2306.17563
[55]
Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Wayne Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. 2021. RocketQA: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 5835–5847. DOI:DOI:
[56]
Roberta Raileanu, Maxwell Goldstein, Denis Yarats, Ilya Kostrikov, and Rob Fergus. 2021. Automatic Data augmentation for generalization in reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 5402–5415.
[57]
Arij Riabi, Thomas Scialom, Rachel Keraron, Benoît Sagot, Djamé Seddah, and Jacopo Staiano. 2021. Synthetic data augmentation for zero-shot cross-lingual question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 7016–7030.
[58]
Koustav Rudra and Avishek Anand. 2020. Distant supervision in BERT-based adhoc document retrieval. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2197–2200.
[59]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv:1910.01108. Retrieved from https://arxiv.org/abs/cs/1910.01108
[60]
Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of Big Data 6, 1 (2019), 1–48.
[61]
Connor Shorten, Taghi M Khoshgoftaar, and Borko Furht. 2021. Text data augmentation for deep learning. Journal of Big Data 8, 1 (2021), 1–34.
[62]
Jaspreet Singh, Wolfgang Nejdl, and Avishek Anand. 2016. History by diversity: Helping historians search news archives. In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval. 183–192.
[63]
Kihyuk Sohn. 2016. Improved deep metric learning with multi-class n-pair loss objective. In Advances in Neural Information Processing Systems 29, Annual Conference on Neural Information Processing Systems. 1857–1865.
[64]
Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, S. Yu Philip, and Lifang He. 2020. Mixup-transformer: Dynamic data augmentation for NLP tasks. In Proceedings of the 28th International Conference on Computational Linguistics. 3436–3440.
[65]
Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Investigating large language models as re-ranking agents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, (EMNLP’23). 14918–14937.
[66]
Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou. 2016. Improved representation learning for question answer matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 464–473. DOI:DOI:
[67]
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). Retrieved from https://openreview.net/forum?id=wCu6T5xFjeJ
[68]
Hoang Van, Vikas Yadav, and Mihai Surdeanu. 2021. Cheap and good? simple and effective data augmentation for low resource machine reading. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2116–2120.
[69]
Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the International Conference on Machine Learning. PMLR, 9929–9939.
[70]
Mikołaj Wieczorek, Barbara Rychalska, and Jacek Dąbrowski. 2021. On the unreasonable effectiveness of centroids in image retrieval. In Proceedings of the International Conference on Neural Information Processing. Springer, 212–223.
[71]
Zhirong Wu, Yuanjun Xiong, Stella X. Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3733–3742.
[72]
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In 9th International Conference on Learning Representations (ICLR’21), Virtual Event, Austria. https://openreview.net/forum?id=zeFrfgyZln
[73]
Nan Yang, Furu Wei, Binxing Jiao, Daxing Jiang, and Linjun Yang. 2021. xmoco: Cross momentum contrastive learning for open-domain question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 6120–6129.
[74]
Wei Yang, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. 2019. Data augmentation for bert fine-tuning in open-domain question answering. arXiv:1904.06652. Retrieved from https://arxiv.org/abs/cs/1904.06652
[75]
Yinfei Yang, Ning Jin, Kuo Lin, Mandy Guo, and Daniel Cer. 2021. Neural retrieval for question answering with cross-attention supervised data augmentation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 263–268.
[76]
Liang Yao, Baosong Yang, Haibo Zhang, Boxing Chen, and Weihua Luo. 2020. Domain transfer based data augmentation for neural query translation. In Proceedings of the 28th International Conference on Computational Linguistics. 4521–4533.
[77]
Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. Cross-domain modeling of sentence-level evidence for document retrieval. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3490–3496.
[78]
Yi Zeng, Han Qiu, Gerard Memmi, and Meikang Qiu. 2020. A data augmentation-based defense method against adversarial attacks in neural networks. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing. Springer, 274–289.
[79]
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma. 2021. Optimizing Dense Retrieval Model Training with Hard Negatives. Association for Computing Machinery, New York, NY, 1503–1512. DOI:
[80]
Xingyu Zhang, Tong Xiao, Yidong Chen, and Qun Liu. 2021. Text augmentation for neural machine translation: A review. arXiv:2103.09065. Retrieved from https://arxiv.org/abs/cs/2103.09065
[81]
Zijian Zhang, Koustav Rudra, and Avishek Anand. 2021. Explain and predict, and then predict again. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 418–426.
[82]
Zhilu Zhang and Mert R Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS).
[83]
Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 13001–13008.
[84]
Qingqing Zhu, Xiwei Wang, Chen Chen, and Junfei Liu. 2020. Data augmentation for retrieval-and generation-based dialog systems. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC). IEEE, 1716–1720.
[85]
Yutao Zhu, Jian-Yun Nie, Zhicheng Dou, Zhengyi Ma, Xinyu Zhang, Pan Du, Xiaochen Zuo, and Hao Jiang. 2021. Contrastive learning of user behavior sequence for context-aware document ranking. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM ’21). Association for Computing Machinery, New York, NY, 2780–2791. DOI:

Cited By

View all
  • (2023)An in-depth analysis of passage-level label transfer for contextual document rankingInformation Retrieval10.1007/s10791-023-09430-526:1-2Online publication date: 8-Dec-2023

Index Terms

  1. Data Augmentation for Sample Efficient and Robust Document Ranking

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 42, Issue 5
    September 2024
    809 pages
    EISSN:1558-2868
    DOI:10.1145/3618083
    • Editor:
    • Min Zhang
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 April 2024
    Online AM: 29 November 2023
    Accepted: 20 November 2023
    Revised: 15 August 2023
    Received: 01 March 2023
    Published in TOIS Volume 42, Issue 5

    Check for updates

    Author Tags

    1. Information retrieval
    2. IR
    3. ranking
    4. document ranking
    5. contrastive loss
    6. data augmentation
    7. interpolation
    8. ranking performance

    Qualifiers

    • Research-article

    Funding Sources

    • European Union - Horizon 2020 Program
    • Integrating Activities for Advanced Communities
    • SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics
    • Science and Engineering Research Board, Department of Science and Technology, Government of India
    • DST-INSPIRE Faculty Fellowship

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)362
    • Downloads (Last 6 weeks)26
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)An in-depth analysis of passage-level label transfer for contextual document rankingInformation Retrieval10.1007/s10791-023-09430-526:1-2Online publication date: 8-Dec-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media