Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3512081acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

UKD: Debiasing Conversion Rate Estimation via Uncertainty-regularized Knowledge Distillation

Published: 25 April 2022 Publication History

Abstract

In online advertising, conventional post-click conversion rate (CVR) estimation models are trained using clicked samples. However, during online serving the models need to estimate for all impression ads, leading to the sample selection bias (SSB) issue. Intuitively, providing reliable supervision signals for unclicked ads is a feasible way to alleviate the SSB issue. This paper proposes an uncertainty-regularized knowledge distillation (UKD) framework to debias CVR estimation via distilling knowledge from unclicked ads. A teacher model learns click-adaptive representations and produces pseudo-conversion labels on unclicked ads as supervision signals. Then a student model is trained on both clicked and unclicked ads with knowledge distillation, performing uncertainty modeling to alleviate the inherent noise in pseudo-labels. Experiments on billion-scale datasets show that UKD outperforms previous debiasing methods. Online results verify that UKD achieves significant improvements.

References

[1]
Elias Bareinboim, Jin Tian, and Judea Pearl. 2014. Recovering from selection bias in causal and statistical inference. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.
[2]
Jiawei Chen, Hande Dong, Yang Qiu, Xiangnan He, Xin Xin, Liang Chen, Guli Lin, and Keping Yang. 2021. AutoDebias: Learning to Debias for Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 21–30.
[3]
Zhihong Chen, Rong Xiao, Chenliang Li, Gangfeng Ye, Haochuan Sun, and Hongbo Deng. 2020. ESAM: Discriminative domain adaptation with non-displayed items to improve long-tail performance. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 579–588.
[4]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.
[5]
Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning.
[6]
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning. 1050–1059.
[7]
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17, 1 (2016), 2096–2030.
[8]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence.
[9]
Siyuan Guo, Lixin Zou, Yiding Liu, Wenwen Ye, Suqi Cheng, Shuaiqiang Wang, Hechang Chen, Dawei Yin, and Yi Chang. 2021. Enhanced Doubly Robust Learning for Debiasing Post-click Conversion Rate Estimation. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[10]
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Proceedings of the 32nd Conference on Neural Information Processing Systems.
[11]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531(2015).
[12]
Yuchin Juan, Yong Zhuang, Wei-Sheng Chin, and Chih-Jen Lin. 2016. Field-aware factorization machines for CTR prediction. In Proceedings of the 10th ACM Conference on Recommender Systems. 43–50.
[13]
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2016. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 31st Conference on Neural Information Processing Systems.
[14]
Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, and Li-Jia Li. 2017. Learning from noisy labels with distillation. In Proceedings of the IEEE International Conference on Computer Vision. 1910–1918.
[15]
Dugang Liu, Pengxiang Cheng, Zhenhua Dong, Xiuqiang He, Weike Pan, and Zhong Ming. 2020. A general knowledge distillation framework for counterfactual recommendation via uniform data. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 831–840.
[16]
Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. 2018. Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval.
[17]
Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International Conference on Data Mining. IEEE, 995–1000.
[18]
Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web. 521–530.
[19]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550(2014).
[20]
Yuta Saito, Suguru Yaginuma, Yuta Nishino, Hayato Sakata, and Kazuhide Nakata. 2020. Unbiased recommender learning from missing-not-at-random implicit feedback. In Proceedings of the 13th International Conference on Web Search and Data Mining. 501–509.
[21]
Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as treatments: Debiasing learning and evaluation. In Proceedings of International Conference on Machine Learning. 1670–1679.
[22]
Tong Shen, Dong Gong, Wei Zhang, Chunhua Shen, and Tao Mei. 2019. Regularizing proxies with multi-adversarial training for unsupervised domain-adaptive semantic segmentation. arXiv preprint arXiv:1907.12282(2019).
[23]
Harald Steck. 2010. Training and testing of recommender systems on data missing not at random. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 713–722.
[24]
Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. 2017. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7167–7176.
[25]
Qi Wang, Zhihui Ji, Huasheng Liu, and Binqiang Zhao. 2019. Deep Bayesian Multi-Target Learning for Recommender Systems. arXiv preprint arXiv:1902.09154(2019).
[26]
Xiaojie Wang, Rui Zhang, Yu Sun, and Jianzhong Qi. 2019. Doubly robust joint learning for recommendation on data missing not at random. In International Conference on Machine Learning. PMLR, 6638–6647.
[27]
Penghui Wei, Weimin Zhang, Zixuan Xu, Shaoguo Liu, Kuang-chih Lee, and Bo Zheng. 2021. AutoHERI: Automated Hierarchical Representation Integration for Post-Click Conversion Rate Estimation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3528–3532.
[28]
Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4133–4141.
[29]
Bowen Yuan, Jui-Yang Hsia, Meng-Yuan Yang, Hong Zhu, Chih-Yao Chang, Zhenhua Dong, and Chih-Jen Lin. 2019. Improving ad click prediction by considering non-displayed events. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 329–338.
[30]
Bianca Zadrozny. 2004. Learning and evaluating classifiers under sample selection bias. In Proceedings of the 21st International Conference on Machine Learning. 114.
[31]
Wenhao Zhang, Wentian Bao, Xiao-Yang Liu, Keping Yang, Quan Lin, Hong Wen, and Ramin Ramezani. 2020. Large-scale Causal Approaches to Debiasing Post-click Conversion Rate Estimation with Multi-task Learning. In Proceedings of The Web Conference 2020. 2775–2781.
[32]
Zhedong Zheng and Yi Yang. 2021. Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. International Journal of Computer Vision 129, 4 (2021), 1106–1120.

Cited By

View all
  • (2024)Discrepancy and uncertainty aware denoising knowledge distillation for zero-shot cross-lingual named entity recognitionProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i16.29762(18056-18064)Online publication date: 20-Feb-2024
  • (2024)Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language ModelsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688118(507-517)Online publication date: 8-Oct-2024
  • (2024)Privacy Preserving Conversion Modeling in Data Clean RoomProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688054(819-822)Online publication date: 8-Oct-2024
  • Show More Cited By

Index Terms

  1. UKD: Debiasing Conversion Rate Estimation via Uncertainty-regularized Knowledge Distillation
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '22: Proceedings of the ACM Web Conference 2022
        April 2022
        3764 pages
        ISBN:9781450390965
        DOI:10.1145/3485447
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 April 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. CVR Estimation
        2. Debiasing
        3. Distillation
        4. Uncertainty Modeling

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '22
        Sponsor:
        WWW '22: The ACM Web Conference 2022
        April 25 - 29, 2022
        Virtual Event, Lyon, France

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)38
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 08 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Discrepancy and uncertainty aware denoising knowledge distillation for zero-shot cross-lingual named entity recognitionProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i16.29762(18056-18064)Online publication date: 20-Feb-2024
        • (2024)Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language ModelsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688118(507-517)Online publication date: 8-Oct-2024
        • (2024)Privacy Preserving Conversion Modeling in Data Clean RoomProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688054(819-822)Online publication date: 8-Oct-2024
        • (2024)SAQRec: Aligning Recommender Systems to User Satisfaction via Questionnaire FeedbackProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679643(3165-3175)Online publication date: 21-Oct-2024
        • (2024)DDPO: Direct Dual Propensity Optimization for Post-Click Conversion Rate EstimationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657817(1179-1188)Online publication date: 10-Jul-2024
        • (2024)Adversarial-Enhanced Causal Multi-Task Framework for Debiasing Post-Click Conversion Rate EstimationProceedings of the ACM Web Conference 202410.1145/3589334.3645379(3287-3296)Online publication date: 13-May-2024
        • (2024)Den-ML: Multi-source cross-lingual transfer via denoising mutual learningInformation Processing & Management10.1016/j.ipm.2024.10383461:6(103834)Online publication date: Nov-2024
        • (2024)Click-through conversion rate prediction model of book e-commerce platform based on feature combination and representationExpert Systems with Applications10.1016/j.eswa.2023.122276238(122276)Online publication date: Mar-2024
        • (2023)Entire Space Cascade Delayed Feedback Modeling for Effective Conversion Rate PredictionProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615475(4981-4987)Online publication date: 21-Oct-2023
        • (2023)Dually Enhanced Delayed Feedback Modeling for Streaming Conversion Rate PredictionProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614856(390-399)Online publication date: 21-Oct-2023
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media