Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3534678.3539055acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

An Online Multi-task Learning Framework for Google Feed Ads Auction Models

Published: 14 August 2022 Publication History

Abstract

In this paper, we introduce a large scale online multi-task deep learning framework for modeling multiple feed ads auction prediction tasks on an industry-scale feed ads recommendation platform. Multiple prediction tasks are combined into one single model which is continuously trained on real time new ads data. Multi-tasking ads auction models in real-time faces many real-world challenges. For example, each task may be trained on different set of training data; the labels of different tasks may have different arrival time due to label delay; different tasks will interact with each other; combining the losses of each task is non-trivial. We tackle these challenges using practical and novel techniques such as multi-stage training for handling label delay, Multi-gate Mixture-of-Experts (MMoE) to optimize model interaction and an auto-parameter learning algorithm to optimize the loss weights of different tasks. We demonstrate that our proposed techniques can lead to quality improvements and substantial resource saving compared to modeling each single task independently.

Supplemental Material

MP4 File
We proposed a large scale online multi-task learning framework for feed ads auction and addressed the fundamental changes unique to this online multi-tasking framework for online advertising with novel and practical techniques. We tackle the challenges using practical and novel techniques such as multi-stage training for handling different label delay of different ads tasks, Multi-gate Mixture-of-Experts (MMoE) to optimize model interaction and an auto-parameter learning algorithm to optimize the loss weights of different ads tasks. These techniques lead to a series of successful launches on a real industry-scale feed ads platform. This multi-task framework can also be scaled to take more ads auction tasks. This paves the way to build a practical online multi-task learning framework for online advertising.

References

[1]
Ashwinkumar Badanidiyuru, Andrew Evdokimov, Vinodh Krishnan, Pan Li, Wynn Vonnegut, and Jayden Wang. 2021. Handling many conversions per click in modeling delayed. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Virtual Event, Singapore). ACM, 9 pages. http://papers.adkdd.org/2021/papers/adkdd21-badanidivuruhandling. pdf
[2]
Rich Caruana. 1997. Multitask learning. Machine Learning 28 (1997), 41--75.
[3]
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. In Proceedings of the 35th International Conference on Machine Learning (Stockholm, Sweden).
[4]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, and Mustafa Ispir. 2016. Wide and deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender system (Boston, MA, USA). ACM. https://doi.org/10.1145/2988450.2988454
[5]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (Helsinki, Finland). ACM, 160--167. https://doi.org/10.1145/1390156.1390177
[6]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (Boston MA, USA). IEEE. https://doi.org/10.1145/2959100. 2959190
[7]
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Halifax, NS, Canada). ACM, 1487--1495. https: //doi.org/10.1145/3097983.3098043
[8]
Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task Learning Using Uncertainty toWeigh Losses for Scene Geometry and Semantics. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Salt Lake City, UT, USA). IEEE. https://doi.org/10.1109/CVPR.2018.00781
[9]
Xi Lin, Hui-Ling Zhen, Zhenhua Li, Qingfu Zhang, and Sam Kwong. 2019. Pareto multi-task learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (Vancouver Canada). ACM, 12060--12070. https: //doi.org/10.5555/3454287.3455367
[10]
Mingsheng Long, Zhangjie Cao, Jianmin Wang, and Philip S. Yu. 2017. Learning multiple tasks with multilinear relationship networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, CA, USA). ACM, 1593--1602. https://doi.org/10.5555/3294771.3294923
[11]
Jiasen Lu, Vedanuj Goswami, Marcus Rohrbach, Devi Parikh, and Stefan Lee. 2020. 12-in-1: Multi-Task Vision and Language Representation Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Seattle, WA, USA). IEEE. https://doi.org/10.1109/CVPR42600.2020.01045
[12]
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-ofexperts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining (London, United Kingdom). ACM, 1930--1939. https://doi.org/10.1145/3219819.3220007
[13]
H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. 2013. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (Chicago, IL, USA). IEEE. https://doi.org/10.1145/ 2487575.2488200
[14]
Ozan Sener and Vladlen Koltun. 2018. Multi-task learning as multi-objective optimization. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada). ACM, 525--536. https: //doi.org/10.5555/3326943.3326992
[15]
Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. 2020. Which Tasks Should Be Learned Together in Multi-task Learning?. In Proceedings of the 37th International Conference on Machine Learning (ICML '20).
[16]
Simon Vandenhende1, Stamatios Georgoulis, Bert De Brabandere, and Luc Van Gool. 2020. Branched Multi-Task Networks:Deciding What Layers To Share. In The 31st British Machine Vision Virtual Conference.
[17]
Ernest Wang. 2020. How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads. https://medium.com/pinterest-engineering/how-weuse- automl-multi-task-learning-and-multi-tower-models-for-pinterest-adsdb966c3dc99e
[18]
RuoxiWang, Bin Fu, Gang Fu, and MingliangWang. 2017. Deep & Cross Network for Ad Click Predictions. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Halifax, NS, Canada). ACM, 1487--1495. https://doi.org/10.1145/3124749.3124754
[19]
RuoxiWang, Rakesh Shivanna, Derek Cheng, Sagar Jain, Dong Lin, Lichan Hong, and Ed Chi. 2021. DCN V2: Improved Deep Cross Network and Practical Lessons forWeb-scale Learning to Rank Systems. In Proceedings of theWeb Conference 2021 (Ljubljana, Slovenia). ACM, 1785--1797. https://doi.org/10.1145/3442381.3450078
[20]
Yuyan Wang, Zhe Zhao, Bo Dai, Christopher Fifty, Dong Lin, Lichan Hong, and Ed H. Chi. 2020. Small Towers Make Big Differences. https://arxiv.org/pdf/2008. 05808.pdf
[21]
Tianhe Yu, Saurabh Kumar, Abhishek Gupta, and Sergey Levine. 2020. Gradient Surgery for Multi-Task Learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Virtual Conference).
[22]
Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems (Copenhagen, Denmark). ACM, 43--51. https://doi.org/10.1145/3298689.3346997

Cited By

View all
  • (2024)Self-supervised cognitive learning for multifaced interest in large-scale industrial recommender systemsInformation Sciences10.1016/j.ins.2024.121338(121338)Online publication date: Aug-2024
  • (2023)Consumer’s Attitude towards Display Google AdsFuture Internet10.3390/fi1504014515:4(145)Online publication date: 7-Apr-2023
  • (2023)Workshop on Learning and Evaluating Recommendations with Impressions (LERI)Proceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608756(1248-1251)Online publication date: 14-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Check for updates

Author Tags

  1. computational advertising
  2. multi-task learning
  3. online advertising
  4. recommender systems

Qualifiers

  • Research-article

Conference

KDD '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,222
  • Downloads (Last 6 weeks)143
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Self-supervised cognitive learning for multifaced interest in large-scale industrial recommender systemsInformation Sciences10.1016/j.ins.2024.121338(121338)Online publication date: Aug-2024
  • (2023)Consumer’s Attitude towards Display Google AdsFuture Internet10.3390/fi1504014515:4(145)Online publication date: 7-Apr-2023
  • (2023)Workshop on Learning and Evaluating Recommendations with Impressions (LERI)Proceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608756(1248-1251)Online publication date: 14-Sep-2023
  • (2023)Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in TaobaoProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615496(4574-4580)Online publication date: 21-Oct-2023
  • (2023)COPR: Consistency-Oriented Pre-Ranking for Online AdvertisingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615465(4974-4980)Online publication date: 21-Oct-2023
  • (2023)Multi-domain Recommendation with Embedding Disentangling and Domain AlignmentProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614977(1917-1927)Online publication date: 21-Oct-2023
  • (2023)Optimizing Airbnb Search Journey with Multi-task LearningProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599881(4872-4881)Online publication date: 6-Aug-2023
  • (2023)Entity-aware Multi-task Learning for Query Understanding at WalmartProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599816(4733-4742)Online publication date: 6-Aug-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media