research-article

Open access

An Online Multi-task Learning Framework for Google Feed Ads Auction Models

Authors:

Derek Zhiyuan Cheng,

Kishor BarmanAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3477 - 3485

https://doi.org/10.1145/3534678.3539055

Published: 14 August 2022 Publication History

Abstract

In this paper, we introduce a large scale online multi-task deep learning framework for modeling multiple feed ads auction prediction tasks on an industry-scale feed ads recommendation platform. Multiple prediction tasks are combined into one single model which is continuously trained on real time new ads data. Multi-tasking ads auction models in real-time faces many real-world challenges. For example, each task may be trained on different set of training data; the labels of different tasks may have different arrival time due to label delay; different tasks will interact with each other; combining the losses of each task is non-trivial. We tackle these challenges using practical and novel techniques such as multi-stage training for handling label delay, Multi-gate Mixture-of-Experts (MMoE) to optimize model interaction and an auto-parameter learning algorithm to optimize the loss weights of different tasks. We demonstrate that our proposed techniques can lead to quality improvements and substantial resource saving compared to modeling each single task independently.

Supplemental Material

MP4 File

We proposed a large scale online multi-task learning framework for feed ads auction and addressed the fundamental changes unique to this online multi-tasking framework for online advertising with novel and practical techniques. We tackle the challenges using practical and novel techniques such as multi-stage training for handling different label delay of different ads tasks, Multi-gate Mixture-of-Experts (MMoE) to optimize model interaction and an auto-parameter learning algorithm to optimize the loss weights of different ads tasks. These techniques lead to a series of successful launches on a real industry-scale feed ads platform. This multi-task framework can also be scaled to take more ads auction tasks. This paves the way to build a practical online multi-task learning framework for online advertising.

Download
71.23 MB

References

[1]

Ashwinkumar Badanidiyuru, Andrew Evdokimov, Vinodh Krishnan, Pan Li, Wynn Vonnegut, and Jayden Wang. 2021. Handling many conversions per click in modeling delayed. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Virtual Event, Singapore). ACM, 9 pages. http://papers.adkdd.org/2021/papers/adkdd21-badanidivuruhandling. pdf

[2]

Rich Caruana. 1997. Multitask learning. Machine Learning 28 (1997), 41--75.

Digital Library

[3]

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. In Proceedings of the 35th International Conference on Machine Learning (Stockholm, Sweden).

[4]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, and Mustafa Ispir. 2016. Wide and deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender system (Boston, MA, USA). ACM. https://doi.org/10.1145/2988450.2988454

Digital Library

[5]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (Helsinki, Finland). ACM, 160--167. https://doi.org/10.1145/1390156.1390177

Digital Library

[6]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (Boston MA, USA). IEEE. https://doi.org/10.1145/2959100. 2959190

Digital Library

[7]

Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Halifax, NS, Canada). ACM, 1487--1495. https: //doi.org/10.1145/3097983.3098043

Digital Library

[8]

Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task Learning Using Uncertainty toWeigh Losses for Scene Geometry and Semantics. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Salt Lake City, UT, USA). IEEE. https://doi.org/10.1109/CVPR.2018.00781

[9]

Xi Lin, Hui-Ling Zhen, Zhenhua Li, Qingfu Zhang, and Sam Kwong. 2019. Pareto multi-task learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (Vancouver Canada). ACM, 12060--12070. https: //doi.org/10.5555/3454287.3455367

Digital Library

[10]

Mingsheng Long, Zhangjie Cao, Jianmin Wang, and Philip S. Yu. 2017. Learning multiple tasks with multilinear relationship networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, CA, USA). ACM, 1593--1602. https://doi.org/10.5555/3294771.3294923

Digital Library

[11]

Jiasen Lu, Vedanuj Goswami, Marcus Rohrbach, Devi Parikh, and Stefan Lee. 2020. 12-in-1: Multi-Task Vision and Language Representation Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Seattle, WA, USA). IEEE. https://doi.org/10.1109/CVPR42600.2020.01045

[12]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-ofexperts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining (London, United Kingdom). ACM, 1930--1939. https://doi.org/10.1145/3219819.3220007

Digital Library

[13]

H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. 2013. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (Chicago, IL, USA). IEEE. https://doi.org/10.1145/ 2487575.2488200

Digital Library

[14]

Ozan Sener and Vladlen Koltun. 2018. Multi-task learning as multi-objective optimization. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada). ACM, 525--536. https: //doi.org/10.5555/3326943.3326992

Digital Library

[15]

Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. 2020. Which Tasks Should Be Learned Together in Multi-task Learning?. In Proceedings of the 37th International Conference on Machine Learning (ICML '20).

Digital Library

[16]

Simon Vandenhende1, Stamatios Georgoulis, Bert De Brabandere, and Luc Van Gool. 2020. Branched Multi-Task Networks:Deciding What Layers To Share. In The 31st British Machine Vision Virtual Conference.

[17]

Ernest Wang. 2020. How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads. https://medium.com/pinterest-engineering/how-weuse- automl-multi-task-learning-and-multi-tower-models-for-pinterest-adsdb966c3dc99e

[18]

RuoxiWang, Bin Fu, Gang Fu, and MingliangWang. 2017. Deep & Cross Network for Ad Click Predictions. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Halifax, NS, Canada). ACM, 1487--1495. https://doi.org/10.1145/3124749.3124754

Digital Library

[19]

RuoxiWang, Rakesh Shivanna, Derek Cheng, Sagar Jain, Dong Lin, Lichan Hong, and Ed Chi. 2021. DCN V2: Improved Deep Cross Network and Practical Lessons forWeb-scale Learning to Rank Systems. In Proceedings of theWeb Conference 2021 (Ljubljana, Slovenia). ACM, 1785--1797. https://doi.org/10.1145/3442381.3450078

Digital Library

[20]

Yuyan Wang, Zhe Zhao, Bo Dai, Christopher Fifty, Dong Lin, Lichan Hong, and Ed H. Chi. 2020. Small Towers Make Big Differences. https://arxiv.org/pdf/2008. 05808.pdf

[21]

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, and Sergey Levine. 2020. Gradient Surgery for Multi-Task Learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Virtual Conference).

[22]

Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems (Copenhagen, Denmark). ACM, 43--51. https://doi.org/10.1145/3298689.3346997

Digital Library

Cited By

Wang YZhang DWulamu A(2024)Self-supervised cognitive learning for multifaced interest in large-scale industrial recommender systemsInformation Sciences10.1016/j.ins.2024.121338(121338)Online publication date: Aug-2024
https://doi.org/10.1016/j.ins.2024.121338
Al Khasawneh MSharabati AAl-Haddad SAl-Daher RHammouri SShaqman S(2023)Consumer’s Attitude towards Display Google AdsFuture Internet10.3390/fi1504014515:4(145)Online publication date: 7-Apr-2023
https://doi.org/10.3390/fi15040145
Ferrari Dacrema MCastells PBasilico JCremonesi P(2023)Workshop on Learning and Evaluating Recommendations with Impressions (LERI)Proceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608756(1248-1251)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608756
Show More Cited By

Index Terms

An Online Multi-task Learning Framework for Google Feed Ads Auction Models
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
2. Information systems

Recommendations

A Framework to Harvest Page Views of Web for Banner Advertising
BDA 2015: Proceedings of the 4th International Conference on Big Data Analytics - Volume 9498

Online advertising provides an opportunity for product sellers and service providers to reach customers and has become a key factor in the growth of economy. It is a major source of revenue for the major search engine and social networking sites. Search ...
Kernel collaborative online algorithms for multi-task learning
Abstract
In many real time applications, we often have to deal with classification, regression or clustering problems that involve multiple tasks. The conventional machine learning approaches solve these tasks independently by ignoring the task ...
Online multi-task collaborative filtering for on-the-fly recommender systems
RecSys '13: Proceedings of the 7th ACM conference on Recommender systems

Traditional batch model-based Collaborative Filtering (CF) approaches typically assume a collection of users' rating data is given a priori for training the model. They suffer from a common yet critical drawback, i.e., the model has to be re-trained ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
3,590
Total Downloads

Downloads (Last 12 months)1,222
Downloads (Last 6 weeks)143

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang YZhang DWulamu A(2024)Self-supervised cognitive learning for multifaced interest in large-scale industrial recommender systemsInformation Sciences10.1016/j.ins.2024.121338(121338)Online publication date: Aug-2024
https://doi.org/10.1016/j.ins.2024.121338
Al Khasawneh MSharabati AAl-Haddad SAl-Daher RHammouri SShaqman S(2023)Consumer’s Attitude towards Display Google AdsFuture Internet10.3390/fi1504014515:4(145)Online publication date: 7-Apr-2023
https://doi.org/10.3390/fi15040145
Ferrari Dacrema MCastells PBasilico JCremonesi P(2023)Workshop on Learning and Evaluating Recommendations with Impressions (LERI)Proceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608756(1248-1251)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608756
Gao JHan SZhu HYang SJiang YXu JZheng BFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in TaobaoProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615496(4574-4580)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615496
Zhao ZGao JZhang YHan SLou SSheng XWang ZZhu HJiang YXu JZheng BFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)COPR: Consistency-Oriented Pre-Ranking for Online AdvertisingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615465(4974-4980)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615465
Ning WYan XLiu WCheng RZhang RTang BFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Multi-domain Recommendation with Embedding Disentangling and Domain AlignmentProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614977(1917-1927)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614977
Tan CChan AHaldar MTang JLiu XAbdool MGao HHe LKatariya SSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Optimizing Airbnb Search Journey with Multi-task LearningProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599881(4872-4881)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599881
Peng ZDave VMcNabb NSharnagat RMagnani ALiao CFang YRajanala SSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Entity-aware Multi-task Learning for Query Understanding at WalmartProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599816(4733-4742)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599816

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents