Scalable hands-free transfer learning for online advertising

B Dalessandro, D Chen, T Raeder, C Perlich… - Proceedings of the 20th …, 2014 - dl.acm.org
Proceedings of the 20th ACM SIGKDD international conference on Knowledge …, 2014dl.acm.org
Internet display advertising is a critical revenue source for publishers and online content
providers, and is supported by massive amounts of user and publisher data. Targeting
display ads can be improved substantially with machine learning methods, but building
many models on massive data becomes prohibitively expensive computationally. This paper
presents a combination of strategies, deployed by the online advertising firm Dstillery, for
learning many models from extremely high-dimensional data efficiently and without human …
Internet display advertising is a critical revenue source for publishers and online content providers, and is supported by massive amounts of user and publisher data. Targeting display ads can be improved substantially with machine learning methods, but building many models on massive data becomes prohibitively expensive computationally. This paper presents a combination of strategies, deployed by the online advertising firm Dstillery, for learning many models from extremely high-dimensional data efficiently and without human intervention. This combination includes: (i)~A method for simple-yet-effective transfer learning where a model learned from data that is relatively abundant and cheap is taken as a prior for Bayesian logistic regression trained with stochastic gradient descent (SGD) from the more expensive target data. (ii)~A new update rule for automatic learning rate adaptation, to support learning from sparse, high-dimensional data, as well as the integration with adaptive regularization. We present an experimental analysis across 100 different ad campaigns, showing that the transfer learning indeed improves performance across a large number of them, especially at the start of the campaigns. The combined "hands-free" method needs no fiddling with the SGD learning rate, and we show that it is just as effective as using expensive grid search to set the regularization parameter for each campaign.
ACM Digital Library