research-article

Multi-Task Learning with Sequential Dependence Toward Industrial Applications: A Systematic Formulation

Authors:

Li MaAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 5

Article No.: 107, Pages 1 - 29

https://doi.org/10.1145/3640468

Published: 28 February 2024 Publication History

Abstract

Multi-task learning (MTL) is widely used in the online recommendation and financial services for multi-step conversion estimation, but current works often overlook the sequential dependence among tasks. In particular, sequential dependence multi-task learning (SDMTL) faces challenges in dealing with complex task correlations and extracting valuable information in real-world scenarios, leading to negative transfer and a deterioration in the performance. Herein, a systematic learning paradigm of the SDMTL problem is established for the first time, which applies to more general multi-step conversion scenarios with longer conversion paths or various task dependence relationships. Meanwhile, an SDMTL architecture, named Task-Aware Feature Extraction (TAFE), is designed to enable the dynamic task representation learning from a sample-wise view. TAFE selectively reconstructs the implicit shared information corresponding to each sample case and performs the explicit task-specific extraction under dependence constraints, which can avoid the negative transfer, resulting in more effective information sharing and joint representation learning. Extensive experiment results demonstrate the effectiveness and applicability of the proposed theoretical and implementation frameworks. Furthermore, the online evaluations at MYbank showed that TAFE had an average increase of 9.22% and 3.76% in various scenarios on the post-view click-through & conversion rate (CTCVR) estimation task. Currently, TAFE is deployed in an online platform to provide various traffic services.

References

[1]

Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A large-scale video classification benchmark. arXiv:1609.08675 [cs.CV] (2016).

[2]

Md. Shad Akhtar, Dushyant Singh Chauhan, and Asif Ekbal. 2020. A deep multi-task contextual attention framework for multi-modal affect analysis. ACM Transactions on Knowledge Discovery from Data 14, 3 (2020), 1–27.

Digital Library

[3]

Yancheng Bai, Yongqiang Zhang, Mingli Ding, and Bernard Ghanem. 2018. SOD-MTGAN: Small object detection via multi-task generative adversarial network. In Proceedings of the European Conference on Computer Vision (ECCV ’18). 206–221.

Digital Library

[4]

Jonathan Baxter. 1997. A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning 28, 1 (1997), 7–39.

Digital Library

[5]

Hakan Bilen and Andrea Vedaldi. 2016. Integrated perception with recurrent multi-task neural networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016. 235–243.

[6]

Felix J. S. Bragman, Ryutaro Tanno, Sebastien Ourselin, Daniel C. Alexander, and Jorge Cardoso. 2019. Stochastic filter groups for multi-task CNNs: Learning specialist and generalist convolution kernels. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1385–1394.

[7]

Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Michael I. Jordan. 2018. Partial transfer learning with selective adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2724–2732.

[8]

R. Caruana. 1993. Multitask learning: A knowledge-based source of inductive bias. In Proceedings of the 10th International Conference on Machine Learning. 41–48.

[9]

Rich Caruana. 1997. Multitask learning. Machine Learning 28, 1 (1997), 41–75.

Digital Library

[10]

Ling Chen, Donghui Chen, Fan Yang, and Jianling Sun. 2021. Neural episodic control. In A Deep Multi-Task Representation Learning Method for Time Series Classification and Retrieval. Information Sciences, 17–32.

[11]

Michael Crawshaw. 2020. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796 (2020).

[12]

Lixin Duan, Dong Xu, and Shih-Fu Chang. 2012. Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1338–1345.

[13]

Liang Ge, Jing Gao, Hung Ngo, Kang Li, and Aidong Zhang. 2014. On handling negative transfer and imbalanced distributions in multiple source transfer learning. Statistical Analysis and Data Mining: The ASA Data Science Journal 7, 4 (2014), 254–271.

Digital Library

[14]

Tiankai Gu, Kun Kuang, Hong Zhu, Jingjie Li, Zhenhua Dong, Wenjie Hu, Zhenguo Li, Xiuqiang He, and Yue Liu. 2021. Estimating true post-click conversion via group-stratified counterfactual inference. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’21).

[15]

Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Ed. H. Chi, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, and Linchan Hong. 2021. DSelect-k: Differentiable selection in the mixture of experts with applications to multi-task learning. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS ’21). 29335–29347.

[16]

Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Ed. H. Chi, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, and Linchan Hong. 2021. DSelect-k: Differentiable selection in the mixture of experts with applications to multi-task learning. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS ’21). 29335–29347.

[17]

Junshi Huang, Rogério Schmidt Feris, Qiang Chen, and Shuicheng Yan. 2015. Cross-domain image retrieval with a dual attribute-aware ranking network. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV ’15). 1062–1070.

Digital Library

[18]

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. Adaptive mixtures of local experts. Neural Computation 3, 1 (1991), 79–87.

[19]

Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with Gumbel-Softmax. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17): Conference Track Proceedings.

[20]

Adrián Javaloy and Isabel Valera. 2021. RotoGrad: Gradient homogenization in multitask learning. arXiv preprint arXiv:2103.02631 (2021).

[21]

Junguang Jiang, Baixu Chen, Junwei Pan, Ximei Wang, Liu Dapeng, Jie Jiang, and Mingsheng Long. 2023. ForkMerge: Overcoming negative transfer in multi-task learning. arXiv preprint arXiv:2301.12618 (2023).

[22]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30 (2017), 3146–3154.

Digital Library

[23]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[24]

Juho Lee, Yoonho Lee, Jungtaek Kim, Adam Kosiorek, Seungjin Choi, and Yee Whye Teh. 2019. Set Transformer: A framework for attention-based permutation-invariant neural networks. In Proceedings of the International Conference on Machine Learning. 3744–3753.

[25]

P. M. Lerman. 1980. Fitting segmented regression models by grid search. Journal of the Royal Statistical Society: Series C (Applied Statistics) 29, 1 (1980), 77–84.

[26]

Pengcheng Li, Runze Li, Qing Da, An-Xiang Zeng, and Lijun Zhang. 2020. Improving multi-scenario learning to rank in e-commerce by exploiting task relationships in the label space. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 2605–2612.

Digital Library

[27]

Yaojin Lin, Qinghua Hu, Jinghua Liu, Xingquan Zhu, and Xindong Wu. 2021. MULFE: Multi-label learning via label-specific feature space ensemble. ACM Transactions on Knowledge Discovery from Data 16, 1 (2021), 1–24.

[28]

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. 2021. Conflict-averse gradient descent for multi-task learning. Advances in Neural Information Processing Systems 34 (2021), 18878–18890.

[29]

Shengchao Liu, Yingyu Liang, and Anthony Gitter. 2019. Loss-balanced task weighting to reduce negative transfer in multi-task learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9977–9978.

Digital Library

[30]

Zhi Liu, Yang Chen, Feng Xia, Jixin Bian, Bing Zhu, Guojiang Shen, and Xiangjie Kong. 2023. TAP: Traffic accident profiling via multi-task spatio-temporal graph representation learning. ACM Transactions on Knowledge Discovery from Data 17, 4 (Feb. 2023), Article 56, 25 pages. DOI:

Digital Library

[31]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1930–1939.

Digital Library

[32]

Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. 2018. Entire space multi-task model: An effective approach for estimating post-click conversion rate. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 1137–1140.

Digital Library

[33]

Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. 2017. The concrete distribution: A continuous relaxation of discrete random variables. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17): Conference Track Proceedings.

[34]

Reem A. Mahmoud and Hazem Hajj. 2022. Multi-objective learning to overcome catastrophic forgetting in time-series applications. ACM Transactions on Knowledge Discovery from Data 16, 6 (2022), 1–20.

Digital Library

[35]

Aakarsh Malhotra, Mayank Vatsa, and Richa Singh. 2023. Dropped scheduled task: Mitigating negative transfer in multi-task learning using dynamic task dropping. Transactions on Machine Learning Research. Early Access January 28, 2023. https://openreview.net/forum?id=myjAVQrRxS

[36]

Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3994–4003.

[37]

Conor O’Brien, Kin Sum Liu, James Neufeld, Rafael Barreto, and Jonathan J. Hunt. 2021. An analysis of entire space multi-task models for post-click conversion prediction. In Proceedings of the 15th ACM Conference on Recommender Systems. 613–619.

Digital Library

[38]

Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345–1359.

Digital Library

[39]

Stavros Petridis, Zuwei Li, and Maja Pantic. 2017. End-to-end visual speech recognition with LSTMs. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’17). IEEE, 2592–2596.

Digital Library

[40]

Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adria Puigdomenech Badia, Oriol Vinyals, Demis Hassabis, Daan Wierstra, and Charles Blundell. 2017. Neural episodic control. In Proceedings of the International Conference on Machine Learning. 2827–2836.

[41]

Michael T. Rosenstein, Zvika Marx, Leslie Pack Kaelbling, and Thomas G. Dietterich. 2005. To transfer or not to transfer. In Proceedings of the NIPS 2005 Workshop on Transfer Learning, Vol. 898.

[42]

Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard. 2019. Latent multi-task architecture learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4822–4829.

Digital Library

[43]

Victor Sanh, Thomas Wolf, and Sebastian Ruder. 2019. A hierarchical multi-task approach for learning embeddings from semantic tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6949–6956.

Digital Library

[44]

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V. Le, Geoffrey E. Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17): Conference Track Proceedings.

[45]

Jiayi Shen, Xiantong Zhen, Marcel Worring, and Ling Shao. 2021. Variational multi-task learning with Gumbel-Softmax priors. Advances in Neural Information Processing Systems 34 (2021), 21031–21042.

[46]

Anders Søgaard and Yoav Goldberg. 2016. Deep multi-task learning with low level tasks supervised at lower layers. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 231–235.

[47]

Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (PLE): A novel multi-task learning (MTL) model for personalized recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems. 269–278.

Digital Library

[48]

Simon Vandenhende, Stamatios Georgoulis, Marc Proesmans, Dengxin Dai, and Luc Van Gool. 2020. Revisiting multi-task learning in the deep learning era. arXiv preprint arXiv:2004.13379 2 (2020).

[49]

Hao Wang, Tai-Wei Chang, Tianqiao Liu, Jianmin Huang, Zhichao Chen, Chao Yu, Ruopeng Li, and Wei Chu. 2022. \(\text{ESCM}^2\) : Entire space counterfactual multi-task model for post-click conversion rate estimation. arXiv preprint arXiv:2204.05125 (2022).

[50]

Yuhao Wang, Ha Tsz Lam, Yi Wong, Ziru Liu, Xiangyu Zhao, Yichao Wang, Bo Chen, Huifeng Guo, and Ruiming Tang. 2023. Multi-task deep recommender systems: A survey. arXiv preprint arXiv:2302.03525 (2023).

[51]

Hong Wen, Jing Zhang, Fuyu Lv, Wentian Bao, Tianyi Wang, and Zulong Chen. 2021. Hierarchically modeling micro and macro behaviors via multi-task learning for conversion rate prediction. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2187–2191.

Digital Library

[52]

Hong Wen, Jing Zhang, Yuan Wang, Fuyu Lv, Wentian Bao, Quan Lin, and Keping Yang. 2020. Entire space multi-task modeling via post-click behavior decomposition for conversion rate prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2377–2386.

Digital Library

[53]

Dongbo Xi, Zhen Chen, Peng Yan, Yinger Zhang, Yongchun Zhu, Fuzhen Zhuang, and Yu Chen. 2021. Modeling the sequential dependence among audience multi-step conversions with multi-task learning in targeted display advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3745–3755.

Digital Library

[54]

Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, and Tomas Pfister. 2020. Differentiable top-k with optimal transport. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 (NeurIPS ’20).

[55]

Ya Xue, Xuejun Liao, Lawrence Carin, and Balaji Krishnapuram. 2007. Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research 8, 2 (2007), 35–63.

Digital Library

[56]

Lijun Zhang, Qizheng Yang, Xiao Liu, and Hui Guan. 2022. Rethinking hard-parameter sharing in multi-domain learning. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME ’22). IEEE, 1–6.

[57]

Wen Zhang, Lingfei Deng, and Dongrui Wu. 2020. Overcoming negative transfer: A survey. arXiv:2009.00909 (2020).

[58]

Wen Zhang, Lingfei Deng, Lei Zhang, and Dongrui Wu. 2022. A survey on negative transfer. IEEE/CAA Journal of Automatica Sinica. Early Access, November 3, 2022.

[59]

Yu Zhang and Qiang Yang. 2021. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering 34, 12 (2021), 5586–5609.

[60]

Jiejie Zhao, Bowen Du, Leilei Sun, Fuzhen Zhuang, Weifeng Lv, and Hui Xiong. 2019. Multiple relational attention network for multi-task learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1123–1131.

Digital Library

[61]

Jie Zhou, Qian Yu, Chuan Luo, and Jing Zhang. 2023. Feature decomposition for reducing negative transfer: A novel multi-task learning method for recommender system. arXiv:2302.05031 (2023).

Index Terms

Multi-Task Learning with Sequential Dependence Toward Industrial Applications: A Systematic Formulation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
2. Information systems
  1. Information systems applications
    1. Computational advertising

Recommendations

Focused multi-task learning in a Gaussian process framework

Multi-task learning, learning of a set of tasks together, can improve performance in the individual learning tasks. Gaussian process models have been applied to learning a set of tasks on different data sets, by constructing joint priors for functions ...
Focused multi-task learning using gaussian processes
ECML PKDD'11: Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II

Given a learning task for a data set, learning it together with related tasks (data sets) can improve performance. Gaussian process models have been applied to such multi-task learning scenarios, based on joint priors for functions underlying the tasks. ...
Focused multi-task learning using Gaussian processes
ECMLPKDD'11: Proceedings of the 2011th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Given a learning task for a data set, learning it together with related tasks (data sets) can improve performance. Gaussian process models have been applied to such multi-task learning scenarios, based on joint priors for functions underlying the tasks. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 18, Issue 5

June 2024

699 pages

EISSN:1556-472X

DOI:10.1145/3613659

Editor:
Jian Pei
Duke University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2024

Online AM: 12 January 2024

Accepted: 03 January 2024

Revised: 21 September 2023

Received: 14 April 2023

Published in TKDD Volume 18, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
285
Total Downloads

Downloads (Last 12 months)285
Downloads (Last 6 weeks)22

Reflects downloads up to 11 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents