Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multi-Task Learning with Sequential Dependence Toward Industrial Applications: A Systematic Formulation

Published: 28 February 2024 Publication History

Abstract

Multi-task learning (MTL) is widely used in the online recommendation and financial services for multi-step conversion estimation, but current works often overlook the sequential dependence among tasks. In particular, sequential dependence multi-task learning (SDMTL) faces challenges in dealing with complex task correlations and extracting valuable information in real-world scenarios, leading to negative transfer and a deterioration in the performance. Herein, a systematic learning paradigm of the SDMTL problem is established for the first time, which applies to more general multi-step conversion scenarios with longer conversion paths or various task dependence relationships. Meanwhile, an SDMTL architecture, named Task-Aware Feature Extraction (TAFE), is designed to enable the dynamic task representation learning from a sample-wise view. TAFE selectively reconstructs the implicit shared information corresponding to each sample case and performs the explicit task-specific extraction under dependence constraints, which can avoid the negative transfer, resulting in more effective information sharing and joint representation learning. Extensive experiment results demonstrate the effectiveness and applicability of the proposed theoretical and implementation frameworks. Furthermore, the online evaluations at MYbank showed that TAFE had an average increase of 9.22% and 3.76% in various scenarios on the post-view click-through & conversion rate (CTCVR) estimation task. Currently, TAFE is deployed in an online platform to provide various traffic services.

References

[1]
Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A large-scale video classification benchmark. arXiv:1609.08675 [cs.CV] (2016).
[2]
Md. Shad Akhtar, Dushyant Singh Chauhan, and Asif Ekbal. 2020. A deep multi-task contextual attention framework for multi-modal affect analysis. ACM Transactions on Knowledge Discovery from Data 14, 3 (2020), 1–27.
[3]
Yancheng Bai, Yongqiang Zhang, Mingli Ding, and Bernard Ghanem. 2018. SOD-MTGAN: Small object detection via multi-task generative adversarial network. In Proceedings of the European Conference on Computer Vision (ECCV ’18). 206–221.
[4]
Jonathan Baxter. 1997. A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning 28, 1 (1997), 7–39.
[5]
Hakan Bilen and Andrea Vedaldi. 2016. Integrated perception with recurrent multi-task neural networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016. 235–243.
[6]
Felix J. S. Bragman, Ryutaro Tanno, Sebastien Ourselin, Daniel C. Alexander, and Jorge Cardoso. 2019. Stochastic filter groups for multi-task CNNs: Learning specialist and generalist convolution kernels. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1385–1394.
[7]
Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Michael I. Jordan. 2018. Partial transfer learning with selective adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2724–2732.
[8]
R. Caruana. 1993. Multitask learning: A knowledge-based source of inductive bias. In Proceedings of the 10th International Conference on Machine Learning. 41–48.
[9]
Rich Caruana. 1997. Multitask learning. Machine Learning 28, 1 (1997), 41–75.
[10]
Ling Chen, Donghui Chen, Fan Yang, and Jianling Sun. 2021. Neural episodic control. In A Deep Multi-Task Representation Learning Method for Time Series Classification and Retrieval. Information Sciences, 17–32.
[11]
Michael Crawshaw. 2020. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796 (2020).
[12]
Lixin Duan, Dong Xu, and Shih-Fu Chang. 2012. Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1338–1345.
[13]
Liang Ge, Jing Gao, Hung Ngo, Kang Li, and Aidong Zhang. 2014. On handling negative transfer and imbalanced distributions in multiple source transfer learning. Statistical Analysis and Data Mining: The ASA Data Science Journal 7, 4 (2014), 254–271.
[14]
Tiankai Gu, Kun Kuang, Hong Zhu, Jingjie Li, Zhenhua Dong, Wenjie Hu, Zhenguo Li, Xiuqiang He, and Yue Liu. 2021. Estimating true post-click conversion via group-stratified counterfactual inference. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’21).
[15]
Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Ed. H. Chi, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, and Linchan Hong. 2021. DSelect-k: Differentiable selection in the mixture of experts with applications to multi-task learning. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS ’21). 29335–29347.
[16]
Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Ed. H. Chi, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, and Linchan Hong. 2021. DSelect-k: Differentiable selection in the mixture of experts with applications to multi-task learning. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS ’21). 29335–29347.
[17]
Junshi Huang, Rogério Schmidt Feris, Qiang Chen, and Shuicheng Yan. 2015. Cross-domain image retrieval with a dual attribute-aware ranking network. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV ’15). 1062–1070.
[18]
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. Adaptive mixtures of local experts. Neural Computation 3, 1 (1991), 79–87.
[19]
Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with Gumbel-Softmax. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17): Conference Track Proceedings.
[20]
Adrián Javaloy and Isabel Valera. 2021. RotoGrad: Gradient homogenization in multitask learning. arXiv preprint arXiv:2103.02631 (2021).
[21]
Junguang Jiang, Baixu Chen, Junwei Pan, Ximei Wang, Liu Dapeng, Jie Jiang, and Mingsheng Long. 2023. ForkMerge: Overcoming negative transfer in multi-task learning. arXiv preprint arXiv:2301.12618 (2023).
[22]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30 (2017), 3146–3154.
[23]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[24]
Juho Lee, Yoonho Lee, Jungtaek Kim, Adam Kosiorek, Seungjin Choi, and Yee Whye Teh. 2019. Set Transformer: A framework for attention-based permutation-invariant neural networks. In Proceedings of the International Conference on Machine Learning. 3744–3753.
[25]
P. M. Lerman. 1980. Fitting segmented regression models by grid search. Journal of the Royal Statistical Society: Series C (Applied Statistics) 29, 1 (1980), 77–84.
[26]
Pengcheng Li, Runze Li, Qing Da, An-Xiang Zeng, and Lijun Zhang. 2020. Improving multi-scenario learning to rank in e-commerce by exploiting task relationships in the label space. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 2605–2612.
[27]
Yaojin Lin, Qinghua Hu, Jinghua Liu, Xingquan Zhu, and Xindong Wu. 2021. MULFE: Multi-label learning via label-specific feature space ensemble. ACM Transactions on Knowledge Discovery from Data 16, 1 (2021), 1–24.
[28]
Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. 2021. Conflict-averse gradient descent for multi-task learning. Advances in Neural Information Processing Systems 34 (2021), 18878–18890.
[29]
Shengchao Liu, Yingyu Liang, and Anthony Gitter. 2019. Loss-balanced task weighting to reduce negative transfer in multi-task learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9977–9978.
[30]
Zhi Liu, Yang Chen, Feng Xia, Jixin Bian, Bing Zhu, Guojiang Shen, and Xiangjie Kong. 2023. TAP: Traffic accident profiling via multi-task spatio-temporal graph representation learning. ACM Transactions on Knowledge Discovery from Data 17, 4 (Feb. 2023), Article 56, 25 pages. DOI:
[31]
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1930–1939.
[32]
Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. 2018. Entire space multi-task model: An effective approach for estimating post-click conversion rate. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 1137–1140.
[33]
Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. 2017. The concrete distribution: A continuous relaxation of discrete random variables. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17): Conference Track Proceedings.
[34]
Reem A. Mahmoud and Hazem Hajj. 2022. Multi-objective learning to overcome catastrophic forgetting in time-series applications. ACM Transactions on Knowledge Discovery from Data 16, 6 (2022), 1–20.
[35]
Aakarsh Malhotra, Mayank Vatsa, and Richa Singh. 2023. Dropped scheduled task: Mitigating negative transfer in multi-task learning using dynamic task dropping. Transactions on Machine Learning Research. Early Access January 28, 2023. https://openreview.net/forum?id=myjAVQrRxS
[36]
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3994–4003.
[37]
Conor O’Brien, Kin Sum Liu, James Neufeld, Rafael Barreto, and Jonathan J. Hunt. 2021. An analysis of entire space multi-task models for post-click conversion prediction. In Proceedings of the 15th ACM Conference on Recommender Systems. 613–619.
[38]
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345–1359.
[39]
Stavros Petridis, Zuwei Li, and Maja Pantic. 2017. End-to-end visual speech recognition with LSTMs. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’17). IEEE, 2592–2596.
[40]
Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adria Puigdomenech Badia, Oriol Vinyals, Demis Hassabis, Daan Wierstra, and Charles Blundell. 2017. Neural episodic control. In Proceedings of the International Conference on Machine Learning. 2827–2836.
[41]
Michael T. Rosenstein, Zvika Marx, Leslie Pack Kaelbling, and Thomas G. Dietterich. 2005. To transfer or not to transfer. In Proceedings of the NIPS 2005 Workshop on Transfer Learning, Vol. 898.
[42]
Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard. 2019. Latent multi-task architecture learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4822–4829.
[43]
Victor Sanh, Thomas Wolf, and Sebastian Ruder. 2019. A hierarchical multi-task approach for learning embeddings from semantic tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6949–6956.
[44]
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V. Le, Geoffrey E. Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17): Conference Track Proceedings.
[45]
Jiayi Shen, Xiantong Zhen, Marcel Worring, and Ling Shao. 2021. Variational multi-task learning with Gumbel-Softmax priors. Advances in Neural Information Processing Systems 34 (2021), 21031–21042.
[46]
Anders Søgaard and Yoav Goldberg. 2016. Deep multi-task learning with low level tasks supervised at lower layers. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 231–235.
[47]
Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (PLE): A novel multi-task learning (MTL) model for personalized recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems. 269–278.
[48]
Simon Vandenhende, Stamatios Georgoulis, Marc Proesmans, Dengxin Dai, and Luc Van Gool. 2020. Revisiting multi-task learning in the deep learning era. arXiv preprint arXiv:2004.13379 2 (2020).
[49]
Hao Wang, Tai-Wei Chang, Tianqiao Liu, Jianmin Huang, Zhichao Chen, Chao Yu, Ruopeng Li, and Wei Chu. 2022. \(\text{ESCM}^2\) : Entire space counterfactual multi-task model for post-click conversion rate estimation. arXiv preprint arXiv:2204.05125 (2022).
[50]
Yuhao Wang, Ha Tsz Lam, Yi Wong, Ziru Liu, Xiangyu Zhao, Yichao Wang, Bo Chen, Huifeng Guo, and Ruiming Tang. 2023. Multi-task deep recommender systems: A survey. arXiv preprint arXiv:2302.03525 (2023).
[51]
Hong Wen, Jing Zhang, Fuyu Lv, Wentian Bao, Tianyi Wang, and Zulong Chen. 2021. Hierarchically modeling micro and macro behaviors via multi-task learning for conversion rate prediction. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2187–2191.
[52]
Hong Wen, Jing Zhang, Yuan Wang, Fuyu Lv, Wentian Bao, Quan Lin, and Keping Yang. 2020. Entire space multi-task modeling via post-click behavior decomposition for conversion rate prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2377–2386.
[53]
Dongbo Xi, Zhen Chen, Peng Yan, Yinger Zhang, Yongchun Zhu, Fuzhen Zhuang, and Yu Chen. 2021. Modeling the sequential dependence among audience multi-step conversions with multi-task learning in targeted display advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3745–3755.
[54]
Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, and Tomas Pfister. 2020. Differentiable top-k with optimal transport. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 (NeurIPS ’20).
[55]
Ya Xue, Xuejun Liao, Lawrence Carin, and Balaji Krishnapuram. 2007. Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research 8, 2 (2007), 35–63.
[56]
Lijun Zhang, Qizheng Yang, Xiao Liu, and Hui Guan. 2022. Rethinking hard-parameter sharing in multi-domain learning. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME ’22). IEEE, 1–6.
[57]
Wen Zhang, Lingfei Deng, and Dongrui Wu. 2020. Overcoming negative transfer: A survey. arXiv:2009.00909 (2020).
[58]
Wen Zhang, Lingfei Deng, Lei Zhang, and Dongrui Wu. 2022. A survey on negative transfer. IEEE/CAA Journal of Automatica Sinica. Early Access, November 3, 2022.
[59]
Yu Zhang and Qiang Yang. 2021. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering 34, 12 (2021), 5586–5609.
[60]
Jiejie Zhao, Bowen Du, Leilei Sun, Fuzhen Zhuang, Weifeng Lv, and Hui Xiong. 2019. Multiple relational attention network for multi-task learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1123–1131.
[61]
Jie Zhou, Qian Yu, Chuan Luo, and Jing Zhang. 2023. Feature decomposition for reducing negative transfer: A novel multi-task learning method for recommender system. arXiv:2302.05031 (2023).

Index Terms

  1. Multi-Task Learning with Sequential Dependence Toward Industrial Applications: A Systematic Formulation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 5
      June 2024
      699 pages
      EISSN:1556-472X
      DOI:10.1145/3613659
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 February 2024
      Online AM: 12 January 2024
      Accepted: 03 January 2024
      Revised: 21 September 2023
      Received: 14 April 2023
      Published in TKDD Volume 18, Issue 5

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Multi-task learning
      2. sequential dependency
      3. negative transfer
      4. industrial applications

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 285
        Total Downloads
      • Downloads (Last 12 months)285
      • Downloads (Last 6 weeks)22
      Reflects downloads up to 11 Sep 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media