research-article

Prototype Feature Extraction for Multi-task Learning

Authors:

Jie ZhangAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 2472 - 2481

https://doi.org/10.1145/3485447.3512119

Published: 25 April 2022 Publication History

Abstract

Multi-task learning (MTL) has been widely utilized in various industrial scenarios, such as recommender systems and search engines. MTL can improve learning efficiency and prediction accuracy by exploiting commonalities and differences across tasks. However, MTL is sensitive to relationships among tasks and may have performance degradation in real-world applications, because existing neural-based MTL models often share the same network structures and original input features. To address this issue, we propose a novel multi-task learning model based on Prototype Feature Extraction (PFE) to balance task-specific objectives and inter-task relationships. PFE is a novel component to disentangle features for multiple tasks. To better extract features from original inputs before gating networks, we introduce a new concept, namely prototype feature center, to disentangle features for multiple tasks. The extracted prototype features fuse various features from different tasks to better learn inter-task relationships. PFE updates prototype feature centers and prototype features iteratively. Our model utilizes the learned prototype features and task-specific experts for MTL. We implement PFE on two public datasets. Empirical results show that PFE outperforms state-of-the-art MTL models by extracting prototype features. Furthermore, we deploy PFE in a real-world recommender system (one of the world’s top-tier short video sharing platforms) to showcase that PFE can be widely applied in industrial scenarios.

References

[1]

Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saied Safaei, Elizabeth D Trippe, Juan B Gutierrez, and Krys Kochut. 2017. A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919(2017).

[2]

Lu Bai, Lixin Cui, Yuhang Jiao, Luca Rossi, and Edwin R. Hancock. 2022. Learning Backtrackless Aligned-Spatial Graph Convolutional Networks for Graph Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 2(2022), 783–798. https://doi.org/10.1109/TPAMI.2020.3011866

Digital Library

[3]

Lu Bai, Yuhang Jiao, Lixin Cui, Luca Rossi, Yue Wang, Philip Yu, and Edwin Hancock. 2021. Learning Graph Convolutional Networks based on Quantum Vertex Information Propagation. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. https://doi.org/10.1109/TKDE.2021.3106804

[4]

Trapit Bansal, David Belanger, and Andrew McCallum. 2016. Ask the gru: Multi-task learning for deep text recommendations. In proceedings of the 10th ACM Conference on Recommender Systems. 107–114.

Digital Library

[5]

Jiajun Bu, Xin Shen, Bin Xu, Chun Chen, Xiaofei He, and Deng Cai. 2016. Improving collaborative recommendation via user-item subgroups. IEEE Transactions on Knowledge and Data Engineering 28, 9(2016), 2363–2375.

Digital Library

[6]

Richard Caruana. 1993. Multitask Learning: A Knowledge-Based Source of Inductive Bias. In Proceedings of the Tenth International Conference on Machine Learning. Morgan Kaufmann, 41–48.

[7]

Rich Caruana. 1997. Multitask learning. Machine learning 28, 1 (1997), 41–75.

[8]

Jyotismita Chaki and Nilanjan Dey. 2020. Texture Feature Extraction Techniques for Image Recognition. Springer.

[9]

Wei-Ta Chen, Wei-Chuan Liu, and Ming-Syan Chen. 2010. Adaptive color feature extraction based on image color distributions. IEEE Transactions on image processing 19, 8 (2010), 2005–2016.

Digital Library

[10]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.

Digital Library

[11]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493–2537.

Digital Library

[12]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.

Digital Library

[13]

Li Deng, Geoffrey Hinton, and Brian Kingsbury. 2013. New types of deep neural network learning for speech recognition and related applications: An overview. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 8599–8603.

[14]

Long Duong, Trevor Cohn, Steven Bird, and Paul Cook. 2015. Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 2: short papers). 845–850.

[15]

David Eigen, Marc’Aurelio Ranzato, and Ilya Sutskever. 2013. Learning factored representations in a deep mixture of experts. arXiv preprint arXiv:1312.4314(2013).

[16]

Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.

Digital Library

[17]

Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. 1991. Adaptive mixtures of local experts. Neural computation 3, 1 (1991), 79–87.

[18]

Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339–351.

[19]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[20]

Zhao Li, Xin Shen, Yuhang Jiao, Xuming Pan, Pengcheng Zou, Xianling Meng, Chengwei Yao, and Jiajun Bu. 2020. Hierarchical bipartite graph neural networks: Towards large-scale e-commerce applications. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1677–1688.

[21]

Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh, and Ye-Yi Wang. 2015. Representation learning using multi-task deep neural networks for semantic classification and information retrieval. (2015).

[22]

Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. 2019. Disentangled graph convolutional networks. In International Conference on Machine Learning. PMLR, 4212–4221.

[23]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930–1939.

Digital Library

[24]

Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3994–4003.

[25]

Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Anxiang Zeng, and Luo Si. 2018. Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 596–605.

Digital Library

[26]

Mark Nixon and Alberto Aguado. 2019. Feature extraction and image processing for computer vision. Academic press.

[27]

Stavros Petridis, Zuwei Li, and Maja Pantic. 2017. End-to-end visual speech recognition with LSTMs. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2592–2596.

Digital Library

[28]

Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard. 2017. Sluice networks: Learning what to share between loosely related tasks. arXiv preprint arXiv:1705.08142 2 (2017).

[29]

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538(2017).

[30]

Sima Soltanpour, Boubakeur Boufama, and QM Jonathan Wu. 2017. A survey of local feature methods for 3D face recognition. Pattern Recognition 72(2017), 391–406.

Digital Library

[31]

Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Fourteenth ACM Conference on Recommender Systems. 269–278.

Digital Library

[32]

Shen Xin, Martin Ester, Jiajun Bu, Chengwei Yao, Zhao Li, Xun Zhou, Yizhou Ye, and Can Wang. 2019. Multi-task based sales predictions for online promotions. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2823–2831.

Digital Library

[33]

SHEN Xin, Zhao Li, Pengcheng Zou, Cheng Long, Jie Zhang, Jiajun Bu, and Jingren Zhou. 2021. ATNN: Adversarial Two-Tower Neural Network for New Item’s Popularity Prediction in E-commerce. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2499–2510.

[34]

SHEN Xin, Hongxia Yang, Weizhao Xian, Martin Ester, Jiajun Bu, Zhongyao Wang, and Can Wang. 2018. Mobile access record resolution on large-scale identifier-linkage graphs. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 886–894.

Digital Library

[35]

Shen Xin, Yizhou Ye, Martin Ester, Cheng Long, Jie Zhang, Zhao Li, Kaiying Yuan, and Yanghua Li. 2020. Multi-channel sellers traffic allocation in large-scale e-commerce promotion. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2845–2852.

Digital Library

[36]

Yongxin Yang and Timothy M Hospedales. 2016. Trace norm regularised deep multi-task learning. arXiv preprint arXiv:1606.04038(2016).

[37]

Xinhua Zhuang, Yan Huang, Kannappan Palaniappan, and Yunxin Zhao. 1996. Gaussian mixture density modeling, decomposition, and applications. IEEE Transactions on Image Processing 5, 9 (1996), 1293–1302.

Digital Library

Cited By

Wang GTang S(2023)Generalized Zero-Shot Image Classification via Partially-Shared Multi-Task Representation LearningElectronics10.3390/electronics1209208512:9(2085)Online publication date: 3-May-2023
https://doi.org/10.3390/electronics12092085
Wang YDu ZZhao XChen BGuo HTang RDong ZChen HDuh WHuang HKato MMothe JPoblete B(2023)Single-shot Feature Selection for Multi-task RecommendationsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591767(341-351)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591767

Index Terms

Prototype Feature Extraction for Multi-task Learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Neural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the ...
Heterogeneous Multi-task Semantic Feature Learning for Classification
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Multi-task Learning (MTL) aims to learn multiple related tasks simultaneously instead of separately to improve generalization performance of each task. Most existing MTL methods assumed that the multiple tasks to be learned have the same feature ...
Improving relation extraction by multi-task learning
HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence

Relation extraction is a subtask of information extraction. Current relation extraction methods are mainly designed for relation extraction tasks, and they use limited knowledge. In this paper, we propose a relation extraction method based on multi-task ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Proceedings of the ACM Web Conference 2022

April 2022

3764 pages

ISBN:9781450390965

DOI:10.1145/3485447

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

the MOE AcRF Tier 1 funding (RG90/20) awarded to Dr. Jie Zhang

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
355
Total Downloads

Downloads (Last 12 months)103
Downloads (Last 6 weeks)5

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang GTang S(2023)Generalized Zero-Shot Image Classification via Partially-Shared Multi-Task Representation LearningElectronics10.3390/electronics1209208512:9(2085)Online publication date: 3-May-2023
https://doi.org/10.3390/electronics12092085
Wang YDu ZZhao XChen BGuo HTang RDong ZChen HDuh WHuang HKato MMothe JPoblete B(2023)Single-shot Feature Selection for Multi-task RecommendationsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591767(341-351)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591767

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents