Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3512119acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

Prototype Feature Extraction for Multi-task Learning

Published: 25 April 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Multi-task learning (MTL) has been widely utilized in various industrial scenarios, such as recommender systems and search engines. MTL can improve learning efficiency and prediction accuracy by exploiting commonalities and differences across tasks. However, MTL is sensitive to relationships among tasks and may have performance degradation in real-world applications, because existing neural-based MTL models often share the same network structures and original input features. To address this issue, we propose a novel multi-task learning model based on Prototype Feature Extraction (PFE) to balance task-specific objectives and inter-task relationships. PFE is a novel component to disentangle features for multiple tasks. To better extract features from original inputs before gating networks, we introduce a new concept, namely prototype feature center, to disentangle features for multiple tasks. The extracted prototype features fuse various features from different tasks to better learn inter-task relationships. PFE updates prototype feature centers and prototype features iteratively. Our model utilizes the learned prototype features and task-specific experts for MTL. We implement PFE on two public datasets. Empirical results show that PFE outperforms state-of-the-art MTL models by extracting prototype features. Furthermore, we deploy PFE in a real-world recommender system (one of the world’s top-tier short video sharing platforms) to showcase that PFE can be widely applied in industrial scenarios.

    References

    [1]
    Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saied Safaei, Elizabeth D Trippe, Juan B Gutierrez, and Krys Kochut. 2017. A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919(2017).
    [2]
    Lu Bai, Lixin Cui, Yuhang Jiao, Luca Rossi, and Edwin R. Hancock. 2022. Learning Backtrackless Aligned-Spatial Graph Convolutional Networks for Graph Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 2(2022), 783–798. https://doi.org/10.1109/TPAMI.2020.3011866
    [3]
    Lu Bai, Yuhang Jiao, Lixin Cui, Luca Rossi, Yue Wang, Philip Yu, and Edwin Hancock. 2021. Learning Graph Convolutional Networks based on Quantum Vertex Information Propagation. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. https://doi.org/10.1109/TKDE.2021.3106804
    [4]
    Trapit Bansal, David Belanger, and Andrew McCallum. 2016. Ask the gru: Multi-task learning for deep text recommendations. In proceedings of the 10th ACM Conference on Recommender Systems. 107–114.
    [5]
    Jiajun Bu, Xin Shen, Bin Xu, Chun Chen, Xiaofei He, and Deng Cai. 2016. Improving collaborative recommendation via user-item subgroups. IEEE Transactions on Knowledge and Data Engineering 28, 9(2016), 2363–2375.
    [6]
    Richard Caruana. 1993. Multitask Learning: A Knowledge-Based Source of Inductive Bias. In Proceedings of the Tenth International Conference on Machine Learning. Morgan Kaufmann, 41–48.
    [7]
    Rich Caruana. 1997. Multitask learning. Machine learning 28, 1 (1997), 41–75.
    [8]
    Jyotismita Chaki and Nilanjan Dey. 2020. Texture Feature Extraction Techniques for Image Recognition. Springer.
    [9]
    Wei-Ta Chen, Wei-Chuan Liu, and Ming-Syan Chen. 2010. Adaptive color feature extraction based on image color distributions. IEEE Transactions on image processing 19, 8 (2010), 2005–2016.
    [10]
    Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.
    [11]
    Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493–2537.
    [12]
    Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.
    [13]
    Li Deng, Geoffrey Hinton, and Brian Kingsbury. 2013. New types of deep neural network learning for speech recognition and related applications: An overview. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 8599–8603.
    [14]
    Long Duong, Trevor Cohn, Steven Bird, and Paul Cook. 2015. Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 2: short papers). 845–850.
    [15]
    David Eigen, Marc’Aurelio Ranzato, and Ilya Sutskever. 2013. Learning factored representations in a deep mixture of experts. arXiv preprint arXiv:1312.4314(2013).
    [16]
    Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.
    [17]
    Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. 1991. Adaptive mixtures of local experts. Neural computation 3, 1 (1991), 79–87.
    [18]
    Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339–351.
    [19]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
    [20]
    Zhao Li, Xin Shen, Yuhang Jiao, Xuming Pan, Pengcheng Zou, Xianling Meng, Chengwei Yao, and Jiajun Bu. 2020. Hierarchical bipartite graph neural networks: Towards large-scale e-commerce applications. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1677–1688.
    [21]
    Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh, and Ye-Yi Wang. 2015. Representation learning using multi-task deep neural networks for semantic classification and information retrieval. (2015).
    [22]
    Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. 2019. Disentangled graph convolutional networks. In International Conference on Machine Learning. PMLR, 4212–4221.
    [23]
    Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930–1939.
    [24]
    Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3994–4003.
    [25]
    Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Anxiang Zeng, and Luo Si. 2018. Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 596–605.
    [26]
    Mark Nixon and Alberto Aguado. 2019. Feature extraction and image processing for computer vision. Academic press.
    [27]
    Stavros Petridis, Zuwei Li, and Maja Pantic. 2017. End-to-end visual speech recognition with LSTMs. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2592–2596.
    [28]
    Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard. 2017. Sluice networks: Learning what to share between loosely related tasks. arXiv preprint arXiv:1705.08142 2 (2017).
    [29]
    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538(2017).
    [30]
    Sima Soltanpour, Boubakeur Boufama, and QM Jonathan Wu. 2017. A survey of local feature methods for 3D face recognition. Pattern Recognition 72(2017), 391–406.
    [31]
    Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Fourteenth ACM Conference on Recommender Systems. 269–278.
    [32]
    Shen Xin, Martin Ester, Jiajun Bu, Chengwei Yao, Zhao Li, Xun Zhou, Yizhou Ye, and Can Wang. 2019. Multi-task based sales predictions for online promotions. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2823–2831.
    [33]
    SHEN Xin, Zhao Li, Pengcheng Zou, Cheng Long, Jie Zhang, Jiajun Bu, and Jingren Zhou. 2021. ATNN: Adversarial Two-Tower Neural Network for New Item’s Popularity Prediction in E-commerce. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2499–2510.
    [34]
    SHEN Xin, Hongxia Yang, Weizhao Xian, Martin Ester, Jiajun Bu, Zhongyao Wang, and Can Wang. 2018. Mobile access record resolution on large-scale identifier-linkage graphs. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 886–894.
    [35]
    Shen Xin, Yizhou Ye, Martin Ester, Cheng Long, Jie Zhang, Zhao Li, Kaiying Yuan, and Yanghua Li. 2020. Multi-channel sellers traffic allocation in large-scale e-commerce promotion. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2845–2852.
    [36]
    Yongxin Yang and Timothy M Hospedales. 2016. Trace norm regularised deep multi-task learning. arXiv preprint arXiv:1606.04038(2016).
    [37]
    Xinhua Zhuang, Yan Huang, Kannappan Palaniappan, and Yunxin Zhao. 1996. Gaussian mixture density modeling, decomposition, and applications. IEEE Transactions on Image Processing 5, 9 (1996), 1293–1302.

    Cited By

    View all
    • (2023)Generalized Zero-Shot Image Classification via Partially-Shared Multi-Task Representation LearningElectronics10.3390/electronics1209208512:9(2085)Online publication date: 3-May-2023
    • (2023)Single-shot Feature Selection for Multi-task RecommendationsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591767(341-351)Online publication date: 19-Jul-2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '22: Proceedings of the ACM Web Conference 2022
    April 2022
    3764 pages
    ISBN:9781450390965
    DOI:10.1145/3485447
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 April 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Multi-task Learning
    2. Neural Network
    3. Recommender System

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • the MOE AcRF Tier 1 funding (RG90/20) awarded to Dr. Jie Zhang

    Conference

    WWW '22
    Sponsor:
    WWW '22: The ACM Web Conference 2022
    April 25 - 29, 2022
    Virtual Event, Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)103
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Generalized Zero-Shot Image Classification via Partially-Shared Multi-Task Representation LearningElectronics10.3390/electronics1209208512:9(2085)Online publication date: 3-May-2023
    • (2023)Single-shot Feature Selection for Multi-task RecommendationsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591767(341-351)Online publication date: 19-Jul-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media