Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3580305.3599253acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Adaptive Disentangled Transformer for Sequential Recommendation

Published: 04 August 2023 Publication History

Abstract

Sequential recommendation aims at mining time-aware user interests through modeling sequential behaviors. Transformer, as an effective architecture designed to process sequential input data, has shown its superiority in capturing sequential relations for recommendation. Nevertheless, existing Transformer architectures lack explicit regularization for layer-wise disentanglement, which fails to take advantage of disentangled representation in recommendation and leads to suboptimal performance. In this paper, we study the problem of layer-wise disentanglement for Transformer architectures and propose the Adaptive Disentangled Transformer (ADT) framework, which is able to adaptively determine the optimal degree of disentanglement of attention heads within different layers. Concretely, we propose to encourage disentanglement by requiring the independence constraint via mutual information estimation over attention heads and employing auxiliary objectives to prevent the information from collapsing into useless noise. We further propose a progressive scheduler to adaptively adjust the weights controlling the degree of disentanglement via an evolutionary process. Extensive experiments on various real-world datasets demonstrate the effectiveness of our proposed ADT framework.

Supplementary Material

MP4 File (rtfp1467-2min-promo.mp4)
This video is about Adaptive Disentangled Transformer for Sequential Recommendation. It discusses sequential recommendation and the use of Transformer architecture for capturing sequential relations in recommendation. However, existing Transformer architectures lack explicit regularization for layer-wise disentanglement, which results in suboptimal performance. To address this problem, the authors propose the Adaptive Disentangled Transformer (ADT) framework, which can adaptively determine the optimal degree of disentanglement of attention heads within different layers.

References

[1]
Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. 2017. Designing Neural Network Architectures using Reinforcement Learning. In International Conference on Learning Representations.
[2]
Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798--1828.
[3]
Yuchen Bian, Jiaji Huang, Xingyu Cai, Jiahong Yuan, and Kenneth Church. 2021. On attention redundancy: A comprehensive study. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 930--945.
[4]
Andrew Brock, Theo Lim, JM Ritchie, and Nick Weston. 2018. SMASH: One-Shot Model Architecture Search through HyperNetworks. In International Conference on Learning Representations.
[5]
Yukuo Cen, Jianwei Zhang, Xu Zou, Chang Zhou, Hongxia Yang, and Jie Tang. 2020. Controllable multi-interest framework for recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2942--2951.
[6]
Hong Chen, Xin Wang, Yue Liu, Yuwei Zhou, Chaoyu Guan, and Wenwu Zhu. 2022. Module-Aware Optimization for Auxiliary Learning. Advances in Neural Information Processing Systems.
[7]
Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. 2019. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1294--1303.
[8]
Yongjun Chen, Zhiwei Liu, Jia Li, Julian McAuley, and Caiming Xiong. 2022. Intent contrastive learning for sequential recommendation. In Proceedings of the ACM Web Conference 2022. 2172--2182.
[9]
Kyunghyun Cho, Bart Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In EMNLP.
[10]
Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. 2019. What Does BERT Look at? An Analysis of BERT's Attention. In Proceedings of the 2019 ACLWorkshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 276--286.
[11]
Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. 2017. Sequential user-based recurrent neural network recommendations. In Proceedings of the eleventh ACM conference on recommender systems. 152--160.
[12]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
[13]
Ziwei Fan, Zhiwei Liu, Shen Wang, Lei Zheng, and Philip S Yu. 2021. Modeling sequences as distributions with uncertainty for sequential recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3019--3023.
[14]
Ziwei Fan, Zhiwei Liu, YuWang, AliceWang, Zahra Nazari, Lei Zheng, Hao Peng, and Philip S Yu. 2022. Sequential recommendation via stochastic self-attention. In Proceedings of the ACM Web Conference 2022. 2036--2047.
[15]
Yang Gao, Hong Yang, Peng Zhang, Chuan Zhou, and Yue Hu. 2019. Graphnas: Graph neural architecture search with reinforcement learning. arXiv preprint arXiv:1904.09981 (2019).
[16]
Chaoyu Guan, Xin Wang, and Wenwu Zhu. 2021. Autoattend: Automated attention representation search. In International conference on machine learning. PMLR, 3864--3874.
[17]
Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020. Single path one-shot neural architecture search with uniform sampling. In European Conference on Computer Vision. Springer, 544--560.
[18]
Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017. Translation-based recommendation. In Proceedings of the eleventh ACM conference on recommender systems. 161--169.
[19]
Ruining He and Julian McAuley. 2016. Fusing similarity models with markov chains for sparse sequential recommendation. In 2016 IEEE 16th international conference on data mining (ICDM). IEEE, 191--200.
[20]
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639--648.
[21]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173--182.
[22]
Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent neural networks with top-k gains for session-based recommendations. In Proceedings of the 27th ACM international conference on information and knowledge management. 843--852.
[23]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
[24]
Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations.
[25]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[26]
Linmei Hu, Siyong Xu, Chen Li, Cheng Yang, Chuan Shi, Nan Duan, Xing Xie, and Ming Zhou. 2020. Graph neural news recommendation with unsupervised preference disentanglement. In Proceedings of the 58th annual meeting of the association for computational linguistics. 4255--4264.
[27]
Sarthak Jain and Byron C Wallace. 2019. Attention is not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 3543--3556.
[28]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197--206.
[29]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[30]
Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1419--1428.
[31]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. DARTS: Differentiable Architecture Search. In International Conference on Learning Representations.
[32]
Shikun Liu, Andrew Davison, and Edward Johns. 2019. Self-supervised generalisation with meta auxiliary learning. Advances in Neural Information Processing Systems 32 (2019).
[33]
Chen Ma, Peng Kang, and Xue Liu. 2019. Hierarchical gating networks for sequential recommendation. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 825--833.
[34]
Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, and Wenwu Zhu. 2020. Disentangled self-supervision in sequential recommenders. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 483--491.
[35]
Paul Michel, Omer Levy, and Graham Neubig. 2019. Are sixteen heads really better than one? Advances in neural information processing systems 32 (2019).
[36]
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image transformer. In International conference on machine learning. PMLR, 4055--4064.
[37]
Bo Peng, Zhiyun Ren, Srinivasan Parthasarathy, and Xia Ning. 2021. HAM: Hybrid Associations Models for Sequential Recommendation. IEEE Transactions on Knowledge and Data Engineering 34, 10 (2021), 4838--4853.
[38]
Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient neural architecture search via parameters sharing. In International conference on machine learning. PMLR, 4095--4104.
[39]
Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi. 2017. Personalizing session-based recommendations with hierarchical recurrent neural networks. In proceedings of the Eleventh ACM Conference on Recommender Systems. 130--137.
[40]
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821--8831.
[41]
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, Vol. 33. 4780--4789.
[42]
Steffen Rendle. 2019. Evaluation metrics for item recommendation under sampling. arXiv preprint arXiv:1912.02263 (2019).
[43]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. 452--461.
[44]
Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811--820.
[45]
Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2021. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics 8 (2021), 842--866.
[46]
Michael S Ryoo, AJ Piergiovanni, Mingxing Tan, and Anelia Angelova. 2019. AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures. In International Conference on Learning Representations.
[47]
RAINER STORN and KENNETH PRICE. 1997. Differential Evolution-A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization 11 (1997), 341--359.
[48]
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441--1450.
[49]
Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565--573.
[50]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[51]
Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5797--5808.
[52]
Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F Wong, and Lidia S Chao. 2019. Learning Deep Transformer Models for Machine Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1810--1822.
[53]
Xin Wang, Hong Chen, Si'ao Tang, Zihao Wu, and Wenwu Zhu. 2022. Disentangled Representation Learning. arXiv:2211.11695 [cs.LG]
[54]
Xin Wang, Hong Chen, Yuwei Zhou, Jianxin Ma, and Wenwu Zhu. 2022. Disentangled representation learning for recommendation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2022), 408--424.
[55]
XinWang, Hong Chen, andWenwu Zhu. 2021. Multimodal disentangled representation for recommendation. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[56]
XinWang, Shuyi Fan, Kun Kuang, andWenwu Zhu. 2021. Explainable automated graph representation learning with hyperparameter importance. In International Conference on Machine Learning. PMLR, 10727--10737.
[57]
Xin Wang, Yue Liu, Jiapei Fan, Weigao Wen, Hui Xue, and Wenwu Zhu. 2023. Continual Few-shot Learning with Transformer Adaptation and Knowledge Regularization. In Proceedings of the ACM Web Conference 2023. 1519--1527.
[58]
Xin Wang, Zirui Pan, Yuwei Zhou, Hong Chen, Chendi Ge, and Wenwu Zhu. 2023. Curriculum Co-disentangled Representation Learning across Multiple Environments for Social Recommendation. In International conference on machine learning. PMLR.
[59]
Cheng Yang, Maosong Sun, Xiaoyuan Yi, and Wenhao Li. 2018. Stylistic chinese poetry generation via unsupervised style disentanglement. In Proceedings of the 2018 conference on empirical methods in natural language processing. 3960--3969.
[60]
Biao Zhang, Ivan Titov, and Rico Sennrich. 2019. Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention. In 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics (ACL), 898--909.
[61]
Yin Zhang, Ziwei Zhu, Yun He, and James Caverlee. 2020. Content-collaborative disentanglement representation learning for enhanced recommendation. In Proceedings of the 14th ACM Conference on Recommender Systems. 43--52.
[62]
Zizhao Zhang, Xin Wang, Chaoyu Guan, Ziwei Zhang, Haoyang Li, and Wenwu Zhu. 2023. AutoGT: Automated Graph Transformer Architecture Search. In The Eleventh International Conference on Learning Representations.
[63]
Barret Zoph and Quoc Le. 2017. Neural Architecture Search with Reinforcement Learning. (2017).
[64]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8697--8710.

Cited By

View all
  • (2025)Fusing temporal and semantic dependencies for session-based recommendationInformation Processing & Management10.1016/j.ipm.2024.10389662:1(103896)Online publication date: Jan-2025
  • (2025)A Review on Deep Learning for Sequential Recommender Systems: Key Technologies and DirectionsBig Data10.1007/978-981-96-1024-2_22(305-318)Online publication date: 24-Jan-2025
  • (2024)Disentangled graph self-supervised learning for out-of-distribution generalizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693230(28890-28904)Online publication date: 21-Jul-2024
  • Show More Cited By

Index Terms

  1. Adaptive Disentangled Transformer for Sequential Recommendation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2023
    5996 pages
    ISBN:9798400701030
    DOI:10.1145/3580305
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 August 2023

    Check for updates

    Author Tags

    1. disentangle
    2. sequential recommendation
    3. transformer

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,069
    • Downloads (Last 6 weeks)99
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Fusing temporal and semantic dependencies for session-based recommendationInformation Processing & Management10.1016/j.ipm.2024.10389662:1(103896)Online publication date: Jan-2025
    • (2025)A Review on Deep Learning for Sequential Recommender Systems: Key Technologies and DirectionsBig Data10.1007/978-981-96-1024-2_22(305-318)Online publication date: 24-Jan-2025
    • (2024)Disentangled graph self-supervised learning for out-of-distribution generalizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693230(28890-28904)Online publication date: 21-Jul-2024
    • (2024)Federated Recommender System Based on Diffusion Augmentation and Guided DenoisingACM Transactions on Information Systems10.1145/368857043:2(1-36)Online publication date: 13-Aug-2024
    • (2024)Automated Disentangled Sequential Recommendation with Large Language ModelsACM Transactions on Information Systems10.1145/367516443:2(1-29)Online publication date: 29-Jun-2024
    • (2024)DIET: Customized Slimming for Incompatible Networks in Sequential RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671669(816-826)Online publication date: 25-Aug-2024
    • (2024)Contrastive Learning on Medical Intents for Sequential Prescription RecommendationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679836(748-757)Online publication date: 21-Oct-2024
    • (2024)FineRec: Exploring Fine-grained Sequential RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657761(1599-1608)Online publication date: 10-Jul-2024
    • (2024)Disentangling ID and Modality Effects for Session-based RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657748(1883-1892)Online publication date: 10-Jul-2024
    • (2024)Disentangled Representation LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.342093746:12(9677-9696)Online publication date: Dec-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media