research-article

Open access

Adaptive Disentangled Transformer for Sequential Recommendation

Authors:

Yipeng Zhang,

Xin Wang,

Hong Chen,

Wenwu ZhuAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3434 - 3445

https://doi.org/10.1145/3580305.3599253

Published: 04 August 2023 Publication History

PDF eReader

Abstract

Sequential recommendation aims at mining time-aware user interests through modeling sequential behaviors. Transformer, as an effective architecture designed to process sequential input data, has shown its superiority in capturing sequential relations for recommendation. Nevertheless, existing Transformer architectures lack explicit regularization for layer-wise disentanglement, which fails to take advantage of disentangled representation in recommendation and leads to suboptimal performance. In this paper, we study the problem of layer-wise disentanglement for Transformer architectures and propose the Adaptive Disentangled Transformer (ADT) framework, which is able to adaptively determine the optimal degree of disentanglement of attention heads within different layers. Concretely, we propose to encourage disentanglement by requiring the independence constraint via mutual information estimation over attention heads and employing auxiliary objectives to prevent the information from collapsing into useless noise. We further propose a progressive scheduler to adaptively adjust the weights controlling the degree of disentanglement via an evolutionary process. Extensive experiments on various real-world datasets demonstrate the effectiveness of our proposed ADT framework.

Supplementary Material

MP4 File (rtfp1467-2min-promo.mp4)

This video is about Adaptive Disentangled Transformer for Sequential Recommendation. It discusses sequential recommendation and the use of Transformer architecture for capturing sequential relations in recommendation. However, existing Transformer architectures lack explicit regularization for layer-wise disentanglement, which results in suboptimal performance. To address this problem, the authors propose the Adaptive Disentangled Transformer (ADT) framework, which can adaptively determine the optimal degree of disentanglement of attention heads within different layers.

Download
57.50 MB

References

[1]

Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. 2017. Designing Neural Network Architectures using Reinforcement Learning. In International Conference on Learning Representations.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Knowledge Graph Transformer for Sequential Recommendation

Knowledge-Enhanced Conversational Recommendation via Transformer-Based Sequential Modeling

User Popularity Preference Aware Sequential Recommendation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations