research-article

Invariant Representation Learning for Multimedia Recommendation

Authors:

Jinhui TangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 619 - 628

https://doi.org/10.1145/3503161.3548405

Published: 10 October 2022 Publication History

Abstract

Multimedia recommendation forms a personalized ranking task with multimedia content representations which are mostly extracted via generic encoders. However, the generic representations introduce spurious correlations --- the meaningless correlation from the recommendation perspective. For example, suppose a user bought two dresses on the same model, this co-occurrence would produce a correlation between the model and purchases, but the correlation is spurious from the view of fashion recommendation. Existing work alleviates this issue by customizing preference-aware representations, requiring high-cost analysis and design.

In this paper, we propose an Invariant Representation Learning Framework (InvRL) to alleviate the impact of the spurious correlations. We utilize environments to reflect the spurious correlations and determine each environment with a set of interactions. We then learn invariant representations --- the inherent factors attracting user attention --- to make a consistent prediction of user-item interaction across various environments. In this light, InvRL proposes two iteratively executed modules to cluster user-item interactions and learn invariant representations. With them, InvRL trains a final recommender model thus mitigating the spurious correlations. We demonstrate InvRL on a cutting-edge recommender model UltraGCN and conduct extensive experiments on three public multimedia recommendation datasets, Movielens, Tiktok, and Kwai. The experimental results validate the rationality and effectiveness of InvRL. Codes are released at https://github.com/nickwzk/InvRL.

Supplementary Material

MP4 File (MM22-fp3071.mp4)

Presentation Video for `Invariant Representation Learning for Multimedia Recommendation'

Download
23.88 MB

References

[1]

Kartik Ahuja, Karthikeyan Shanmugam, Kush R. Varshney, and Amit Dhurandhar. 2020. Invariant Risk Minimization Games. arXiv preprint arXiv:2002.04692 (2020).

[2]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In 5th International Conference on Learning Representations.

[3]

Oren Barkan, Noam Koenigstein, Eylon Yogev, and Ori Katz. 2019. CB2CF: A neural multiview content-to-collaborative filtering model for completely cold item recommendations. In 13th ACM Conference on Recommender Systems. 228-- 236. https://doi.org/10.1145/3298689.3347038

Digital Library

[4]

Zhiyong Cheng, Xiaojun Chang, Lei Zhu, Rose C. Kanjirathinkal, and Mohan Kankanhalli. 2019. MMalfM: Explainable recommendation by leveraging reviews and images. ACM Transactions on Information Systems 37, 2 (2019), 1--28. https://doi.org/10.1145/3291060

Digital Library

[5]

Jacob Devlin, MingWei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1 (2019), 4171--4186.

[6]

Yujuan Ding, Yunshan Ma, Wai Keung Wong, and Tat Seng Chua. 2022. Modeling Instant User Intent and Content-Level Transition for Sequential Fashion Recommendation. IEEE Transactions on Multimedia 24 (2022), 2687--2700. https://doi.org/10.1109/TMM.2021.3088281

Digital Library

[7]

Xiaoyu Du, Xiangnan He, F. Yuan, Jinhui Tang, Zhiguang Qin, and Tat Seng Chua. 2019. Modeling embedding dimension correlations via convolutional neural collaborative filtering. ACM Transactions on Information Systems 37, 4 (2019), 1--22. https://doi.org/10.1145/3357154

Digital Library

[8]

Xiaoyu Du, Xiang Wang, Xiangnan He, Zechao Li, Jinhui Tang, and Tat Seng Chua. 2020. How to Learn Item Representation for Cold-Start Multimedia Recommendation? Proceedings of the 28th ACM International Conference on Multimedia (2020), 3469--3477. https://doi.org/10.1145/3394171.3413628

Digital Library

[9]

Xue Geng, Hanwang Zhang, Jingwen Bian, and Tat Seng Chua. 2015. Learning image and user features for recommendation in social networks. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 2015 Inter. 4274--4282. https://doi.org/10.1109/ICCV.2015.486

Digital Library

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2016-December. 770--778. https://doi.org/10.1109/CVPR.2016.90

[11]

Ruining He and Julian McAuley. 2016. VBPR: Visual Bayesian personalized ranking from implicit feedback. In 30th AAAI Conference on Artificial Intelligence, Vol. 30. 144--150. https://doi.org/10.1609/aaai.v30i1.9973

[12]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yong Dong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 639--648. https://doi.org/10.1145/3397271.3401063

Digital Library

[13]

Xiangnan He, Zhankui He, Xiaoyu Du, and Tat Seng Chua. 2018. Adversarial personalized ranking for recommendation. In 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 355--364. https://doi.org/10.1145/3209978.3209981

Digital Library

[14]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat Seng Chua. 2017. Neural collaborative filtering. In 26th International World Wide Web Conference. 173--182. https://doi.org/10.1145/3038912.3052569

Digital Library

[15]

Xiangnan He, Jinhui Tang, Xiaoyu Du, Richang Hong, Tongwei Ren, and Tat Seng Chua. 2020. Fast matrix factorization with nonuniform weights on missing data. IEEE Transactions on Neural Networks and Learning Systems 31, 8 (2020), 2791--2804. https://doi.org/10.1109/TNNLS.2018.2890117

[16]

Xiangnan He, Hanwang Zhang, Min Yen Kan, and Tat Seng Chua. 2016. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 549--558. https://doi.org/10.1145/2911451.2911489

Digital Library

[17]

Xiangnan He, Yang Zhang, Fuli Feng, Chonggang Song, Lingling Yi, Guohui Ling, and Yongdong Zhang. 2022. Addressing Confounding Feature Issue for Causal Recommendation. arXiv preprint arXiv:2205.06532 (2022).

[18]

Shawn Hershey, Sourish Chaudhuri, Daniel P.W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, and Kevin Wilson. 2017. CNN architectures for large-scale audio classification. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 131--135. https://doi.org/10.1109/ICASSP.2017.7952132

Digital Library

[19]

Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: Factored item similarity models for Top-N recommender systems. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Vol. Part F128815. 659--667. https://doi.org/10.1145/2487575.2487589

Digital Library

[20]

Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations - Conference Track Proceedings (2015).

[21]

Yehuda Koren. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 426--434. https://doi.org/10.1145/1401890.1401944

Digital Library

[22]

Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018. Variational auto encoders for collaborative filtering. In Proceedings of the World Wide Web Conference. 689--698. https://doi.org/10.1145/3178876.3186150

[23]

Fan Liu, Zhiyong Cheng, Changchang Sun, Yinglong Wang, Liqiang Nie, and Mohan Kankanhalli. 2019. User diverse preference modeling by multimodal attentive metric learning. In Proceedings of the 27th ACM International Conference on Multimedia. 1526--1534. https://doi.org/10.1145/3343031.3350953

Digital Library

[24]

Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, and Zheyan Shen. 2021. Kernelized Heterogeneous Risk Minimization. In Advances in Neural Information Processing Systems, Vol. 26. PMLR, 21720--21731.

[25]

Qiang Liu, Shu Wu, and Liang Wang. 2017. Deepstyle: Learning user preferences for visual recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 841--844. https://doi.org/10.1145/3077136.3080658

Digital Library

[26]

Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. 2019. Disentangled graph convolutional networks. In 36th International Conference on Machine Learning, Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 2019-June. PMLR, 7454--7463.

[27]

Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, and Wenwu Zhu. 2019. Learning disentangled representations for recommendation. Advances in Neural Information Processing Systems 32 (2019).

[28]

Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, and Xiuqiang He. 2021. Ultra GCN: Ultra Simplification of Graph Convolutional Networks for Recommendation. In International Conference on Information and Knowledge Management, Proceedings. 1253--1262. https://doi.org/10.1145/3459637.3482291

[29]

Ruihong Qiu, SenWang, Zhi Chen, Hongzhi Yin, and Zi Huang. 2021. Causal Rec: Causal Inference for Visual Debiasing in Visually-Aware Recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 3844--3852. https://doi.org/10.1145/3474085.3475266

Digital Library

[30]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (2009), 452--461.

Digital Library

[31]

Tiancheng Shen, Jia Jia, Yan Li, Hanjie Wang, and Bo Chen. 2020. Enhancing Music Recommendation with Social Media Content: An Attentive Multi-modal Auto encoder Approach. In Proceedings of the International Joint Conference on Neural Networks. IEEE, 1--8. https://doi.org/10.1109/IJCNN48605.2020.9206894

[32]

Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. 2020. Multi-modal Knowledge Graphs for Recommender Systems. International Conference on Information and Knowledge Management, Proceedings (2020), 1405--1414. https://doi.org/10.1145/3340531.3411947

[33]

Zhulin Tao, Yinwei Wei, Xiang Wang, Xiangnan He, Xianglin Huang, and Tat Seng Chua. 2020. MGAT: Multimodal Graph Attention Network for Recommendation. Information Processing and Management 57, 5 (2020), 102277. https://doi.org/10.1016/j.ipm.2020.102277

[34]

Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, and Tat Seng Chua. 2021. Clicks can be Cheating: Counterfactual Recommendation for Mitigating Click bait Issue. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1288--1297. https://doi.org/10.1145/3404835.3462962

Digital Library

[35]

Wenjie Wang, Xinyu Lin, Fuli Feng, Xiangnan He, Min Lin, and Tat Seng Chua. 2022. Causal Representation Learning for Out-of-Distribution Recommendation. In Proceedings of the ACM Web Conference 2022. 3562--3571. https://doi.org/10.1145/3485447.3512251

Digital Library

[36]

Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat Seng Chua. 2019. KGAT: Knowledge graph attention network for recommendation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 950--958. https://doi.org/10.1145/3292500.3330989

Digital Library

[37]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 165--174. https://doi.org/10.1145/3331184.3331267

Digital Library

[38]

Yinwei Wei, Xiangnan He, Xiang Wang, Richang Hong, Liqiang Nie, and Tat Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM International Conference on Multimedia. 1437--1445. https://doi.org/10.1145/3343031.3351034

Digital Library

[39]

Yinwei Wei, Xiang Wang, Xiangnan He, Liqiang Nie, Yong Rui, and Tat Seng Chua. 2022. Hierarchical User Intent Graph Network for Multimedia Recommendation. IEEE Transactions on Multimedia 24 (2022), 2701--2712. https://doi.org/10.1109/TMM.2021.3088307

Digital Library

[40]

Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat Seng Chua. 2020. Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. Proceedings of the 28th ACM International Conference on Multimedia (2020), 3541--3549. https://doi.org/10.1145/3394171.3413556

Digital Library

[41]

Le Wu, Lei Chen, Richang Hong, Yanjie Fu, Xing Xie, and Meng Wang. 2020. A Hierarchical Attention Model for Social Contextual Image Recommendation. IEEE Transactions on Knowledge and Data Engineering 32, 10 (2020), 1854?1867. https://doi.org/10.1109/TKDE.2019.2913394

[42]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 974-983. https://doi.org/10.1145/3219819.3219890

Digital Library

[43]

Wenhui Yu, Huidi Zhang, Xiangnan He, Xu Chen, Li Xiong, and Zheng Qin. 2018. Aesthetic-based clothing recommendation. In Proceedings of the World Wide Web Conference. 649-658. https://doi.org/10.1145/3178876.3186146

Digital Library

[44]

Xinyuan Zhu, Yang Zhang, Fuli Feng, Xun Yang, Dingxian Wang, and Xiangnan He. 2022. Mitigating Hidden Confounding Effects for Causal Recommendation. arXiv preprint arXiv:2205.07499 (2022).

Cited By

Kim SKang HChoi SKim DYang MPark CBaeza-Yates RBonchi F(2024)Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender SystemProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671931(1395-1406)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671931
Wang SSui YWang CXiong HChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Unleashing the Power of Knowledge Graph for Recommendation via Invariant LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645576(3745-3755)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645576
Zhao WWu HHe WBi HWang HZhu CXu TChen E(2024)Hierarchical Multi-Modal Attention Network for Time-Sync Comment Video RecommendationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330976834:4(2694-2705)Online publication date: Apr-2024
https://doi.org/10.1109/TCSVT.2023.3309768
Show More Cited By

Index Terms

Invariant Representation Learning for Multimedia Recommendation
1. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Learning Hybrid Behavior Patterns for Multimedia Recommendation
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Multimedia recommendation aims to predict user preferences where users interact with multimodal items. Collaborative filtering based on graph convolutional networks manifests impressive performance gains in multimedia recommendation. This is attributed ...
Mining Latent Structures for Multimedia Recommendation
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Multimedia content is of predominance in the modern Web era. Investigating how users interact with multimodal items is a continuing concern within the rapid development of recommender systems. The majority of previous work focuses on modeling user-item ...
Pareto Invariant Representation Learning for Multimedia Recommendation
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
514
Total Downloads

Downloads (Last 12 months)190
Downloads (Last 6 weeks)13

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim SKang HChoi SKim DYang MPark CBaeza-Yates RBonchi F(2024)Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender SystemProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671931(1395-1406)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671931
Wang SSui YWang CXiong HChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Unleashing the Power of Knowledge Graph for Recommendation via Invariant LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645576(3745-3755)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645576
Zhao WWu HHe WBi HWang HZhu CXu TChen E(2024)Hierarchical Multi-Modal Attention Network for Time-Sync Comment Video RecommendationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330976834:4(2694-2705)Online publication date: Apr-2024
https://doi.org/10.1109/TCSVT.2023.3309768
Fang HSha LLiang J(2024)Multimodal recommender system based on multi-channel counterfactual learning networksMultimedia Systems10.1007/s00530-024-01448-z30:5Online publication date: 13-Aug-2024
https://doi.org/10.1007/s00530-024-01448-z
Huang SLi HLi QZheng CLiu LEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Pareto Invariant Representation Learning for Multimedia RecommendationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612591(6410-6419)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612591
Yang WFang ZZhang TWu SLu CEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Modal-aware Bias Constrained Contrastive Learning for Multimodal RecommendationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612568(6369-6378)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612568
Lin ZTan YZhan YLiu WWang FChen CWang SYang CEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Contrastive Intra- and Inter-Modality Generation for Enhancing Incomplete Multimedia RecommendationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612362(6234-6242)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612362
Shang YGao CChen JJin DMa HLi YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Enhancing Adversarial Robustness of Multi-modal Recommendation via Modality BalancingProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612337(6274-6282)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612337
Shen FShu XDu XTang JEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Pedestrian-specific Bipartite-aware Similarity Learning for Text-based Person RetrievalProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612009(8922-8931)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612009
Zhang YShi TFeng FWang WWang DHe XZhang YChen HDuh WHuang HKato MMothe JPoblete B(2023)Reformulating CTR Prediction: Learning Invariant Feature Interactions for RecommendationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591755(1386-1395)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591755
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents