Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3548405acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Invariant Representation Learning for Multimedia Recommendation

Published: 10 October 2022 Publication History

Abstract

Multimedia recommendation forms a personalized ranking task with multimedia content representations which are mostly extracted via generic encoders. However, the generic representations introduce spurious correlations --- the meaningless correlation from the recommendation perspective. For example, suppose a user bought two dresses on the same model, this co-occurrence would produce a correlation between the model and purchases, but the correlation is spurious from the view of fashion recommendation. Existing work alleviates this issue by customizing preference-aware representations, requiring high-cost analysis and design.
In this paper, we propose an Invariant Representation Learning Framework (InvRL) to alleviate the impact of the spurious correlations. We utilize environments to reflect the spurious correlations and determine each environment with a set of interactions. We then learn invariant representations --- the inherent factors attracting user attention --- to make a consistent prediction of user-item interaction across various environments. In this light, InvRL proposes two iteratively executed modules to cluster user-item interactions and learn invariant representations. With them, InvRL trains a final recommender model thus mitigating the spurious correlations. We demonstrate InvRL on a cutting-edge recommender model UltraGCN and conduct extensive experiments on three public multimedia recommendation datasets, Movielens, Tiktok, and Kwai. The experimental results validate the rationality and effectiveness of InvRL. Codes are released at https://github.com/nickwzk/InvRL.

Supplementary Material

MP4 File (MM22-fp3071.mp4)
Presentation Video for `Invariant Representation Learning for Multimedia Recommendation'

References

[1]
Kartik Ahuja, Karthikeyan Shanmugam, Kush R. Varshney, and Amit Dhurandhar. 2020. Invariant Risk Minimization Games. arXiv preprint arXiv:2002.04692 (2020).
[2]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In 5th International Conference on Learning Representations.
[3]
Oren Barkan, Noam Koenigstein, Eylon Yogev, and Ori Katz. 2019. CB2CF: A neural multiview content-to-collaborative filtering model for completely cold item recommendations. In 13th ACM Conference on Recommender Systems. 228-- 236. https://doi.org/10.1145/3298689.3347038
[4]
Zhiyong Cheng, Xiaojun Chang, Lei Zhu, Rose C. Kanjirathinkal, and Mohan Kankanhalli. 2019. MMalfM: Explainable recommendation by leveraging reviews and images. ACM Transactions on Information Systems 37, 2 (2019), 1--28. https://doi.org/10.1145/3291060
[5]
Jacob Devlin, MingWei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1 (2019), 4171--4186.
[6]
Yujuan Ding, Yunshan Ma, Wai Keung Wong, and Tat Seng Chua. 2022. Modeling Instant User Intent and Content-Level Transition for Sequential Fashion Recommendation. IEEE Transactions on Multimedia 24 (2022), 2687--2700. https://doi.org/10.1109/TMM.2021.3088281
[7]
Xiaoyu Du, Xiangnan He, F. Yuan, Jinhui Tang, Zhiguang Qin, and Tat Seng Chua. 2019. Modeling embedding dimension correlations via convolutional neural collaborative filtering. ACM Transactions on Information Systems 37, 4 (2019), 1--22. https://doi.org/10.1145/3357154
[8]
Xiaoyu Du, Xiang Wang, Xiangnan He, Zechao Li, Jinhui Tang, and Tat Seng Chua. 2020. How to Learn Item Representation for Cold-Start Multimedia Recommendation? Proceedings of the 28th ACM International Conference on Multimedia (2020), 3469--3477. https://doi.org/10.1145/3394171.3413628
[9]
Xue Geng, Hanwang Zhang, Jingwen Bian, and Tat Seng Chua. 2015. Learning image and user features for recommendation in social networks. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 2015 Inter. 4274--4282. https://doi.org/10.1109/ICCV.2015.486
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2016-December. 770--778. https://doi.org/10.1109/CVPR.2016.90
[11]
Ruining He and Julian McAuley. 2016. VBPR: Visual Bayesian personalized ranking from implicit feedback. In 30th AAAI Conference on Artificial Intelligence, Vol. 30. 144--150. https://doi.org/10.1609/aaai.v30i1.9973
[12]
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yong Dong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 639--648. https://doi.org/10.1145/3397271.3401063
[13]
Xiangnan He, Zhankui He, Xiaoyu Du, and Tat Seng Chua. 2018. Adversarial personalized ranking for recommendation. In 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 355--364. https://doi.org/10.1145/3209978.3209981
[14]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat Seng Chua. 2017. Neural collaborative filtering. In 26th International World Wide Web Conference. 173--182. https://doi.org/10.1145/3038912.3052569
[15]
Xiangnan He, Jinhui Tang, Xiaoyu Du, Richang Hong, Tongwei Ren, and Tat Seng Chua. 2020. Fast matrix factorization with nonuniform weights on missing data. IEEE Transactions on Neural Networks and Learning Systems 31, 8 (2020), 2791--2804. https://doi.org/10.1109/TNNLS.2018.2890117
[16]
Xiangnan He, Hanwang Zhang, Min Yen Kan, and Tat Seng Chua. 2016. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 549--558. https://doi.org/10.1145/2911451.2911489
[17]
Xiangnan He, Yang Zhang, Fuli Feng, Chonggang Song, Lingling Yi, Guohui Ling, and Yongdong Zhang. 2022. Addressing Confounding Feature Issue for Causal Recommendation. arXiv preprint arXiv:2205.06532 (2022).
[18]
Shawn Hershey, Sourish Chaudhuri, Daniel P.W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, and Kevin Wilson. 2017. CNN architectures for large-scale audio classification. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 131--135. https://doi.org/10.1109/ICASSP.2017.7952132
[19]
Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: Factored item similarity models for Top-N recommender systems. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Vol. Part F128815. 659--667. https://doi.org/10.1145/2487575.2487589
[20]
Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations - Conference Track Proceedings (2015).
[21]
Yehuda Koren. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 426--434. https://doi.org/10.1145/1401890.1401944
[22]
Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018. Variational auto encoders for collaborative filtering. In Proceedings of the World Wide Web Conference. 689--698. https://doi.org/10.1145/3178876.3186150
[23]
Fan Liu, Zhiyong Cheng, Changchang Sun, Yinglong Wang, Liqiang Nie, and Mohan Kankanhalli. 2019. User diverse preference modeling by multimodal attentive metric learning. In Proceedings of the 27th ACM International Conference on Multimedia. 1526--1534. https://doi.org/10.1145/3343031.3350953
[24]
Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, and Zheyan Shen. 2021. Kernelized Heterogeneous Risk Minimization. In Advances in Neural Information Processing Systems, Vol. 26. PMLR, 21720--21731.
[25]
Qiang Liu, Shu Wu, and Liang Wang. 2017. Deepstyle: Learning user preferences for visual recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 841--844. https://doi.org/10.1145/3077136.3080658
[26]
Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. 2019. Disentangled graph convolutional networks. In 36th International Conference on Machine Learning, Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 2019-June. PMLR, 7454--7463.
[27]
Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, and Wenwu Zhu. 2019. Learning disentangled representations for recommendation. Advances in Neural Information Processing Systems 32 (2019).
[28]
Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, and Xiuqiang He. 2021. Ultra GCN: Ultra Simplification of Graph Convolutional Networks for Recommendation. In International Conference on Information and Knowledge Management, Proceedings. 1253--1262. https://doi.org/10.1145/3459637.3482291
[29]
Ruihong Qiu, SenWang, Zhi Chen, Hongzhi Yin, and Zi Huang. 2021. Causal Rec: Causal Inference for Visual Debiasing in Visually-Aware Recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 3844--3852. https://doi.org/10.1145/3474085.3475266
[30]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (2009), 452--461.
[31]
Tiancheng Shen, Jia Jia, Yan Li, Hanjie Wang, and Bo Chen. 2020. Enhancing Music Recommendation with Social Media Content: An Attentive Multi-modal Auto encoder Approach. In Proceedings of the International Joint Conference on Neural Networks. IEEE, 1--8. https://doi.org/10.1109/IJCNN48605.2020.9206894
[32]
Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. 2020. Multi-modal Knowledge Graphs for Recommender Systems. International Conference on Information and Knowledge Management, Proceedings (2020), 1405--1414. https://doi.org/10.1145/3340531.3411947
[33]
Zhulin Tao, Yinwei Wei, Xiang Wang, Xiangnan He, Xianglin Huang, and Tat Seng Chua. 2020. MGAT: Multimodal Graph Attention Network for Recommendation. Information Processing and Management 57, 5 (2020), 102277. https://doi.org/10.1016/j.ipm.2020.102277
[34]
Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, and Tat Seng Chua. 2021. Clicks can be Cheating: Counterfactual Recommendation for Mitigating Click bait Issue. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1288--1297. https://doi.org/10.1145/3404835.3462962
[35]
Wenjie Wang, Xinyu Lin, Fuli Feng, Xiangnan He, Min Lin, and Tat Seng Chua. 2022. Causal Representation Learning for Out-of-Distribution Recommendation. In Proceedings of the ACM Web Conference 2022. 3562--3571. https://doi.org/10.1145/3485447.3512251
[36]
Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat Seng Chua. 2019. KGAT: Knowledge graph attention network for recommendation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 950--958. https://doi.org/10.1145/3292500.3330989
[37]
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 165--174. https://doi.org/10.1145/3331184.3331267
[38]
Yinwei Wei, Xiangnan He, Xiang Wang, Richang Hong, Liqiang Nie, and Tat Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM International Conference on Multimedia. 1437--1445. https://doi.org/10.1145/3343031.3351034
[39]
Yinwei Wei, Xiang Wang, Xiangnan He, Liqiang Nie, Yong Rui, and Tat Seng Chua. 2022. Hierarchical User Intent Graph Network for Multimedia Recommendation. IEEE Transactions on Multimedia 24 (2022), 2701--2712. https://doi.org/10.1109/TMM.2021.3088307
[40]
Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat Seng Chua. 2020. Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. Proceedings of the 28th ACM International Conference on Multimedia (2020), 3541--3549. https://doi.org/10.1145/3394171.3413556
[41]
Le Wu, Lei Chen, Richang Hong, Yanjie Fu, Xing Xie, and Meng Wang. 2020. A Hierarchical Attention Model for Social Contextual Image Recommendation. IEEE Transactions on Knowledge and Data Engineering 32, 10 (2020), 1854?1867. https://doi.org/10.1109/TKDE.2019.2913394
[42]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 974-983. https://doi.org/10.1145/3219819.3219890
[43]
Wenhui Yu, Huidi Zhang, Xiangnan He, Xu Chen, Li Xiong, and Zheng Qin. 2018. Aesthetic-based clothing recommendation. In Proceedings of the World Wide Web Conference. 649-658. https://doi.org/10.1145/3178876.3186146
[44]
Xinyuan Zhu, Yang Zhang, Fuli Feng, Xun Yang, Dingxian Wang, and Xiangnan He. 2022. Mitigating Hidden Confounding Effects for Causal Recommendation. arXiv preprint arXiv:2205.07499 (2022).

Cited By

View all
  • (2024)Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender SystemProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671931(1395-1406)Online publication date: 25-Aug-2024
  • (2024)Unleashing the Power of Knowledge Graph for Recommendation via Invariant LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645576(3745-3755)Online publication date: 13-May-2024
  • (2024)Hierarchical Multi-Modal Attention Network for Time-Sync Comment Video RecommendationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330976834:4(2694-2705)Online publication date: Apr-2024
  • Show More Cited By

Index Terms

  1. Invariant Representation Learning for Multimedia Recommendation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. invariant learning
    2. multimedia recommendation
    3. multimedia representation learning
    4. spurious correlation

    Qualifiers

    • Research-article

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)190
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender SystemProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671931(1395-1406)Online publication date: 25-Aug-2024
    • (2024)Unleashing the Power of Knowledge Graph for Recommendation via Invariant LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645576(3745-3755)Online publication date: 13-May-2024
    • (2024)Hierarchical Multi-Modal Attention Network for Time-Sync Comment Video RecommendationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330976834:4(2694-2705)Online publication date: Apr-2024
    • (2024)Multimodal recommender system based on multi-channel counterfactual learning networksMultimedia Systems10.1007/s00530-024-01448-z30:5Online publication date: 13-Aug-2024
    • (2023)Pareto Invariant Representation Learning for Multimedia RecommendationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612591(6410-6419)Online publication date: 26-Oct-2023
    • (2023)Modal-aware Bias Constrained Contrastive Learning for Multimodal RecommendationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612568(6369-6378)Online publication date: 26-Oct-2023
    • (2023)Contrastive Intra- and Inter-Modality Generation for Enhancing Incomplete Multimedia RecommendationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612362(6234-6242)Online publication date: 26-Oct-2023
    • (2023)Enhancing Adversarial Robustness of Multi-modal Recommendation via Modality BalancingProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612337(6274-6282)Online publication date: 26-Oct-2023
    • (2023)Pedestrian-specific Bipartite-aware Similarity Learning for Text-based Person RetrievalProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612009(8922-8931)Online publication date: 26-Oct-2023
    • (2023)Reformulating CTR Prediction: Learning Invariant Feature Interactions for RecommendationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591755(1386-1395)Online publication date: 19-Jul-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media