Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3511808.3557411acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

OptEmbed: Learning Optimal Embedding Table for Click-through Rate Prediction

Published: 17 October 2022 Publication History

Abstract

Click-through rate (CTR) prediction model usually consists of three components: embedding table, feature interaction layer, and classifier. Learning embedding table plays a fundamental role in CTR prediction from the view of the model performance and memory usage. The embedding table is a two-dimensional tensor, with its axes indicating the number of feature values and the embedding dimension, respectively. To learn an efficient and effective embedding table, recent works either assign various embedding dimensions for feature fields and reduce the number of embeddings respectively or mask the embedding table parameters. However, all these existing works cannot get an optimal embedding table. On the one hand, various embedding dimensions still require a large amount of memory due to the vast number of features in the dataset. On the other hand, decreasing the number of embeddings usually suffers from performance degradation, which is intolerable in CTR prediction. Finally, pruning embedding parameters will lead to a sparse embedding table, which is hard to be deployed. To this end, we propose an optimal embedding table learning framework OptEmbed, which provides a practical and general method to find an optimal embedding table for various base CTR models. Specifically, we propose pruning the redundant embeddings regarding corresponding features' importance by learnable pruning thresholds. Furthermore, we consider assigning various embedding dimensions as one single candidate architecture. To efficiently search the optimal embedding dimensions, we design a uniform embedding dimension sampling scheme to equally train all candidate architectures, meaning architecture-related parameters and learnable thresholds are trained simultaneously in one supernet. We then propose an evolution search method based on the supernet to find the optimal embedding dimensions for each field. Experiments on public datasets show that OptEmbed can learn a compact embedding table which can further improve the model performance.

References

[1]
Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. 2018. Understanding and Simplifying One-Shot Architecture Search. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, Stockholmsmässan, Stockholm, Sweden, 550--559.
[2]
Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net, New Orleans, LA, USA, 13 pages.
[3]
Olivier Chapelle, Eren Manavoglu, and Romer Rosales. 2015. Simple and Scalable Response Prediction for Display Advertising. ACM Trans. Intell. Syst. Technol., Vol. 5, 4 (dec 2015), 61.
[4]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS@RecSys 2016. ACM, Boston, MA, USA, 7--10.
[5]
Weiyu Cheng, Yanyan Shen, and Linpeng Huang. 2020. Differentiable Neural Input Search for Recommender Systems. CoRR, Vol. abs/2006.04466 (2020).
[6]
Wei Deng, Junwei Pan, Tian Zhou, Deguang Kong, Aaron Flores, and Guang Lin. 2021. DeepLight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving. In WSDM '21. ACM, Virtual Event, Israel, 922--930.
[7]
Jérémie Donà and Patrick Gallinari. 2021. Differentiable Feature Selection, A Reparameterization Approach. In Machine Learning and Knowledge Discovery in Databases. Research Track. Springer International Publishing, Spain, 414--429.
[8]
Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net, New Orleans, LA, USA, 42 pages.
[9]
Antonio A. Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, and James Zou. 2021. Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems. In IEEE International Symposium on Information Theory, ISIT 2021. IEEE, Australia, 2786--2791.
[10]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010 (JMLR Proceedings, Vol. 9). JMLR.org, Italy, 249--256.
[11]
Huifeng Guo, Wei Guo, Yong Gao, Ruiming Tang, Xiuqiang He, and Wenzhi Liu. 2021. ScaleFreeCTR: MixCache-Based Distributed Training System for CTR Models with Huge Embedding Table. Association for Computing Machinery, New York, NY, USA, 1269--1278.
[12]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017. ijcai.org, Australia, 1725--1731.
[13]
Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020. Single Path One-Shot Neural Architecture Search with Uniform Sampling. In Computer Vision - ECCV 2020 - 16th European Conference (Lecture Notes in Computer Science, Vol. 12361). Springer, UK, 544--560.
[14]
Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both Weights and Connections for Efficient Neural Network. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015. Springer, Canada, 1135--1143.
[15]
Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse Predictive Analytics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de Vries, and Ryen W. White (Eds.). ACM, Shinjuku, Tokyo, Japan, 355--364.
[16]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015 (JMLR Workshop and Conference Proceedings, Vol. 37). JMLR.org, France, 448--456.
[17]
Gangwei Jiang, Hao Wang, Jin Chen, Haoyu Wang, Defu Lian, and Enhong Chen. 2021. xLightFM: Extremely Memory-Efficient Factorization Machine. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, Virtual Event, Canada, 337--346.
[18]
Wang-Cheng Kang, Derek Zhiyuan Cheng, Ting Chen, Xinyang Yi, Dong Lin, Lichan Hong, and Ed H. Chi. 2020. Learning Multi-granular Quantized Embeddings for Large-Vocab Categorical Features in Recommender Systems. In Companion of The 2020 Web Conference 2020. ACM / IW3C2, Taiwan, 562--566.
[19]
Farhan Khawar, Xu Hang, Ruiming Tang, Bin Liu, Zhenguo Li, and Xiuqiang He. 2020. AutoFeature: Searching for Feature Interactions and Their Architectures for Click-through Rate Prediction. In CIKM '20: The 29th ACM International Conference on Information and Knowledge Management. ACM, Ireland, 625--634.
[20]
Bin Liu, Chenxu Zhu, Guilin Li, Weinan Zhang, Jincai Lai, Ruiming Tang, Xiuqiang He, Zhenguo Li, and Yong Yu. 2020b. AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, USA, 2636--2645.
[21]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. In 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net, USA.
[22]
Junjie Liu, Zhe Xu, Runbin Shi, Ray C. C. Cheung, and Hayden Kwok-Hay So. 2020a. Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net, Ethiopia. https://openreview.net/forum?id=SJlbGJrtDB
[23]
Siyi Liu, Chen Gao, Yihong Chen, Depeng Jin, and Yong Li. 2021. Learnable Embedding sizes for Recommender Systems. In 9th International Conference on Learning Representations, ICLR 2021. OpenReview.net, Austria.
[24]
Fuyuan Lyu, Xing Tang, Huifeng Guo, Ruiming Tang, Xiuqiang He, Rui Zhang, and Xue Liu. 2021. Memorize, Factorize, or be Naïve: Learning Optimal Feature Interaction Methods for CTR Prediction. CoRR, Vol. abs/2108.01265 (2021). showeprint[arXiv]2108.01265 https://arxiv.org/abs/2108.01265
[25]
Ze Meng, Jinnian Zhang, Yumeng Li, Jiancheng Li, Tanchao Zhu, and Lifeng Sun. 2021 A General Method For Automatic Discovery of Powerful Interactions In Click-Through Rate Prediction. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Canada, 1298--1307.
[26]
Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR, Vol. abs/1906.00091 (2019).
[27]
Liang Qu, Yonghong Ye, Ningzhi Tang, Lixin Zhang, Yuhui Shi, and Hongzhi Yin. 2022. Single-shot Embedding Dimension Search in Recommender System. CoRR, Vol. abs/2204.03281 (2022), 11 pages. https://doi.org/10.48550/arXiv.2204.03281 showeprint[arXiv]2204.03281
[28]
Yanru Qu, Bohui Fang, Weinan Zhang, Ruiming Tang, Minzhe Niu, Huifeng Guo, Yong Yu, and Xiuqiang He. 2018. Product-Based Neural Networks for User Response Prediction over Multi-Field Categorical Data. ACM Trans. Inf. Syst., Vol. 37, 1, Article 5 (oct 2018), 35 pages.
[29]
Steffen Rendle. 2010. Factorization Machines. In ICDM 2010, The 10th IEEE International Conference on Data Mining. IEEE Computer Society, Australia, 995--1000.
[30]
Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting Clicks: Estimating the Click-through Rate for New Ads. In Proceedings of the 16th International Conference on World Wide Web (Banff, Alberta, Canada) (WWW '07). Association for Computing Machinery, New York, NY, USA, 521--530.
[31]
Jiayi Shen, Haotao Wang, Shupeng Gui, Jianchao Tan, Zhangyang Wang, and Ji Liu. 2021. UMEC: Unified model and embedding compression for efficient recommendation systems. In 9th International Conference on Learning Representations, ICLR 2021. OpenReview.net, Austria.
[32]
Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, and Jiyan Yang. 2020. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, USA, 165--175.
[33]
Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019. ACM, China, 1161--1170.
[34]
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. In Proceedings of the ADKDD'17 (ADKDD'17). Association for Computing Machinery, Canada, Article 12, 7 pages.
[35]
Yejing Wang, Xiangyu Zhao, Tong Xu, and Xian Wu. 2022. AutoField: Automating Feature Selection in Deep Recommender Systems. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, France) (WWW '22). Association for Computing Machinery, New York, NY, USA, 1977--1986.
[36]
Zhikun Wei, Xin Wang, and Wenwu Zhu. 2021. AutoIAS: Automatic Integrated Architecture Searcher for Click-Trough Rate Prediction. In CIKM '21: The 30th ACM International Conference on Information and Knowledge Management. ACM, Australia, 2101--2110.
[37]
Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. Feature Hashing for Large Scale Multitask Learning. In Proceedings of the 26th Annual International Conference on Machine Learning (Montreal, Quebec, Canada) (ICML '09). Association for Computing Machinery, New York, NY, USA, 1113--1120. https://doi.org/10.1145/1553374.1553516
[38]
Bencheng Yan, Pengjie Wang, Jinquan Liu, Wei Lin, Kuang-Chih Lee, Jian Xu, and Bo Zheng. 2021a. Binary Code Based Hash Embedding for Web-Scale Applications. Association for Computing Machinery, New York, NY, USA, 3563--3567.
[39]
Bencheng Yan, Pengjie Wang, Kai Zhang, Wei Lin, Kuang-Chih Lee, Jian Xu, and Bo Zheng. 2021b. Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer. In CIKM '21: The 30th ACM International Conference on Information and Knowledge Management. ACM, Australia, 3568--3572.
[40]
Xin Yuan, Pedro Henrique Pamplona Savarese, and Michael Maire. 2021. Growing Efficient Deep Networks by Structured Continuous Sparsification. In 9th International Conference on Learning Representations, ICLR 2021. OpenReview.net, Austria. https://openreview.net/forum?id=wb3wxCObbRT
[41]
Caojin Zhang, Yicun Liu, Yuanpu Xie, Sofia Ira Ktena, Alykhan Tejani, Akshay Gupta, Pranay Kumar Myana, Deepak Dilipkumar, Suvadip Paul, Ikuhiro Ihara, Prasang Upadhyaya, Ferenc Huszar, and Wenzhe Shi. 2020. Model Size Reduction Using Frequency Based Double Hashing for Recommender Systems. Association for Computing Machinery, New York, NY, USA, 521--526.
[42]
Weinan Zhang, Tianming Du, and Jun Wang. 2016. Deep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction. In Advances in Information Retrieval - 38th European Conference on IR Research, ECIR 2016, Vol. 9626. Springer, Italy, 45--57. https://doi.org/10.1007/978-3-319-30671-1_4
[43]
Xiangyu Zhao, Haochen Liu, Hui Liu, Jiliang Tang, Weiwei Guo, Jun Shi, Sida Wang, Huiji Gao, and Bo Long. 2021. AutoDim: Field-aware Embedding Dimension Searchin Recommender Systems. In WWW '21: The Web Conference 2021. ACM / IW3C2, Slovenia, 3015--3022.
[44]
Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, and Xiuqiang He. 2021. Open Benchmarking for Click-Through Rate Prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, Australia, 2759--2769.

Cited By

View all
  • (2024)Experimental Analysis of Large-Scale Learnable Vector Storage CompressionProceedings of the VLDB Endowment10.14778/3636218.363623417:4(808-822)Online publication date: 5-Mar-2024
  • (2024)CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation ModelsProceedings of the ACM on Management of Data10.1145/36393062:1(1-28)Online publication date: 26-Mar-2024
  • (2024)Embedding Compression in Recommender Systems: A SurveyACM Computing Surveys10.1145/363784156:5(1-21)Online publication date: 12-Jan-2024
  • Show More Cited By

Index Terms

  1. OptEmbed: Learning Optimal Embedding Table for Click-through Rate Prediction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
      October 2022
      5274 pages
      ISBN:9781450392365
      DOI:10.1145/3511808
      • General Chairs:
      • Mohammad Al Hasan,
      • Li Xiong
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. ctr prediction
      2. neural architecture search
      3. recommendation

      Qualifiers

      • Research-article

      Conference

      CIKM '22
      Sponsor:

      Acceptance Rates

      CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)97
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 16 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Experimental Analysis of Large-Scale Learnable Vector Storage CompressionProceedings of the VLDB Endowment10.14778/3636218.363623417:4(808-822)Online publication date: 5-Mar-2024
      • (2024)CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation ModelsProceedings of the ACM on Management of Data10.1145/36393062:1(1-28)Online publication date: 26-Mar-2024
      • (2024)Embedding Compression in Recommender Systems: A SurveyACM Computing Surveys10.1145/363784156:5(1-21)Online publication date: 12-Jan-2024
      • (2024)AutoDCS: Automated Decision Chain Selection in Deep Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657818(956-965)Online publication date: 10-Jul-2024
      • (2024)MultiFS: Automated Multi-Scenario Feature Selection in Deep Recommender SystemsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635859(434-442)Online publication date: 4-Mar-2024
      • (2023)MvFS: Multi-view Feature Selection for Recommender SystemProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615243(4048-4052)Online publication date: 21-Oct-2023
      • (2023)Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender SystemProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615135(741-750)Online publication date: 21-Oct-2023
      • (2023)iHAS: Instance-wise Hierarchical Architecture Search for Deep Learning Recommendation ModelsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614925(3030-3039)Online publication date: 21-Oct-2023
      • (2023)Optimizing Feature Set for Click-Through Rate PredictionProceedings of the ACM Web Conference 202310.1145/3543507.3583545(3386-3395)Online publication date: 30-Apr-2023
      • (2023)Continuous Input Embedding Size Search For Recommender SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591653(708-717)Online publication date: 19-Jul-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media