Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3487553.3524213acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
short-paper
Open access

ROSE: Robust Caches for Amazon Product Search

Published: 16 August 2022 Publication History

Abstract

Product search engines like Amazon Search often use caches to improve the customer user experience; caches can improve both the system’s latency as well as search quality. However, as search traffic increases over time, the cache’s ever-growing size can diminish the overall system performance. Furthermore, typos, misspellings, and redundancy widely witnessed in real-world product search queries can cause unnecessary cache misses, reducing the cache’s utility. In this paper, we introduce ROSE, a RObuSt cachE, a system that is tolerant to misspellings and typos while retaining the look-up cost of traditional caches. The core component of ROSE is a randomized hashing schema that makes ROSE able to index and retrieve an arbitrarily large set of queries with constant memory and constant time. ROSE is also robust to any query intent, typos, and grammatical errors with theoretical guarantees. Extensive experiments on real-world datasets demonstrate the effectiveness and efficiency of ROSE. ROSE is deployed in the Amazon Search Engine and produced a significant improvement over the existing solutions across several key business metrics.

References

[1]
Aman Ahuja, Nikhil Rao, Sumeet Katariya, Karthik Subbian, and Chandan K Reddy. 2020. Language-Agnostic Representation Learning for Product Search on E-Commerce Platforms. In Proceedings of the 13th International Conference on Web Search and Data Mining. 7–15.
[2]
Keping Bi, Choon Hui Teo, Yesh Dattatreya, Vijai Mohan, and W Bruce Croft. 2019. A Study of Context Dependencies in Multi-page Product Search. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2333–2336.
[3]
Andrei Broder. 2005. Algorithms for duplicate documents. URL:: http://www. cs. princeton. edu/courses/archive/spr05/cos598E/bib/Princeton. pdf (2015)(2005).
[4]
Randal E Bryant, O’Hallaron David Richard, and O’Hallaron David Richard. 2003. Computer systems: a programmer’s perspective. Vol. 2. Prentice Hall Upper Saddle River.
[5]
Tobias Christiani. 2020. DartMinHash: Fast Sketching for Weighted Sets. arXiv preprint arXiv:2005.11547(2020).
[6]
Yulong Gu, Zhuoye Ding, Shuaiqiang Wang, and Dawei Yin. 2020. Hierarchical User Profiling for E-commerce Recommender Systems. In Proceedings of the 13th International Conference on Web Search and Data Mining. 223–231.
[7]
Christian Hansen, Rishabh Mehrotra, Casper Hansen, Brian Brost, Lucas Maystre, and Mounia Lalmas. 2021. Shifting consumption towards diverse content on music streaming platforms. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 238–246.
[8]
Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (Dallas, Texas, USA) (STOC ’98). Association for Computing Machinery, New York, NY, USA, 604–613. https://doi.org/10.1145/276698.276876
[9]
Sergey Ioffe. 2010. Improved consistent sampling, weighted minhash and l1 sketching. In 2010 IEEE International Conference on Data Mining. IEEE, 246–255.
[10]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734(2017).
[11]
Ting Liang, Guanxiong Zeng, Qiwei Zhong, Jianfeng Chi, Jinghua Feng, Xiang Ao, and Jiayu Tang. 2021. Credit Risk and Limits Forecasting in E-Commerce Consumer Lending Service via Multi-view-aware Mixture-of-experts Nets. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 229–237.
[12]
Heran Lin, Pengcheng Xiong, Danqing Zhang, Fan Yang, Ryoichi Kato, Mukul Kumar, William Headden, and Bing Yin. [n.d.]. Light Feed-Forward Networks for Shard Selection in Large-scale Product Search. ([n. d.]).
[13]
Chen Luo. 2020. Some Rare LSH Gems for Large-scale Machine Learning. Ph.D. Dissertation. Rice University.
[14]
Chen Luo, Zhengzhang Chen, Lu-An Tang, Anshumali Shrivastava, Zhichun Li, Haifeng Chen, and Jieping Ye. 2018. TINET: learning invariant networks via knowledge transfer. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1890–1899.
[15]
Chen Luo, Jian-Guang Lou, Qingwei Lin, Qiang Fu, Rui Ding, Dongmei Zhang, and Zhe Wang. 2014. Correlating events with time series for incident diagnosis. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1583–1592.
[16]
Chen Luo and Anshumali Shrivastava. 2017. SSH (sketch, shingle, & hash) for indexing massive-scale time series. In NIPS 2016 Time Series Workshop. PMLR, 38–58.
[17]
Chen Luo and Anshumali Shrivastava. 2018. Arrays of (locality-sensitive) count estimators (ace): High-speed anomaly detection via cache lookups. (2018).
[18]
Chen Luo and Anshumali Shrivastava. 2019. Scaling-up split-merge mcmc with locality sensitive sampling (lss). In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4464–4471.
[19]
Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, USA.
[20]
Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019. Semantic product search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2876–2885.
[21]
Xichuan Niu, Bofang Li, Chenliang Li, Rong Xiao, Haochuan Sun, Hongbo Deng, and Zhenzhong Chen. 2020. A Dual Heterogeneous Graph Attention Network to Improve Long-Tail Performance for Shop Search in E-Commerce. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3405–3415.
[22]
Jiatu Shi, Huaxiu Yao, Xian Wu, Tong Li, Zedong Lin, Tengfei Wang, and Binqiang Zhao. 2021. Relation-aware Meta-learning for E-commerce Market Segment Demand Prediction with Limited Records. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 220–228.
[23]
Anshumali Shrivastava. 2016. Simple and Efficient Weighted Minwise Hashing. In NIPS. 1498–1506.
[24]
Anshumali Shrivastava. 2017. Optimal Densification for Fast and Accurate Minwise Hashing. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017(Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 3154–3163. http://proceedings.mlr.press/v70/shrivastava17a.html
[25]
Anshumali Shrivastava and Ping Li. 2015. Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS). In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI 2015, July 12-16, 2015, Amsterdam, The Netherlands, Marina Meila and Tom Heskes (Eds.). AUAI Press, 812–821. http://auai.org/uai2015/proceedings/papers/96.pdf
[26]
Zehong Tan, Canran Xu, Mengjie Jiang, Hua Yang, and Xiaoyuan Wu. 2017. Query rewrite for null and low search results in eCommerce. In eCOM@ SIGIR.
[27]
Jeffrey S Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS) 11, 1 (1985), 37–57.
[28]
Yiqiu Wang, Anshumali Shrivastava, Jonathan Wang, and Junghee Ryu. 2018. Randomized algorithms accelerated over cpu-gpu for ultra-high dimensional similarity search. In Proceedings of the 2018 International Conference on Management of Data. 889–903.
[29]
Rong Xiao, Jianhui Ji, Baoliang Cui, Haihong Tang, Wenwu Ou, Yanghua Xiao, Jiwei Tan, and Xuan Ju. 2019. Weakly Supervised Co-Training of Query Rewriting andSemantic Matching for e-Commerce. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 402–410.
[30]
Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. 2020. Product knowledge graph embedding for e-commerce. In Proceedings of the 13th international conference on web search and data mining. 672–680.
[31]
Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. 2021. Theoretical Understandings of Product Embedding for E-commerce Machine Learning. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 256–264.
[32]
Danqing Zhang, Zheng Li, Tianyu Cao, Chen Luo, Tony Wu, Hanqing Lu, Yiwei Song, Bing Yin, Tuo Zhao, and Qiang Yang. 2021. QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4362–4372.
[33]
Junhao Zhang, Weidi Xu, Jianhui Ji, Xi Chen, Hongbo Deng, and Keping Yang. 2021. Modeling Across-Context Attention For Long-Tail Query Classification in E-commerce. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 58–66.

Cited By

View all
  • (2024)E-Commerce Bot Traffic: In-Network Impact, Detection, and Mitigation2024 27th Conference on Innovation in Clouds, Internet and Networks (ICIN)10.1109/ICIN60470.2024.10494459(179-185)Online publication date: 11-Mar-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '22: Companion Proceedings of the Web Conference 2022
April 2022
1338 pages
ISBN:9781450391306
DOI:10.1145/3487553
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Amazon Search
  2. Data Mining
  3. Robust Cache

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

WWW '22
Sponsor:
WWW '22: The ACM Web Conference 2022
April 25 - 29, 2022
Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)284
  • Downloads (Last 6 weeks)61
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)E-Commerce Bot Traffic: In-Network Impact, Detection, and Mitigation2024 27th Conference on Innovation in Clouds, Internet and Networks (ICIN)10.1109/ICIN60470.2024.10494459(179-185)Online publication date: 11-Mar-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media