Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3539597.3575786acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
abstract

Learning to Infer Product Attribute Values From Descriptive Texts and Images

Published: 27 February 2023 Publication History

Abstract

Online marketplaces are able to offer a staggering array of products that no physical store can match. While this makes it more likely for customers to find what they want, in order for online providers to ensure a smooth and efficient user experience, they must maintain well-organized catalogs, which depends greatly on the availability of per-product attribute values such as color, material, brand, to name a few. Unfortunately, such information is often incomplete or even missing in practice, and therefore we have to resort to predictive models as well as other sources of information to impute missing attribute values.
In this talk we present the deep learning-based approach that we have developed at Rakuten Group to extract attribute values from product descriptive texts and images. Starting from pretrained architectures to encode textual and visual modalities, we discuss several refinements and improvements that we find necessary to achieve satisfactory performance and meet strict business requirements, namely improving recall while maintaining a high precision (>= 95%). Our methodology is driven by a systematic investigation into several practical research questions surrounding multimodality, which we revisit in this talk. At the heart of our multimodal architecture, is a new method to combine modalities inspired by empirical cross-modality comparisons. We present the latter component in details, point out one of its major limitations, namely exacerbating the issue of modality collapse, i.e., when the model forgets one modality, and describe our mitigation to this problem based on a principled regularization scheme.
We present various empirical results on both Rakuten data as well as public benchmark datasets, which provide evidence of the benefits of our approach compared to several strong baselines. We also share some insights to characterise the circumstances in which the proposed model offers the most significant improvements. We conclude this talk by criticising the current model and discussing possible future developments and improvements.
Our model is successfully deployed in Rakuten Ichiba - a Rakuten marketplace - and we believe that our investigation into multimodal attribute value extraction for e-commerce will benefit other researchers and practitioners alike embarking on similar journeys.

Supplementary Material

MP4 File (wsdm2023_special_industry_day_salah_images_01.mp4-streaming.mp4)
Learning to Infer Product Attribute Values From Descriptive Texts and Images

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. The International Conference on Learning Representations (ICLR) (2015).
[2]
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg.
[3]
 ngelo Cardoso, Fabio Daolio, and Saú l Vargas. 2018. Product Characterisation towards Personalisation: Learning Attributes from Unstructured Data to Recommend Fashion Products. (mar 2018). https://doi.org/10.1145/3219819.3219888 arxiv: 1803.07679
[4]
Alo"is De la Comble, Anuvabh Dutt, Pablo Montalvo, and Aghiles Salah. 2022. Multi-Modal Attribute Extraction for E-Commerce. arXiv preprint arXiv:2203.03441 (2022).
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[6]
Jerome H Friedman. 2017. The elements of statistical learning: Data mining, inference, and prediction. springer open.
[7]
Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. 2016. Densely Connected Convolutional Networks. CoRR, Vol. abs/1608.06993 (2016). arxiv: 1608.06993 http://arxiv.org/abs/1608.06993
[8]
Karin Mauge, Khash Rohanimanesh, and Jean David Ruvini. 2012. Structuring e-commerce inventory. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 805--814.
[9]
Quoc-Tuan Truong, Aghiles Salah, and Hady W Lauw. 2021. Bilateral variational autoencoder for collaborative filtering. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 292--300.
[10]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR, Vol. abs/1706.03762 (2017). arxiv: 1706.03762 http://arxiv.org/abs/1706.03762
[11]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (ICML). PMLR, 2048--2057.
[12]
Tiangang Zhu, Yue Wang, Haoran Li, Youzheng Wu, Xiaodong He, and Bowen Zhou. 2020. Multimodal joint attribute prediction and value extraction for E-commerce product. arXiv preprint arXiv:2009.07162 (2020).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining
February 2023
1345 pages
ISBN:9781450394079
DOI:10.1145/3539597
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Check for updates

Author Tags

  1. e-commerce
  2. multimodal deep learning
  3. neural networks
  4. product attribute values extraction
  5. text and image data

Qualifiers

  • Abstract

Conference

WSDM '23

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 101
    Total Downloads
  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media