abstract

Learning to Infer Product Attribute Values From Descriptive Texts and Images

Authors:

Pablo Montalvo,

Aghiles SalahAuthors Info & Claims

WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Pages 1293 - 1294

https://doi.org/10.1145/3539597.3575786

Published: 27 February 2023 Publication History

Get Access

Abstract

Online marketplaces are able to offer a staggering array of products that no physical store can match. While this makes it more likely for customers to find what they want, in order for online providers to ensure a smooth and efficient user experience, they must maintain well-organized catalogs, which depends greatly on the availability of per-product attribute values such as color, material, brand, to name a few. Unfortunately, such information is often incomplete or even missing in practice, and therefore we have to resort to predictive models as well as other sources of information to impute missing attribute values.

In this talk we present the deep learning-based approach that we have developed at Rakuten Group to extract attribute values from product descriptive texts and images. Starting from pretrained architectures to encode textual and visual modalities, we discuss several refinements and improvements that we find necessary to achieve satisfactory performance and meet strict business requirements, namely improving recall while maintaining a high precision (>= 95%). Our methodology is driven by a systematic investigation into several practical research questions surrounding multimodality, which we revisit in this talk. At the heart of our multimodal architecture, is a new method to combine modalities inspired by empirical cross-modality comparisons. We present the latter component in details, point out one of its major limitations, namely exacerbating the issue of modality collapse, i.e., when the model forgets one modality, and describe our mitigation to this problem based on a principled regularization scheme.

We present various empirical results on both Rakuten data as well as public benchmark datasets, which provide evidence of the benefits of our approach compared to several strong baselines. We also share some insights to characterise the circumstances in which the proposed model offers the most significant improvements. We conclude this talk by criticising the current model and discussing possible future developments and improvements.

Our model is successfully deployed in Rakuten Ichiba - a Rakuten marketplace - and we believe that our investigation into multimodal attribute value extraction for e-commerce will benefit other researchers and practitioners alike embarking on similar journeys.

Supplementary Material

MP4 File (wsdm2023_special_industry_day_salah_images_01.mp4-streaming.mp4)

Learning to Infer Product Attribute Values From Descriptive Texts and Images

Download
954.67 MB

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. The International Conference on Learning Representations (ICLR) (2015).

Google Scholar

[2]

Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg.

Digital Library

Google Scholar

[3]

Â ngelo Cardoso, Fabio Daolio, and Saú l Vargas. 2018. Product Characterisation towards Personalisation: Learning Attributes from Unstructured Data to Recommend Fashion Products. (mar 2018). https://doi.org/10.1145/3219819.3219888 arxiv: 1803.07679

Digital Library

Google Scholar

[4]

Alo"is De la Comble, Anuvabh Dutt, Pablo Montalvo, and Aghiles Salah. 2022. Multi-Modal Attribute Extraction for E-Commerce. arXiv preprint arXiv:2203.03441 (2022).

Google Scholar

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

Google Scholar

[6]

Jerome H Friedman. 2017. The elements of statistical learning: Data mining, inference, and prediction. springer open.

Google Scholar

[7]

Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. 2016. Densely Connected Convolutional Networks. CoRR, Vol. abs/1608.06993 (2016). arxiv: 1608.06993 http://arxiv.org/abs/1608.06993

Google Scholar

[8]

Karin Mauge, Khash Rohanimanesh, and Jean David Ruvini. 2012. Structuring e-commerce inventory. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 805--814.

Digital Library

Google Scholar

[9]

Quoc-Tuan Truong, Aghiles Salah, and Hady W Lauw. 2021. Bilateral variational autoencoder for collaborative filtering. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 292--300.

Digital Library

Google Scholar

[10]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR, Vol. abs/1706.03762 (2017). arxiv: 1706.03762 http://arxiv.org/abs/1706.03762

Google Scholar

[11]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (ICML). PMLR, 2048--2057.

Google Scholar

[12]

Tiangang Zhu, Yue Wang, Haoran Li, Youzheng Wu, Xiaodong He, and Bowen Zhou. 2020. Multimodal joint attribute prediction and value extraction for E-commerce product. arXiv preprint arXiv:2009.07162 (2020).

Google Scholar

Index Terms

Learning to Infer Product Attribute Values From Descriptive Texts and Images
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
    2. Natural language processing
  2. Machine learning
    1. Machine learning approaches

Recommendations

PAM: Understanding Product Images in Cross Product Category Attribute Extraction
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Understanding product attributes plays an important role in improving online shopping experience for customers and serves asan integral part for constructing a product knowledge graph. Most existing methods focus on attribute extraction from text ...
CiteNet: Cross-modal incongruity perception network for multimodal sentiment prediction
Abstract
Multimodal sentiment prediction poses a formidable challenge that necessitates a profound understanding of both visual and linguistic cues, as well as the intricate interactions between them. The current achievements of modern systems in this ...
Multimodal Co-learning: Challenges, applications with datasets, recent advances and future directions
Abstract
Multimodal deep learning systems that employ multiple modalities like text, image, audio, video, etc., are showing better performance than individual modalities (i.e., unimodal) systems. Multimodal machine learning involves multiple ...

Comments

Information & Contributors

Information

Published In

WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

February 2023

1345 pages

ISBN:9781450394079

DOI:10.1145/3539597

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Hady Lauw
Singapore Management University
,
Program Chairs:
Luo Si
Salesforce
,
Evimaria Terzi
Boston University
,
Panayiotis Tsaparas
University of Ioannina

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Check for updates

Author Tags

Qualifiers

Abstract

Conference

WSDM '23

Sponsor:

WSDM '23: The Sixteenth ACM International Conference on Web Search and Data Mining

February 27 - March 3, 2023

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
101
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

PAM: Understanding Product Images in Cross Product Category Attribute Extraction

CiteNet: Cross-modal incongruity perception network for multimodal sentiment prediction

Multimodal Co-learning: Challenges, applications with datasets, recent advances and future directions

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Abstract

Supplementary Material

References

Index Terms

Recommendations

PAM: Understanding Product Images in Cross Product Category Attribute Extraction

CiteNet: Cross-modal incongruity perception network for multimodal sentiment prediction

Multimodal Co-learning: Challenges, applications with datasets, recent advances and future directions

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations