Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A machine learning approach for product matching and categorization

Published: 01 January 2018 Publication History

Abstract

Consumers today have the option to purchase products from thousands of e-shops. However, the completeness of the product specifications and the taxonomies used for organizing the products differ across different e-shops. To improve the consumer experience, e.g., by allowing for easily comparing offers by different vendors, approaches for product integration on the Web are needed. In this paper, we present an approach that leverages neural language models and deep learning techniques in combination with standard classification approaches for product matching and categorization. In our approach we use structured product data as supervision for training feature extraction models able to extract attribute-value pairs from textual product descriptions. To minimize the need for lots of data for supervision, we use neural language models to produce word embeddings from large quantities of publicly available product data marked up with Microdata, which boost the performance of the feature extraction model, thus leading to better product matching and categorization performances. Furthermore, we use a deep Convolutional Neural Network to produce image embeddings from product images, which further improve the results on both tasks.

References

[1]
S. Bell and K. Bala, Learning visual similarity for product design with convolutional neural networks, ACM Trans. Graph. 34(4) (2015), 98:1–98:10.
[2]
S. Bhattacharya, S. Gollapudi and K. Munagala, Consideration set generation in commerce search, in: Proceedings of WWW, ACM, 2011, pp. 317–326.
[3]
R. Collobert and J. Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning, in: Proceedings of the 25th International Conference on Machine Learning, ACM, 2008, pp. 160–167.
[4]
J.G. de Souza, M. Federico and H. Sawaf, Mt quality estimation for e-commerce data, in: Proceedings of MT Summit XV, 2015.
[5]
R. DeNale and D. Weidenhamer, Quarterly Retail E-Commerce Sales 4th Quarter 2015, US Census Bureau News, 2015.
[6]
J.R. Finkel, T. Grenager and C. Manning, Incorporating non-local information into information extraction systems by Gibbs sampling, in: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005, pp. 363–370.
[7]
R. Ghani, K. Probst, Y. Liu, M. Krema and A. Fano, Text mining for product attribute extraction, ACM SIGKDD Explorations Newsletter 8(1) (2006), 41–48.
[8]
V. Gopalakrishnan, S.P. Iyengar, A. Madaan, R. Rastogi and S. Sengamedu, Matching product titles using web-based enrichment, in: 21st ACM International Conference on Information and Knowledge Management, 2012, pp. 605–614.
[9]
M. Grbovic, V. Radosavljevic, N. Djuric, N. Bhamidipati, J. Savla, V. Bhagwan and D. Sharp, E-commerce in your inbox: Product recommendations at scale, in: Proceedings of the 21th ACM SIGKDD, ACM, 2015, pp. 1809–1818.
[10]
V. Gupta, H. Karnick, A. Bansal and P. Jhala, Product classification in e-commerce using distributional semantics, 2016, CoRR, http://arxiv.org/abs/1606.06083.
[11]
R. Isele and C. Bizer, Learning linkage rules using genetic programming, in: Proceedings of the International Workshop on Ontology Matching, 2011, pp. 13–24.
[12]
A. Kannan, I.E. Givoni, R. Agrawal and A. Fuxman, Matching unstructured product offers to structured product specifications, in: 17th ACM SIGKDD, 2011.
[13]
A. Kannan, P.P. Talukdar, N. Rasiwasia and Q. Ke, Improving product classification using images, in: Proceedings of the 2011 IEEE 11th International Conference on Data Mining, IEEE Computer Society, 2011, pp. 310–319.
[14]
M.H. Kiapour, X. Han, S. Lazebnik, A.C. Berg and T.L. Berg, Where to buy it: Matching street clothing photos in online shops, in: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 3343–3351.
[15]
H. Köpcke, A. Thor, S. Thomas and E. Rahm, Tailoring entity resolution for matching product offers, in: Proceedings of the 15th International Conference on Extending Database Technology, ACM, 2012, pp. 545–550.
[16]
Z. Kozareva, Everyone likes shopping! Multi-class product categorization for e-commerce, in: The 2015 Annual Conference of the North Americal, 2015, pp. 1329–1333, Chapter for the ACL.
[17]
A. Krizhevsky, I. Sutskever and G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
[18]
Q.V. Le and T. Mikolov, Distributed representations of sentences and documents, 2014, arXiv preprint.
[19]
Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard and L.D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural computation 1(4) (1989), 541–551.
[20]
S. Liu, B. Tang, Q. Chen and X. Wang, Effects of semantic features on machine learning-based drug name recognition systems: Word embeddings vs. manually constructed dictionaries, Information 6(4) (2015), 848–865.
[21]
N. Londhe, V. Gopalakrishnan, A. Zhang, H.Q. Ngo and R. Srihari, Matching titles with cross title web-search enrichment and community detection, Proceedings of the VLDB Endowment 7(12) (2014), 1167–1178.
[22]
J. McAuley, C. Targett, Q. Shi and A. van den Hengel, Image-based recommendations on styles and substitutes, in: Proceedings of the 38th International ACM SIGIR Conference, ACM, 2015, pp. 43–52.
[23]
Q. McNemar, Note on the sampling error of the difference between correlated proportions or percentages, Psychometrika 12(2) (1947), 153–157.
[24]
G. Melli, Shallow semantic parsing of product offering titles (for better automatic hyperlink insertion), in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2014, pp. 1670–1678.
[25]
R. Meusel and H. Paulheim, Heuristics for Fixing Common Errors in Deployed schema.org Microdata, Springer International Publishing, Cham, 2015, pp. 152–168.
[26]
R. Meusel, P. Petrovski and C. Bizer, The webdatacommons microdata, rdfa and microformat dataset series, in: The Semantic Web-ISWC, 2014, pp. 277–292.
[27]
R. Meusel, A. Primpeli, C. Meilicke, H. Paulheim and C. Bizer, Exploiting microdata annotations to consistently categorize product offers at web scale, in: Proceedings of EC-Web, Valencia, Spain, 2015.
[28]
T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, 2013, arXiv preprint.
[29]
T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado and J. Dean, Distributed representations of words and phrases and their compositionality, in: Advances in Neural Information Processing Systems, 2013, pp. 3111–3119.
[30]
A. Mnih and G.E. Hinton, A scalable hierarchical distributed language model, in: Advances in Neural Information Processing Systems, 2009, pp. 1081–1088.
[31]
H. Nguyen, A. Fuxman, S. Paparizos, J. Freire and R. Agrawal, Synthesizing products for online catalogs, Proceedings of the VLDB Endowment 4(7) (2011), 409–418.
[32]
J. Pennington, R. Socher, C.D. Manning and Glove, Global vectors for word representation, in: EMNLP, Vol. 14, 2014, pp. 1532–1543.
[33]
P. Petrovski, V. Bryl and C. Bizer, Integrating product data from websites offering microdata markup, in: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, 2014, pp. 1299–1304.
[34]
P. Petrovski, V. Bryl and C. Bizer, Learning regular expressions for the extraction of product attributes from e-commerce microdata, 2014.
[35]
D. Qiu, L. Barbosa, X.L. Dong, Y. Shen, D. Srivastava and Dexter, Large-scale discovery and extraction of product specifications on the web, Proceedings of the VLDB Endowment 8(13) (2015), 2194–2205.
[36]
P. Ristoski and P. Mika, Enriching product ads with metadata from html annotations, in: Proceedings of the 13th Extended Semantic Web Conference, 2016.
[37]
J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015), 85–117, http://www.sciencedirect.com/science/article/pii/S0893608014002135.
[38]
R. van Bezu, S. Borst, R. Rijkse, J. Verhagen, D. Vandic and F. Frasincar, Multi-component similarity method for web product duplicate detection, 2015.
[39]
D. Vandic, J.W. Van Dam and F. Frasincar, Faceted product search powered by the semantic web, Decision Support Systems 53(3) (2012), 425–437.
[40]
M. Wang and C.D. Manning, Effect of non-linear deep architecture in sequence labeling, in: IJCNLP, 2013, pp. 1285–1291.
[41]
X. Wang, Z. Sun, W. Zhang, Y. Zhou and Y.G. Jiang, Matching user photos to online products with robust deep features, in: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ICMR’16, ACM, New York, NY, USA, 2016, pp. 7–14, http://doi.acm.org/10.1145/2911996.2912002.
[42]
W.X. Zhao, S. Li, Y. He, E. Chang, J.R. Wen and X. Li, Connecting social media to e-commerce: Cold-start product recommendation on microblogs, IEEE Transactions on Knowledge and Data Engineering 28(5) (2016), 1147–1159.

Cited By

View all

Index Terms

  1. A machine learning approach for product matching and categorization
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Semantic Web
          Semantic Web  Volume 9, Issue 5
          2018
          180 pages
          ISSN:1570-0844
          EISSN:2210-4968
          Issue’s Table of Contents

          Publisher

          IOS Press

          Netherlands

          Publication History

          Published: 01 January 2018

          Author Tags

          1. Product data
          2. data integration
          3. vector space embeddings
          4. deep learning
          5. microdata
          6. schema.org

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 22 Sep 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Investigating Influence of Google-Play Application Titles on SuccessBig Data Research10.1016/j.bdr.2024.10044336:COnline publication date: 28-May-2024
          • (2023)Multi-Faceted Knowledge-Driven Pre-Training for Product Representation LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320092135:7(7239-7250)Online publication date: 1-Jul-2023
          • (2023)Proportionally Fair Matching with Multiple GroupsGraph-Theoretic Concepts in Computer Science10.1007/978-3-031-43380-1_1(1-15)Online publication date: 28-Jun-2023
          • (2022)An Exploratory Study on Utilising the Web of Linked Data for Product Data MiningSN Computer Science10.1007/s42979-022-01415-34:1Online publication date: 17-Oct-2022
          • (2022)An Application of Learned Multi-modal Product Similarity to E-CommerceSimilarity Search and Applications10.1007/978-3-031-17849-8_3(25-39)Online publication date: 5-Oct-2022
          • (2021)Metric Learning Based Vision Transformer for Product MatchingNeural Information Processing10.1007/978-3-030-92185-9_1(3-13)Online publication date: 8-Dec-2021
          • (2020)Learning expressive linkage rules from sparse dataSemantic Web10.3233/SW-19035611:3(549-567)Online publication date: 1-Jan-2020
          • (2019)The WDC Training Dataset and Gold Standard for Large-Scale Product MatchingCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3316609(381-386)Online publication date: 13-May-2019
          • (2019)Product Classification Using Microdata AnnotationsThe Semantic Web – ISWC 201910.1007/978-3-030-30793-6_41(716-732)Online publication date: 26-Oct-2019

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media