Abstract
This paper considers the problem of semantically typing pet products using only independent and crowdsourced reviews provided for them on e-commerce websites by customers purchasing the product, rather than detailed product descriptions. Instead of proposing new methods, we consider the feasibility of established text classification algorithms in support of this goal. We conduct a detailed series of experiments, using three different methodologies and a two-level pet product taxonomy. Our results show that classic methods can serve as robust solutions to this problem, and that, while promising when more data is available, language models and word embeddings tend both to be more computationally intensive, as well as being susceptible to degraded performance in the long tail.
These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
https://huggingface.co/datasets/amazon_us_reviews/viewer/Pet_Products_v1_00/train; Accessed: Jan 25, 2022.
- 3.
We could have also done a ‘global’ 80-20 split of the entire manually labeled sample of 1000 products, but this would not have guaranteed that all Level 1 categories were, in fact, represented. This is especially the case due to the ‘long-tail’ nature of the distribution: Level 1 categories like Reptile are significantly less prevalent in the sample and in the overall dataset than Dog or Cat.
- 4.
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html; Accessed: March 1, 2023.
- 5.
https://docs.python.org/3/library/string.html; Accessed: March 1, 2023.
- 6.
https://scikit-learn.org/stable/modules/feature_extraction.html#stop-words; Accessed: March 1, 2023.
- 7.
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html; Accessed: March 1, 2023.
- 8.
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html; Accessed: March 1, 2023.
- 9.
https://www.nltk.org/api/nltk.tokenize.html; Accessed: March 1, 2023.
- 10.
Specifically, “fasttext-wiki-news-subwords-300” accessed at https://fasttext.cc/docs/en/english-vectors.html.
- 11.
https://nlp.stanford.edu/projects/glove/; Accessed: March 1, 2023.
- 12.
Specifically, the distilbert-base-uncased model accessed at https://huggingface.co/distilbert-base-uncased.
- 13.
Note that, because we have partitioned products into training and test sets, reviews would not ‘straddle’ the two sets: either all n reviews for a product would be allocated to the training set partition, or to the test set partition.
References
Kejriwal, M.: Domain-Specific Knowledge Graph Construction. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12375-8
Ehrig, M.: Ontology Alignment: Bridging the Semantic Gap. Springer, New York (2006). https://doi.org/10.1007/978-0-387-36501-5
Kejriwal, M., Shen, K., Ni, C.-C., Torzec, N.: An evaluation and annotation methodology for product category matching in e-commerce. Comput. Ind. 131, 103497 (2021)
Cho, Y.H., Kim, J.K.: Application of web usage mining and product taxonomy to collaborative recommendations in e-commerce. Expert Syst. Appl. 26(2), 233–246 (2004)
Kejriwal, M., Shen, K., Ni, C.-C., Torzec, N.: Transfer-based taxonomy induction over concept labels. Eng. Appl. Artif. Intell. 108, 104548 (2022)
Zhang, W., Cao, H., Lin, L.: Analysis of the future development trend of the pet industry. In: 2022 7th International Conference on Financial Innovation and Economic Development (ICFIED 2022), pp. 1682–1689. Atlantis Press (2022)
Bakos, Y.: The emerging landscape for retail e-commerce. J. Econ. Perspect. 15(1), 69–80 (2001)
Nakashole, N., Tylenda, T., Weikum, G.: Fine-grained semantic typing of emerging entities. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1488–1497 (2013)
Kejriwal, M., Szekely, P.: Supervised typing of big graphs using semantic embeddings. In: Proceedings of the International Workshop on Semantic Big Data, pp. 1–6 (2017)
Kapoor, R., Kejriwal, M., Szekely, P.: Using contexts and constraints for improved geotagging of human trafficking webpages. In: Proceedings of the Fourth International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, pp. 1–6 (2017)
Kejriwal, M., Szekely, P.: Scalable generation of type embeddings using the ABox. Open J. Semant. Web (OJSW) 4(1), 20–34 (2017)
Gene Ontology Consortium: Creating the gene ontology resource: design and implementation. Genome Res. 11(8), 1425–1433 (2001)
Kejriwal, M., Knoblock, C.A., Szekely, P.: Knowledge Graphs: Fundamentals, Techniques, and Applications. MIT Press, Cambridge (2021)
Dong, X.L.: Challenges and innovations in building a product knowledge graph. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2869–2869 (2018)
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey. Information 10(4), 150 (2019)
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: a comprehensive review. ACM Comput. Surv. (CSUR) 54(3), 1–40 (2021)
Grandini, M., Bagli, E., Visani, G.: Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756 (2020)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Gheini, M., Kejriwal, M.: Unsupervised product entity resolution using graph representation learning. In: eCOM@ SIGIR (2019)
Sarawagi, S., et al.: Information extraction. Found. Trends® Databases 1(3), 261–377 (2008)
Kejriwal, M.: A meta-engine for building domain-specific search engines. Softw. Impacts 7, 100052 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, X. et al. (2023). Automatic Semantic Typing of Pet E-commerce Products Using Crowdsourced Reviews: An Experimental Study. In: Ortiz-Rodriguez, F., Villazón-Terrazas, B., Tiwari, S., Bobed, C. (eds) Knowledge Graphs and Semantic Web. KGSWC 2023. Lecture Notes in Computer Science, vol 14382. Springer, Cham. https://doi.org/10.1007/978-3-031-47745-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-47745-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47744-7
Online ISBN: 978-3-031-47745-4
eBook Packages: Computer ScienceComputer Science (R0)