Building Materials Classification Model Based on Text Data Enhancement and Semantic Feature Extraction
Abstract
:1. Introduction
- In order to extract keywords from a corpus of different building materials and enrich the original building material text, a data augmentation method combining the LDA algorithm and Ngram is proposed.
- To specifically capture contextual semantic information, a novel layered feature extraction network was constructed. In this network, the full test features are obtained by the first convolutional layer; then, the key local features are further extracted by the composite convolutional layers.
- Experimental comparisons with various machine learning and deep learning models were conducted, and the results demonstrate the proposed method’s superior performance in classifying building materials.
2. Problem Formulation and the Proposed Method
2.1. Problems in Matching Building Material Types
2.2. The Proposed LNBC Model
2.3. Research on the Practicability of the Model
3. The Detailed Process of the Proposed LNBC Model
3.1. Data Preprocessing
3.2. Data Augmentation at Different Levels
3.3. Word Embedding
3.4. Feature Extraction and Aggregation for Full Text
3.4.1. Convolutional Calculations
3.4.2. Aggregation Features
3.5. Feature Extraction for Pre- and Post-Semantics
3.6. Classification Output
4. Experiments and Discussion
4.1. Comparisons and Experimental Environment
4.2. Evaluation Indicators
4.3. Comparative Experiments
4.4. Analysis of the Experimental Results
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wei, Y.; Chen, K.; Kang, J.; Chen, W.; Zhang, X.; Wang, X. Policy and management of carbon peaking and carbon neutrality: A literature review. Engineering 2022, 14, 52–63. [Google Scholar] [CrossRef]
- China Association of Building Energy Efficiency. China Building Energy Consumption and Carbon Emissions Research Report. 2022. Available online: https://finance.sina.com.cn/tech/roll/2023-03-12/doc-imykpzhc2296343.shtml (accessed on 15 May 2024).
- Standard for Terminology of Building Materials. 2010. Available online: https://www.doc88.com/p-7768454939608.html (accessed on 9 June 2024).
- Jing, X.; Wu, Z.; Zhang, L.; Li, Z.; Mu, D. Electrical fault diagnosis from text data: A supervised sentence embedding combined with imbalanced classification. IEEE Trans. Ind. Electron. 2024, 71, 3064–3073. [Google Scholar] [CrossRef]
- Garg, M. WELLXPLAIN: Wellness concept extraction and classification in Reddit posts for mental health analysis. Knowl. Based Syst. 2024, 284, 111228. [Google Scholar] [CrossRef]
- Tufchi, S.; Yadav, A.; Ahmed, T. A comprehensive survey of multimodal fake news detection techniques: Advances, challenges, and opportunities. Int. J. Multimed. Inf. Retr. 2023, 12, 28. [Google Scholar] [CrossRef]
- Kim, Y. Convolutional neural networks for sentence classification. EMNLP 2014, 1746–1751. [Google Scholar] [CrossRef]
- Aslan, S. A deep learning-based sentiment analysis approach (MF-CNN-BILSTM) and topic modeling of tweets related to the Ukraine-Russia conflict. Appl. Soft Comput. 2023, 143, 110404. [Google Scholar] [CrossRef]
- Lu, G.; Liu, Y.; Wang, J.; Wu, H. CNN-BiLSTM-Attention: A multi-label neural classifier for short texts with a small set of labels. Inf. Process Manag. 2023, 60, 103320. [Google Scholar] [CrossRef]
- Zhong, B.; Xing, X.; Love, P.; Wang, X.; Luo, H. Convolutional neural network: Deep learning-based classification of building quality problems. Adv. Eng. Inform. 2019, 40, 46–57. [Google Scholar] [CrossRef]
- Abulaish, M.; Sah, A.K. A Text Data Augmentation Approach for Improving the Performance of CNN. Comsnets 2019, 660–665. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar] [CrossRef]
- Bao, T.; Ren, N.; Luo, R.; Wang, B.J.; Shen, G.Y.; Guo, T. A BERT-Based Hybrid short text classification model incorporating CNN and Attention-Based BiGRU. J. Organ. End User Comput. 2021, 23, 21. [Google Scholar] [CrossRef]
- Liu, S.; Liu, S.; Liu, Z.; Peng, X.; Yang, Z. Automated detection of emotional and cognitive engagement in MOOC discussions to predict learning achievement. Comput. Educ. 2022, 181, 104461. [Google Scholar] [CrossRef]
- Li, Y.; Trevor, C.; Timothy, B. Robust training under linguistic adversity. EACL 2017, 2, 21–27. [Google Scholar]
- Marivate, V.; Sefara, T. Improving short text classification through global augmentation methods. Mach. Learn. Knowl. Extr. 2020, 4, 385–399. [Google Scholar] [CrossRef]
- Sahin, G.; Steedman, M. Data augmentation via dependency tree morphing for low-resource languages. arXiv 2018, arXiv:1903.09460. [Google Scholar] [CrossRef]
- Kuniyoshi, F.; Ozawa, J.; Miwa, M. Analyzing research trends in inorganic materials literature using NLP. arXiv 2021, arXiv:2106.14157. [Google Scholar] [CrossRef]
- Song, Y.; Miret, S.; Liu, B. MatSci-NLP: Evaluating scientific language models on materials science language tasks using text-to-schema modeling. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Volume 1, pp. 3621–3639. [Google Scholar] [CrossRef]
- Elton, D.C.; Turakhia, D.; Reddy, N.; Boukouvalas, Z.; Fuge, M.D.; Doherty, R.M.; Chung, P.W. Using natural language processing techniques to extract information on the properties and functionalities of energetic materials from large text corpora. In Proceedings of the 22nd International Seminar in New Trends in Research of Energetic Materials, Pardubice, Czech Republic, 10–12 April 2019. [Google Scholar] [CrossRef]
- Yoshitake, M.; Sato, F.; Kawano, H.; Teraoka, H. Materialbert for natural language processing of materials science texts. Sci. Technol. Adv. Mater. 2022, 2, 372–380. [Google Scholar] [CrossRef]
- Turhan, G.D. Life Cycle Assessment for the Unconventional Construction Materials in Collaboration with a Large Language Model. In Proceedings of the International Conference on Education and Research in Computer Aided Architectural Design in Europe, Graz, Austria, 20–22 September 2023. [Google Scholar]
- China Products Carbon Footprint Factors Database. Available online: https://lca.cityghg.com/ (accessed on 15 May 2024).
- El-Rashidy, M.; Farouk, A.; El-Fishawy, N.; Aslan, H.; Khodeir, N. New weighted BERT features and multi-CNN models to enhance the performance of MOOC posts classification. Neural Comput. Appl. 2023, 35, 18019–18033. [Google Scholar] [CrossRef]
- Al-Fuqaha’a, S.; Al-Madi, N.; Hammo, B. A robust classification approach to enhance clinic identification from Arabic health text. Neural Comput. Appl. 2024, 36, 7161–7185. [Google Scholar] [CrossRef]
- Huang, A.; Xu, R.; Chen, Y.; Guo, M. Research on multi-label user classification of social media based on ML-KNN algorithm. Technol. Forecast. Soc. Change 2023, 188, 122271. [Google Scholar] [CrossRef]
- Berkin, A.; Aerts, W.; Van Caneghem, T. Feasibility analysis of machine learning for performance-related attributional statements. Int. J. Account. Inf. Syst. 2023, 48, 100597. [Google Scholar] [CrossRef]
- Luo, X.; Li, X.; Song, X.; Liu, Q. Convolutional neural network algorithm-based novel automatic text classification framework for construction accident reports. J. Constr. Eng. Manag. 2023, 149. [Google Scholar] [CrossRef]
- Gu, D.; Li, M.; Yang, X.; Gu, Y.; Zhao, Y.; Liang, C.; Liu, H. An analysis of cognitive change in online mental health communities: A textual data analysis based on post replies of support seekers. Inform Process Manag. 2023, 60, 103192. [Google Scholar] [CrossRef]
- Hasib, K.; Towhid, N.; Faruk, K.; Al Mahmud, J.; Mridha, M. Strategies for enhancing the performance of news article classification in Bangla: Handling imbalance and interpretation. Eng. Appl. Artif. Intel. 2023, 125, 106688. [Google Scholar] [CrossRef]
- Yilmaz, S.; Toklu, S. A deep learning analysis on question classification task using Word2vec representations. Neural Comput. Appl. 2020, 32, 2909–2928. [Google Scholar] [CrossRef]
Building Materials Text | Material Types (Level 1) | Material Types (Level 2) | Material Types (Level 3) |
---|---|---|---|
Cotton yarn is made from cotton fibers through spinning. When processed into ply yarn, it becomes cotton thread. There are two types, Carded yarn: Made with a basic spinning system. Combed yarn: Made with a high-quality spinning system, resulting in smoother, stronger yarn used for premium fabrics. | pure cotton carded yarn | pure cotton carded yarn | pure cotton carded yarn |
Insulated wire is wire covered with an insulating layer. It includes magnet wire and general-purpose insulated wire. | wires, cables, optical cables, and electrical | insulated wire | copper core polyethylene insulated wire |
White calico includes various materials like cotton, linen, silk, taffeta, and satin, each with unique characteristics and uses. Cotton: Comfortable, perfect for everyday clothes and bedding. Linen: Light and cool, ideal for summer wear. Silk: Soft and luxurious, great for fancy dresses. Taffeta: Transparent, used for lingerie. Satin: Shiny and elegant, chosen for wedding gowns and curtains. | textile and garment industry | textile products | average textile products |
Floor tiles, made of porcelain or ceramic, are used for indoor and outdoor flooring. The size of the tiles is a key factor and depends on personal preference, design requirements, and room size. Larger tiles make spaces appear more spacious and tidy, and reduce the number of seams, making the floor smoother and easier to clean. | non-metal | non-metallic mineral products | architectural ceramics—porcelain tiles—wet milling process |
Insulation nails are special engineering plastic expansion nails used to fasten insulation boards to walls. They are specifically designed for external wall insulation and are widely used in building decoration, particularly for anchoring wall insulation. They consist of a galvanized screw, a nylon expansion tube, and a fixed round plate. | polymeric chemicals | synthetic resin | plastic-PVC |
Expansion bolts are devices used to anchor into concrete and other materials. They include a bolt, nut, nut sleeve, and spiral casing that together form an expansion anchoring system. The bolt is inserted into a pre-drilled hole and expands inside the hole through the action of the spiral casing and nut sleeve, providing a strong hold. They are used to fix structures like brackets, bridges, and pipes in construction projects. | metal | ferrous metal smelting and rolling products | steel products |
Solder is a common welding material used to join components in electronics, appliances, and communications equipment. It has a low melting point and good wettability and fluidity, enabling reliable welded connections. | metal | ferrous metal smelting and rolling products | refined tin |
Rebar, used in reinforced and prestressed concrete, usually has a round cross-section but can sometimes be square with rounded edges. Types include smooth, ribbed, and twisted rebar. Rebar for concrete can be straight or coiled, and comes in two types: smooth and deformed. Smooth round rebar is simply low-carbon steel in small diameters. | metal | ferrous metal smelting and rolling products | rebar |
Hanging rod, shaped like an ingot and also known as an ingot bar or Yuanbao rod, is used to transfer concentrated forces from the bottom to the top of concrete beam components. This enhances the beam’s ability to resist shear under concentrated loads. | metal | ferrous metal smelting and rolling products | rebar |
Flat steel is a metal with a large width-to-thickness ratio and a rectangular cross-section. Made of steel, it is thin and wide, used for frames, supports, brake pads, and mechanical parts. It is strong, rigid, and easy to process and cut into various shapes for customization. | metal | ferrous metal smelting and rolling products | small-sized steel materials |
Parameter | Value |
---|---|
Learning rate | 0.00001 |
Max_len | 256 |
Dimensions of a word vector | 768 |
Batch_size | 16 |
Epochs | 10 |
Convolution window size | 2, 3, 4 |
Activation function | Relu |
Optimizer | Adam |
Model | Evaluation Indicators | Model | Evaluation Indicators | ||||
---|---|---|---|---|---|---|---|
PMacro (%) | RMacro (%) | F1Macro (%) | PMacro (%) | RMacro (%) | F1Macro (%) | ||
SVM | 69.13 | 73.72 | 71.35 | LDA-Ngram-BERT-LSTM | 73.91 | 81.02 | 77.30 |
KNN | 65.78 | 72.26 | 68.87 | ||||
Naive Bayes | 68.96 | 73.72 | 71.26 | LNBC (none) * | 75.54 | 80.29 | 77.84 |
CNN | 71.97 | 75.91 | 73.89 | LNBC (2.3) * | 77.93 | 83.21 | 80.48 |
LSTM | 71.66 | 75.18 | 73.38 | LNBC (2.4) * | 73.92 | 82.48 | 77.97 |
BERT-CNN | 73.58 | 77.37 | 75.43 | LNBC (3.4) * | 78.13 | 83.94 | 80.93 |
LSTM-CNN | 70.68 | 78.10 | 74.20 | LNBC * | 78.89 | 83.94 | 81.33 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yan, Q.; Jiao, F.; Peng, W. Building Materials Classification Model Based on Text Data Enhancement and Semantic Feature Extraction. Buildings 2024, 14, 1859. https://doi.org/10.3390/buildings14061859
Yan Q, Jiao F, Peng W. Building Materials Classification Model Based on Text Data Enhancement and Semantic Feature Extraction. Buildings. 2024; 14(6):1859. https://doi.org/10.3390/buildings14061859
Chicago/Turabian StyleYan, Qiao, Fei Jiao, and Wei Peng. 2024. "Building Materials Classification Model Based on Text Data Enhancement and Semantic Feature Extraction" Buildings 14, no. 6: 1859. https://doi.org/10.3390/buildings14061859