Text Classification of Patient Experience Comments in Saudi Dialect Using Deep Learning Techniques
Abstract
:1. Introduction
2. Related Work
2.1. Text Classification in Healthcare
2.2. Text Classification in Arabic
2.3. Bert-Based Arabic Text Classification
2.4. Summary
3. Proposed Methodology
3.1. Data Set Description
3.2. Data Pre-Processing
3.2.1. Data Cleaning
3.2.2. Data Normalization
3.2.3. Tokenization
3.2.4. Data Representation
- Static word embeddings: Words are represented using short dense vectors in a multi-dimensional space. This representation is better than traditional techniques, as it can manifest words with similar meaning [57]. In this work, we utilize AraVec, a pre-trained word embedding model published in 2017 by Soliman et al. [31]. It has many variations that were trained on different data sets that compose more than 3,000,000,000 tokens. The used AraVec model was trained with the skip-gram algorithm on the Twitter data set, and has a vector size of length 300 [58]. The skip-gram algorithm was used, as it can grasp the context better than continuous bag of words (CBOW) as previously demonstrated in the literature [36].Furthermore, we built a patient experience-specific word embedding considering all the comments we obtained, which was 968,985 comments. The same data cleaning and normalization processes described above were followed. Stop words were removed, as they do not add any meaning to the static embeddings. We utilized an existing list of Arabic stop words [59] to eliminate them from the texts. According to previous studies, removing stop words can reduce the size of a corpus by 35–45% while simultaneously increasing the accuracy and effectiveness of text mining applications, thus reducing the overall temporal and spatial complexity of the application [60]. We specified the vector size to be 300, window size 5, and the minimum occurrences of a vocabulary to be 2. We yielded 160,136 unique vocabularies that we utilized in vectorization for some models. It is important to mention that the third quartile for the number of words per comment was 29 for the comments without stop word removal, and 22 for the comments with stop words removed, which implies that the sequence length of the static word embeddings that we built was sufficient for the case at hand. Figure 5 shows the projection of similar word clusters based on the word sense.
- Contextual Embeddings: One of the main limitations of static word embeddings is that words with multiple meanings have a single representation. Contextual embeddings solve this problem by capturing the context in which a word appears. Unlabeled data are used to train such models. BERT is a language-dependent pre-trained model that applies the concept of contextual embeddings. It is bidirectional, which means that it captures the context that precedes and follows the represented words [15]. We utilized four contextual embeddings, including AraBERT [46], which is an Arabic implementation of BERT that tokenizes the text using the SentencePiece tokenizer and was trained with 77 GB of the Arabic unlabeled data set; MarBERT [61], which was trained using 128 GB of text and tokenized using WordPiece; and Qarib [62], which is a language model trained using 180,000 tweets and tokenized using byte pair encoding (BPE). All of the above-mentioned models were built using MSA and DA. Table 9 provides the parameters used for pre-training. Moreover, we utilized the unannotated data collected by the PX center of the MOH to pre-train a fourth model: the Arabic PX-specific BERT model (PX_BERT). A total of 968,985 comments written in MSA and DA were utilized. The BERT BPE tokenizer was used for tokenization of the comments. This model was trained with masked language modeling (MLM) head only; we utilized BertForMaskedLM from the transformers Python library, and we configured the model with the values given in Table 10, where the masking percentage was set to 15%, without the next-sentence prediction (NSP) head. This is based on evidence in the literature that no improvement was observed when using NSP in terms of NLP downstream task performance [22].
3.3. Trained Models
3.3.1. Bidirectional Long Short-Term Memory Network
- BiLSTM with AraVec static word embeddings: We utilized AraVec pre-trained static word embeddings, which requires vectorizing the comments into a compatible format of length 300. The vectorized comments were then fed into a Bidirectional LSTM layer through an embedding layer of length 300, using the hyperparameters given in Table 11. The embedding layer of size 300 provides a higher capacity for representing words and allows the model to capture more nuances and semantic relationships in the text. In addition, pre-trained word embeddings such as Word2Vec are often available in 300-dimensional vectors, and using a similar dimension for our embedding layer was expected to facilitate comparison and knowledge transfer. The number of units in the LSTM layer was set to 128, in order to capture more complex patterns and dependencies, as the used data were complex and rich in sequential information. As an optimizer, Adam combines the advantages of both the Adagrad and RMSprop algorithms. It adapts the learning rate for each parameter, leading to faster convergence and better optimization, making it well-suited to complex models. As the chosen optimization technique was Adam, the learning rate was set to 0.001. Adam automatically adjusts the learning rate during training, and starting with a smaller learning rate (0.001) is generally considered an appropriate choice. Although a lower learning rate can result in slower convergence, it can also lead to a more stable training process. In contrast, higher learning rates may speed up convergence but at the risk of overshooting the optimal solution. A dropout rate of 0.2 is often chosen when training models, representing a moderate regularization level. The batch size was set to 128 for faster training and smoother gradients, and the number of units in a dense layer was set to 25 in order to reduce model complexity, improve training efficiency, and make it less prone to overfitting.
- BiLSTM with AraVec static word embeddings and hyperparameter tuning: We aimed to fine tune the hyperparameters to obtain better performance. Many hyperparameter combinations were tested, as detailed in Table 12, using the KerasTuner python library. We implemented 30 grid search trials in order to obtain 30 different combinations of randomly set hyperparameters. Our goal was to determine the hyperparameter combination that leads to the best performance without compromising the time and computing power. Table 13 gives the hyperparameter values for the best model found among the 30 models.
- BiLSTM with PX-Vec static word embeddings: PX-Vec word embeddings were built especially for this experiment in order to vectorize the comments. Then, the vectorized comments were fed into a Bidirectional LSTM model through an embedding layer, following the same hyperparameter values mentioned in Table 11, which represent the best hyperparameter set found by the hyperparameter tuning algorithm. Additionally, we carried out cross validation to check the reliability of our model, especially as we used an imbalanced data set.
3.3.2. Bidirectional Gated Recurrent Unit
- BiGRU with AraVec static word embeddings: Following the same procedure used to build the BiLSTM model, AraVec pre-trained static word embeddings were used to vectorize the comments, which were then fed into a BiGRU model through an embedding layer, using the preset hyperparameter values listed in Table 14.
- BiGRU with AraVec static word embeddings and hyperparameter tuning: To obtain the hyperparameter combination that results in the best performance, we experimented with 30 random combinations of the BiGRU hyperparameters listed in Table 15. Table 16 provides the hyperparameter values for the best model found among the 30 trials in this experiment.
- BiGRU with PX-Vec static word embeddings: PX-Vec embeddings were used to represent the comments, then fed into a BiGRU model through an embedding layer. The hyperparameter values mentioned in Table 14 were used. Additionally, cross validation was applied in order to examine the reliability of our model.
3.3.3. BERT-Based Model
- Fine-tuned AraBERT: We fine tuned the AraBERTv02 base model using 80% of the training data and the parameters listed in Table 17. We utilized the AraBERT tokenizer to transform the data into an appropriate format for the fine-tuning process.
- Fine-tuned MarBERT: We fine tuned the MarBERT pre-trained model using the parameters listed in Table 17.
- Fine-tuned Qarib: We fine tuned the pre-trained Qarib model for the task of multi-label text classification using the parameters listed in Table 17.
- Fine-tuned PX-BERT: We built a customized PX_BERT model by pre-training a BERT model using the PX unlabeled data set provided by the PX center at MOH (PX_BERT pretraining process described in Section 3.2.4). The pre-trained model was then fine tuned using BertForSequenceClassification from the transformers Python library, with the problem type set to multi-label classification, and the remaining parameters were set as detailed in Table 17.
4. Experimental Results
4.1. Experimental Settings
4.2. Performance Measures
- Accuracy measures the percentage of correctly classified observations out of the total number of observations, using the following formula:
- Precision measures the percentage of the correctly classified observations as positives out of the total classified positive observations, using the following formula:
- Recall measures the percentage of observations classified correctly as positive, out of the total actual positive observations using the following formula:
- F1 Score is the arithmetic mean over the harmonic mean of precision and recall calculated using the following formula:
4.3. Results
4.3.1. Negative-Only Data Set (13K)
4.3.2. All-Sentiment Data Set (19K)
4.3.3. All Sentiment Data Set (19K with 20 Classes)
4.3.4. Average Results Based on the Various Data Sets
4.3.5. Computational Time
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wolf, J.A.; Niederhauser, V.; Marshburn, D.; LaVela, S.L. Defining Patient Experience. Patient Exp. J. 2014, 1, 7–19. [Google Scholar]
- Ferreira, J.; Patel, P.; Guadagno, E.; Ow, N.; Wray, J.; Emil, S.; Poenaru, D. Patient experience or patient satisfaction? A systematic review of child- and family-reported experience measures in pediatric surgery. J. Pediatr. Surg. 2023, 58, 862–870. [Google Scholar] [CrossRef] [PubMed]
- Lumeon’s Report. Available online: https://info.lumeon.com/patient-access-leadership-research-report (accessed on 13 January 2023).
- Ministry of Health Saudi Arabia. Available online: https://www.moh.gov.sa/en/Pages/Default.aspx (accessed on 3 January 2023).
- Alimova, I.; Tutubalina, E.; Alferova, J.; Gafiyatullina, G. A Machine Learning Approach to Classification of Drug Reviews in Russian. In Proceedings of the 2017 Ivannikov ISPRAS Open Conference (ISPRAS), Moscow, Russia, 30 November–1 December 2017; IEEE: Moscow, Russia, 2017; pp. 64–69. [Google Scholar]
- Tafti, A.P.; Fu, S.; Khurana, A.; Mastorakos, G.M.; Poole, K.G.; Traub, S.J.; Yiannias, J.A.; Liu, H. Artificial intelligence to organize patient portal messages: A journey from an ensemble deep learning text classification to rule-based named entity recognition. In Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, USA, 18–21 November 2019; pp. 1380–1387. [Google Scholar]
- Nawab, K.; Ramsey, G.; Schreiber, R. Natural Language Processing to Extract Meaningful Information from Patient Experience Feedback. Appl. Clin. Inform. 2020, 11, 242–252. [Google Scholar] [CrossRef] [PubMed]
- Joshi, S.; Abdelfattah, E. Multi-Class Text Classification Using Machine Learning Models for Online Drug Reviews. In Proceedings of the 2021 IEEE World AI IoT Congress (AIIoT), Virtual, 10–13 May 2021; IEEE: Seattle, WA, USA, 2021; pp. 0262–0267. [Google Scholar]
- Khanbhai, M.; Warren, L.; Symons, J.; Flott, K.; Harrison-White, S.; Manton, D.; Darzi, A.; Mayer, E. Using natural language processing to understand, facilitate and maintain continuity in patient experience across transitions of care. Int. J. Med. Inform. 2022, 157, 104642. [Google Scholar] [CrossRef] [PubMed]
- Alorini, D.; Rawat, D.B. Automatic Spam Detection on Gulf Dialectical Arabic Tweets. In Proceedings of the 2019 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, 18–21 February 2019; IEEE: Honolulu, HI, USA, 2019; pp. 448–452. [Google Scholar]
- Rachid, B.A.; Azza, H.; Ben Ghezala, H.H. Classification of Cyberbullying Text in Arabic. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–4407. [Google Scholar]
- Ameur, M.S.H.; Belkebir, R.; Guessoum, A. Robust Arabic Text Categorization by Combining Convolutional and Recurrent Neural Networks. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2020, 19, 66:1–66:16. [Google Scholar] [CrossRef]
- Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
- Li, Q.; Peng, H.; Li, J.; Xia, C.; Yang, R.; Sun, L.; Yu, P.S.; He, L. A Survey on Text Classification: From Traditional to Deep Learning. ACM Trans. Intell. Syst. Technol. 2022, 13, 1–41. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
- Wen, Y.; Liang, Y.; Zhu, X. Sentiment analysis of hotel online reviews using the BERT model and ERNIE model—Data from China. PLoS ONE 2023, 18, e0275382. [Google Scholar] [CrossRef]
- Abdel-Salam, S.; Rafea, A. Performance study on extractive text summarization using BERT models. Information 2022, 13, 67. [Google Scholar] [CrossRef]
- Wang, Z.; Ng, P.; Ma, X.; Nallapati, R.; Xiang, B. Multi-passage bert: A globally normalized bert model for open-domain question answering. arXiv 2019, arXiv:1908.08167. [Google Scholar]
- Zhang, Y.; Shao, Y.; Zhang, X.; Wan, W.; Li, J.; Sun, J. BERT Based Fake News Detection Model. Training 2022, 1530, 383. [Google Scholar]
- Patient Experience; Ministry of Health Saudi Arabia: Ar Riyad, Saudi Arabia. Available online: https://www.moh.gov.sa/en/Ministry/pxmp/Pages/default.aspx (accessed on 15 December 2022).
- Saudi Healthcare Complaint Taxonomy. Available online: https://www.moh.gov.sa/en/Ministry/MediaCenter/Publications/Pages/Publications-2019-04-01-001.aspx (accessed on 15 December 2022).
- Tarekegn, A.N.; Giacobini, M.; Michalak, K. A review of methods for imbalanced multi-label classification. Pattern Recognit. 2021, 118, 107965. [Google Scholar] [CrossRef]
- El Rifai, H.; Al Qadi, L.; Elnagar, A. Arabic text classification: The need for multi-labeling systems. Neural Comput. Appl. 2021, 34, 1135–1159. [Google Scholar] [CrossRef] [PubMed]
- Alsaleh, D.; Larabi-Marie-Sainte, S. Arabic Text Classification Using Convolutional Neural Network and Genetic Algorithms. IEEE Access 2021, 9, 91670–91685. [Google Scholar] [CrossRef]
- Jbene, M.; Tigani, S.; Saadane, R.; Chehri, A. A Moroccan News Articles Dataset (MNAD) For Arabic Text Categorization. In Proceedings of the 2021 International Conference on Decision Aid Sciences and Application (DASA), Online, 7–8 December 2021; pp. 350–353. [Google Scholar]
- Biniz, M.; Boukil, S.; Adnani, F.; Cherrat, L.; Moutaouakkil, A. Arabic Text Classification Using Deep Learning Technics. Int. J. Grid Distrib. Comput. 2018, 11, 103–114. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Doha, Qatar, 2014; pp. 1532–1543. [Google Scholar]
- Lulu, L.; Elnagar, A. Automatic Arabic Dialect Classification Using Deep Learning Models. Procedia Comput. Sci. 2018, 142, 262–269. [Google Scholar] [CrossRef]
- Zaidan, O.F.; Callison-Burch, C. The Arabic Online Commentary Dataset: An Annotated Dataset of Informal Arabic with High Dialectal Content. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; Association for Computational Linguistics: Portland, OR, USA, 2011; pp. 37–41. [Google Scholar]
- Wray, S. Classification of Closely Related Sub-dialects of Arabic Using Support-Vector Machines. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7–12 May 2018; European Language Resources Association (ELRA): Paris, France, 2018; p. 4. [Google Scholar]
- Soliman, A.B.; Eissa, K.; El-Beltagy, S.R. AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP. Procedia Comput. Sci. 2017, 117, 256–265. [Google Scholar] [CrossRef]
- alsukhni, B. Multi-Label Arabic Text Classification Based On Deep Learning. In Proceedings of the 2021 12th International Conference on Information and Communication Systems (ICICS), Valencia, Spain, 24–26 May 2021; pp. 475–477. [Google Scholar]
- Al-Ayyoub, M.; Selawi, H.; Zaghlol, M.; Al-Natsheh, H.; Suileman, S.; Fadel, A.; Badawi, R.; Morsy, A.; Tuffaha, I.; Aljarrah, M. Mowjaz Multi-Topic Labelling Task. 2021. Available online: https://www.just.edu.jo/icics/icics2021/com/Task%20Description.html (accessed on 15 December 2022).
- Ghourabi, A.; Mahmood, M.A.; Alzubi, Q.M. A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English Messages. Future Internet 2020, 12, 156. [Google Scholar] [CrossRef]
- Al-Laith, A.; Alenezi, M. Monitoring People’s Emotions and Symptoms from Arabic Tweets during the COVID-19 Pandemic. Information 2021, 12, 86. [Google Scholar] [CrossRef]
- Faris, H.; Habib, M.; Faris, M.; Alomari, A.; Castillo, P.A.; Alomari, M. Classification of Arabic healthcare questions based on word embeddings learned from massive consultations: A deep learning approach. J. Ambient. Intell. Humaniz. Comput. 2022, 13, 1811–1827. [Google Scholar] [CrossRef]
- Ikram, A.Y.; Chakir, L. Arabic Text Classification in the Legal Domain. In Proceedings of the 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), Marrakech, Morocco, 28–30 October 2019; pp. 1–6. [Google Scholar]
- Omar, A.; Mahmoud, T.M.; Mahfouz, A. Multi-label Arabic text classification in Online Social Networks—ScienceDirect. Inf. Syst. 2021, 100, 101785. [Google Scholar] [CrossRef]
- Elnagar, A.; Al-Debsi, R.; Einea, O. Arabic text classification using deep learning models. Inf. Process. Manag. 2020, 57, 102121. [Google Scholar] [CrossRef]
- Alhawarat, M.; Aseeri, A.O. A Superior Arabic Text Categorization Deep Model (SATCDM). IEEE Access 2020, 8, 24653–24661. [Google Scholar] [CrossRef]
- Saad, M.K.; Ashour, W. OSAC: Open source Arabic Corpora. In Proceedings of the 6th International Conference on Electrical and Computer Systems, Lefke, North Cyprus, 25–26 November 2010. [Google Scholar]
- Aliwy, A.H.; Taher, H.A.; Abutiheen, Z.A. Arabic Dialects Identification for All Arabic countries. In Proceedings of the Fifth Arabic Natural Language Processing Workshop 2020, Barcelona, Spain, 10 September 2020. [Google Scholar]
- Abdul-Mageed, M.; Zhang, C.; Bouamor, H.; Habash, N. NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain, 10 September 2020; Association for Computational Linguistics: Barcelona, Spain, 2020; pp. 97–110. [Google Scholar]
- Touati-Hamad, Z.; Ridda Laouar, M.; Bendib, I.; Hakak, S. Arabic Quran Verses Authentication Using Deep Learning and Word Embeddings. Int. Arab J. Inf. Technol. 2022, 19, 681–688. [Google Scholar] [CrossRef]
- Ghourabi, A. A BERT-based system for multi-topic labeling of Arabic content. In Proceedings of the 2021 12th International Conference on Information and Communication Systems (ICICS), Valencia, Spain, 24–26 May 2021; pp. 486–489. [Google Scholar]
- Antoun, W.; Baly, F.; Hajj, H. AraBERT: Transformer-based Model for Arabic Language Understanding. arXiv 2021, arXiv:2003.00104. [Google Scholar]
- Djandji, M.; Baly, F. Multi-Task Learning using AraBert for Offensive Language Detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection. European Language Resource Association (ELRA): Marseille, France, Marseille, France, 11–16 May 2020; p. 5. [Google Scholar]
- Althabiti, S.; Alsalka, M.; Atwell, E. SCUoL at CheckThat! 2021: An AraBERT Model for Check- Worthiness of Arabic Tweets. In Proceedings of the Working Notes of CLEF 2021—Conference and Labs of the Evaluation Forum, Bucharest, Romania, 21–24 September 2021; p. 5. [Google Scholar]
- Faraj, D.; Faraj, D.; Abdullah, M. SarcasmDet at Sarcasm Detection Task 2021 in Arabic using AraBERT Pretrained Model. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Online, 19 April 2021; Association for Computational Linguistics: Location Kyiv, Ukraine, 2021; p. 6. [Google Scholar]
- Faris, H.; Faris, M.; Habib, M.; Alomari, A. Automatic symptoms identification from a massive volume of unstructured medical consultations using deep neural and BERT models. Heliyon 2022, 8, e09683. [Google Scholar] [CrossRef]
- Uyangodage, L.; Ranasinghe, T.; Hettiarachchi, H. Transformers to Fight the COVID-19 Infodemic. arXiv 2021, arXiv:2104.12201. [Google Scholar]
- NLP4IF-2021–Fighting the COVID-19 Infodemic. Available online: https://gitlab.com/NLP4IF/nlp4if-2021 (accessed on 15 December 2022).
- Farghaly, A.; Shaalan, K. Arabic Natural Language Processing: Challenges and Solutions. ACM Trans. Asian Lang. Inf. Process. 2009, 8, 14:1–14:22. [Google Scholar] [CrossRef]
- Pasha, A.; Al-Badrashiny, M.; Diab, M.; Kholy, A.E.; Eskander, R.; Habash, N.; Pooleery, M.; Rambow, O.; Roth, R.M. MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014; European Language Resources Association (ELRA): Paris, France, 2014; p. 8. [Google Scholar]
- Obeid, O.; Zalmout, N.; Khalifa, S.; Taji, D.; Oudah, M.; Alhafni, B.; Inoue, G.; Eryani, F.; Erdmann, A.; Habash, N. CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; European Language Resources Association: Marseille, France, 2020; pp. 7022–7032. [Google Scholar]
- Gensim: Topic Modelling for Humans. Available online: https://radimrehurek.com/gensim/index.html (accessed on 8 April 2022).
- Jurafsky, D.; Martin, J.H. Speech and Language Processing; Prentice Hall: Hoboken, NJ, USA, 2000. [Google Scholar]
- Soliman, A.B. Bakrianoo/Aravec. 2022. Available online: https://github.com/bakrianoo/aravec (accessed on 3 April 2022).
- Alrefaie, M.T. Arabic-Stop-Words. 2021. Available online: https://github.com/mohataher/arabic-stop-words (accessed on 1 April 2022).
- Ladani, D.J.; Desai, N.P. Stopword Identification and Removal Techniques on TC and IR applications: A Survey. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 466–472. [Google Scholar]
- Abdul-Mageed, M.; Elmadany, A.; Nagoudi, E.M.B. ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; Association for Computational Linguistics: Bangkok, Thailand, 2021; pp. 7088–7105. Available online: https://aclanthology.org/2021.acl-long.0/ (accessed on 3 April 2022).
- Abdelali, A.; Hassan, S.; Mubarak, H.; Darwish, K.; Samih, Y. Pre-Training BERT on Arabic Tweets: Practical Considerations. arXiv 2021, arXiv:2102.10684. [Google Scholar]
- Tian, Z.; Rong, W.; Shi, L.; Liu, J.; Xiong, Z. Attention Aware Bidirectional Gated Recurrent Unit Based Framework for Sentiment Analysis. In Proceedings of the Knowledge Science, Engineering and Management; Lecture Notes in Computer Science; Liu, W., Giunchiglia, F., Yang, B., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 67–78. [Google Scholar]
- Keras: The Python Deep Learning API. Available online: https://keras.io/ (accessed on 1 January 2022).
- TensorFlow. Available online: https://www.tensorflow.org/ (accessed on 1 January 2022).
- Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthc. 2022, 3, 1–23. [Google Scholar] [CrossRef]
- Rezvani, S.; Wang, X. A broad review on class imbalance learning techniques. Appl. Soft Comput. 2023, 143, 110415. [Google Scholar] [CrossRef]
- Gonçalves, T.; Quaresma, P. The impact of nlp techniques in the multilabel text classification problem. In Proceedings of the Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM ‘04 Conference, Zakopane, Poland, 17–20 May 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 424–428. [Google Scholar]
- Kaneko, M.; Sakaizawa, Y.; Komachi, M. Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Taipei, Taiwan, 27 November–1 December 2017; Asian Federation of Natural Language Processing: Taipei, Taiwan, 2017; pp. 40–48. [Google Scholar]
Reference | Language | Best Model | Data Set Source | Classification Type | Result of the Best Model |
---|---|---|---|---|---|
Tafti 2019 [6] | English | Ensemble (CNN, RNN, LSTM) | Healthcare (PPM) | Binary | F1: 89.9% |
Nawab 2020 [7] | English | Deep learning Sequential Model | PX survey | Binary | F1: 81% A: 82% |
Alimova 2017 [5] | Russian | Linear SVM | Healthcare (drug reviews) | Binary | F1: 73.3% |
El-rifai 2021 [23] | Arabic | SVM | News | Binary | F1: 97.93% A: 97.9% |
Alsaleh 2021 [24] | Arabic | CNN with GA | News data sets (SNAD,MNAD) | Binary | SNAD A: 88.71% MNAD; A: 98.42% |
Lulu 2018 [28] | Arabic | LSTM | Social Media | Binary | A: 71.4% |
Wray 2018 [30] | Arabic | SVM | Social Media | Binary | A: 65% |
Alorini 2019 [10] | Arabic | Naïve Bayes | Social Media | Binary | A: 86% F1: 92% P: 81% R: 87% |
Rachid 2020 [11] | Arabic | Combination of CNN, LSTM, GRU | Social Media | Binary | F: 84% |
Ghourabi 2020 [34] | Arabic | Hybrid CNN–LSTM | SMS text | Binary | A: 98.37% F1: 91.48% P: 95.39% R: 87.87% |
Ikram 2019 [37] | Arabic | SVM | Legal text | Binary | A: 98.11% F1: 98.04% |
Omar 2021 [38] | Arabic | SVC | Social Media | Binary | A: 97.8% F1: 97.79% R: 97.79%, P: 97.8% |
Elnagar 2020 [39] | Arabic | Attention-GRU | News | Binary | A: 95.94% |
Alhawrat 2020 [40] | Arabic | Multi-kernel CNN | Miscellaneous | A: (97.58–99.90%) | |
Ameur 2020 [12] | Arabic | Combined (RNN–CNN) | Online Source Arabic Corpora | F1: 98.61%, P: 98.63% R: 98.58% | |
Touati-Hamad 2022 [44] | Arabic | Hybrid CNN–LSTM | Quran, Arabic Learner Corpus | Binary | F1: 97.86% A: 98.33% P: 97.86% R: 97.86% |
Djandj 2020 [47] | Arabic | AraBERT with MTL | Binary | F1: 90.15% (offensive) F1: 83.41% (hate-speech) | |
Althabit 2021 [48] | Arabic | AraBERT with TanH function | Binary | A: 68% (AraBERTv0.2) A: 69% (AraBERTv2) | |
Faraj 2021 [49] | Arabic | Ensemble (hard-voting) with AraBERT | ArSarcasm-v2 data set | Binary | F1: 59.8% A: 78.3% |
Uyangodage 2021 [51] | Arabic | AraBERT | Binary | F1: 69.8% |
Reference | Language | Best Model | Data Set Source | Classification Type | Result of the Best Model |
---|---|---|---|---|---|
Joshi 2021 [8] | English | Linear SVC | Healthcare (drug reviews) | Multi-class | F1: 88% R: 88% P: 88% |
Khanbahi 2022 [9] | English | SVM | Healthcare (PX) | Multi-class | A: 62%+ |
AL-laith 2021 [35] | Arabic | LSTM | Social Media | Multi-class | A: 75% |
Faris 2021 [36] | Arabic | BiLSTM | Healthcare (Altibbi) | Multi-class | A: 87.2% P: (83–95%) |
Biniz 2018 [26] | Arabic | CNN | News | Multi-class | A: 92.94% |
Aliwy 2020 [42] | Arabic | Ensemble (voting combining LR, NB, DT) | NADI data set | Multi-class | F1: 20.05% |
El-rifai 2021 [23] | Arabic | CNN-GRU | News | Multi-label | A: 94.85% F1: 78.86% |
Alsukhni 2021 [32] | Arabic | LSTM | News | Multi-label | F1: 83.8% |
Omar 2021 [38] | Arabic | Linear SVC | Social Media | Multi-label | A: 81.44%%, F1: 92.0% R: 90.5% P: 93.52% |
Elnagar 2020 [39] | Arabic | Attention-GRU | News | Multi-label | Multi: A: 88.86% |
Ghourabi 2021 [45] | Arabic | AraBERT | News | Multi-label | A: 85.1% F1: 86.42% |
Faris 2022 [50] | Arabic | BiLSTM | Healthcare (Altibbi) | Multi-label | R: 54.4%, P: 26.8%, F1: 35.46% |
Reference | Language | Best Model | Data Set Source | Classification Type | Result of the Best Model |
---|---|---|---|---|---|
El-rifai 2021 [23] | Arabic | CNN-GRU | News | Multi-label | A: 94.85% F1: 78.86% |
Alsukhni 2021 [32] | Arabic | LSTM | News | Multi-label | F1: 83.8% |
Omar 2021 [38] | Arabic | Linear SVC | Social Media | Multi-label | A: 81.44%%, F1: 92.0% R: 90.5% P: 93.52% |
Elnagar 2020 [39] | Arabic | Attention-GRU | News | Multi-label | Multi: A: 88.86% |
Ghourabi 2021 [45] | Arabic | AraBERT | News | Multi-label | A: 85.1% F1: 86.42% |
Faris 2022 [50] | Arabic | BiLSTM | Healthcare (Altibbi) | Multi-label | R: 54.4%, P: 26.8%, F1: 35.46% |
Domain | Category | Sub-Category | Classification |
---|---|---|---|
Clinical | 2 | 8 | 59 |
Management | 2 | 11 | 82 |
Relationship | 2 | 6 | 17 |
Total | 6 | 25 | 158 |
Domain | Category | Subcategory | Classification |
---|---|---|---|
Relationships Complaints | Communication | Patient–staff communication | Miscommunication with Patient |
Poor provider–patient communication | |||
Not involving patient in clinical decisions | |||
Failure to clarify patient case to their family | |||
Incorrect Information | Deficient Information | ||
Communication of wrong information | |||
Humanness/ Caring | Emotional Support | Inadequate emotional support | |
Neglect | |||
Assault and Harassment | Inappropriate/aggressive behavior | ||
Provider assaulted patient | |||
Molesting a patient | |||
Discrimination | |||
No apology to the patient |
Comment in Arabic | Sub-Category | Translation |
---|---|---|
Assault and harassment | The treatment of the nursing staff was not at the desired and expected level, while other things were very acceptable | |
Quality of care | It is necessary to monitor the hospital staff and find out the reasons for not caring for patients. Thank you | |
Delays | My first visit to the center I have an appointment to check for the Coronavirus and so far I have not seen the results for a month | |
Environment | The building is not qualified to be called a health center |
Class | % |
---|---|
Quality_Care | 16.43% |
Environment | 16.17% |
Delays | 9.34% |
Administrative_Policies_procedures | 8.07% |
Access | 5.35% |
Medication_Vaccination | 5.90% |
Examination | 6.89% |
Staffing | 4.38% |
Resources | 3.76% |
Skills_conducts | 3.72% |
Assault_Harassment | 2.44% |
PatientStaff_Communication | 2.62% |
Safety_Incidents | 2.37% |
Emotional_Support | 2.04% |
Treatment | 1.68% |
Patient_Journey | 1.7% |
Medical_Records | 1.28% |
Diagnosis | 0.81% |
Safety_Security | 1.35% |
Confidentiality | 0.72% |
Incorrect_Information | 1.32% |
Referrals | 0.57% |
Patient_Disposition | 0.86% |
Finance_Billing | 0.30% |
Consent | 0.07% |
Patient Journey | Number of Entries |
---|---|
Primary Healthcare Center | 9967 |
Inpatient | 8582 |
ER | 7129 |
Total | 25,678 |
Model Name | Tokenizer | Approach | Vocabulary Size | Hidden Size | Attention | Hidden Layers | Batch Size | Epochs |
---|---|---|---|---|---|---|---|---|
AraBERT | SentencePiece | MLM/NSP | 64K | 768 | 12 | 12 | 512 | 27 |
MARBERT | WordPiece | MLM | 100K | 768 | 12 | 12 | 256 | 36 |
Qarib | BPE | MLM | - | 768 | 12 | 12 | - | - |
Model Name | Tokenizer | Approach | Vocabulary Size | Hidden Size | Attention | Hidden Layers | Batch Size | Epochs |
---|---|---|---|---|---|---|---|---|
PX_BERT | BPE | MLM | 50K | 768 | 12 | 6 | 32 | 10 |
Hyperparameter | Value |
---|---|
Embedding layer | 300 |
Bidirectional LSTM (activation = linear) | 128 |
Dropout | 0.2 |
Bidirectional LSTM (activation = linear) | 128 |
Dropout | 0.2 |
Dense (activation = sigmoid) | 25 |
Optimizer | Adam |
Loss | Binary Cross-entropy |
Learning Rate | 0.001 |
Epochs | 10 |
Batch Size | 128 |
Hyperparameter | Set of Values |
---|---|
Number of Bidirectional LSTM layers | 2, 3, 4, 5, 6, 7, 8 |
Number of units | 8, 16, 32, 64, 128 |
Activation function | ReLU,tanh,sigmoid, |
Recurrent dropout | 0.4 |
Optimizer | Adam, SGD, RMSprop |
Dropout | 0.2 |
Loss | Binary Cross-entropy |
Hyperparameter | Set of Values |
---|---|
Embedding layer | 300 |
Bidirectional LSTM (activation = tanh) | 32 |
Bidirectional LSTM (activation = tanh) | 32 |
Bidirectional LSTM (activation = tanh) | 32 |
Bidirectional LSTM (activation = linear) | 128 |
Dropout | 0.2 |
Dense (activation = sigmoid) | 25 |
Optimizer | RMSprop |
Recurrent dropout for LSTM | 0.4 |
Loss | Binary Cross-entropy |
Learning Rate | 0.001 |
Epochs | 10 |
Batch Size | 128 |
Hyperparameter | Value |
---|---|
Embedding layer | 300 |
Bidirectional GRU (activation = linear) | 128 |
Dropout | 0.2 |
Bidirectional GRU (activation = linear) | 25 |
Dropout | 0.2 |
Dense (activation = sigmoid) | 25 |
Recurrent dropout of GRU | 0 |
Optimizer | Adam |
Loss | Binary cross-entropy |
Learning Rate | 0.001 |
Epochs | 10 |
Batch Size | 128 |
Hyperparameter | Set of Values |
---|---|
Number of Bidirectional GRU layers | 2, 3, 4, 5, 6, 7, 8 |
Number of units | 8, 16, 32, 64, 128 |
Activation function | ReLU, tanh, sigmoid, |
Recurrent dropout | 0.4 |
Optimizer | Adam, SGD, RMSprop |
Dropout | 0.2 |
Loss | Binary Cross-entropy |
Hyperparameter | Set of Values |
---|---|
Embedding layer | 300 |
Bidirectional GRU (activation = tanh) | 32 |
Dropout | 0.2 |
Bidirectional GRU (activation = tanh) | 32 |
Dropout | 0.2 |
Bidirectional GRU (activation = linear) | 64 |
Dropout | 0.2 |
Dense (activation = sigmoid) | 64 |
Recurrent dropout for all GRU | 0.4 |
Optimizer | Adam |
Loss | Binary Cross-entropy |
Learning Rate | 0.001 |
Epochs | 10 |
Batch Size | 128 |
Model Name | Version | Batches | Epochs | Learning Rate | Sequence Length |
---|---|---|---|---|---|
AraBERT | bert-base-arabertv02 | 8 | 5 | 0.00002 | 512 |
MARBERT | MarBERTv2 | 8 | 5 | 0.00002 | 512 |
Qarib | bert-base-qarib | 8 | 5 | 0.00002 | 512 |
PX_BERT | PX-BERT | 16 | 5 | 0.00002 | 512 |
Model | Accuracy | Macro F1 Score | Macro Precision | Macro Recall |
---|---|---|---|---|
BiLSTM + AraVec | 54.00% | 18.49% | 19.00% | 18.00% |
Tuned BiLSTM + AraVec | 53.34% | 19.00% | 19.00% | 19.00% |
BiGRU + AraVec | 47.57% | 25.25% | 28.00% | 23.00% |
Tuned BiGRU + AraVec | 55.44% | 10.91% | 12.00% | 10.00% |
BiLSTM_PX_Vec | 54.66% | 14.07% | 17.00% | 12.00% |
BiLSTM_PX_Vec (10 folds) | 33.25% | 15.77% | 23.00% | 12.00% |
BiGRU_PX_Vec | 47.84% | 27.27% | 30.00% | 25.00% |
BiGRU_PX_Vec (10 folds) | 34.78% | 30.00% | 40.00% | 24.00% |
AraBERTv02 | 60.24% | 38.83% | 81.22% | 25.51% |
MarBERTv2 | 56.92% | 42.41% | 80.51% | 28.79% |
Qarib | 57.60% | 41.55% | 71.77% | 29.24% |
PX_BERT | 55.14% | 43.06% | 63.61% | 32.54% |
Model | Accuracy | Macro F1 Score | Macro Precision | Macro Recall |
---|---|---|---|---|
BiLSTM_PX_Vec | 61.26% | 26.21% | 38.00% | 20.00% |
BiLSTM_PX_Vec (10 folds) | 53.00% | 32.69% | 44.00% | 26.00% |
BiGRU_PX_Vec | 62.00% | 34.64% | 43.00% | 29.00% |
BiGRU_PX_Vec (10 folds) | 54.00% | 42.88% | 53.00% | 36.00% |
AraBERTv02 | 57.95% | 47.10% | 64.12% | 37.22% |
MarBERTv2 | 57.21% | 42.68% | 80.62% | 29.02% |
Qarib | 67.97% | 47.00% | 63.65% | 37.26% |
PX_BERT | 55.84% | 43.07% | 67.45% | 31.63% |
Model | Accuracy | Macro F1 Score | Macro Precision | Macro Recall |
---|---|---|---|---|
BiLSTM_PX_Vec | 54.93% | 32.12% | 42.00% | 26.00% |
BiLSTM_PX_Vec (10 folds) | 40.71% | 42.16% | 53.00% | 35.00% |
BiGRU_PX_Vec | 52.67% | 40.20% | 44.00% | 37.00% |
BiGRU_PX_Vec (10 folds) | 40.26% | 47.25% | 54.00% | 42.00% |
AraBERTv02 | 60.02% | 48.70% | 66.00% | 38.59% |
MarBERTv2 | 57.89% | 42.35% | 64.33% | 31.56% |
Qarib | 59.10% | 46.51% | 59.10% | 38.34% |
PX_BERT | 55.74% | 45.50% | 55.64% | 38.49% |
Measure | Negative (13K), 25 Classes | All-Sentiments (19K), 25 Classes | All-Sentiments (19K), 20 Classes |
---|---|---|---|
Accuracy | 50.9% | 58.65% | 52.67% |
Macro F1 Score | 27.22% | 39.53% | 43.10% |
Precision | 40.43% | 56.73% | 54.76% |
Recall | 21.59% | 30.77% | 35.87% |
Model | Training Time |
---|---|
BiLSTM + AraVec | 48 |
Tuned BiLSTM + AraVec | 1676 |
BiGRU + AraVec | 28 |
Tuned BiGRU + AraVec | 1565 |
AraBertv02 | 1102 |
AraBertv02 (tuned) | 1891 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alhazzani, N.Z.; Al-Turaiki, I.M.; Alkhodair, S.A. Text Classification of Patient Experience Comments in Saudi Dialect Using Deep Learning Techniques. Appl. Sci. 2023, 13, 10305. https://doi.org/10.3390/app131810305
Alhazzani NZ, Al-Turaiki IM, Alkhodair SA. Text Classification of Patient Experience Comments in Saudi Dialect Using Deep Learning Techniques. Applied Sciences. 2023; 13(18):10305. https://doi.org/10.3390/app131810305
Chicago/Turabian StyleAlhazzani, Najla Z., Isra M. Al-Turaiki, and Sarah A. Alkhodair. 2023. "Text Classification of Patient Experience Comments in Saudi Dialect Using Deep Learning Techniques" Applied Sciences 13, no. 18: 10305. https://doi.org/10.3390/app131810305