CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students
Abstract
:1. Introduction
2. Related Works
3. Materials and Methods
3.1. Sampling and Data Collection Method
3.2. Experiment and Problem Definition
3.3. Architecture of the Proposed Solution
3.4. Methods
3.4.1. Long Short-Term Memory (LSTM)
3.4.2. Gated Recurrent Unit (GRU)
3.4.3. The Convolutional Neural Network (CNN)
3.4.4. The Bidirectional Encoder Representations from Transformers (BERT)
3.4.5. Data Loading
- Input: the resumes of students
- Output: A model trained on the CVs of students and one of the five pre-defined classes for each resume in the test dataset
- Import dataset file (CVs.csv) into pandas data frame.
- Pre-process data (cleaning and deleting noisy data).
- Generate one hot encoding for each class representing the field of study.
- Split dataset into two parts, training and testing dataset, with ratio the 80:20, respectively.
- Tokenization step based on either BERT-obtained model or the Gensim embedding approach where the tokenization was based on the unigram mode.
- 5.
- Add new token-related competencies and unknown vocabulary into the vocab.txt of the BERT models.
- 6.
- Create an embedding matrix for every word in the vocabulary
- 7.
- Builda simple model or hybrid model based on a combination of CNN, LSTM, and GRU.
- 8.
- Dropout layer (0.2).
- 9.
- Dense (5 classes) layer with Softmax activation function.
- 10.
- Train the model on the training set.
- 11.
- Evaluate the model on the test set.
3.4.6. Experimental Settings
- Tensorflow and Keras libraries were used;
- Number of LSTM, GRU and CNN(Conv1d) layers: 1;
- Dropout rate: 0.2;
- Activation Function: SoftMax;
- learning_rate = 1 × 10−5;
- decay = 1 × 10−6;
- loss function: CategoricalCrossentropy();
- Learning rate: 0.001;
- Epochs: 15;
- Batch size: 32;
- Optimizer: Adam.
4. Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Nichols, J.A.; Chan, H.W.H.; Baker, M.A.B. Machine learning: Applications of artificial intelligence to imaging and diagnosis. Biophys. Rev. 2019, 11, 111–118. [Google Scholar] [CrossRef] [PubMed]
- Kaul, V.; Enslin, S.; Gross, S.A. History of artificial intelligence in medicine. Gastrointest. Endosc. 2020, 92, 807–812. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Cai, W.; Wang, X.; Zhou, Y.; Feng, D.D.; Chen, M. Medical image classification with convolutional neural network. In Proceedings of the IEEE 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV), Singapore, 10–12 December 2014; pp. 844–848. [Google Scholar] [CrossRef]
- Yağcı, M. Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Smart Learn. Environ. 2022, 9, 11. [Google Scholar] [CrossRef]
- Nieto, Y.; Gacia-Diaz, V.; Montenegro, C.; Gonzalez, C.C.; Crespo, R.G. Usage of Machine Learning for Strategic Decision Making at Higher Educational Institutions. IEEE Access 2019, 7, 75007–75017. [Google Scholar] [CrossRef]
- Ramteke, J.; Shah, S.; Godhia, D.; Shaikh, A. Election result prediction using Twitter sentiment analysis. In Proceedings of the IEEE 2016 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 26–27 August 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Alaei, A.R.; Becken, S.; Stantic, B. Sentiment Analysis in Tourism: Capitalizing on Big Data. J. Travel Res. 2019, 58, 175–191. [Google Scholar] [CrossRef]
- Golowko, N. Future Skills in Education: Knowledge Management, AI and Sustainability as Key Factors in Competence-Oriented Education; Sustainable Management, Wertschöpfung und Effizienz; Springer Fachmedien Wiesbaden: Wiesbaden, Germany, 2021; ISBN 978-3-658-33996-8. [Google Scholar] [CrossRef]
- Huang, A.Y.Q.; Lu, O.H.T.; Huang, J.C.H.; Yin, C.J.; Yang, S.J.H. Predicting students’ academic performance by using educational big data and learning analytics: Evaluation of classification methods and learning logs. Interact. Learn. Environ. 2020, 28, 206–230. [Google Scholar] [CrossRef]
- Pal, R.; Shaikh, S.; Satpute, S.; Bhagwat, S. Resume Classification using various Machine Learning Algorithms. ITM Web Conf. 2022, 44, 03011. [Google Scholar] [CrossRef]
- Urdaneta-Ponte, M.C.; Oleagordia-Ruíz, I.; Méndez-Zorrilla, A. Using LinkedIn Endorsements to Reinforce an Ontology and Machine Learning-Based Recommender System to Improve Professional Skills. Electronics 2022, 11, 1190. [Google Scholar] [CrossRef]
- Cole, M.S.; Feild, H.S.; Giles, W.F.; Harris, S.G. Recruiters’ Inferences of Applicant Personality Based on Resume Screening: Do Paper People have a Personality? J. Bus. Psychol. 2009, 24, 5–18. [Google Scholar] [CrossRef]
- Kumalasari, L.D.; Susanto, A. Recommendation System of Information Technology Jobs using Collaborative Filtering Method Based on LinkedIn Skills Endorsement. SISFORMA 2020, 6, 63–72. [Google Scholar] [CrossRef]
- Appadoo, K.; Soonnoo, M.B.; Mungloo-Dilmohamud, Z. Job Recommendation System, Machine Learning, Regression, Classification, Natural Language Processing. In Proceedings of the 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Gold Coast, Australia, 16–18 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Kowsari, K.; Meimandi, J.K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text Classification Algorithms: A Survey. Information 2019, 10, 150. [Google Scholar] [CrossRef]
- Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep Learning--based Text Classification: A Comprehensive Review. ACM Comput. Surv. 2022, 54, 1–40. [Google Scholar] [CrossRef]
- Sellamy, K.; El Farouki, M.; Sabri, Z.; Nouib, H.; Qostal, A.; Fakhri, Y.; Moumen, A. Exploring the IT’s Needs in Morocco Using Online Job Ads. In Automatic Control and Emerging Technologies; El Fadil, H., Zhang, W., Eds.; Springer Nature: Singapore, 2024; pp. 665–677. [Google Scholar] [CrossRef]
- Đurđević Babić, I. Machine learning methods in predicting the student academic motivation. Croat. Oper. Res. Rev. 2017, 8, 443–461. [Google Scholar] [CrossRef]
- Qazdar, A.; Er-Raha, B.; Cherkaoui, C.; Mammass, D. A machine learning algorithm framework for predicting students performance: A case study of baccalaureate students in Morocco. Educ. Inf. Technol. 2019, 24, 3577–3589. [Google Scholar] [CrossRef]
- Mourdi, Y.; Sadgal, M.; Berrada Fathi, W.; El Kabtane, H. A Machine Learning Based Approach to Enhance Mooc Users’ Classification. Turk. Online J. Distance Educ. 2020, 21, 47–68. [Google Scholar] [CrossRef]
- Sadqui, A.; Ertel, M.; Sadiki, H.; Amali, S. Evaluating Machine Learning Models for Predicting Graduation Timelines in Moroccan Universities. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 10. [Google Scholar] [CrossRef]
- Ouatik, F.O.; Erritali, M.E.; Jourhmane, M.J. Student orientation using machine learning under MapReduce with Hadoop. J. Ubiquitous Syst. Pervasive Netw. 2020, 13, 21–26. [Google Scholar] [CrossRef]
- Qostal, A.; Moumen, A.; Lakhrissi, Y. Systematic Literature Review on Big Data and Data Analytics for Employment of Youth People: Challenges and Opportunities. In Proceedings of the 2nd International Conference on Advanced Technologies for Humanity; SCITEPRESS—Science and Technology Publications: Rabat, Morocco, 2020; pp. 179–185. [Google Scholar] [CrossRef]
- Casuat, C.D.; Festijo, E.D. Predicting Students’ Employability using Machine Learning Approach. In Proceedings of the 2019 IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS), Kuala Lumpur, Malaysia, 20–21 December 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Mewburn, I.; Grant, W.J.; Suominen, H.; Kizimchuk, S. A Machine Learning Analysis of the Non-academic Employment Opportunities for Ph.D. Graduates in Australia. High. Educ. Policy 2020, 33, 799–813. [Google Scholar] [CrossRef]
- ElSharkawy, G.; Helmy, Y.; Yehia, E. Employability Prediction of Information Technology Graduates using Machine Learning Algorithms. Int. J. Adv. Comput. Sci. Appl. 2022, 13. [Google Scholar] [CrossRef]
- Roy, A. Recent Trends in Named Entity Recognition (NER). arXiv 2021. [Google Scholar] [CrossRef]
- Narendra, G.O.; Hashwanth, S. Named Entity Recognition based Resume Parser and Summarizer. Int. J. Adv. Res. Sci. Commun. Technol. 2022, 2, 728–735. [Google Scholar] [CrossRef]
- Gugnani, A.; Misra, H. Implicit Skills Extraction Using Document Embedding and Its Use in Job Recommendation. Proc. AAAI Conf. Artif. Intell. 2020, 34, 13286–13293. [Google Scholar] [CrossRef]
- Fareri, S.; Melluso, N.; Chiarello, F.; Fantoni, G. SkillNER: Mining and mapping soft skills from any text. Expert Syst. Appl. 2021, 184, 115544. [Google Scholar] [CrossRef]
- Casuat, C.D. Predicting Students’ Employability using Support Vector Machine: A SMOTE-Optimized Machine Learning System. Int. J. Emerg. Trends Eng. Res. 2020, 8, 2101–2106. [Google Scholar] [CrossRef]
- Baffa, M.H.; Miyim, M.A.; Dauda, A.S. Machine Learning for Predicting Students’ Employability. UMYU Sci. 2023, 2, 001–009. [Google Scholar] [CrossRef] [PubMed]
- Sun, T.; He, Z. Developing intelligent hybrid DNN model for predicting students’ employability—A Machine Learning approach. J. Educ. Humanit. Soc. Sci. 2023, 18, 235–248. [Google Scholar] [CrossRef]
- Makdoun, I.; Mezzour, G.; Carley, K.M.; Kassou, I. Analyzing the Needs of the Automotive Job Market in Morocco. In Proceedings of the 2018 13th International Conference on Computer Science & Education (ICCSE), Colombo, Sri Lanka, 8–11 August 2018; pp. 1–6. [Google Scholar]
- Habous, A.; Nfaoui, E.H. Combining Word Embeddings and Deep Neural Networks for Job Offers and Resumes Classification in IT Recruitment Domain. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 7. [Google Scholar] [CrossRef]
- Mgarbi, H.; Chkouri, M.; Tahiri, A. Towards a New Job Offers Recommendation System Based on the Candidate Resume. Int. J. Comput. Digit. Syst. 2023, 14, 31–38. [Google Scholar] [CrossRef] [PubMed]
- Qostal, A.; Sellamy, K.; Sabri, Z.; Nouib, H.; Lakhrissi, Y.; Moumen, A. Perceived employability of moroccan engineering students: A PLS-SEM approach. Int. J. Instr. 2024, 17, 259–282. [Google Scholar] [CrossRef]
- Hopfield, J.J. Brain, neural networks, and computation. Rev. Mod. Phys. 1999, 71, S431–S437. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Wu, H.; Zhu, N.; Jiang, Y.; Tan, J.; Guo, Y. Prediction of dissolved oxygen in a fishery pond based on gated recurrent unit (GRU). Inf. Process. Agric. 2021, 8, 185–193. [Google Scholar] [CrossRef]
- Ren, L.; Cheng, X.; Wang, X.; Cui, J.; Zhang, L. Multi-scale Dense Gate Recurrent Unit Networks for bearing remaining useful life prediction. Future Gener. Comput. Syst. 2019, 94, 601–609. [Google Scholar] [CrossRef]
- Nosouhian, S.; Nosouhian, F.; Khoshouei, A.K. A Review of Recurrent Neural Network Architecture for Sequence Learning: Comparison between LSTM and GRU. Preprints 2021, 2021070252. [Google Scholar] [CrossRef]
- O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Alaparthi, S.; Mishra, M. Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey. arXiv 2020. [Google Scholar] [CrossRef]
- Subakti, A.; Murfi, H.; Hariadi, N. The performance of BERT as data representation of text clustering. J. Big Data 2022, 9, 15. [Google Scholar] [CrossRef] [PubMed]
- Roy, P.K.; Chowdhary, S.S.; Bhatia, R. A Machine Learning approach for automation of Resume Recommendation system. Procedia Comput. Sci. 2020, 167, 2318–2327. [Google Scholar] [CrossRef]
- Rahhal, I.; Carley, K.M.; Kassou, I.; Ghogho, M. Two Stage Job Title Identification System for Online Job Advertisements. IEEE Access 2023, 11, 19073–19092. [Google Scholar] [CrossRef]
Study | Context | Model | Accuracy |
---|---|---|---|
[24] | Dataset from the career center of technological institute of the Philippines, Manila with 27,000 information of students with 3000 observations and 9 features of each student |
|
|
[31] | Dataset based on mock job interview results with three thousand (3000) observations and twelve (12) features, student performance rating of the on-the-job training students collected |
|
|
[26] | Dataset (296 records) from survey ofgraduates and employers in Egypt oftraining skills, soft skills, andhard skills |
|
|
[32] | Proposed models for predicted performance and students’ employability. Primary datasets of 218 graduate students of higher educational institutions (heis). |
|
|
[33] | Hybrid DNN model for predicting students’ employability using a machine learning approach |
|
|
ENSA Kenitra | Department | Total |
---|---|---|
Department | Computer Engineering (CE) | 263 |
Networks and Systems Telecommunications (NST) | 134 | |
Automotive Mechatronics Engineering (AutoMec) | 149 | |
Industrial Engineering (Indus) | 114 | |
Electrical Engineering (ELE) | 207 | |
Total | 867 |
Type File | Total |
---|---|
Docx | 321 |
308 | |
Png | 123 |
Jpg/Jpeg | 115 |
Text Representation Method | Model | Accuracy | Precision | Recall |
---|---|---|---|---|
BERT | GRU—LSTM | 0.8122 | 0.8995 | 0.8331 |
GRU—CNN | 0.8821 | 0.8722 | 0.8754 | |
LSTM—GRU | 0.8354 | 0.9021 | 0.8463 | |
LSTM—CNN | 0.8531 | 0.8911 | 0.8234 | |
CNN—GRU | 0.9351 | 0.9310 | 0.9411 | |
CNN—LSTM | 0.9242 | 0.9329 | 0.9012 | |
GRU | 0.9013 | 0.9181 | 0.8051 | |
LSTM | 0.8951 | 0.8886 | 0.8125 | |
CNN | 0.9188 | 0.9102 | 0.8963 | |
Gensim | GRU—LSTM | 0.8241 | 0.8542 | 0.7741 |
GRU—CNN | 0.8321 | 0.8669 | 0.7725 | |
LSTM—GRU | 0.8214 | 0.8632 | 0.7921 | |
LSTM—CNN | 0.9025 | 0.8552 | 0.7626 | |
CNN—GRU | 0.8751 | 0.8224 | 0.7995 | |
CNN—LSTM | 0.8423 | 0.8821 | 0.7768 | |
GRU | 0.7742 | 0.8256 | 0.7951 | |
LSTM | 0.8287 | 0.8413 | 0.7858 | |
CNN | 0.9021 | 0.8961 | 0.7551 |
Speciality | Precision | Recall |
---|---|---|
Electrical Engineering (ELE) | 0.975 | 0.951 |
Networks and Systems Telecommunications (NST) | 0.933 | 0.933 |
Computer Engineering (CE) | 0.903 | 0.886 |
Industrial Engineering (Indus) | 0.937 | 0.967 |
Automotive Mechatronics Engineering (AutoMec) | 0.935 | 0.966 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qostal, A.; Moumen, A.; Lakhrissi, Y. CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students. Data 2024, 9, 74. https://doi.org/10.3390/data9060074
Qostal A, Moumen A, Lakhrissi Y. CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students. Data. 2024; 9(6):74. https://doi.org/10.3390/data9060074
Chicago/Turabian StyleQostal, Aniss, Aniss Moumen, and Younes Lakhrissi. 2024. "CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students" Data 9, no. 6: 74. https://doi.org/10.3390/data9060074