MDPI - Publisher of Open Access Journals

16 pages, 1931 KiB

Open AccessArticle

CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students

by Aniss Qostal, Aniss Moumen and Younes Lakhrissi

Data 2024, 9(6), 74; https://doi.org/10.3390/data9060074 - 24 May 2024

Viewed by 1388

Deep learning (DL)-oriented document processing is widely used in different fields for extraction, recognition, and classification processes from raw corpus of data. The article examines the application of deep learning approaches, based on different neural network methods, including Gated Recurrent Unit (GRU), long short-term memory (LSTM), and convolutional neural networks (CNNs). The compared models were combined with two different word embedding techniques, namely: Bidirectional Encoder Representations from Transformers (BERT) and Gensim Word2Vec. The models are designed to evaluate the performance of architectures based on neural network techniques for the classification of CVs of Moroccan engineering students at ENSAK (National School of Applied Sciences of Kenitra, Ibn Tofail University). The used dataset included CVs collected from engineering students at ENSAK in 2023 for a project on the employability of Moroccan engineers in which new approaches were applied, especially machine learning, deep learning, and big data. Accordingly, 867 resumes were collected from five specialties of study (Electrical Engineering (ELE), Networks and Systems Telecommunications (NST), Computer Engineering (CE), Automotive Mechatronics Engineering (AutoMec), Industrial Engineering (Indus)). The results showed that the proposed models based on the BERT embedding approach had more accuracy compared to models based on the Gensim Word2Vec embedding approach. Accordingly, the CNN-GRU/BERT model achieved slightly better accuracy with 0.9351 compared to other hybrid models. On the other hand, single learning models also have good metrics, especially based on BERT embedding architectures, where CNN has the best accuracy with 0.9188. Full article

► Show Figures

Figure 1

30 pages, 2234 KiB

Open AccessReview

Contemporary Approaches in Evolving Language Models

by Dina Oralbekova, Orken Mamyrbayev, Mohamed Othman, Dinara Kassymova and Kuralai Mukhsina

Appl. Sci. 2023, 13(23), 12901; https://doi.org/10.3390/app132312901 - 1 Dec 2023

Cited by 9 | Viewed by 2777

Abstract

This article provides a comprehensive survey of contemporary language modeling approaches within the realm of natural language processing (NLP) tasks. This paper conducts an analytical exploration of diverse methodologies employed in the creation of language models. This exploration encompasses the architecture, training processes, and optimization strategies inherent in these models. The detailed discussion covers various models ranging from traditional n-gram and hidden Markov models to state-of-the-art neural network approaches such as BERT, GPT, LLAMA, and Bard. This article delves into different modifications and enhancements applied to both standard and neural network architectures for constructing language models. Special attention is given to addressing challenges specific to agglutinative languages within the context of developing language models for various NLP tasks, particularly for Arabic and Turkish. The research highlights that contemporary transformer-based methods demonstrate results comparable to those achieved by traditional methods employing Hidden Markov Models. These transformer-based approaches boast simpler configurations and exhibit faster performance during both training and analysis. An integral component of the article is the examination of popular and actively evolving libraries and tools essential for constructing language models. Notable tools such as NLTK, TensorFlow, PyTorch, and Gensim are reviewed, with a comparative analysis considering their simplicity and accessibility for implementing diverse language models. The aim is to provide readers with insights into the landscape of contemporary language modeling methodologies and the tools available for their implementation. Full article

(This article belongs to the Special Issue Natural Language Processing: Trends and Challenges)

► Show Figures

Figure 1

18 pages, 1654 KiB

Open AccessSystematic Review

Exploring Technology- and Sensor-Driven Trends in Education: A Natural-Language-Processing-Enhanced Bibliometrics Study

by Manuel J. Gomez, José A. Ruipérez-Valiente and Félix J. García Clemente

Sensors 2023, 23(23), 9303; https://doi.org/10.3390/s23239303 - 21 Nov 2023

Cited by 1 | Viewed by 1336

Abstract

Over the last decade, there has been a large amount of research on technology-enhanced learning (TEL), including the exploration of sensor-based technologies. This research area has seen significant contributions from various conferences, including the European Conference on Technology-Enhanced Learning (EC-TEL). In this research, we present a comprehensive analysis that aims to identify and understand the evolving topics in the TEL area and their implications in defining the future of education. To achieve this, we use a novel methodology that combines a text-analytics-driven topic analysis and a social network analysis following an open science approach. We collected a comprehensive corpus of 477 papers from the last decade of the EC-TEL conference (including full and short papers), parsed them automatically, and used the extracted text to find the main topics and collaborative networks across papers. Our analysis focused on the following three main objectives: (1) Discovering the main topics of the conference based on paper keywords and topic modeling using the full text of the manuscripts. (2) Discovering the evolution of said topics over the last ten years of the conference. (3) Discovering how papers and authors from the conference have interacted over the years from a network perspective. Specifically, we used Python and PdfToText library to parse and extract the text and author keywords from the corpus. Moreover, we employed Gensim library Latent Dirichlet Allocation (LDA) topic modeling to discover the primary topics from the last decade. Finally, Gephi and Networkx libraries were used to create co-authorship and citation networks. Our findings provide valuable insights into the latest trends and developments in educational technology, underlining the critical role of sensor-driven technologies in leading innovation and shaping the future of this area. Full article

(This article belongs to the Special Issue Advances of Sensors and Human-Centered Intelligent Systems in Education)

► Show Figures

Figure 1

22 pages, 3388 KiB

Open AccessFeature PaperArticle

Automated Generation of Energy Profiles for Urban Simulations

by Tobias Maile, Heiner Steinacker, Matthias W. Stickel, Etienne Ott and Christian Kley

Energies 2023, 16(17), 6115; https://doi.org/10.3390/en16176115 - 22 Aug 2023

Cited by 1 | Viewed by 1290

Abstract

Urban simulations play an important role on the way to a climate neutral society. To enable early assessment of different energy concepts for urban developments, energy profiles for different building types are needed. This work describes the development and use of a new engineering tool GenSim to quickly and reliably generate energy profiles for urban simulations and early building energy predictions. While GenSim is a standalone tool to create energy profiles for early design assessment, it was developed in the context of urban simulations to primarily support energy efficient urban developments within Germany. Energy engineers quickly embraced the tool due to its simplicity and comprehensible results. The development of the tool was recently switched to open source to enable its usage to a broader audience. In order to foster its development and use, a detailed testing framework has been established to ensure the quality of the results of the tool. The paper includes a detailed validation section to demonstrate the validity of the results compared to a detailed building energy simulation model and actual measured performance data. Full article

(This article belongs to the Special Issue Energy Efficiency through Building Simulation)

► Show Figures

Figure 1

10 pages, 866 KiB

Open AccessArticle

Weakly Bound Dimer of a Diaryloxygermylene Derived from a ^tBuPh₂Si-Substituted 2,2′-Methylenediphenol

by Ryo Yamazaki, Ryunosuke Kuriki, Asuka Sugihara, Youichi Ishii and Takuya Kuwabara

Crystals 2022, 12(5), 605; https://doi.org/10.3390/cryst12050605 - 25 Apr 2022

Cited by 1 | Viewed by 2021

Abstract

Novel diaryloxygermylenes have been prepared by the reaction of Lappert’s germylene, Ge[N(SiMe₃)₂]₂, with 2,2′-methylenediphenols bearing different substituents. The bulkiness of the substituents on the ortho positions of the phenolic oxygen (6 and 6′ positions) affects the structure of the products both in the solid-state and in solution. When the ortho substituents are Si^tBuPh₂, the diaryloxygemylene crystalizes as a weakly bound dimer with intermolecular Ge…O distances of ca. 3.0 Å and exists as a monomer in solution. In contrast, the germylene with SiMePh₂ groups as the ortho substituents form a tightly bound dimer featuring a Ge₂O₂ rhombus with cis-oriented terminal aryloxy groups in the crystalline state, which is confirmed to be maintained in solution through the VT (variable-temperature)-¹H NMR studies. To the best of our knowledge, the former dimeric structure is unprecedented in the family of dioxytetrylenes. Full article

(This article belongs to the Section Crystal Engineering)

► Show Figures

Figure 1

19 pages, 495 KiB

Open AccessArticle

Sentence Boundary Extraction from Scientific Literature of Electric Double Layer Capacitor Domain: Tools and Techniques

by Md. Saef Ullah Miah, Junaida Sulaiman, Talha Bin Sarwar, Ateeqa Naseer, Fasiha Ashraf, Kamal Zuhairi Zamli and Rajan Jose

Appl. Sci. 2022, 12(3), 1352; https://doi.org/10.3390/app12031352 - 27 Jan 2022

Cited by 13 | Viewed by 3593

Abstract

Given the growth of scientific literature on the web, particularly material science, acquiring data precisely from the literature has become more significant. Material information systems, or chemical information systems, play an essential role in discovering data, materials, or synthesis processes using the existing scientific literature. Processing and understanding the natural language of scientific literature is the backbone of these systems, which depend heavily on appropriate textual content. Appropriate textual content means a complete, meaningful sentence from a large chunk of textual content. The process of detecting the beginning and end of a sentence and extracting them as correct sentences is called sentence boundary extraction. The accurate extraction of sentence boundaries from PDF documents is essential for readability and natural language processing. Therefore, this study provides a comparative analysis of different tools for extracting PDF documents into text, which are available as Python libraries or packages and are widely used by the research community. The main objective is to find the most suitable technique among the available techniques that can correctly extract sentences from PDF files as text. The performance of the used techniques Pypdf2, Pdfminer.six, Pymupdf, Pdftotext, Tika, and Grobid is presented in terms of precision, recall, f-1 score, run time, and memory consumption. NLTK, Spacy, and Gensim Natural Language Processing (NLP) tools are used to identify sentence boundaries. Of all the techniques studied, the Grobid PDF extraction package using the NLP tool Spacy achieved the highest f-1 score of 93% and consumed the least amount of memory at 46.13 MegaBytes. Full article

(This article belongs to the Special Issue Natural Language Processing: Approaches and Applications)

► Show Figures

Figure 1

21 pages, 7975 KiB

Open AccessArticle

Topic Extraction and Interactive Knowledge Graphs for Learning Resources

by Ahmed Badawy, Jesus A. Fisteus, Tarek M. Mahmoud and Tarek Abd El-Hafeez

Sustainability 2022, 14(1), 226; https://doi.org/10.3390/su14010226 - 26 Dec 2021

Cited by 13 | Viewed by 3895

Abstract

Humanity development through education is an important method of sustainable development. This guarantees community development at present time without any negative effects in the future and also provides prosperity for future generations. E-learning is a natural development of the educational tools in this era and current circumstances. Thanks to the rapid development of computer sciences and telecommunication technologies, this has evolved impressively. In spite of facilitating the educational process, this development has also provided a massive amount of learning resources, which makes the task of searching and extracting useful learning resources difficult. Therefore, new tools need to be advanced to facilitate this development. In this paper we present a new algorithm that has the ability to extract the main topics from textual learning resources, link related resources and generate interactive dynamic knowledge graphs. This algorithm accurately and efficiently accomplishes those tasks no matter how big or small the texts are. We used Wikipedia Miner, TextRank, and Gensim within our algorithm. Our algorithm’s accuracy was evaluated against Gensim, largely improving its accuracy. This could be a step towards strengthening self-learning and supporting the sustainable development of communities, and more broadly of humanity, across different generations. Full article

(This article belongs to the Special Issue Education 4.0: Mobilizing for Sustainable Development)

► Show Figures

Figure 1

Search Results (7)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (7)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI