Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (255)

Search Parameters:
Keywords = Word2Vec

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 5279 KiB  
Article
Navigating Digital Challenges for SMEs: A Two-Tier Approach to Risks Mitigation and Sustainability
by Arnesh Telukdarie, Thabile Dube, Megashness Munsamy, Khuliso Murulane and Regionald Mongwe
Sustainability 2024, 16(14), 5857; https://doi.org/10.3390/su16145857 - 9 Jul 2024
Viewed by 538
Abstract
The global significance of SMEs has traditionally been recognized as a key driver of economic growth and sustainability. The emergence of digital technologies and Information and Communication Technology for Development (ICT4D) holds significant potential to further enhance this impact. However, SMEs in developing [...] Read more.
The global significance of SMEs has traditionally been recognized as a key driver of economic growth and sustainability. The emergence of digital technologies and Information and Communication Technology for Development (ICT4D) holds significant potential to further enhance this impact. However, SMEs in developing countries have faced challenges in adopting sustainable, resource-intensive digital systems. Factors such as limited skills, financial constraints, and the alignment of suitable solutions hinder this adoption. To address these challenges and promote sustainable digital transformation, this study proposes a two-tier approach. The first prong employs Natural Language Processing (NLP) techniques, including Word2Vec, for global analysis and digital systems identification. The second prong involves a country-specific analysis of SMEs’ digital requirements. This two-tier analysis aims to uncover the actual digital needs of SMEs while shedding light on high-intensity global SME activities that, if integrated through ICT4D, could effectively address the risks and challenges SMEs face in adopting, implementing, and maintaining digital systems. In addition, the study develops systems required by SMEs to optimize their business processes and production, thereby promoting their growth and sustainability in the digital era. The results of this study demonstrate the effectiveness of these proposed methods in addressing digital challenges for SMEs and fostering sustainable development. Full article
Show Figures

Figure 1

19 pages, 2258 KiB  
Article
Social Network Forensics Analysis Model Based on Network Representation Learning
by Kuo Zhao, Huajian Zhang, Jiaxin Li, Qifu Pan, Li Lai, Yike Nie and Zhongfei Zhang
Entropy 2024, 26(7), 579; https://doi.org/10.3390/e26070579 - 7 Jul 2024
Viewed by 551
Abstract
The rapid evolution of computer technology and social networks has led to massive data generation through interpersonal communications, necessitating improved methods for information mining and relational analysis in areas such as criminal activity. This paper introduces a Social Network Forensic Analysis model that [...] Read more.
The rapid evolution of computer technology and social networks has led to massive data generation through interpersonal communications, necessitating improved methods for information mining and relational analysis in areas such as criminal activity. This paper introduces a Social Network Forensic Analysis model that employs network representation learning to identify and analyze key figures within criminal networks, including leadership structures. The model incorporates traditional web forensics and community algorithms, utilizing concepts such as centrality and similarity measures and integrating the Deepwalk, Line, and Node2vec algorithms to map criminal networks into vector spaces. This maintains node features and structural information that are crucial for the relational analysis. The model refines node relationships through modified random walk sampling, using BFS and DFS, and employs a Continuous Bag-of-Words with Hierarchical Softmax for node vectorization, optimizing the value distribution via the Huffman tree. Hierarchical clustering and distance measures (cosine and Euclidean) were used to identify the key nodes and establish a hierarchy of influence. The findings demonstrate the effectiveness of the model in accurately vectorizing nodes, enhancing inter-node relationship precision, and optimizing clustering, thereby advancing the tools for combating complex criminal networks. Full article
(This article belongs to the Section Complexity)
Show Figures

Figure 1

30 pages, 1318 KiB  
Article
Malware Classification Using Dynamically Extracted API Call Embeddings
by Sahil Aggarwal and Fabio Di Troia
Appl. Sci. 2024, 14(13), 5731; https://doi.org/10.3390/app14135731 - 30 Jun 2024
Viewed by 812
Abstract
Malware classification stands as a crucial element in establishing robust computer security protocols, encompassing the segmentation of malware into discrete groupings. Recently, the emergence of machine learning has presented itself as an apt approach for addressing this challenge. Models can undergo training employing [...] Read more.
Malware classification stands as a crucial element in establishing robust computer security protocols, encompassing the segmentation of malware into discrete groupings. Recently, the emergence of machine learning has presented itself as an apt approach for addressing this challenge. Models can undergo training employing diverse malware attributes, such as opcodes and API calls, to distill valuable insights for effective classification. Within the realm of natural language processing, word embeddings assume a pivotal role by representing text in a manner that aligns closely with the proximity of similar words. These embeddings facilitate the quantification of word resemblances. This research embarks on a series of experiments that harness hybrid machine learning methodologies. We derive word vectors from dynamic API call logs associated with malware and integrate them as features in collaboration with diverse classifiers. Our methodology involves the utilization of Hidden Markov Models and Word2Vec to generate embeddings from API call logs. Additionally, we amalgamate renowned models like BERT and ELMo, noted for their capacity to yield contextualized embeddings. The resultant vectors are channeled into our classifiers, namely Support Vector Machines (SVMs), Random Forest (RF), k-Nearest Neighbors (kNNs), and Convolutional Neural Networks (CNNs). Through two distinct sets of experiments, our objective revolves around the classification of both malware families and categories. The outcomes achieved illuminate the efficacy of API call embeddings as a potent instrument in the domain of malware classification, particularly in the realm of identifying malware families. The best combination was RF and word embeddings generated by Word2Vec, ELMo, and BERT, achieving an accuracy between 0.91 and 0.93. This result underscores the potential of our approach in effectively classifying malware. Full article
(This article belongs to the Collection Innovation in Information Security)
Show Figures

Figure 1

23 pages, 4962 KiB  
Article
Ensemble Learning with Pre-Trained Transformers for Crash Severity Classification: A Deep NLP Approach
by Shadi Jaradat, Richi Nayak, Alexander Paz and Mohammed Elhenawy
Algorithms 2024, 17(7), 284; https://doi.org/10.3390/a17070284 - 30 Jun 2024
Viewed by 701
Abstract
Transfer learning has gained significant traction in natural language processing due to the emergence of state-of-the-art pre-trained language models (PLMs). Unlike traditional word embedding methods such as TF-IDF and Word2Vec, PLMs are context-dependent and outperform conventional techniques when fine-tuned for specific tasks. This [...] Read more.
Transfer learning has gained significant traction in natural language processing due to the emergence of state-of-the-art pre-trained language models (PLMs). Unlike traditional word embedding methods such as TF-IDF and Word2Vec, PLMs are context-dependent and outperform conventional techniques when fine-tuned for specific tasks. This paper proposes an innovative hard voting classifier to enhance crash severity classification by combining machine learning and deep learning models with various word embedding techniques, including BERT, RoBERTa, Word2Vec, and TF-IDF. Our study involves two comprehensive experiments using motorists’ crash data from the Missouri State Highway Patrol. The first experiment evaluates the performance of three machine learning models—XGBoost (XGB), random forest (RF), and naive Bayes (NB)—paired with TF-IDF, Word2Vec, and BERT feature extraction techniques. Additionally, BERT and RoBERTa are fine-tuned with a Bidirectional Long Short-Term Memory (Bi-LSTM) classification model. All models are initially evaluated on the original dataset. The second experiment repeats the evaluation using an augmented dataset to address the severe data imbalance. The results from the original dataset show strong performance for all models in the “Fatal” and “Personal Injury” classes but a poor classification of the minority “Property Damage” class. In the augmented dataset, while the models continued to excel with the majority classes, only XGB/TFIDF and BERT-LSTM showed improved performance for the minority class. The ensemble model outperformed individual models in both datasets, achieving an F1 score of 99% for “Fatal” and “Personal Injury” and 62% for “Property Damage” on the augmented dataset. These findings suggest that ensemble models, combined with data augmentation, are highly effective for crash severity classification and potentially other textual classification tasks. Full article
(This article belongs to the Special Issue AI Algorithms for Positive Change in Digital Futures)
Show Figures

Figure 1

31 pages, 4733 KiB  
Article
Enhanced Network Intrusion Detection System for Internet of Things Security Using Multimodal Big Data Representation with Transfer Learning and Game Theory
by Farhan Ullah, Ali Turab, Shamsher Ullah, Diletta Cacciagrano and Yue Zhao
Sensors 2024, 24(13), 4152; https://doi.org/10.3390/s24134152 - 26 Jun 2024
Viewed by 1843
Abstract
Internet of Things (IoT) applications and resources are highly vulnerable to flood attacks, including Distributed Denial of Service (DDoS) attacks. These attacks overwhelm the targeted device with numerous network packets, making its resources inaccessible to authorized users. Such attacks may comprise attack references, [...] Read more.
Internet of Things (IoT) applications and resources are highly vulnerable to flood attacks, including Distributed Denial of Service (DDoS) attacks. These attacks overwhelm the targeted device with numerous network packets, making its resources inaccessible to authorized users. Such attacks may comprise attack references, attack types, sub-categories, host information, malicious scripts, etc. These details assist security professionals in identifying weaknesses, tailoring defense measures, and responding rapidly to possible threats, thereby improving the overall security posture of IoT devices. Developing an intelligent Intrusion Detection System (IDS) is highly complex due to its numerous network features. This study presents an improved IDS for IoT security that employs multimodal big data representation and transfer learning. First, the Packet Capture (PCAP) files are crawled to retrieve the necessary attacks and bytes. Second, Spark-based big data optimization algorithms handle huge volumes of data. Second, a transfer learning approach such as word2vec retrieves semantically-based observed features. Third, an algorithm is developed to convert network bytes into images, and texture features are extracted by configuring an attention-based Residual Network (ResNet). Finally, the trained text and texture features are combined and used as multimodal features to classify various attacks. The proposed method is thoroughly evaluated on three widely used IoT-based datasets: CIC-IoT 2022, CIC-IoT 2023, and Edge-IIoT. The proposed method achieves excellent classification performance, with an accuracy of 98.2%. In addition, we present a game theory-based process to validate the proposed approach formally. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

26 pages, 4703 KiB  
Article
A Novel Approach for the Analysis of Ship Pollution Accidents Using Knowledge Graph
by Junlin Hu, Weixiang Zhou, Pengjun Zheng and Guiyun Liu
Sustainability 2024, 16(13), 5296; https://doi.org/10.3390/su16135296 - 21 Jun 2024
Viewed by 603
Abstract
Ship pollution accidents can cause serious harm to marine ecosystems and economic development. This study proposes a ship pollution accident analysis method based on a knowledge graph to solve the problem that complex accident information is challenging to present clearly. Based on the [...] Read more.
Ship pollution accidents can cause serious harm to marine ecosystems and economic development. This study proposes a ship pollution accident analysis method based on a knowledge graph to solve the problem that complex accident information is challenging to present clearly. Based on the information of 411 ship pollution accidents along the coast of China, the Word2vec’s word vector models, BERT–BiLSTM–CRF model and BiLSTM–CRF model, were applied to extract entities and relations, and the Neo4j graph database was used for knowledge graph data storage and visualization. Furthermore, the case information retrieval and cause correlation of ship pollution accidents were analyzed by a knowledge graph. This method established 3928 valid entities and 5793 valid relationships, and the extraction accuracy of the entities and relationships was 79.45% and 82.47%, respectively. In addition, through visualization and Cypher language queries, we can clearly understand the logical relationship between accidents and causes and quickly retrieve relevant information. Using the centrality algorithm, we can analyze the degree of influence between accident causes and put forward targeted measures based on the relevant causes, which will help improve accident prevention and emergency response capabilities and strengthen marine environmental protection. Full article
Show Figures

Figure 1

21 pages, 1861 KiB  
Article
Prominent User Segments in Online Consumer Recommendation Communities: Capturing Behavioral and Linguistic Qualities with User Comment Embeddings
by Apostolos Skotis and Christos Livas
Information 2024, 15(6), 356; https://doi.org/10.3390/info15060356 - 15 Jun 2024
Viewed by 509
Abstract
Online conversation communities have become an influential source of consumer recommendations in recent years. We propose a set of meaningful user segments which emerge from user embedding representations, based exclusively on comments’ text input. Data were collected from three popular recommendation communities in [...] Read more.
Online conversation communities have become an influential source of consumer recommendations in recent years. We propose a set of meaningful user segments which emerge from user embedding representations, based exclusively on comments’ text input. Data were collected from three popular recommendation communities in Reddit, covering the domains of book and movie suggestions. We utilized two neural language model methods to produce user embeddings, namely Doc2Vec and Sentence-BERT. Embedding interpretation issues were addressed by examining latent factors’ associations with behavioral, sentiment, and linguistic variables, acquired using the VADER, LIWC, and LFTK libraries in Python. User clusters were identified, having different levels of engagement and linguistic characteristics. The latent features of both approaches were strongly correlated with several user behavioral and linguistic indicators. Both approaches managed to capture significant variability in writing styles and quality, such as length, readability, use of function words, and complexity. However, the Doc2Vec features better described users by varying level of contribution, while S-BERT-based features were more closely adapted to users’ varying emotional engagement. Prominent segments revealed prolific users with formal, intuitive, emotionally distant, and highly analytical styles, as well as users who were less elaborate, less consistent, but more emotionally connected. The observed patterns were largely similar across communities. Full article
(This article belongs to the Section Information Processes)
Show Figures

Figure 1

20 pages, 5055 KiB  
Article
Automatic Extraction and Cluster Analysis of Natural Disaster Metadata Based on the Unified Metadata Framework
by Zongmin Wang, Xujie Shi, Haibo Yang, Bo Yu and Yingchun Cai
ISPRS Int. J. Geo-Inf. 2024, 13(6), 201; https://doi.org/10.3390/ijgi13060201 - 14 Jun 2024
Viewed by 551
Abstract
The development of information technology has led to massive, multidimensional, and heterogeneously sourced disaster data. However, there’s currently no universal metadata standard for managing natural disasters. Common pre-training models for information extraction requiring extensive training data show somewhat limited effectiveness, with limited annotated [...] Read more.
The development of information technology has led to massive, multidimensional, and heterogeneously sourced disaster data. However, there’s currently no universal metadata standard for managing natural disasters. Common pre-training models for information extraction requiring extensive training data show somewhat limited effectiveness, with limited annotated resources. This study establishes a unified natural disaster metadata standard, utilizes self-trained universal information extraction (UIE) models and Python libraries to extract metadata stored in both structured and unstructured forms, and analyzes the results using the Word2vec-Kmeans cluster algorithm. The results show that (1) the self-trained UIE model, with a learning rate of 3 × 10−4 and a batch_size of 32, significantly improves extraction results for various natural disasters by over 50%. Our optimized UIE model outperforms many other extraction methods in terms of precision, recall, and F1 scores. (2) The quality assessments of consistency, completeness, and accuracy for ten tables all exceed 0.80, with variances between the three dimensions being 0.04, 0.03, and 0.05. The overall evaluation of data items of tables also exceeds 0.80, consistent with the results at the table level. The metadata model framework constructed in this study demonstrates high-quality stability. (3) Taking the flood dataset as an example, clustering reveals five main themes with high similarity within clusters, and the differences between clusters are deemed significant relative to the differences within clusters at a significance level of 0.01. Overall, this experiment supports effective sharing of disaster data resources and enhances natural disaster emergency response efficiency. Full article
Show Figures

Figure 1

38 pages, 2043 KiB  
Article
Boosting Institutional Identity on X Using NLP and Sentiment Analysis: King Faisal University as a Case Study
by Khalied M. Albarrak and Shaymaa E. Sorour
Mathematics 2024, 12(12), 1806; https://doi.org/10.3390/math12121806 - 11 Jun 2024
Viewed by 667
Abstract
Universities increasingly leverage social media platforms, especially Twitter, for news dissemination, audience engagement, and feedback collection. King Faisal University (KFU) is dedicated to enhancing its institutional identity (ID), grounded in environmental sustainability and food security, encompassing nine critical areas. This study aims to [...] Read more.
Universities increasingly leverage social media platforms, especially Twitter, for news dissemination, audience engagement, and feedback collection. King Faisal University (KFU) is dedicated to enhancing its institutional identity (ID), grounded in environmental sustainability and food security, encompassing nine critical areas. This study aims to assess the impact of KFU’s Twitter interactions on public awareness of its institutional identity using systematic analysis and machine learning (ML) methods. The objectives are to: (1) Determine the influence of KFU’s Twitter presence on ID awareness; (2) create a dedicated dataset for real-time public interaction analysis with KFU’s Twitter content; (3) investigate Twitter’s role in promoting KFU’s institutional identity across 9-ID domains and its changing impact over time; (4) utilize k-means clustering and sentiment analysis (TFIDF and Word2vec) to classify data and assess similarities among the identity domains; and (5) apply the categorization method to process and categorize tweets, facilitating the assessment of word meanings and similarities of the 9-ID domains. The study also employs four ML models, including Logistic Regression (LR) and Support Vector Machine (SVM), with the Random Forest (RF) model combined with Word2vec achieving the highest accuracy of 100%. The findings underscore the value of KFU’s Twitter data analysis in deepening the understanding of its ID and guiding the development of effective communication strategies. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Decision Making)
Show Figures

Figure 1

16 pages, 6121 KiB  
Article
Prediction of Machine-Generated Financial Tweets Using Advanced Bidirectional Encoder Representations from Transformers
by Muhammad Asad Arshed, Ștefan Cristian Gherghina, Dur-E-Zahra and Mahnoor Manzoor
Electronics 2024, 13(11), 2222; https://doi.org/10.3390/electronics13112222 - 6 Jun 2024
Viewed by 522
Abstract
With the rise of Large Language Models (LLMs), distinguishing between genuine and AI-generated content, particularly in finance, has become challenging. Previous studies have focused on binary identification of ChatGPT-generated content, overlooking other AI tools used for text regeneration. This study addresses this gap [...] Read more.
With the rise of Large Language Models (LLMs), distinguishing between genuine and AI-generated content, particularly in finance, has become challenging. Previous studies have focused on binary identification of ChatGPT-generated content, overlooking other AI tools used for text regeneration. This study addresses this gap by examining various AI-regenerated content types in the finance domain. Objective: The study aims to differentiate between human-generated financial content and AI-regenerated content, specifically focusing on ChatGPT, QuillBot, and SpinBot. It constructs a dataset comprising real text and AI-regenerated text for this purpose. Contribution: This research contributes to the field by providing a dataset that includes various types of AI-regenerated financial content. It also evaluates the performance of different models, particularly highlighting the effectiveness of the Bidirectional Encoder Representations from the Transformers Base Cased model in distinguishing between these content types. Methods: The dataset is meticulously preprocessed to ensure quality and reliability. Various models, including Bidirectional Encoder Representations Base Cased, are fine-tuned and compared with traditional machine learning models using TFIDF and Word2Vec approaches. Results: The Bidirectional Encoder Representations Base Cased model outperforms other models, achieving an accuracy, precision, recall, and F1 score of 0.73, 0.73, 0.73, and 0.72 respectively, in distinguishing between real and AI-regenerated financial content. Conclusions: This study demonstrates the effectiveness of the Bidirectional Encoder Representations base model in differentiating between human-generated financial content and AI-regenerated content. It highlights the importance of considering various AI tools in identifying synthetic content, particularly in the finance domain in Pakistan. Full article
Show Figures

Figure 1

20 pages, 2640 KiB  
Article
Enhancing Arabic Dialect Detection on Social Media: A Hybrid Model with an Attention Mechanism
by Wael M. S. Yafooz
Information 2024, 15(6), 316; https://doi.org/10.3390/info15060316 - 28 May 2024
Cited by 1 | Viewed by 608
Abstract
Recently, the widespread use of social media and easy access to the Internet have brought about a significant transformation in the type of textual data available on the Web. This change is particularly evident in Arabic language usage, as the growing number of [...] Read more.
Recently, the widespread use of social media and easy access to the Internet have brought about a significant transformation in the type of textual data available on the Web. This change is particularly evident in Arabic language usage, as the growing number of users from diverse domains has led to a considerable influx of Arabic text in various dialects, each characterized by differences in morphology, syntax, vocabulary, and pronunciation. Consequently, researchers in language recognition and natural language processing have become increasingly interested in identifying Arabic dialects. Numerous methods have been proposed to recognize this informal data, owing to its crucial implications for several applications, such as sentiment analysis, topic modeling, text summarization, and machine translation. However, Arabic dialect identification is a significant challenge due to the vast diversity of the Arabic language in its dialects. This study introduces a novel hybrid machine and deep learning model, incorporating an attention mechanism for detecting and classifying Arabic dialects. Several experiments were conducted using a novel dataset that collected information from user-generated comments from Twitter of Arabic dialects, namely, Egyptian, Gulf, Jordanian, and Yemeni, to evaluate the effectiveness of the proposed model. The dataset comprises 34,905 rows extracted from Twitter, representing an unbalanced data distribution. The data annotation was performed by native speakers proficient in each dialect. The results demonstrate that the proposed model outperforms the performance of long short-term memory, bidirectional long short-term memory, and logistic regression models in dialect classification using different word representations as follows: term frequency-inverse document frequency, Word2Vec, and global vector for word representation. Full article
(This article belongs to the Special Issue Recent Advances in Social Media Mining and Analysis)
Show Figures

Figure 1

22 pages, 516 KiB  
Article
The Impact of Input Types on Smart Contract Vulnerability Detection Performance Based on Deep Learning: A Preliminary Study
by Izdehar M. Aldyaflah, Wenbing Zhao, Shunkun Yang and Xiong Luo
Information 2024, 15(6), 302; https://doi.org/10.3390/info15060302 - 24 May 2024
Cited by 1 | Viewed by 529
Abstract
Stemming vulnerabilities out of a smart contract prior to its deployment is essential to ensure the security of decentralized applications. As such, numerous tools and machine-learning-based methods have been proposed to help detect vulnerabilities in smart contracts. Furthermore, various ways of encoding the [...] Read more.
Stemming vulnerabilities out of a smart contract prior to its deployment is essential to ensure the security of decentralized applications. As such, numerous tools and machine-learning-based methods have been proposed to help detect vulnerabilities in smart contracts. Furthermore, various ways of encoding the smart contracts for analysis have also been proposed. However, the impact of these input methods has not been systematically studied, which is the primary goal of this paper. In this preliminary study, we experimented with four common types of input, including Word2Vec, FastText, Bag-of-Words (BoW), and Term Frequency–Inverse Document Frequency (TF-IDF). To focus on the comparison of these input types, we used the same deep-learning model, i.e., convolutional neural networks, in all experiments. Using a public dataset, we compared the vulnerability detection performance of the four input types both in the binary classification scenarios and the multiclass classification scenario. Our findings show that TF-IDF is the best overall input type among the four. TF-IDF has excellent detection performance in all scenarios: (1) it has the best F1 score and accuracy in binary classifications for all vulnerability types except for the delegate vulnerability where TF-IDF comes in a close second, and (2) it comes in a very close second behind BoW (within 0.8%) in the multiclass classification. Full article
(This article belongs to the Special Issue Machine Learning for the Blockchain)
Show Figures

Figure 1

16 pages, 1931 KiB  
Article
CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students
by Aniss Qostal, Aniss Moumen and Younes Lakhrissi
Data 2024, 9(6), 74; https://doi.org/10.3390/data9060074 - 24 May 2024
Viewed by 845
Abstract
Deep learning (DL)-oriented document processing is widely used in different fields for extraction, recognition, and classification processes from raw corpus of data. The article examines the application of deep learning approaches, based on different neural network methods, including Gated Recurrent Unit (GRU), long [...] Read more.
Deep learning (DL)-oriented document processing is widely used in different fields for extraction, recognition, and classification processes from raw corpus of data. The article examines the application of deep learning approaches, based on different neural network methods, including Gated Recurrent Unit (GRU), long short-term memory (LSTM), and convolutional neural networks (CNNs). The compared models were combined with two different word embedding techniques, namely: Bidirectional Encoder Representations from Transformers (BERT) and Gensim Word2Vec. The models are designed to evaluate the performance of architectures based on neural network techniques for the classification of CVs of Moroccan engineering students at ENSAK (National School of Applied Sciences of Kenitra, Ibn Tofail University). The used dataset included CVs collected from engineering students at ENSAK in 2023 for a project on the employability of Moroccan engineers in which new approaches were applied, especially machine learning, deep learning, and big data. Accordingly, 867 resumes were collected from five specialties of study (Electrical Engineering (ELE), Networks and Systems Telecommunications (NST), Computer Engineering (CE), Automotive Mechatronics Engineering (AutoMec), Industrial Engineering (Indus)). The results showed that the proposed models based on the BERT embedding approach had more accuracy compared to models based on the Gensim Word2Vec embedding approach. Accordingly, the CNN-GRU/BERT model achieved slightly better accuracy with 0.9351 compared to other hybrid models. On the other hand, single learning models also have good metrics, especially based on BERT embedding architectures, where CNN has the best accuracy with 0.9188. Full article
Show Figures

Figure 1

21 pages, 4782 KiB  
Article
biSAMNet: A Novel Approach in Maritime Data Completion Using Deep Learning and NLP Techniques
by Yong Li and Zhishan Wang
J. Mar. Sci. Eng. 2024, 12(6), 868; https://doi.org/10.3390/jmse12060868 - 23 May 2024
Viewed by 593
Abstract
In the extensive monitoring of maritime traffic, maritime management frequently encounters incomplete automatic identification system (AIS) data. This deficiency poses significant challenges to safety management, requiring effective methods to infer corresponding ship information. We tackle this issue using a classification approach. Due to [...] Read more.
In the extensive monitoring of maritime traffic, maritime management frequently encounters incomplete automatic identification system (AIS) data. This deficiency poses significant challenges to safety management, requiring effective methods to infer corresponding ship information. We tackle this issue using a classification approach. Due to the absence of a fixed road network at sea unlike on land, raw trajectories are difficult to convert and cannot be directly fed into neural networks. We devised a latitude–longitude gridding encoding strategy capable of transforming continuous latitude–longitude data into discrete grid points. Simultaneously, we employed a compression algorithm to further extract significant grid points, thereby shortening the encoding sequence. Utilizing natural language processing techniques, we integrate the Word2vec word embedding approach with our novel biLSTM self-attention chunk-max pooling net (biSAMNet) model, enhancing the classification of vessel trajectories. This method classifies targets into ship types and ship lengths within static information. Employing the Taiwan Strait as a case study and benchmarking against CNN, RNN, and methods based on the attention mechanism, our findings underscore our model’s superiority. The biSAMNet achieves an impressive trajectory classification F1 score of 0.94 in the ship category dataset using only five-dimensional word embeddings. Additionally, through ablation experiments, the effectiveness of the Word2vec pre-trained embedding layer is highlighted. This study introduces a novel method for handling ship trajectory data, addressing the challenge of obtaining ship static information when AIS data are unreliable. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

30 pages, 16418 KiB  
Article
Automating Fault Test Cases Generation and Execution for Automotive Safety Validation via NLP and HIL Simulation
by Ayman Amyan, Mohammad Abboush, Christoph Knieke and Andreas Rausch
Sensors 2024, 24(10), 3145; https://doi.org/10.3390/s24103145 - 15 May 2024
Cited by 1 | Viewed by 725
Abstract
The complexity and the criticality of automotive electronic implanted systems are steadily advancing and that is especially the case for automotive software development. ISO 26262 describes requirements for the development process to confirm the safety of such complex systems. Among these requirements, fault [...] Read more.
The complexity and the criticality of automotive electronic implanted systems are steadily advancing and that is especially the case for automotive software development. ISO 26262 describes requirements for the development process to confirm the safety of such complex systems. Among these requirements, fault injection is a reliable technique to assess the effectiveness of safety mechanisms and verify the correct implementation of the safety requirements. However, the method of injecting the fault in the system under test in many cases is still manual and depends on an expert, requiring a high level of knowledge of the system. In complex systems, it consumes time, is difficult to execute, and takes effort, because the testers limit the fault injection experiments and inject the minimum number of possible test cases. Fault injection enables testers to identify and address potential issues with a system under test before they become actual problems. In the automotive industry, failures can have serious hazards. In these systems, it is essential to ensure that the system can operate safely even in the presence of faults. We propose an approach using natural language processing (NLP) technologies to automatically derive the fault test cases from the functional safety requirements (FSRs) and execute them automatically by hardware-in-the-loop (HIL) in real time according to the black-box concept and the ISO 26262 standard. The approach demonstrates effectiveness in automatically identifying fault injection locations and conditions, simplifying the testing process, and providing a scalable solution for various safety-critical systems. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

Back to TopTop