Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,069)

Search Parameters:
Keywords = document datasets

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
35 pages, 409 KiB  
Review
Fault Detection and Diagnosis in Industry 4.0: A Review on Challenges and Opportunities
by Denis Leite, Emmanuel Andrade, Diego Rativa and Alexandre M. A. Maciel
Sensors 2025, 25(1), 60; https://doi.org/10.3390/s25010060 - 25 Dec 2024
Abstract
Integrating Machine Learning (ML) in industrial settings has become a cornerstone of Industry 4.0, aiming to enhance production system reliability and efficiency through Real-Time Fault Detection and Diagnosis (RT-FDD). This paper conducts a comprehensive literature review of ML-based RT-FDD. Out of 805 documents, [...] Read more.
Integrating Machine Learning (ML) in industrial settings has become a cornerstone of Industry 4.0, aiming to enhance production system reliability and efficiency through Real-Time Fault Detection and Diagnosis (RT-FDD). This paper conducts a comprehensive literature review of ML-based RT-FDD. Out of 805 documents, 29 studies were identified as noteworthy for presenting innovative methods that address the complexities and challenges associated with fault detection. While ML-based RT-FDD offers different benefits, including fault prediction accuracy, it faces challenges in data quality, model interpretability, and integration complexities. This review identifies a gap in industrial implementation outcomes that opens new research opportunities. Future Fault Detection and Diagnosis (FDD) research may prioritize standardized datasets to ensure reproducibility and facilitate comparative evaluations. Furthermore, there is a pressing need to refine techniques for handling unbalanced datasets and improving feature extraction for temporal series data. Implementing Explainable Artificial Intelligence (AI) (XAI) tailored to industrial fault detection is imperative for enhancing interpretability and trustworthiness. Subsequent studies must emphasize comprehensive comparative evaluations, reducing reliance on specialized expertise, documenting real-world outcomes, addressing data challenges, and bolstering real-time capabilities and integration. By addressing these avenues, the field can propel the advancement of ML-based RT-FDD methodologies, ensuring their effectiveness and relevance in industrial contexts. Full article
Show Figures

Figure 1

19 pages, 1457 KiB  
Article
Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome
by Aiden M. C. Pike, Saeed Amal, Melissa S. Maginnis and Michael P. Wilczek
Viruses 2025, 17(1), 12; https://doi.org/10.3390/v17010012 - 25 Dec 2024
Abstract
JC polyomavirus (JCPyV) establishes a persistent, asymptomatic kidney infection in most of the population. However, JCPyV can reactivate in immunocompromised individuals and cause progressive multifocal leukoencephalopathy (PML), a fatal demyelinating disease with no approved treatment. Mutations in the hypervariable non-coding control region (NCCR) [...] Read more.
JC polyomavirus (JCPyV) establishes a persistent, asymptomatic kidney infection in most of the population. However, JCPyV can reactivate in immunocompromised individuals and cause progressive multifocal leukoencephalopathy (PML), a fatal demyelinating disease with no approved treatment. Mutations in the hypervariable non-coding control region (NCCR) of the JCPyV genome have been linked to disease outcomes and neuropathogenesis, yet few metanalyses document these associations. Many online sequence entries, including those on NCBI databases, lack sufficient sample information, limiting large-scale analyses of NCCR sequences. Machine learning techniques, however, can augment available data for analysis. This study employs a previously compiled dataset of 989 JCPyV NCCR sequences from GenBank with associated patient PML status and viral tissue source to train multilayer perceptrons for predicting missing information within the dataset. The PML status and tissue source models were 100% and 87.8% accurate, respectively. Within the dataset, 348 samples had an unconfirmed PML status, where 259 were predicted as No PML and 89 as PML sequences. Of the 63 sequences with unconfirmed tissue sources, eight samples were predicted as urine, 13 as blood, and 42 as cerebrospinal fluid. These models can improve viral sequence identification and provide insights into viral mutations and pathogenesis. Full article
(This article belongs to the Special Issue JC Polyomavirus)
Show Figures

Figure 1

22 pages, 9016 KiB  
Article
Leveraging Transformer-Based OCR Model with Generative Data Augmentation for Engineering Document Recognition
by Wael Khallouli, Mohammad Shahab Uddin, Andres Sousa-Poza, Jiang Li and Samuel Kovacic
Electronics 2025, 14(1), 5; https://doi.org/10.3390/electronics14010005 - 24 Dec 2024
Abstract
The long-standing practice of document-based engineering has resulted in the accumulation of a large number of engineering documents across various industries. Engineering documents, such as 2D drawings, continue to play a significant role in exchanging information and sharing knowledge across multiple engineering processes. [...] Read more.
The long-standing practice of document-based engineering has resulted in the accumulation of a large number of engineering documents across various industries. Engineering documents, such as 2D drawings, continue to play a significant role in exchanging information and sharing knowledge across multiple engineering processes. However, these documents are often stored in non-digitized formats, such as paper and portable document format (PDF) files, making automation difficult. As digital engineering transforms processes in many industries, digitizing engineering documents presents a crucial challenge that requires advanced methods. This research addresses the problem of automatically extracting textual content from non-digitized legacy engineering documents. We introduced an optical character recognition (OCR) system for text detection and recognition that leverages transformer-based generative deep learning models and transfer learning approaches to enhance text recognition accuracy in engineering documents. The proposed system was evaluated on a dataset collected from ships’ engineering drawings provided by a U.S. agency. Experimental results demonstrated that the proposed transformer-based OCR model significantly outperformed pretrained off-the-shelf OCR models. Full article
Show Figures

Figure 1

10 pages, 995 KiB  
Article
The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study
by Eun Jeong Gong, Chang Seok Bang, Jae Jun Lee, Jonghyung Park, Eunsil Kim, Subeen Kim, Minjae Kimm and Seoung-Ho Choi
Bioengineering 2025, 12(1), 1; https://doi.org/10.3390/bioengineering12010001 - 24 Dec 2024
Abstract
Background: The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of [...] Read more.
Background: The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a customized GPT model and a conventional GPT-4o, an advanced LLM capable of retrieval-augmented generation (RAG). Method: We established a customized GPT with the BM25 algorithm using Open AI’s GPT-4o model, which allows it to produce responses in the context of specific documents including textbooks of internal medicine (in English) and gastroenterology (in Korean). Also, we prepared a conventional ChatGPT 4o (accessed on 16 October 2024) access. The benchmark (written in Korean) consisted of 15 clinical questions developed by four clinical experts, representing typical questions for medical students. The two LLMs, a gastroenterology fellow, and an expert gastroenterologist were tested to assess their performance. Results: While the customized LLM correctly answered 8 out of 15 questions, the fellow answered 10 correctly. When the standardized Korean medical terms were replaced with English terminology, the LLM’s performance improved, answering two additional knowledge-based questions correctly, matching the fellow’s score. However, judgment-based questions remained a challenge for the model. Even with the implementation of ‘Chain of Thought’ prompt engineering, the customized GPT did not achieve improved reasoning. Conventional GPT-4o achieved the highest score among the AI models (14/15). Although both models performed slightly below the expert gastroenterologist’s level (15/15), they show promising potential for clinical applications (scores comparable with or higher than that of the gastroenterology fellow). Conclusions: LLMs could be utilized to assist with specialized tasks such as patient counseling. However, RAG capabilities by enabling real-time retrieval of external data not included in the training dataset, appear essential for managing complex, specialized content, and clinician oversight will remain crucial to ensure safe and effective use in clinical practice. Full article
(This article belongs to the Special Issue New Technique for Endoscopic Diagnosis in in Biomedical Engineering)
Show Figures

Figure 1

21 pages, 51554 KiB  
Article
Airborne LiDAR Applications at the Medieval Site of Castel Fenuculus in the Lower Valley of the Calore River (Benevento, Southern Italy)
by Antonio Corbo
Land 2024, 13(12), 2255; https://doi.org/10.3390/land13122255 - 23 Dec 2024
Abstract
This paper explores the application of Airborne Laser Scanning (ALS) technology in the investigation of the medieval Norman site of Castel Fenuculus, in the lower Calore Valley, Southern Italy. This research aims to assess the actual potential of the ALS dataset provided by [...] Read more.
This paper explores the application of Airborne Laser Scanning (ALS) technology in the investigation of the medieval Norman site of Castel Fenuculus, in the lower Calore Valley, Southern Italy. This research aims to assess the actual potential of the ALS dataset provided by the Italian Ministry of the Environment (MATTM) for the detection and visibility of archaeological features in a difficult environment characterised by dense vegetation and morphologically complex terrain. The study focuses on improving the detection and interpretation of archaeological features through a systematic approach that includes the acquisition of ALS point clouds, the implementation of classification algorithms, and the removal of vegetation layers to reveal the underlying terrain and ruined structures. Furthermore, the aim was to test different classification and filtering techniques to identify the best one to use in complex contexts, with the intention of providing a comprehensive and replicable methodological framework. Finally, the Digital Elevation Model (DTM), and various LiDAR-derived models (LDMs), were generated to visualise and highlight topographical features potentially related to archaeological remains. The results obtained demonstrate the significant potential of LiDAR in identifying and documenting archaeological features in densely vegetated and wooded landscapes. Full article
(This article belongs to the Section Landscape Archaeology)
Show Figures

Figure 1

69 pages, 16833 KiB  
Article
Contributions to the Inocybe umbratica–paludinella (Agaricales) Group in China: Taxonomy, Species Diversity, and Molecular Phylogeny
by Xin Chen, Wen-Jie Yu, Tolgor Bau, P. Brandon Matheny, Egon Horak, Yu Liu, Li-Wu Qin, Li-Ping Tang, Yu-Peng Ge, Tie-Zhi Liu and Yu-Guang Fan
J. Fungi 2024, 10(12), 893; https://doi.org/10.3390/jof10120893 - 23 Dec 2024
Abstract
Inocybe is the largest genus in the family Inocybaceae, with approximately 1000 species worldwide. Basic data on the species diversity, geographic distribution, and the infrageneric framework of Inocybe are still incomplete because of the intricate nature of this genus, which includes numerous [...] Read more.
Inocybe is the largest genus in the family Inocybaceae, with approximately 1000 species worldwide. Basic data on the species diversity, geographic distribution, and the infrageneric framework of Inocybe are still incomplete because of the intricate nature of this genus, which includes numerous unrecognized taxa that exist around the world. A multigene phylogeny of the I. umbratica–paludinella group, initially designated as the “I. angustifolia subgroup”, was conducted using the ITS-28S-rpb2 nucleotide datasets. The seven species, I. alabamensis, I. angustifolia, I. argenteolutea, I. olivaceonigra, I. paludinella, I. subangustifolia, and I. umbratica, were confirmed as members of this species group. At the genus level, the I. umbratica–paludinella group is a sister to the lineage of the unifying I. castanea and an undescribed species. Inocybe sect. Umbraticae sect. nov. was proposed to accommodate species in the I. umbratica–paludinella group and the I. castanea lineage. This section now comprises eight documented species and nine new species from China, as described in this paper. Additionally, new geographical distributions of I. angustifolia and I. castanea in China are reported. The nine new species and I. angustifolia, I. castanea, I. olivaceonigra, and I. umbratica are described in detail and illustrated herein with color plates based on Chinese materials. A global key to 17 species in the section Umbraticae is provided. The results of the current study provide a more detailed basis for the accurate identification of species in the I. umbratica-paludinella group and a better understanding of their phylogenetic placement. Full article
(This article belongs to the Section Fungal Evolution, Biodiversity and Systematics)
Show Figures

Figure 1

17 pages, 471 KiB  
Article
Incorporating Global Information for Aspect Category Sentiment Analysis
by Heng Wang, Chen Wang, Chunsheng Li and Changxing Wu
Electronics 2024, 13(24), 5020; https://doi.org/10.3390/electronics13245020 - 20 Dec 2024
Viewed by 275
Abstract
Aspect category sentiment analysis aims to automatically identify the sentiment polarities of aspect categories mentioned in text, and is widely used in the data analysis of product reviews and social media. Most existing studies typically limit themselves to utilizing sentence-level local information, thereby [...] Read more.
Aspect category sentiment analysis aims to automatically identify the sentiment polarities of aspect categories mentioned in text, and is widely used in the data analysis of product reviews and social media. Most existing studies typically limit themselves to utilizing sentence-level local information, thereby failing to fully exploit the potential of document-level and corpus-level global information. To address these limitations, we propose a model that integrates global information for aspect category sentiment analysis, aiming to leverage sentence-level, document-level, and corpus-level information simultaneously. Specifically, based on sentences and their corresponding aspect categories, a graph neural network is initially built to capture document-level information, including sentiment consistency within the same category and sentiment similarity between different categories in a review. We subsequently employ a memory network to retain corpus-level information, where the representations of training instances serve as keys and their associated labels as values. Additionally, a k-nearest neighbor retrieval mechanism is used to find training instances relevant to a given input. Experimental results on four commonly used datasets from SemEval 2015 and 2016 demonstrate the effectiveness of our model. The in-depth experimental analysis reveals that incorporating document-level information substantially improves the accuracies of the two primary ‘positive’ and ‘negative’ categories, while the inclusion of corpus-level information is especially advantageous for identifying the less frequently occurring ‘neutral’ category. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

23 pages, 4893 KiB  
Article
Enhancing Software Effort Estimation with Pre-Trained Word Embeddings: A Small-Dataset Solution for Accurate Story Point Prediction
by Issa Atoum and Ahmed Ali Otoom
Electronics 2024, 13(23), 4843; https://doi.org/10.3390/electronics13234843 - 8 Dec 2024
Viewed by 728
Abstract
Traditional software effort estimation methods, such as term frequency–inverse document frequency (TF-IDF), are widely used due to their simplicity and interpretability. However, they struggle with limited datasets, fail to capture intricate semantics, and suffer from dimensionality, sparsity, and computational inefficiency. This study used [...] Read more.
Traditional software effort estimation methods, such as term frequency–inverse document frequency (TF-IDF), are widely used due to their simplicity and interpretability. However, they struggle with limited datasets, fail to capture intricate semantics, and suffer from dimensionality, sparsity, and computational inefficiency. This study used pre-trained word embeddings, including FastText and GPT-2, to improve estimation accuracy in such cases. Seven pre-trained models were evaluated for their ability to effectively represent textual data, addressing the fundamental limitations of TF-IDF through contextualized embeddings. The results show that combining FastText embeddings with support vector machines (SVMs) consistently outperforms traditional approaches, reducing the mean absolute error (MAE) by 5–18% while achieving accuracy comparable to deep learning models like GPT-2. This approach demonstrated the adaptability of pre-trained embeddings for small datasets, balancing semantic richness with computational efficiency. The proposed method optimized project planning and resource allocation while enhancing software development through accurate story point prediction while safeguarding privacy and security through data anonymization. Future research will explore task-specific embeddings tailored to software engineering domains and investigate how dataset characteristics, such as cultural variations, influence model performance, ensuring the development of adaptable, robust, and secure machine learning models for diverse contexts. Full article
Show Figures

Figure 1

21 pages, 7023 KiB  
Article
Multi-Scale Network Analysis of Community-Based Senior Centers: Exploring the Intersection of Spatial Embeddedness and Accessibility in Nanjing, China
by Zhixin Xu and Xiaoming Li
Buildings 2024, 14(12), 3922; https://doi.org/10.3390/buildings14123922 - 8 Dec 2024
Viewed by 490
Abstract
As critical infrastructure of age-friendly cities, senior centers are designed to be embedded in communities and facilitate service accessibility for older adults. However, their underutilization is widely documented, suggesting a need to reassess their effectiveness. Existing studies often analyze the issue focusing on [...] Read more.
As critical infrastructure of age-friendly cities, senior centers are designed to be embedded in communities and facilitate service accessibility for older adults. However, their underutilization is widely documented, suggesting a need to reassess their effectiveness. Existing studies often analyze the issue focusing on socio-demographic factors, overlooking the spatial contexts in which senior centers are embedded and their impacts on older people’s access. This study aims to address the research gap by investigating the spatial embeddedness of senior centers using Space Syntax methods and examining its influence on older people’s access patterns. Using a geo-behavioral dataset collected in Nanjing, China, we find that about 70% of the senior centers in the research area are embedded in highly localized settings with limited connections to global street networks, which significantly restricts the access of older people from wider spatial contexts. This spatial segregation may force senior centers to incur higher costs to attract users, thereby reducing the effectiveness of community-based services. This study introduces a novel spatial perspective to evaluate community-based services, highlighting the critical influence of the spatial context on service accessibility. The findings provide valuable empirical insights for policymakers and planners striving to create age-friendly cities and communities. Full article
(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)
Show Figures

Figure 1

22 pages, 1599 KiB  
Article
Single-Stage Entity–Relation Joint Extraction of Pesticide Registration Information Based on HT-BES Multi-Dimensional Labeling Strategy
by Chenyang Dong, Shiyu Xi, Yinchao Che, Shufeng Xiong, Xinming Ma, Lei Xi and Shuping Xiong
Algorithms 2024, 17(12), 559; https://doi.org/10.3390/a17120559 - 6 Dec 2024
Viewed by 319
Abstract
Pesticide registration information is an essential part of the pesticide knowledge base. However, the large amount of unstructured text data that it contains pose significant challenges for knowledge storage, retrieval, and utilization. To address the characteristics of pesticide registration text such as high [...] Read more.
Pesticide registration information is an essential part of the pesticide knowledge base. However, the large amount of unstructured text data that it contains pose significant challenges for knowledge storage, retrieval, and utilization. To address the characteristics of pesticide registration text such as high information density, complex logical structures, large spans between entities, and heterogeneous entity lengths, as well as to overcome the challenges faced when using traditional joint extraction methods, including triplet overlap, exposure bias, and redundant computation, we propose a single-stage entity–relation joint extraction model based on HT-BES multi-dimensional labeling (MD-SERel). First, in the encoding layer, to address the complex structural characteristics of pesticide registration texts, we employ RoBERTa combined with a multi-head self-attention mechanism to capture the deep semantic features of the text. Simultaneously, syntactic features are extracted using a syntactic dependency tree and graph neural networks to enhance the model’s understanding of text structure. Subsequently, we integrate semantic and syntactic features, enriching the character vector representations and thus improving the model’s ability to represent complex textual data. Secondly, in the multi-dimensional labeling framework layer, we use HT-BES multi-dimensional labeling, where the model assigns multiple labels to each character. These labels include entity boundaries, positions, and head–tail entity association information, which naturally resolves overlapping triplets. Through utilizing a parallel scoring function and fine-grained classification components, the joint extraction of entities and relations is transformed into a multi-label sequence labeling task based on relation dimensions. This process does not involve interdependent steps, thus enabling single-stage parallel labeling, preventing exposure bias and reducing computational redundancy. Finally, in the decoding layer, entity–relation triplets are decoded based on the predicted labels from the fine-grained classification. The experimental results demonstrate that the MD-SERel model performs well on both the Pesticide Registration Dataset (PRD) and the general DuIE dataset. On the PRD, compared to the optimal baseline model, the training time is 1.2 times faster, the inference time is 1.2 times faster, and the F1 score is improved by 1.5%, demonstrating its knowledge extraction capabilities in pesticide registration documents. On the DuIE dataset, the MD-SERel model also achieved better results compared to the baseline, demonstrating its strong generalization ability. These findings will provide technical support for the construction of pesticide knowledge bases. Full article
(This article belongs to the Special Issue Algorithms for Feature Selection (3rd Edition))
Show Figures

Figure 1

24 pages, 47033 KiB  
Article
Hybrid Denoising Algorithm for Architectural Point Clouds Acquired with SLAM Systems
by Antonella Ambrosino, Alessandro Di Benedetto and Margherita Fiani
Remote Sens. 2024, 16(23), 4559; https://doi.org/10.3390/rs16234559 - 5 Dec 2024
Viewed by 502
Abstract
The sudden development of systems capable of rapidly acquiring dense point clouds has underscored the importance of data processing and pre-processing prior to modeling. This work presents the implementation of a denoising algorithm for point clouds acquired with LiDAR SLAM systems, aimed at [...] Read more.
The sudden development of systems capable of rapidly acquiring dense point clouds has underscored the importance of data processing and pre-processing prior to modeling. This work presents the implementation of a denoising algorithm for point clouds acquired with LiDAR SLAM systems, aimed at optimizing data processing and the reconstruction of surveyed object geometries for graphical rendering and modeling. Implemented in a MATLAB environment, the algorithm utilizes an approximate modeling of a reference surface with Poisson’s model and a statistical analysis of the distances between the original point cloud and the reconstructed surface. Tested on point clouds from historically significant buildings with complex geometries scanned with three different SLAM systems, the results demonstrate a satisfactory reduction in point density to approximately one third of the original. The filtering process effectively removed about 50% of the points while preserving essential details, facilitating improved restitution and modeling of architectural and structural elements. This approach serves as a valuable tool for noise removal in SLAM-derived datasets, enhancing the accuracy of architectural surveying and heritage documentation. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Graphical abstract

33 pages, 1325 KiB  
Article
A Centrality-Weighted Bidirectional Encoder Representation from Transformers Model for Enhanced Sequence Labeling in Key Phrase Extraction from Scientific Texts
by Tsitsi Zengeya, Jean Vincent Fonou Dombeu and Mandlenkosi Gwetu
Big Data Cogn. Comput. 2024, 8(12), 182; https://doi.org/10.3390/bdcc8120182 - 4 Dec 2024
Viewed by 523
Abstract
Deep learning approaches, utilizing Bidirectional Encoder Representation from Transformers (BERT) and advanced fine-tuning techniques, have achieved state-of-the-art accuracies in the domain of term extraction from texts. However, BERT presents some limitations in that it primarily captures the semantic context relative to the surrounding [...] Read more.
Deep learning approaches, utilizing Bidirectional Encoder Representation from Transformers (BERT) and advanced fine-tuning techniques, have achieved state-of-the-art accuracies in the domain of term extraction from texts. However, BERT presents some limitations in that it primarily captures the semantic context relative to the surrounding text without considering how relevant or central a token is to the overall document content. There has also been research on the application of sequence labeling on contextualized embeddings; however, the existing methods often rely solely on local context for extracting key phrases from texts. To address these limitations, this study proposes a centrality-weighted BERT model for key phrase extraction from text using sequence labelling (CenBERT-SEQ). The proposed CenBERT-SEQ model utilizes BERT to represent terms with various contextual embedding architectures, and introduces a centrality-weighting layer that integrates document-level context into BERT. This layer leverages document embeddings to influence the importance of each term based on its relevance to the entire document. Finally, a linear classifier layer is employed to model the dependencies between the outputs, thereby enhancing the accuracy of the CenBERT-SEQ model. The proposed CenBERT-SEQ model was evaluated against the standard BERT base-uncased model using three Computer Science article datasets, namely, SemEval-2010, WWW, and KDD. The experimental results show that, although the CenBERT-SEQ and BERT-base models achieved higher and close comparable accuracy, the proposed CenBERT-SEQ model achieved higher precision, recall, and F1-score than the BERT-base model. Furthermore, a comparison of the proposed CenBERT-SEQ model to that of related studies revealed that the proposed CenBERT-SEQ model achieved a higher accuracy, precision, recall, and F1-score of 95%, 97%, 91%, and 94%, respectively, than related studies, showing the superior capabilities of the CenBERT-SEQ model in keyphrase extraction from scientific documents. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

23 pages, 4318 KiB  
Article
Incident Analysis in Micromobility Spaces at Metro Stations: A Case Study in Valparaíso, Chile
by Sebastian Seriani, Vicente Aprigliano, Catalina Toro, Gonzalo Rojas, Felipe Gonzalez, Alvaro Peña and Kamalasudhan Achuthan
Sustainability 2024, 16(23), 10483; https://doi.org/10.3390/su162310483 - 29 Nov 2024
Viewed by 508
Abstract
This study analyzes passenger incidents in metro stations and their relationship with safety in Valparaiso, Chile. The primary aim is to examine how factors such as station design, passenger flow, and weather conditions influence the frequency and types of incidents in various micromobility [...] Read more.
This study analyzes passenger incidents in metro stations and their relationship with safety in Valparaiso, Chile. The primary aim is to examine how factors such as station design, passenger flow, and weather conditions influence the frequency and types of incidents in various micromobility spaces within metro stations. A comprehensive data analysis was conducted using records from the Valparaiso Metro between 2022 and 2023. During this period, approximately 500 incidents were documented, providing a substantial dataset for identifying incident patterns and correlations with contributing factors. The analysis revealed that incidents are significantly influenced by peak-hour conditions and weekdays. The platform–train interface emerged as the most complex space for incident occurrences. Specifically, the study found that crowded conditions inside trains during morning and evening rush hours contribute substantially to incidents. In other station spaces, incidents were closely linked to the station type and the presence of stair access. Conversely, stations designed with more accessible features appeared to have fewer incidents. Future studies will expand on this framework by incorporating additional factors and analyzing new data to develop a more comprehensive understanding of incident dynamics. Full article
Show Figures

Figure 1

8 pages, 2726 KiB  
Communication
The Regensburg Dental Trauma Registry: Methodical Framework for the Systematic Collection of Dentoalveolar Trauma Data
by Matthias Widbiller, Gunnar Huppertz, Karolina Müller, Michael Koller, Torsten E. Reichert, Wolfgang Buchalla and Martyna Smeda
J. Clin. Med. 2024, 13(23), 7196; https://doi.org/10.3390/jcm13237196 - 27 Nov 2024
Viewed by 369
Abstract
Objectives: Traumatic dental injuries (TDIs) are common, particularly in children and adolescents, and require timely, well-documented treatment for optimal long-term functional and esthetic outcomes. Despite their prevalence, comprehensive data on TDI remain limited. The Regensburg Dental Trauma Registry (RDTR) was established to enable [...] Read more.
Objectives: Traumatic dental injuries (TDIs) are common, particularly in children and adolescents, and require timely, well-documented treatment for optimal long-term functional and esthetic outcomes. Despite their prevalence, comprehensive data on TDI remain limited. The Regensburg Dental Trauma Registry (RDTR) was established to enable structured data collection, documentation and analysis of dentoalveolar trauma cases to improve both research and clinical practice. Methods: The RDTR was developed at the Centre for Dental Traumatology at the University Hospital Regensburg as part of a multi-stage implementation process, which involved creating clinical infrastructure, establishing treatment protocols, providing continuous clinician training, and designing a standardized documentation form to capture essential data, including patient demographics, accident details, clinical assessments, and initial treatment. Data are transferred into a REDCap electronic case report form (eCRF), which is hosted on secure university servers, ensuring efficient administration, controlled access and high data integrity. Quality assurance measures, including automated and manual data checks and regular treatment protocol updates, maintain high data accuracy and consistency. Results: This initial methodological report outlines the systematic approach of the RDTR and its potential to generate large datasets. These will enable in-depth analyses of injury patterns, treatment effectiveness, risk factors, and more. Future expansion includes collaboration with additional university hospitals to broaden the dataset and support multi-center approaches. Conclusions: The RDTR offers a framework for consistent data collection and quality control, laying the foundation for comprehensive analyses that contribute to the development of preventive strategies and treatment protocols. Full article
(This article belongs to the Special Issue Current Advances in Endodontics and Dental Traumatology)
Show Figures

Figure 1

18 pages, 957 KiB  
Article
Layered Query Retrieval: An Adaptive Framework for Retrieval-Augmented Generation in Complex Question Answering for Large Language Models
by Jie Huang, Mo Wang, Yunpeng Cui, Juan Liu, Li Chen, Ting Wang, Huan Li and Jinming Wu
Appl. Sci. 2024, 14(23), 11014; https://doi.org/10.3390/app142311014 - 27 Nov 2024
Viewed by 749
Abstract
Retrieval-augmented generation (RAG) addresses the problem of knowledge cutoff and overcomes the inherent limitations of pre-trained language models by retrieving relevant information in real time. However, challenges related to efficiency and accuracy persist in current RAG strategies. A key issue is how to [...] Read more.
Retrieval-augmented generation (RAG) addresses the problem of knowledge cutoff and overcomes the inherent limitations of pre-trained language models by retrieving relevant information in real time. However, challenges related to efficiency and accuracy persist in current RAG strategies. A key issue is how to select appropriate methods for user queries of varying complexity dynamically. This study introduces a novel adaptive retrieval-augmented generation framework termed Layered Query Retrieval (LQR). The LQR framework focuses on query complexity classification, retrieval strategies, and relevance analysis, utilizing a custom-built training dataset to train smaller models that aid the large language model (LLM) in efficiently retrieving relevant information. A central technique in LQR is a semantic rule-based approach to distinguish between different levels of multi-hop queries. The process begins by parsing the user’s query for keywords, followed by a keyword-based document retrieval. Subsequently, we employ a natural language inference (NLI) model to assess whether the retrieved document is relevant to the query. We validated our approach on multiple single-hop and multi-hop datasets, demonstrating significant improvements in both accuracy and efficiency compared to existing single-step, multi-step, and adaptive methods. Our method exhibits high accuracy and efficiency, particularly on the HotpotQA dataset, where it outperforms the Adaptive-RAG method by improving accuracy by 9.4% and the F1 score by 16.14%. The proposed approach carefully balances retrieval efficiency with the accuracy of the LLM’s responses. Full article
Show Figures

Figure 1

Back to TopTop