Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (988)

Search Parameters:
Keywords = audio data

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 5826 KiB  
Article
Novel Method for Detecting Coughing Pigs with Audio-Visual Multimodality for Smart Agriculture Monitoring
by Heechan Chae, Junhee Lee, Jonggwan Kim, Sejun Lee, Jonguk Lee, Yongwha Chung and Daihee Park
Sensors 2024, 24(22), 7232; https://doi.org/10.3390/s24227232 - 12 Nov 2024
Viewed by 276
Abstract
While the pig industry is crucial in global meat consumption, accounting for 34% of total consumption, respiratory diseases in pigs can cause substantial economic losses to pig farms. To alleviate this issue, we propose an advanced audio-visual monitoring system for the early detection [...] Read more.
While the pig industry is crucial in global meat consumption, accounting for 34% of total consumption, respiratory diseases in pigs can cause substantial economic losses to pig farms. To alleviate this issue, we propose an advanced audio-visual monitoring system for the early detection of coughing, a key symptom of respiratory diseases in pigs, that will enhance disease management and animal welfare. The proposed system is structured into three key modules: the cough sound detection (CSD) module, which detects coughing sounds using audio data; the pig object detection (POD) module, which identifies individual pigs in video footage; and the coughing pig detection (CPD) module, which pinpoints which pigs are coughing among the detected pigs. These modules, using a multimodal approach, detect coughs from continuous audio streams amidst background noise and accurately pinpoint specific pens or individual pigs as the source. This method enables continuous 24/7 monitoring, leading to efficient action and reduced human labor stress. It achieved a substantial detection accuracy of 0.95 on practical data, validating its feasibility and applicability. The potential to enhance farm management and animal welfare is shown through proposed early disease detection. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

13 pages, 1449 KiB  
Article
Evaluating the User Experience and Usability of the MINI Robot for Elderly Adults with Mild Dementia and Mild Cognitive Impairment: Insights and Recommendations
by Aysan Mahmoudi Asl, Jose Miguel Toribio-Guzmán, Álvaro Castro-González, María Malfaz, Miguel A. Salichs and Manuel Franco Martín
Sensors 2024, 24(22), 7180; https://doi.org/10.3390/s24227180 - 8 Nov 2024
Viewed by 309
Abstract
Introduction: In recent years, the integration of robotic systems into various aspects of daily life has become increasingly common. As these technologies continue to advance, ensuring user-friendly interfaces and seamless interactions becomes more essential. For social robots to genuinely provide lasting value [...] Read more.
Introduction: In recent years, the integration of robotic systems into various aspects of daily life has become increasingly common. As these technologies continue to advance, ensuring user-friendly interfaces and seamless interactions becomes more essential. For social robots to genuinely provide lasting value to humans, a favourable user experience (UX) emerges as an essential prerequisite. This article aimed to evaluate the usability of the MINI robot, highlighting its strengths and areas for improvement based on user feedback and performance. Materials and Methods: In a controlled lab setting, a mixed-method qualitative study was conducted with ten individuals aged 65 and above diagnosed with mild dementia (MD) and mild cognitive impairment (MCI). Participants engaged in individual MINI robot interaction sessions, completing cognitive tasks as per written instructions. Video and audio recordings documented interactions, while post-session System Usability Scale (SUS) questionnaires quantified usability perception. Ethical guidelines were followed, ensuring informed consent, and the data underwent qualitative and quantitative analyses, contributing insights into the MINI robot’s usability for this demographic. Results: The study addresses the ongoing challenges that tasks present, especially for MD individuals, emphasizing the importance of user support. Most tasks require both verbal and physical interactions, indicating that MD individuals face challenges when switching response methods within subtasks. These complexities originate from the selection and use of response methods, including difficulties with voice recognition, tablet touch, and tactile sensors. These challenges persist across tasks, with individuals with MD struggling to comprehend task instructions and provide correct answers and individuals with MCI struggling to use response devices, often due to the limitations of the robot’s speech recognition. Technical shortcomings have been identified. The results of the SUS indicate positive perceptions, although there are lower ratings for instructor assistance and pre-use learning. The average SUS score of 68.3 places device usability in the “good” category. Conclusions: Our study examines the usability of the MINI robot, revealing strengths in quick learning, simple system and operation, and integration of features, while also highlighting areas for improvement. Careful design and modifications are essential for meaningful engagement with people with dementia. The robot could better benefit people with MD and MCI if clear, detailed instructions and instructor assistance were available. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

17 pages, 4004 KiB  
Article
Designing a Tactile Document UI for 2D Refreshable Tactile Displays: Towards Accessible Document Layouts for Blind People
by Sara Alzalabny, Omar Moured, Karin Müller, Thorsten Schwarz, Bastian Rapp and Rainer Stiefelhagen
Multimodal Technol. Interact. 2024, 8(11), 102; https://doi.org/10.3390/mti8110102 - 8 Nov 2024
Viewed by 398
Abstract
Understanding document layouts is vital for enhancing document exploration and information retrieval for sighted individuals. However, for blind and visually impaired people, it becomes challenging to have access to layout information using typical assistive technologies such as screen readers. In this paper, we [...] Read more.
Understanding document layouts is vital for enhancing document exploration and information retrieval for sighted individuals. However, for blind and visually impaired people, it becomes challenging to have access to layout information using typical assistive technologies such as screen readers. In this paper, we examine the potential benefits of presenting documents on two-dimensional (2D) refreshable tactile displays. These displays enable the tactile perception of 2D data, offering the advantage of dynamic and interactive functionality. Despite their potential, the development of user interfaces (UIs) for such displays has not advanced significantly. Thus, we propose a design of an intelligent tactile user interface (TUI), incorporating touch and audio feedback to represent documents in a tactile format. Our exploratory study for evaluating this approach revealed satisfaction from participants with the experience of directly viewing documents in their true form, rather than relying on screen-reading interpretations. Additionally, participants offered recommendations for incorporating additional features and refining the approach in future iterations. To facilitate further research and development, we have made our dataset and models publicly available. Full article
Show Figures

Graphical abstract

21 pages, 6346 KiB  
Article
Novel Steganographic Method Based on Hermitian Positive Definite Matrix and Weighted Moore–Penrose Inverses
by Selver Pepić, Muzafer Saračević, Aybeyan Selim, Darjan Karabašević, Marija Mojsilović, Amor Hasić and Pavle Brzaković
Appl. Sci. 2024, 14(22), 10174; https://doi.org/10.3390/app142210174 - 6 Nov 2024
Viewed by 343
Abstract
In this paper, we describe the concept of a new data-hiding technique for steganography in RGB images where a secret message is embedded in the blue layer of specific bytes. For increasing security, bytes are chosen randomly using a random square Hermitian positive [...] Read more.
In this paper, we describe the concept of a new data-hiding technique for steganography in RGB images where a secret message is embedded in the blue layer of specific bytes. For increasing security, bytes are chosen randomly using a random square Hermitian positive definite matrix, which is a stego-key. The proposed solution represents a very strong key since the number of variants of positive definite matrices of order 8 is huge. Implementing the proposed steganographic method consists of splitting a color image into its R, G, and B channels and implementing two segments, which take place in several phases. The first segment refers to embedding a secret message in the carrier (image or text) based on the unique absolute elements values of the Hermitian positive definite matrix. The second segment refers to extracting a hidden message based on a stego-key generated based on the Hermitian positive definite matrix elements. The objective of the data-hiding technique using a Hermitian positive definite matrix is to embed confidential or sensitive data within cover media (such as images, audio, or video) securely and imperceptibly; by doing so, the hidden data remain confidential and tamper-resistant while the cover media’s visual or auditory quality is maintained. Full article
Show Figures

Figure 1

19 pages, 2763 KiB  
Article
DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
by Seohyun Kim and Kyogu Lee
Appl. Sci. 2024, 14(22), 10116; https://doi.org/10.3390/app142210116 - 5 Nov 2024
Viewed by 373
Abstract
In recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still [...] Read more.
In recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label datasets or datasets that contain multiple tags for videos. To address this gap, this paper explores the feasibility of generating dance captions from tags using a pseudo-captioning approach, inspired by the significant improvements large language models (LLMs) have shown in other domains. Various tags are generated from features extracted from videos and audio, and LLMs are then instructed to produce dance captions based on these tags. Captions were generated using both the open dance dataset and Internet dance videos, followed by user evaluations of randomly sampled captions. Participants found the captions effective in describing dance movements, of expert quality, and consistent with video content. Additionally, positive feedback was received on the evaluation of the gap in image extraction and the inclusion of tag data. This paper introduces and validates a novel pseudo-captioning method for generating dance captions using predefined tags, contributing to the expansion of data available for dance research and offering a practical solution to the current lack of datasets in this field. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 2039 KiB  
Article
Overcoming the Challenges of Including Learners with Visual Impairments Through Teacher Collaborations
by Manis Maesala and Ferreira Ronél
Educ. Sci. 2024, 14(11), 1217; https://doi.org/10.3390/educsci14111217 - 4 Nov 2024
Viewed by 491
Abstract
In this article we report on a study undertaken with 255 teachers working with learners with visual impairments. The focus of our discussion is teachers’ implementation of inclusive education policies with learners with visual impairments in full-service schools in South Africa. We foreground [...] Read more.
In this article we report on a study undertaken with 255 teachers working with learners with visual impairments. The focus of our discussion is teachers’ implementation of inclusive education policies with learners with visual impairments in full-service schools in South Africa. We foreground the ways in which the teacher participants relied on teacher collaborations to overcome some of the challenges they faced as a result of limited resource provisions in schools in this country. We implemented an instrumental case study design and followed the approach of participatory reflection and action (PRA). The sample included teachers (n = 255) from seven full-service and ten special schools from five provinces in South Africa. In addition, 50 expert stakeholders who work in the field of visual impairment were involved. For data generation and documentation, we utilised PRA-based workshops, the observation-as-context-of-interaction method, audio-visual techniques, field notes, and reflective journals. The findings of our research confirm that full-service schools face distinct challenges regarding limited resources as well as teachers that are inexperienced to accommodate learners with visual impairments. Even though the teachers in our study were initially reluctant to implement inclusive education practices, their collaboration with fellow teachers and other informed stakeholders enabled them to address some of the challenges they experienced and implement inclusive practices. They subsequently formed a team and learnt from one another to facilitate positive changes through the implementation of inclusive practices, thereby following a socio-ecological approach to inclusive practices in full-service schools in South Africa. Full article
(This article belongs to the Special Issue Cultivating Inclusive Classrooms: Practices in Special Education)
Show Figures

Figure 1

20 pages, 2918 KiB  
Article
A Text Generation Method Based on a Multimodal Knowledge Graph for Fault Diagnosis of Consumer Electronics
by Yuezhong Wu, Yuxuan Sun, Lingjiao Chen, Xuanang Zhang and Qiang Liu
Appl. Sci. 2024, 14(21), 10068; https://doi.org/10.3390/app142110068 - 4 Nov 2024
Viewed by 632
Abstract
As consumer electronics evolve towards greater intelligence, their automation and complexity also increase, making it difficult for users to diagnose faults when they occur. To address the problem where users, relying solely on their own knowledge, struggle to diagnose faults in consumer electronics [...] Read more.
As consumer electronics evolve towards greater intelligence, their automation and complexity also increase, making it difficult for users to diagnose faults when they occur. To address the problem where users, relying solely on their own knowledge, struggle to diagnose faults in consumer electronics promptly and accurately, we propose a multimodal knowledge graph-based text generation method. Our method begins by using deep learning models like the Residual Network (ResNet) and Bidirectional Encoder Representations from Transformers (BERT) to extract features from user-provided fault information, which can include images, text, audio, and even olfactory data. These multimodal features are then combined to form a comprehensive representation. The fused features are fed into a graph convolutional network (GCN) for fault inference, identifying potential fault nodes in the electronics. These fault nodes are subsequently fed into a pre-constructed knowledge graph to determine the final diagnosis. Finally, this information is processed through the Bias-term Fine-tuning (BitFit) enhanced Chinese Pre-trained Transformer (CPT) model, which generates the final fault diagnosis text for the user. The experimental results show that our proposed method achieves a 4.4% improvement over baseline methods, reaching a fault diagnosis accuracy of 98.4%. Our approach effectively leverages multimodal fault information, addressing the challenges users face in diagnosing faults through the integration of graph convolutional network and knowledge graph technologies. Full article
(This article belongs to the Special Issue State-of-the-Art of Knowledge Graphs and Their Applications)
Show Figures

Figure 1

13 pages, 2404 KiB  
Article
Automated Cough Analysis with Convolutional Recurrent Neural Network
by Yiping Wang, Mustafaa Wahab, Tianqi Hong, Kyle Molinari, Gail M. Gauvreau, Ruth P. Cusack, Zhen Gao, Imran Satia and Qiyin Fang
Bioengineering 2024, 11(11), 1105; https://doi.org/10.3390/bioengineering11111105 - 1 Nov 2024
Viewed by 526
Abstract
Chronic cough is associated with several respiratory diseases and is a significant burden on physical, social, and psychological health. Non-invasive, real-time, continuous, and quantitative monitoring tools are highly desired to assess cough severity, the effectiveness of treatment, and monitor disease progression in clinical [...] Read more.
Chronic cough is associated with several respiratory diseases and is a significant burden on physical, social, and psychological health. Non-invasive, real-time, continuous, and quantitative monitoring tools are highly desired to assess cough severity, the effectiveness of treatment, and monitor disease progression in clinical practice and research. There are currently limited tools to quantitatively measure spontaneous coughs in daily living settings in clinical trials and in clinical practice. In this study, we developed a machine learning model for the detection and classification of cough sounds. Mel spectrograms are utilized as a key feature representation to capture the temporal and spectral characteristics of coughs. We applied this approach to automate cough analysis using 300 h of audio recordings from cough challenge clinical studies conducted in a clinical lab setting. A number of machine learning algorithms were studied and compared, including decision tree, support vector machine, k-nearest neighbors, logistic regression, random forest, and neural network. We identified that for this dataset, the CRNN approach is the most effective method, reaching 98% accuracy in identifying individual coughs from the audio data. These findings provide insights into the strengths and limitations of various algorithms, highlighting the potential of CRNNs in analyzing complex cough patterns. This research demonstrates the potential of neural network models in fully automated cough monitoring. The approach requires validation in detecting spontaneous coughs in patients with refractory chronic cough in a real-life setting. Full article
Show Figures

Figure 1

16 pages, 3506 KiB  
Article
HADNet: A Novel Lightweight Approach for Abnormal Sound Detection on Highway Based on 1D Convolutional Neural Network and Multi-Head Self-Attention Mechanism
by Cong Liang, Qian Chen, Qiran Li, Qingnan Wang, Kang Zhao, Jihui Tu and Ammar Jafaripournimchahi
Electronics 2024, 13(21), 4229; https://doi.org/10.3390/electronics13214229 - 28 Oct 2024
Viewed by 471
Abstract
Video surveillance is an effective tool for traffic management and safety, but it may face challenges in extreme weather, low visibility, areas outside the monitoring field of view, or during nighttime conditions. Therefore, abnormal sound detection is used in traffic management and safety [...] Read more.
Video surveillance is an effective tool for traffic management and safety, but it may face challenges in extreme weather, low visibility, areas outside the monitoring field of view, or during nighttime conditions. Therefore, abnormal sound detection is used in traffic management and safety as an auxiliary tool to complement video surveillance. In this paper, a novel lightweight method for abnormal sound detection based on 1D CNN and Multi-Head Self-Attention Mechanism on the embedded system is proposed, which is named HADNet. First, 1D CNN is employed for local feature extraction, which minimizes information loss from the audio signal during time-frequency conversion and reduces computational complexity. Second, the proposed block based on Multi-Head Self-Attention Mechanism not only effectively mitigates the issue of disappearing gradients, but also enhances detection accuracy. Finally, the joint loss function is employed to detect abnormal audio. This choice helps address issues related to unbalanced training data and class overlap, thereby improving model performance on imbalanced datasets. The proposed HADNet method was evaluated on the MIVIA Road Events and UrbanSound8K datasets. The results demonstrate that the proposed method for abnormal audio detection on embedded systems achieves high accuracy of 99.6% and an efficient detection time of 0.06 s. This approach proves to be robust and suitable for practical applications in traffic management and safety. By addressing the challenges posed by traditional video surveillance methods, HADNet offers a valuable and complementary solution for enhancing safety measures in diverse traffic conditions. Full article
(This article belongs to the Special Issue Fault Detection Technology Based on Deep Learning)
Show Figures

Figure 1

13 pages, 1761 KiB  
Article
Leveraging Multi-Modality and Enhanced Temporal Networks for Robust Violence Detection
by Gwangho Na, Jaepil Ko and Kyungjoo Cheoi
Mach. Learn. Knowl. Extr. 2024, 6(4), 2422-2434; https://doi.org/10.3390/make6040119 - 28 Oct 2024
Viewed by 452
Abstract
In this paper, we present a novel model that enhances performance by extending the dual-modality TEVAD model—originally leveraging visual and textual information—into a multi-modal framework that integrates visual, audio, and textual data. Additionally, we refine the multi-scale temporal network (MTN) to improve feature [...] Read more.
In this paper, we present a novel model that enhances performance by extending the dual-modality TEVAD model—originally leveraging visual and textual information—into a multi-modal framework that integrates visual, audio, and textual data. Additionally, we refine the multi-scale temporal network (MTN) to improve feature extraction across multiple temporal scales between video snippets. Using the XD-Violence dataset, which includes audio data for violence detection, we conduct experiments to evaluate various feature fusion methods. The proposed model achieves an average precision (AP) of 83.9%, surpassing the performance of single-modality approaches (visual: 73.9%, audio: 67.1%, textual: 29.9%) and dual-modality approaches (visual + audio: 78.8%, visual + textual: 78.5%). These findings demonstrate that the proposed model outperforms models based on the original MTN and reaffirm the efficacy of multi-modal approaches in enhancing violence detection compared to single- or dual-modality methods. Full article
Show Figures

Figure 1

21 pages, 1089 KiB  
Article
Cloud IaaS Optimization Using Machine Vision at the IoT Edge and the Grid Sensing Algorithm
by Nuruzzaman Faruqui, Sandesh Achar, Sandeepkumar Racherla, Vineet Dhanawat, Prathyusha Sripathi, Md. Monirul Islam, Jia Uddin, Manal A. Othman, Md Abdus Samad and Kwonhue Choi
Sensors 2024, 24(21), 6895; https://doi.org/10.3390/s24216895 - 27 Oct 2024
Viewed by 800
Abstract
Security grids consisting of High-Definition (HD) Internet of Things (IoT) cameras are gaining popularity for organizational perimeter surveillance and security monitoring. Transmitting HD video data to cloud infrastructure requires high bandwidth and more storage space than text, audio, and image data. It becomes [...] Read more.
Security grids consisting of High-Definition (HD) Internet of Things (IoT) cameras are gaining popularity for organizational perimeter surveillance and security monitoring. Transmitting HD video data to cloud infrastructure requires high bandwidth and more storage space than text, audio, and image data. It becomes more challenging for large-scale organizations with massive security grids to minimize cloud network bandwidth and storage costs. This paper presents an application of Machine Vision at the IoT Edge (Mez) technology in association with a novel Grid Sensing (GRS) algorithm to optimize cloud Infrastructure as a Service (IaaS) resource allocation, leading to cost minimization. Experimental results demonstrated a 31.29% reduction in bandwidth and a 22.43% reduction in storage requirements. The Mez technology offers a network latency feedback module with knobs for transforming video frames to adjust to the latency sensitivity. The association of the GRS algorithm introduces its compatibility in the IoT camera-driven security grid by automatically ranking the existing bandwidth requirements by different IoT nodes. As a result, the proposed system minimizes the entire grid’s throughput, contributing to significant cloud resource optimization. Full article
Show Figures

Figure 1

16 pages, 624 KiB  
Article
Towards the Development of the Clinical Decision Support System for the Identification of Respiration Diseases via Lung Sound Classification Using 1D-CNN
by Syed Waqad Ali, Muhammad Munaf Rashid, Muhammad Uzair Yousuf, Sarmad Shams, Muhammad Asif, Muhammad Rehan and Ikram Din Ujjan
Sensors 2024, 24(21), 6887; https://doi.org/10.3390/s24216887 - 27 Oct 2024
Viewed by 480
Abstract
Respiratory disorders are commonly regarded as complex disorders to diagnose due to their multi-factorial nature, encompassing the interplay between hereditary variables, comorbidities, environmental exposures, and therapies, among other contributing factors. This study presents a Clinical Decision Support System (CDSS) for the early detection [...] Read more.
Respiratory disorders are commonly regarded as complex disorders to diagnose due to their multi-factorial nature, encompassing the interplay between hereditary variables, comorbidities, environmental exposures, and therapies, among other contributing factors. This study presents a Clinical Decision Support System (CDSS) for the early detection of respiratory disorders using a one-dimensional convolutional neural network (1D-CNN) model. The ICBHI 2017 Breathing Sound Database, which contains samples of different breathing sounds, was used in this research. During pre-processing, audio clips were resampled to a uniform rate, and breathing cycles were segmented into individual instances of the lung sound. A One-Dimensional Convolutional Neural Network (1D-CNN) consisting of convolutional layers, max pooling layers, dropout layers, and fully connected layers, was designed to classify the processed clips into four categories: normal, crackles, wheezes, and combined crackles and wheezes. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to the training data. Hyperparameters were optimized using grid search with k−fold cross-validation. The model achieved an overall accuracy of 0.95, outperforming state-of-the-art methods. Particularly, the normal and crackles categories attained the highest F1-scores of 0.97 and 0.95, respectively. The model’s robustness was further validated through 5−fold and 10−fold cross-validation experiments. This research highlighted an essential aspect of diagnosing lung sounds through artificial intelligence and utilized the 1D-CNN to classify lung sounds accurately. The proposed advancement of technology shall enable medical care practitioners to diagnose lung disorders in an improved manner, leading to better patient care. Full article
(This article belongs to the Special Issue AI-Based Automated Recognition and Detection in Healthcare)
Show Figures

Figure 1

18 pages, 3589 KiB  
Article
Improved Patch-Mix Transformer and Contrastive Learning Method for Sound Classification in Noisy Environments
by Xu Chen, Mei Wang, Ruixiang Kan and Hongbing Qiu
Appl. Sci. 2024, 14(21), 9711; https://doi.org/10.3390/app14219711 - 24 Oct 2024
Viewed by 474
Abstract
In urban environments, noise significantly impacts daily life and presents challenges for Environmental Sound Classification (ESC). The structural influence of urban noise on audio signals complicates feature extraction and audio classification for environmental sound classification methods. To address these challenges, this paper proposes [...] Read more.
In urban environments, noise significantly impacts daily life and presents challenges for Environmental Sound Classification (ESC). The structural influence of urban noise on audio signals complicates feature extraction and audio classification for environmental sound classification methods. To address these challenges, this paper proposes a Contrastive Learning-based Audio Spectrogram Transformer (CL-Transformer) that incorporates a Patch-Mix mechanism and adaptive contrastive learning strategies while simultaneously improving and utilizing adaptive data augmentation techniques for model training. Firstly, a combination of data augmentation techniques is introduced to enrich environmental sounds. Then, the Patch-Mix feature fusion scheme randomly mixes patches of the enhanced and noisy spectrograms during the Transformer’s patch embedding. Furthermore, a novel contrastive learning scheme is introduced to quantify loss and improve model performance, synergizing well with the Transformer model. Finally, experiments on the ESC-50 and UrbanSound8K public datasets achieved accuracies of 97.75% and 92.95%, respectively. To simulate the impact of noise in real urban environments, the model is evaluated using the UrbanSound8K dataset with added background noise at different signal-to-noise ratios (SNR). Experimental results demonstrate that the proposed framework performs well in noisy environments. Full article
Show Figures

Figure 1

14 pages, 267 KiB  
Article
Practices and Barriers in Implementing the Low FODMAP Diet for Irritable Bowel Syndrome Among Malaysian Dietitians: A Qualitative Study
by Tham Jin Ke, Mohd Jamil Sameeha, Kewin Tien Ho Siah, Putri Balqish Qistina Binti Jeffri, Noor Athierah Binti Idrus and Shanthi Krishnasamy
Nutrients 2024, 16(21), 3596; https://doi.org/10.3390/nu16213596 - 23 Oct 2024
Viewed by 662
Abstract
The low fermentable oligo-, di-, mono-saccharides and polyols (FODMAP) diet (LFD) is a second-line dietary intervention for irritable bowel syndrome (IBS) patients, involving FODMAP restriction, reintroduction, and personalization, and it needs to be delivered by dietitians. However, the application of this diet among [...] Read more.
The low fermentable oligo-, di-, mono-saccharides and polyols (FODMAP) diet (LFD) is a second-line dietary intervention for irritable bowel syndrome (IBS) patients, involving FODMAP restriction, reintroduction, and personalization, and it needs to be delivered by dietitians. However, the application of this diet among Malaysian IBS patients is not well understood. This study aimed to explore the practices and barriers in delivering the LFD among Malaysia dietitians. Semi-structured qualitative interviews were conducted online with practicing dietitians until the data reached saturation. All the interview sessions were audio recorded and transcribed verbatim. Thematic analysis was used to analyze the data. Eleven dietitians were interviewed, with 36.4% (n = 4) having more than 10 years of experience. The following four themes regarding their practices emerged: 1. dietary advice on FODMAP restriction; 2. duration of FODMAP restriction phase; 3. references used to get information about FODMAPs, and 4. strategies on reintroduction. Meanwhile, the following seven barriers were identified: 1. lack of culturally relevant educational materials; 2. limited knowledge about the LFD; 3. inadequate formal training among dietitians; 4. lack of integration in multi-disciplinary care; 5. low health literacy of patients; 6. low compliance rate among patients, and 7. restrictions for certain populations. LFD implementation in Malaysia is not standardized as only experienced dietitians can provide dietary evidence-based advice. Lack of training and culturally specific resources are some of the main barriers that were identified to be limiting the implementation of the diet. Therefore, there is a need for training programs and resource development to support Malaysian dietitians in managing IBS patients. Full article
(This article belongs to the Section Clinical Nutrition)
14 pages, 1309 KiB  
Article
Combined Keyword Spotting and Localization Network Based on Multi-Task Learning
by Jungbeom Ko, Hyunchul Kim and Jungsuk Kim
Mathematics 2024, 12(21), 3309; https://doi.org/10.3390/math12213309 - 22 Oct 2024
Viewed by 444
Abstract
The advent of voice assistance technology and its integration into smart devices has facilitated many useful services, such as texting and application execution. However, most assistive technologies lack the capability to enable the system to act as a human who can localize the [...] Read more.
The advent of voice assistance technology and its integration into smart devices has facilitated many useful services, such as texting and application execution. However, most assistive technologies lack the capability to enable the system to act as a human who can localize the speaker and selectively spot meaningful keywords. Because keyword spotting (KWS) and sound source localization (SSL) are essential and must operate in real time, the efficiency of a neural network model is crucial for memory and computation. In this paper, a single neural network model for KWS and SSL is proposed to overcome the limitations of sequential KWS and SSL, which require more memory and inference time. The proposed model uses multi-task learning to utilize the limited resources of the device efficiently. A shared encoder is used as the initial layer to extract common features from the multichannel audio data. Subsequently, the task-specific parallel layers utilize these features for KWS and SSL. The proposed model was evaluated on a synthetic dataset with multiple speakers, and a 7-module shared encoder structure was identified as optimal in terms of accuracy, direction of arrival (DOA) accuracy, DOA error, and latency. It achieved a KWS accuracy of 94.51%, DOA error of 12.397°, and DOA accuracy of 89.86%. Consequently, the proposed model requires significantly less memory owing to the shared network architecture, which enhances the inference time without compromising KWS accuracy, DOA error, and DOA accuracy. Full article
(This article belongs to the Special Issue Computational Intelligence and Machine Learning with Applications)
Show Figures

Figure 1

Back to TopTop