Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (744)

Search Parameters:
Keywords = audio processing

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 2189 KiB  
Article
Data Augmentation for Deep Learning-Based Speech Reconstruction Using FOC-Based Methods
by Bilgi Görkem Yazgaç and Mürvet Kırcı
Fractal Fract. 2025, 9(2), 56; https://doi.org/10.3390/fractalfract9020056 - 21 Jan 2025
Viewed by 44
Abstract
Neural audio reconstruction is an important subtopic of Neural Audio Synthesis (NAS), which is a current emerging topic of modern Artificial Intelligence (AI) applications. The objective of a neural audio reconstruction model is to achieve a viable audio waveform from an audio feature [...] Read more.
Neural audio reconstruction is an important subtopic of Neural Audio Synthesis (NAS), which is a current emerging topic of modern Artificial Intelligence (AI) applications. The objective of a neural audio reconstruction model is to achieve a viable audio waveform from an audio feature representation that excludes the phase information. Since the data-dependent nature of such systems demands an increased quantity of data, methods of increasing the quantity of data for neural network training arise as a topic of substantial interest. Although the applications of data augmentation methods for classification tasks are well documented, there is still room for development for applications of such methods on signal synthesis tasks. Additionally, the Fractional-Order Calculus (FOC) framework provides possibilities for quality applications for the signal processing domain. Still, it is important to show that the methods based on the FOC framework can be applied to different application domains to show the capabilities of this framework. In this paper, FOC-based methods are applied to a speech dataset for data augmentation purposes to increase the audio reconstruction performance of a neural network, a spectral consistency-based neural audio reconstruction model called Deep Griffin-Lim Iteration (DeGLI), with respect to objective measures PESQ and STOI. An FOC-based method for rescaling linear frequency for augmenting magnitude spectrogram data is proposed. Furthermore, together with an FOC-based phase estimation method, it is shown that an augmentation strategy that has the objective of increased spectral consistency should be considered in data augmentation for audio reconstruction tasks. The test results reveal that this type of strategy increases the performance of a spectral consistency-based neural audio reconstruction model by over 13% for smaller depths. Full article
Show Figures

Figure 1

34 pages, 740 KiB  
Systematic Review
Exploring the Intersection of ADHD and Music: A Systematic Review
by Phoebe Saville, Caitlin Kinney, Annie Heiderscheit and Hubertus Himmerich
Behav. Sci. 2025, 15(1), 65; https://doi.org/10.3390/bs15010065 - 13 Jan 2025
Viewed by 1008
Abstract
Attention Deficit Hyperactivity Disorder (ADHD) is a highly prevalent neurodevelopmental disorder, affecting both children and adults, which often leads to significant difficulties with attention, impulsivity, and working memory. These challenges can impact various cognitive and perceptual domains, including music perception and performance. Despite [...] Read more.
Attention Deficit Hyperactivity Disorder (ADHD) is a highly prevalent neurodevelopmental disorder, affecting both children and adults, which often leads to significant difficulties with attention, impulsivity, and working memory. These challenges can impact various cognitive and perceptual domains, including music perception and performance. Despite these difficulties, individuals with ADHD frequently engage with music, and previous research has shown that music listening can serve as a means of increasing stimulation and self-regulation. Moreover, music therapy has been explored as a potential treatment option for individuals with ADHD. As there is a lack of integrative reviews on the interaction between ADHD and music, the present review aimed to fill the gap in research. Following PRISMA guidelines, a comprehensive literature search was conducted across PsychInfo (Ovid), PubMed, and Web of Science. A narrative synthesis was conducted on 20 eligible studies published between 1981 and 2023, involving 1170 participants, of whom 830 had ADHD or ADD. The review identified three main areas of research: (1) music performance and processing in individuals with ADHD, (2) the use of music listening as a source of stimulation for those with ADHD, and (3) music-based interventions aimed at mitigating ADHD symptoms. The analysis revealed that individuals with ADHD often experience unique challenges in musical tasks, particularly those related to timing, rhythm, and complex auditory stimuli perception, though these deficits did not extend to rhythmic improvisation and musical expression. Most studies indicated that music listening positively affects various domains for individuals with ADHD. Furthermore, most studies of music therapy found that it can generate significant benefits for individuals with ADHD. The strength of these findings, however, was limited by inconsistencies among the studies, such as variations in ADHD diagnosis, comorbidities, medication use, and gender. Despite these limitations, this review provides a valuable foundation for future research on the interaction between ADHD and music. Full article
(This article belongs to the Special Issue Innovations in Music Based Interventions for Psychological Wellbeing)
Show Figures

Figure 1

20 pages, 4168 KiB  
Article
Immersive Haptic Technology to Support English Language Learning Based on Metacognitive Strategies
by Adriana Guanuche, Wilman Paucar, William Oñate and Gustavo Caiza
Appl. Sci. 2025, 15(2), 665; https://doi.org/10.3390/app15020665 - 11 Jan 2025
Viewed by 406
Abstract
One of the most widely used strategies for learning support is the use of Information and Communication Technologies (ICTs), due to the variety of applications and benefits they provide in the educational field. This article describes the design and implementation of an immersive [...] Read more.
One of the most widely used strategies for learning support is the use of Information and Communication Technologies (ICTs), due to the variety of applications and benefits they provide in the educational field. This article describes the design and implementation of an immersive application supported by Senso gloves and 3D environments for learning English as a second language in Ecuador. The following steps should be considered for the app design: (1) the creation of a classroom with characteristics similar to a real classroom and different buttons to navigate through the scenarios; (2) the creation of a virtual environment where text, images, examples, and audio are added according to the grammatical topic; (3) the creation of a dynamic environment for assessment in which multiple choice questions are interacted with, followed by automatic grading with direct feedback. The results showed that the interaction between the physical and virtual environment through navigation tests with the glove in different 3D environments achieved a complete activation and navigation rate. Teachers showed a clear interest in using the application in their classes as an additional teaching tool to complement the English language teaching process, given that it can increase motivation and memorization in students, as it is an easy-to-use application, and the 3D environments designed are attractive, which would make classes more dynamic. In addition, the availability of the application at any place and time represents a support for the current academic community as it adapts to the needs of today’s world. Full article
Show Figures

Figure 1

20 pages, 1849 KiB  
Article
Speech Emotion Recognition Model Based on Joint Modeling of Discrete and Dimensional Emotion Representation
by John Lorenzo Bautista and Hyun Soon Shin
Appl. Sci. 2025, 15(2), 623; https://doi.org/10.3390/app15020623 - 10 Jan 2025
Viewed by 394
Abstract
This paper introduces a novel joint model architecture for Speech Emotion Recognition (SER) that integrates both discrete and dimensional emotional representations, allowing for the simultaneous training of classification and regression tasks to improve the comprehensiveness and interpretability of emotion recognition. By employing a [...] Read more.
This paper introduces a novel joint model architecture for Speech Emotion Recognition (SER) that integrates both discrete and dimensional emotional representations, allowing for the simultaneous training of classification and regression tasks to improve the comprehensiveness and interpretability of emotion recognition. By employing a joint loss function that combines categorical and regression losses, the model ensures balanced optimization across tasks, with experiments exploring various weighting schemes using a tunable parameter to adjust task importance. Two adaptive weight balancing schemes, Dynamic Weighting and Joint Weighting, further enhance performance by dynamically adjusting task weights based on optimization progress and ensuring balanced emotion representation during backpropagation. The architecture employs parallel feature extraction through independent encoders, designed to capture unique features from multiple modalities, including Mel-frequency Cepstral Coefficients (MFCC), Short-term Features (STF), Mel-spectrograms, and raw audio signals. Additionally, pre-trained models such as Wav2Vec 2.0 and HuBERT are integrated to leverage their robust latent features. The inclusion of self-attention and co-attention mechanisms allows the model to capture relationships between input modalities and interdependencies among features, further improving its interpretability and integration capabilities. Experiments conducted on the IEMOCAP dataset using a leave-one-subject-out approach demonstrate the model’s effectiveness, with results showing a 1–2% accuracy improvement over classification-only models. The optimal configuration, incorporating the joint architecture, dynamic weighting, and parallel processing of multimodal features, achieves a weighted accuracy of 72.66%, an unweighted accuracy of 73.22%, and a mean Concordance Correlation Coefficient (CCC) of 0.3717. These results validate the effectiveness of the proposed joint model architecture and adaptive balancing weight schemes in improving SER performance. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

12 pages, 3982 KiB  
Article
Development of a Solar-Powered Edge Processing Perimeter Alert System with AI and LoRa/LoRaWAN Integration for Drone Detection and Enhanced Security
by Mateo Mejia-Herrera, Juan Botero-Valencia, José Ortega and Ruber Hernández-García
Drones 2025, 9(1), 43; https://doi.org/10.3390/drones9010043 - 10 Jan 2025
Viewed by 462
Abstract
Edge processing is a trend in developing new technologies that leverage Artificial Intelligence (AI) without transmitting large volumes of data to centralized processing services. This technique is particularly relevant for security applications where there is a need to reduce the probability of intrusion [...] Read more.
Edge processing is a trend in developing new technologies that leverage Artificial Intelligence (AI) without transmitting large volumes of data to centralized processing services. This technique is particularly relevant for security applications where there is a need to reduce the probability of intrusion or data breaches and to decentralize alert systems. Although drone detection has received great research attention, the ability to identify helicopters expands the spectrum of aerial threats that can be detected. In this work, we present the development of a perimeter alert system that integrates AI and multiple sensors processed at the edge. The proposed system can be integrated into a LoRa or LoRaWAN network powered by solar energy. The system incorporates a PDM microphone based on an Arduino Nano 33 BLE with a trained model to identify a drone or a UH-60 from an audio spectrogram to demonstrate its functionality. It is complemented by two PIR motion sensors and a microwave sensor with a range of up to 11 m. Additionally, the DC magnetic field is measured to identify possible sensor movements or changes caused by large bodies, and a configurable RGB light signal visually indicates motion or sound detection. The monitoring system communicates with a second MCU integrated with a LoRa or LoRaWAN communication module, enabling information transmission over distances of up to several kilometers. The system is powered by a LiPo battery, which is recharged using solar energy. The perimeter alert system offers numerous advantages, including edge processing for enhanced data privacy and reduced latency, integrating multiple sensors for increased accuracy, and a decentralized approach to improving security. Its compatibility with LoRa or LoRaWAN networks enables long-range communication, while solar-powered operation reduces environmental impact. These features position the perimeter alert system as a versatile and powerful solution for various applications, including border control, private property protection, and critical infrastructure monitoring. The evaluation results show notable progress in the acoustic detection of helicopters and drones under controlled conditions. Finally, all the original data presented in the study are openly available in an OSF repository. Full article
Show Figures

Figure 1

25 pages, 6169 KiB  
Article
Elephant Sound Classification Using Deep Learning Optimization
by Hiruni Dewmini, Dulani Meedeniya and Charith Perera
Sensors 2025, 25(2), 352; https://doi.org/10.3390/s25020352 - 9 Jan 2025
Viewed by 361
Abstract
Elephant sound identification is crucial in wildlife conservation and ecological research. The identification of elephant vocalizations provides insights into the behavior, social dynamics, and emotional expressions, leading to elephant conservation. This study addresses elephant sound classification utilizing raw audio processing. Our focus lies [...] Read more.
Elephant sound identification is crucial in wildlife conservation and ecological research. The identification of elephant vocalizations provides insights into the behavior, social dynamics, and emotional expressions, leading to elephant conservation. This study addresses elephant sound classification utilizing raw audio processing. Our focus lies on exploring lightweight models suitable for deployment on resource-costrained edge devices, including MobileNet, YAMNET, and RawNet, alongside introducing a novel model termed ElephantCallerNet. Notably, our investigation reveals that the proposed ElephantCallerNet achieves an impressive accuracy of 89% in classifying raw audio directly without converting it to spectrograms. Leveraging Bayesian optimization techniques, we fine-tuned crucial parameters such as learning rate, dropout, and kernel size, thereby enhancing the model’s performance. Moreover, we scrutinized the efficacy of spectrogram-based training, a prevalent approach in animal sound classification. Through comparative analysis, the raw audio processing outperforms spectrogram-based methods. In contrast to other models in the literature that primarily focus on a single caller type or binary classification that identifies whether a sound is an elephant voice or not, our solution is designed to classify three distinct caller-types namely roar, rumble, and trumpet. Full article
Show Figures

Figure 1

14 pages, 282 KiB  
Essay
Podcast—The Remediation of Radio: A Media Theoretical Framework for Podcast Research
by Mónika Andok
Journal. Media 2025, 6(1), 7; https://doi.org/10.3390/journalmedia6010007 - 9 Jan 2025
Viewed by 522
Abstract
The study aims to investigate, describe, and contextualize the media theoretical characteristics of podcasts. Initially, it examines the media technological advancements that facilitated the emergence of podcasts, followed by an exploration of the parallel innovations in content production. Subsequently, the analysis focuses on [...] Read more.
The study aims to investigate, describe, and contextualize the media theoretical characteristics of podcasts. Initially, it examines the media technological advancements that facilitated the emergence of podcasts, followed by an exploration of the parallel innovations in content production. Subsequently, the analysis focuses on key dimensions relevant to defining podcasts as a medium: the evolution of audio technology, the heterogeneity of content, and the patterns of individual and collective usage. The transition of podcasts from a novel cultural form to a recognized medium was a gradual process spanning approximately two decades. The aim of this study is to examine how podcasts emerged as a distinct medium, combining the characteristics of traditional radio with modern digital technologies, using Bolter and Grusin’s remediation concept as a media theoretical framework. This paper posits that the most suitable theoretical framework for understanding podcasts as a medium is Bolter and Grusin’s theory of remediation. According to this framework, media are defined through processes of remediation. In the case of podcasts, however, a unique form of convergent remediation emerges, wherein the medium integrates elements of traditional radio with the distinctive characteristics of networked communication. Full article
22 pages, 1409 KiB  
Article
Authenticity at Risk: Key Factors in the Generation and Detection of Audio Deepfakes
by Alba Martínez-Serrano, Claudia Montero-Ramírez and Carmen Peláez-Moreno
Appl. Sci. 2025, 15(2), 558; https://doi.org/10.3390/app15020558 - 8 Jan 2025
Viewed by 449
Abstract
Detecting audio deepfakes is crucial to ensure authenticity and security, especially in contexts where audio veracity can have critical implications, such as in the legal, security or human rights domains. Various elements, such as complex acoustic backgrounds, enhance the realism of deepfakes; however, [...] Read more.
Detecting audio deepfakes is crucial to ensure authenticity and security, especially in contexts where audio veracity can have critical implications, such as in the legal, security or human rights domains. Various elements, such as complex acoustic backgrounds, enhance the realism of deepfakes; however, their effect on the processes of creation and detection of deepfakes remains under-explored. This study systematically analyses how factors such as the acoustic environment, user type and signal-to-noise ratio influence the quality and detectability of deepfakes. For this study, we use the WELIVE dataset, which contains audio recordings of 14 female victims of gender-based violence in real and uncontrolled environments. The results indicate that the complexity of the acoustic scene affects both the generation and detection of deepfakes: classifiers, particularly the linear SVM, are more effective in complex acoustic environments, suggesting that simpler acoustic environments may facilitate the generation of more realistic deepfakes and, in turn, make it more difficult for classifiers to detect them. These findings underscore the need to develop adaptive models capable of handling diverse acoustic environments, thus improving detection reliability in dynamic and real-world contexts. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Graphical abstract

8 pages, 2712 KiB  
Proceeding Paper
CareTaker.ai—A Smart Health-Monitoring and Caretaker-Assistant System for Elder Healthcare
by Ankur Gupta, Sahil Sawhney and Suhaib Ahmed
Eng. Proc. 2024, 78(1), 7; https://doi.org/10.3390/engproc2024078007 - 8 Jan 2025
Viewed by 256
Abstract
There are several systems for patient care, including elderly healthcare, which rely on sensor data acquisition and analysis. These sensors are typical vital-monitoring sensors and are coupled with Artificial Intelligence (AI) models to quickly analyze emergency situations or even predict them. These systems [...] Read more.
There are several systems for patient care, including elderly healthcare, which rely on sensor data acquisition and analysis. These sensors are typical vital-monitoring sensors and are coupled with Artificial Intelligence (AI) models to quickly analyze emergency situations or even predict them. These systems are deployed in hospitals and require expensive monitoring and analysis equipment. Eldercare specifically encompasses monitoring, smart analysis, and even the emotional aspects of care. Existing systems do not provide a portable, easy-to-use system for at-home eldercare. Further, existing systems do not address advanced analysis capabilities around mood/sentiment/mental state/mental disorder analysis or the analysis of issues around sleep disorders, apnea, etc., based on sound capture and analysis. Also, existing systems disregard the emotional needs of elderly patients, which are a critical aspect of patient wellbeing. A low-cost and effective solution is therefore required for extended use in eldercare. In this paper, the CareTaker.ai system is proposed to address the shortcomings of the existing systems and build a comprehensive caretaker assistant using sensors, audio, video, and AI. It consists of smart bed sheets, pillow covers with embedded sensors, and a processing unit with GPUs, conversational AI, and generative AI capabilities, with associated functional modules. Compared to existing systems, the proposed system has advanced monitoring and analysis capabilities with potential for low-cost mass manufacturing and a widespread commercial application. Full article
Show Figures

Figure 1

18 pages, 2813 KiB  
Article
Multimodal Data Fusion for Depression Detection Approach
by Mariia Nykoniuk, Oleh Basystiuk, Nataliya Shakhovska and Nataliia Melnykova
Computation 2025, 13(1), 9; https://doi.org/10.3390/computation13010009 - 2 Jan 2025
Viewed by 497
Abstract
Depression is one of the most common mental health disorders in the world, affecting millions of people. Early detection of depression is crucial for effective medical intervention. Multimodal networks can greatly assist in the detection of depression, especially in situations where in patients [...] Read more.
Depression is one of the most common mental health disorders in the world, affecting millions of people. Early detection of depression is crucial for effective medical intervention. Multimodal networks can greatly assist in the detection of depression, especially in situations where in patients are not always aware of or able to express their symptoms. By analyzing text and audio data, such networks are able to automatically identify patterns in speech and behavior that indicate a depressive state. In this study, we propose two multimodal information fusion networks: early and late fusion. These networks were developed using convolutional neural network (CNN) layers to learn local patterns, a bidirectional LSTM (Bi-LSTM) to process sequences, and a self-attention mechanism to improve focus on key parts of the data. The DAIC-WOZ and EDAIC-WOZ datasets were used for the experiments. The experiments compared the precision, recall, f1-score, and accuracy metrics for the cases of using early and late multimodal data fusion and found that the early information fusion multimodal network achieved higher classification accuracy results. On the test dataset, this network achieved an f1-score of 0.79 and an overall classification accuracy of 0.86, indicating its effectiveness in detecting depression. Full article
(This article belongs to the Special Issue Artificial Intelligence Applications in Public Health: 2nd Edition)
Show Figures

Figure 1

15 pages, 4321 KiB  
Article
Feasibility Study of Real-Time Speech Detection and Characterization Using Millimeter-Wave Micro-Doppler Radar
by Nati Steinmetz and Nezah Balal
Remote Sens. 2025, 17(1), 91; https://doi.org/10.3390/rs17010091 - 29 Dec 2024
Viewed by 596
Abstract
This study presents a novel approach to remote speech recognition using a millimeter-wave micro-Doppler radar system operating at 94 GHz. By detecting micro-Doppler speech-related vibrations, the system enables non-contact and privacy-preserving speech recognition. Initial experiments used a piezoelectric crystal to simulate vocal cord [...] Read more.
This study presents a novel approach to remote speech recognition using a millimeter-wave micro-Doppler radar system operating at 94 GHz. By detecting micro-Doppler speech-related vibrations, the system enables non-contact and privacy-preserving speech recognition. Initial experiments used a piezoelectric crystal to simulate vocal cord vibrations, followed by tests with actual human speech. Advanced signal processing techniques, including short-time Fourier transform (STFT), were used to generate spectrograms and reconstruct speech signals. The system demonstrated high accuracy, with cross-correlation analysis quantitatively confirming a strong correlation between radar-reconstructed and original audio signals. These results validate the effectiveness of detecting and characterizing speech-related vibrations without direct audio recording. The findings have significant implications for applications in noisy industrial environments, enabling robust voice interaction capabilities, as well as in healthcare diagnostics and assistive technologies, where contactless and privacy-preserving solutions are essential. Future research will explore diverse real-world scenarios and the integration of advanced signal processing and machine learning techniques to further enhance accuracy and robustness. Full article
(This article belongs to the Special Issue Remote Sensing in 2024)
Show Figures

Figure 1

18 pages, 5732 KiB  
Article
AFT-SAM: Adaptive Fusion Transformer with a Sparse Attention Mechanism for Audio–Visual Speech Recognition
by Na Che, Yiming Zhu, Haiyan Wang, Xianwei Zeng and Qinsheng Du
Appl. Sci. 2025, 15(1), 199; https://doi.org/10.3390/app15010199 - 29 Dec 2024
Viewed by 522
Abstract
Aiming at the problems of serious information redundancy, complex inter-modal information interaction, and difficult multimodal fusion faced by the audio–visual speech recognition system when dealing with complex multimodal information, this paper proposes an adaptive fusion transformer algorithm (AFT-SAM) based on a sparse attention [...] Read more.
Aiming at the problems of serious information redundancy, complex inter-modal information interaction, and difficult multimodal fusion faced by the audio–visual speech recognition system when dealing with complex multimodal information, this paper proposes an adaptive fusion transformer algorithm (AFT-SAM) based on a sparse attention mechanism. The algorithm adopts the sparse attention mechanism in the feature-encoding process to reduce excessive attention to non-important regions and dynamically adjusts the attention weights through adaptive fusion to capture and integrate the multimodal information more effectively and reduce the impact of redundant information on the model performance. Experiments are conducted on the audio–visual speech recognition dataset LRS2 and compared with other algorithms, and the experimental results show that the proposed algorithm in this paper has significantly lower WERs in the audio-only, visual-only, and audio–visual bimodal cases. Full article
(This article belongs to the Special Issue Advances in Audio/Image Signals Processing)
Show Figures

Figure 1

13 pages, 2646 KiB  
Article
Audio Watermarking System in Real-Time Applications
by Carlos Jair Santin-Cruz and Gordana Jovanovic Dolecek
Informatics 2025, 12(1), 1; https://doi.org/10.3390/informatics12010001 - 25 Dec 2024
Viewed by 374
Abstract
Watermarking is widely employed to protect audio files. Previous research has focused on developing systems that balance performance criteria, including robustness, imperceptibility, and capacity. Most existing systems are designed to work with pre-recorded audio signals, where the characteristics of the host signal are [...] Read more.
Watermarking is widely employed to protect audio files. Previous research has focused on developing systems that balance performance criteria, including robustness, imperceptibility, and capacity. Most existing systems are designed to work with pre-recorded audio signals, where the characteristics of the host signal are known in advance. In such cases, processing time is not a critical factor, as these systems generally do not account for real-time signal acquisition or report tests for real-time signal acquisition nor report the elapsed time between signal acquisition and watermarking output, known as latency. However, the increasing prevalence of audio sharing through real-time streams or video calls is a pressing issue requiring low-latency systems. This work introduces a low-latency watermarking system that utilizes a spread spectrum technique, a method that spreads the signal energy across a wide frequency band while embedding the watermark additively in the time domain to minimize latency. The system’s performance was evaluated by simulating real-time audio streams using two distinct methods. The results demonstrate that the proposed system achieves minimal latency during embedding, addressing the urgent need for such systems. Full article
Show Figures

Figure 1

23 pages, 3175 KiB  
Article
Assisting Hearing and Physically Impaired Students in Navigating Immersive Virtual Reality for Library Orientation
by Pakinee Ariya, Yakannut Yensathit, Phimphakan Thongthip, Kannikar Intawong and Kitti Puritat
Technologies 2025, 13(1), 2; https://doi.org/10.3390/technologies13010002 - 24 Dec 2024
Viewed by 823
Abstract
This study aims to design and develop a virtual reality platform (VR-ISLS) tailored to support hearing and physically impaired students at the university library for navigating and utilizing library services. By employing an immersive virtual environment, the platform replicates the physical setting of [...] Read more.
This study aims to design and develop a virtual reality platform (VR-ISLS) tailored to support hearing and physically impaired students at the university library for navigating and utilizing library services. By employing an immersive virtual environment, the platform replicates the physical setting of the university’s library to create a realistic experience that reduces anxiety and enhances familiarity. The platform integrates assistive technology functions, including sign language interpretation, customizable audio cues, vibration feedback, and various locomotion controls to meet the diverse needs of impaired students. The research methodology employs an iterative development process, incorporating feedback from library staff, disability support services, and students to ensure usability and accessibility. Evaluation of the platform using the System Usability Scale (SUS) and user feedback revealed a positive reception, with recommendations for further customization and enhanced assistive features to optimize the user experience. This study underscores the importance of inclusive design and continuous iteration in creating immersive virtual reality tools that provide significant benefits for persons with disabilities, enhancing both accessibility and learning experiences. Full article
Show Figures

Graphical abstract

20 pages, 1343 KiB  
Article
Fast Design Space Exploration for Always-On Neural Networks
by Jeonghun Kim and Sunggu Lee
Electronics 2024, 13(24), 4971; https://doi.org/10.3390/electronics13244971 - 17 Dec 2024
Viewed by 409
Abstract
An analytical model can quickly predict performance and energy efficiency based on information about the neural network model and neural accelerator architecture, making it ideal for rapid pre-synthesis design space exploration. This paper proposes a new analytical model specifically targeted for convolutional neural [...] Read more.
An analytical model can quickly predict performance and energy efficiency based on information about the neural network model and neural accelerator architecture, making it ideal for rapid pre-synthesis design space exploration. This paper proposes a new analytical model specifically targeted for convolutional neural networks used in always-on applications. To validate the proposed model, the performance and energy efficiency estimated by the model were compared with actual hardware and post-synthesis gate-level simulations of hardware synthesized with a state-of-the-art electronic design automation (EDA) synthesis tool. Comparisons with hardware created for the Eyeriss neural accelerator showed average execution time and energy consumption error rates of 3.33% and 13.54%, respectively. Comparisons with hardware synthesis results showed an error of 3.18% to 9.44% for two example neural accelerator configurations used to execute MobileNet, EfficientNet, and DarkNet neural network models. Finally, the utility of the proposed model was demonstrated by using it to evaluate the effects of different channel sizes, pruning rates, and batch sizes in several neural network designs for always-on vision, text, and audio processing. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop