Search | arXiv e-print repository

arXiv:2407.18892 [pdf, other]

SHANGUS: Deep Reinforcement Learning Meets Heuristic Optimization for Speedy Frontier-Based Exploration of Autonomous Vehicles in Unknown Spaces

Authors: Seunghyeop Nam, Tuan Anh Nguyen, Eunmi Choi, Dugki Min

Abstract: This paper introduces SHANGUS, an advanced framework combining Deep Reinforcement Learning (DRL) with heuristic optimization to improve frontier-based exploration efficiency in unknown environments, particularly for intelligent vehicles in autonomous air services, search and rescue operations, and space exploration robotics. SHANGUS harnesses DRL's adaptability and heuristic prioritization, marked… ▽ More This paper introduces SHANGUS, an advanced framework combining Deep Reinforcement Learning (DRL) with heuristic optimization to improve frontier-based exploration efficiency in unknown environments, particularly for intelligent vehicles in autonomous air services, search and rescue operations, and space exploration robotics. SHANGUS harnesses DRL's adaptability and heuristic prioritization, markedly enhancing exploration efficiency, reducing completion time, and minimizing travel distance. The strategy involves a frontier selection node to identify unexplored areas and a DRL navigation node using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for robust path planning and dynamic obstacle avoidance. Extensive experiments in ROS2 and Gazebo simulation environments show SHANGUS surpasses representative traditional methods like the Nearest Frontier (NF), Novel Frontier-Based Exploration Algorithm (CFE), and Goal-Driven Autonomous Exploration (GDAE) algorithms, especially in complex scenarios, excelling in completion time, travel distance, and exploration rate. This scalable solution is suitable for real-time autonomous navigation in fields such as industrial automation, autonomous driving, household robotics, and space exploration. Future research will integrate additional sensory inputs and refine heuristic functions to further boost SHANGUS's efficiency and robustness. △ Less

Submitted 26 July, 2024; originally announced July 2024.

arXiv:2407.06142 [pdf, ps, other]

Delay-Aware Robust Edge Network Hardening Under Decision-Dependent Uncertainty

Authors: Jiaming Cheng, Duong Thuy Anh Nguyen, Ni Trieu, Duong Tung Nguyen

Abstract: Edge computing promises to offer low-latency and ubiquitous computation to numerous devices at the network edge. For delay-sensitive applications, link delays can have a direct impact on service quality. These delays can fluctuate drastically over time due to various factors such as network congestion, changing traffic conditions, cyberattacks, component failures, and natural disasters. Thus, it i… ▽ More Edge computing promises to offer low-latency and ubiquitous computation to numerous devices at the network edge. For delay-sensitive applications, link delays can have a direct impact on service quality. These delays can fluctuate drastically over time due to various factors such as network congestion, changing traffic conditions, cyberattacks, component failures, and natural disasters. Thus, it is crucial to efficiently harden the edge network to mitigate link delay variation as well as ensure a stable and improved user experience. To this end, we propose a novel robust model for optimal edge network hardening, considering the link delay uncertainty. Departing from the existing literature that treats uncertainties as exogenous, our model incorporates an endogenous uncertainty set to properly capture the impact of hardening and workload allocation decisions on link delays. However, the endogenous set introduces additional complexity to the problem due to the interdependence between decisions and uncertainties. We present two efficient methods to transform the problem into a solvable form. Extensive numerical results are shown to demonstrate the effectiveness of the proposed approach. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 14 pages, 18 figures

arXiv:2405.01609 [pdf, ps, other]

doi 10.1109/IPCCC51483.2021.9679398

Q-learning-based Opportunistic Communication for Real-time Mobile Air Quality Monitoring Systems

Authors: Trung Thanh Nguyen, Truong Thao Nguyen, Dinh Tuan Anh Nguyen, Thanh Hung Nguyen, Phi Le Nguyen

Abstract: We focus on real-time air quality monitoring systems that rely on devices installed on automobiles in this research. We investigate an opportunistic communication model in which devices can send the measured data directly to the air quality server through a 4G communication channel or via Wi-Fi to adjacent devices or the so-called Road Side Units deployed along the road. We aim to reduce 4G costs… ▽ More We focus on real-time air quality monitoring systems that rely on devices installed on automobiles in this research. We investigate an opportunistic communication model in which devices can send the measured data directly to the air quality server through a 4G communication channel or via Wi-Fi to adjacent devices or the so-called Road Side Units deployed along the road. We aim to reduce 4G costs while assuring data latency, where the data latency is defined as the amount of time it takes for data to reach the server. We propose an offloading scheme that leverages Q-learning to accomplish the purpose. The experiment results show that our offloading method significantly cuts down around 40-50% of the 4G communication cost while keeping the latency of 99.5% packets smaller than the required threshold. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 2021 IEEE International Conference on Performance, Computing and Communications (IPCCC). arXiv admin note: substantial text overlap with arXiv:2405.01057

arXiv:2404.09621 [pdf, other]

AAM-VDT: Vehicle Digital Twin for Tele-Operations in Advanced Air Mobility

Authors: Tuan Anh Nguyen, Taeho Kwag, Vinh Pham, Viet Nghia Nguyen, Jeongseok Hyun, Minseok Jang, Jae-Woo Lee

Abstract: This study advanced tele-operations in Advanced Air Mobility (AAM) through the creation of a Vehicle Digital Twin (VDT) system for eVTOL aircraft, tailored to enhance remote control safety and efficiency, especially for Beyond Visual Line of Sight (BVLOS) operations. By synergizing digital twin technology with immersive Virtual Reality (VR) interfaces, we notably elevate situational awareness and… ▽ More This study advanced tele-operations in Advanced Air Mobility (AAM) through the creation of a Vehicle Digital Twin (VDT) system for eVTOL aircraft, tailored to enhance remote control safety and efficiency, especially for Beyond Visual Line of Sight (BVLOS) operations. By synergizing digital twin technology with immersive Virtual Reality (VR) interfaces, we notably elevate situational awareness and control precision for remote operators. Our VDT framework integrates immersive tele-operation with a high-fidelity aerodynamic database, essential for authentically simulating flight dynamics and control tactics. At the heart of our methodology lies an eVTOL's high-fidelity digital replica, placed within a simulated reality that accurately reflects physical laws, enabling operators to manage the aircraft via a master-slave dynamic, substantially outperforming traditional 2D interfaces. The architecture of the designed system ensures seamless interaction between the operator, the digital twin, and the actual aircraft, facilitating exact, instantaneous feedback. Experimental assessments, involving propulsion data gathering, simulation database fidelity verification, and tele-operation testing, verify the system's capability in precise control command transmission and maintaining the digital-physical eVTOL synchronization. Our findings underscore the VDT system's potential in augmenting AAM efficiency and safety, paving the way for broader digital twin application in autonomous aerial vehicles. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.05610 [pdf, ps, other]

Evidence, Definitions and Algorithms regarding the Existence of Cohesive-Convergence Groups in Neural Network Optimization

Authors: Thien An L. Nguyen

Abstract: Understanding the convergence process of neural networks is one of the most complex and crucial issues in the field of machine learning. Despite the close association of notable successes in this domain with the convergence of artificial neural networks, this concept remains predominantly theoretical. In reality, due to the non-convex nature of the optimization problems that artificial neural netw… ▽ More Understanding the convergence process of neural networks is one of the most complex and crucial issues in the field of machine learning. Despite the close association of notable successes in this domain with the convergence of artificial neural networks, this concept remains predominantly theoretical. In reality, due to the non-convex nature of the optimization problems that artificial neural networks tackle, very few trained networks actually achieve convergence. To expand recent research efforts on artificial-neural-network convergence, this paper will discuss a different approach based on observations of cohesive-convergence groups emerging during the optimization process of an artificial neural network. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.05755 [pdf, other]

SpiRit-LM: Interleaved Spoken and Written Language Model

Authors: Tu Anh Nguyen, Benjamin Muller, Bokai Yu, Marta R. Costa-jussa, Maha Elbayad, Sravya Popuri, Paul-Ambroise Duquenne, Robin Algayres, Ruslan Mavlyutov, Itai Gat, Gabriel Synnaeve, Juan Pino, Benoit Sagot, Emmanuel Dupoux

Abstract: We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Speech and text sequences are concatenated as a single set of tokens, and trained with a word-level interleaving method using a small automatically-curated… ▽ More We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Speech and text sequences are concatenated as a single set of tokens, and trained with a word-level interleaving method using a small automatically-curated speech-text parallel corpus. SPIRIT-LM comes in two versions: a BASE version that uses speech semantic units and an EXPRESSIVE version that models expressivity using pitch and style units in addition to the semantic units. For both versions, the text is encoded with subword BPE tokens. The resulting model displays both the semantic abilities of text models and the expressive abilities of speech models. Additionally, we demonstrate that SPIRIT-LM is able to learn new tasks in a few-shot fashion across modalities (i.e. ASR, TTS, Speech Classification). △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2401.08041 [pdf, ps, other]

Two-Stage Distributionally Robust Edge Node Placement Under Endogenous Demand Uncertainty

Authors: Jiaming Cheng, Duong Thuy Anh Nguyen, Duong Tung Nguyen

Abstract: Edge computing (EC) promises to deliver low-latency and ubiquitous computation to numerous devices at the network edge. This paper aims to jointly optimize edge node (EN) placement and resource allocation for an EC platform, considering demand uncertainty. Diverging from existing approaches treating uncertainties as exogenous, we propose a novel two-stage decision-dependent distributionally robust… ▽ More Edge computing (EC) promises to deliver low-latency and ubiquitous computation to numerous devices at the network edge. This paper aims to jointly optimize edge node (EN) placement and resource allocation for an EC platform, considering demand uncertainty. Diverging from existing approaches treating uncertainties as exogenous, we propose a novel two-stage decision-dependent distributionally robust optimization (DRO) framework to effectively capture the interdependence between EN placement decisions and uncertain demands. The first stage involves making EN placement decisions, while the second stage optimizes resource allocation after uncertainty revelation. We present an exact mixed-integer linear program reformulation for solving the underlying ``min-max-min" two-stage model. We further introduce a valid inequality method to enhance computational efficiency, especially for large-scale networks. Extensive numerical experiments demonstrate the benefits of considering endogenous uncertainties and the advantages of the proposed model and approach. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2310.05224 [pdf, other]

Generative Spoken Language Model based on continuous word-sized audio tokens

Authors: Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoit Sagot, Emmanuel Dupoux

Abstract: In NLP, text language models based on words or subwords are known to outperform their character-based counterparts. Yet, in the speech community, the standard input of spoken LMs are 20ms or 40ms-long discrete units (shorter than a phoneme). Taking inspiration from word-based LM, we introduce a Generative Spoken Language Model (GSLM) based on word-size continuous-valued audio embeddings that can g… ▽ More In NLP, text language models based on words or subwords are known to outperform their character-based counterparts. Yet, in the speech community, the standard input of spoken LMs are 20ms or 40ms-long discrete units (shorter than a phoneme). Taking inspiration from word-based LM, we introduce a Generative Spoken Language Model (GSLM) based on word-size continuous-valued audio embeddings that can generate diverse and expressive language output. This is obtained by replacing lookup table for lexical types with a Lexical Embedding function, the cross entropy loss by a contrastive loss, and multinomial sampling by k-NN sampling. The resulting model is the first generative language model based on word-size continuous embeddings. Its performance is on par with discrete unit GSLMs regarding generation quality as measured by automatic metrics and subjective human judgements. Moreover, it is five times more memory efficient thanks to its large 200ms units. In addition, the embeddings before and after the Lexical Embedder are phonetically and semantically interpretable. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: Conference paper at EMNLP 2023

arXiv:2309.17020 [pdf, other]

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

Authors: Po-chun Hsu, Ali Elkahky, Wei-Ning Hsu, Yossi Adi, Tu Anh Nguyen, Jade Copet, Emmanuel Dupoux, Hung-yi Lee, Abdelrahman Mohamed

Abstract: Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TT… ▽ More Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TTS) system with limited resources using SSL features and generate a large synthetic corpus for pre-training. Experimental results demonstrate that our proposed approach effectively reduces the demand for speech data by 90% with only slight performance degradation. To the best of our knowledge, this is the first work aiming to enhance low-resource self-supervised learning in speech processing. △ Less

Submitted 4 June, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: ASRU 2023 SPARKS Workshop

arXiv:2309.07897 [pdf, other]

Nash equilibrium seeking over digraphs with row-stochastic matrices and network-independent step-sizes

Authors: Duong Thuy Anh Nguyen, Mattia Bianchi, Florian Dörfler, Duong Tung Nguyen, Angelia Nedić

Abstract: In this paper, we address the challenge of Nash equilibrium (NE) seeking in non-cooperative convex games with partial-decision information. We propose a distributed algorithm, where each agent refines its strategy through projected-gradient steps and an averaging procedure. Each agent uses estimates of competitors' actions obtained solely from local neighbor interactions, in a directed communicati… ▽ More In this paper, we address the challenge of Nash equilibrium (NE) seeking in non-cooperative convex games with partial-decision information. We propose a distributed algorithm, where each agent refines its strategy through projected-gradient steps and an averaging procedure. Each agent uses estimates of competitors' actions obtained solely from local neighbor interactions, in a directed communication network. Unlike previous approaches that rely on (strong) monotonicity assumptions, this work establishes the convergence towards a NE under a diagonal dominance property of the pseudo-gradient mapping, that can be checked locally by the agents. Further, this condition is physically interpretable and of relevance for many applications, as it suggests that an agent's objective function is primarily influenced by its individual strategic decisions, rather than by the actions of its competitors. In virtue of a novel block-infinity norm convergence argument, we provide explicit bounds for constant step-size that are independent of the communication structure, and can be computed in a totally decentralized way. Numerical simulations on an optical network's power control problem validate the algorithm's effectiveness. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2308.05725 [pdf, ps, other]

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

Authors: Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux

Abstract: Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization). The adoption of these methods is still limited by the fact that most speech synthes… ▽ More Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization). The adoption of these methods is still limited by the fact that most speech synthesis datasets are read, severely limiting spontaneity and expressivity. Here, we introduce Expresso, a high-quality expressive speech dataset for textless speech synthesis that includes both read speech and improvised dialogues rendered in 26 spontaneous expressive styles. We illustrate the challenges and potentials of this dataset with an expressive resynthesis benchmark where the task is to encode the input in low-bitrate units and resynthesize it in a target voice while preserving content and style. We evaluate resynthesis quality with automatic metrics for different self-supervised discrete encoders, and explore tradeoffs between quality, bitrate and invariance to speaker and style. All the dataset, evaluation metrics and baseline models are open source △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2306.12545 [pdf, other]

Neural Multigrid Memory For Computational Fluid Dynamics

Authors: Duc Minh Nguyen, Minh Chau Vu, Tuan Anh Nguyen, Tri Huynh, Nguyen Tri Nguyen, Truong Son Hy

Abstract: Turbulent flow simulation plays a crucial role in various applications, including aircraft and ship design, industrial process optimization, and weather prediction. In this paper, we propose an advanced data-driven method for simulating turbulent flow, representing a significant improvement over existing approaches. Our methodology combines the strengths of Video Prediction Transformer (VPTR) (Ye… ▽ More Turbulent flow simulation plays a crucial role in various applications, including aircraft and ship design, industrial process optimization, and weather prediction. In this paper, we propose an advanced data-driven method for simulating turbulent flow, representing a significant improvement over existing approaches. Our methodology combines the strengths of Video Prediction Transformer (VPTR) (Ye & Bilodeau, 2022) and Multigrid Architecture (MgConv, MgResnet) (Ke et al., 2017). VPTR excels in capturing complex spatiotemporal dependencies and handling large input data, making it a promising choice for turbulent flow prediction. Meanwhile, Multigrid Architecture utilizes multiple grids with different resolutions to capture the multiscale nature of turbulent flows, resulting in more accurate and efficient simulations. Through our experiments, we demonstrate the effectiveness of our proposed approach, named MGxTransformer, in accurately predicting velocity, temperature, and turbulence intensity for incompressible turbulent flows across various geometries and flow conditions. Our results exhibit superior accuracy compared to other baselines, while maintaining computational efficiency. Our implementation in PyTorch is available publicly at https://github.com/Combi2k2/MG-Turbulent-Flow △ Less

Submitted 24 June, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

Comments: arXiv admin note: text overlap with arXiv:1911.08655 by other authors

arXiv:2305.13009 [pdf, other]

Textually Pretrained Speech Language Models

Authors: Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi

Abstract: Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model de… ▽ More Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model design choices such as the speech tokenizer, the pretrained textual model, and the dataset size. We find that model and dataset scale both play an important role in constructing better-performing SpeechLMs. Based on our observations, we present the largest (to the best of our knowledge) SpeechLM both in terms of number of parameters and training data. We additionally introduce two spoken versions of the StoryCloze textual benchmark to further improve model evaluation and advance future research in the field. We make speech samples, code and models publicly available: https://pages.cs.huji.ac.il/adiyoss-lab/twist/ . △ Less

Submitted 30 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2304.13246 [pdf, other]

CrowdCache: A Decentralized Game-Theoretic Framework for Mobile Edge Content Sharing

Authors: Duong Thuy Anh Nguyen, Jiaming Cheng, Duong Tung Nguyen, Angelia Nedich

Abstract: Mobile edge computing (MEC) is a promising solution for enhancing the user experience, minimizing content delivery expenses, and reducing backhaul traffic. In this paper, we propose a novel privacy-preserving decentralized game-theoretic framework for resource crowdsourcing in MEC. Our framework models the interactions between a content provider (CP) and multiple mobile edge device users (MEDs) as… ▽ More Mobile edge computing (MEC) is a promising solution for enhancing the user experience, minimizing content delivery expenses, and reducing backhaul traffic. In this paper, we propose a novel privacy-preserving decentralized game-theoretic framework for resource crowdsourcing in MEC. Our framework models the interactions between a content provider (CP) and multiple mobile edge device users (MEDs) as a non-cooperative game, in which MEDs offer idle storage resources for content caching in exchange for rewards. We introduce efficient decentralized gradient play algorithms for Nash equilibrium (NE) computation by exchanging local information among neighboring MEDs only, thus preventing attackers from learning users' private information. The key challenge in designing such algorithms is that communication among MEDs is not fixed and is facilitated by a sequence of undirected time-varying graphs. Our approach achieves linear convergence to the NE without imposing any assumptions on the values of parameters in the local objective functions, such as requiring strong monotonicity to be stronger than its dependence on other MEDs' actions, which is commonly required in existing literature when the graph is directed time-varying. Extensive simulations demonstrate the effectiveness of our approach in achieving efficient resource outsourcing decisions while preserving the privacy of the edge devices. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2303.16385 [pdf, other]

Geometric Convergence of Distributed Heavy-Ball Nash Equilibrium Algorithm over Time-Varying Digraphs with Unconstrained Actions

Authors: Duong Thuy Anh Nguyen, Duong Tung Nguyen, Angelia Nedich

Abstract: This paper presents a new distributed algorithm that leverages heavy-ball momentum and a consensus-based gradient method to find a Nash equilibrium (NE) in a class of non-cooperative convex games with unconstrained action sets. In this approach, each agent in the game has access to its own smooth local cost function and can exchange information with its neighbors over a communication network. The… ▽ More This paper presents a new distributed algorithm that leverages heavy-ball momentum and a consensus-based gradient method to find a Nash equilibrium (NE) in a class of non-cooperative convex games with unconstrained action sets. In this approach, each agent in the game has access to its own smooth local cost function and can exchange information with its neighbors over a communication network. The main novelty of our work is the incorporation of heavy-ball momentum in the context of non-cooperative games that operate on fully-decentralized, directed, and time-varying communication graphs, while also accommodating non-identical step-sizes and momentum parameters. Overcoming technical challenges arising from the dynamic and asymmetric nature of mixing matrices and the presence of an additional momentum term, we provide a rigorous proof of the geometric convergence to the NE. Moreover, we establish explicit bounds for the step-size values and momentum parameters based on the characteristics of the cost functions, mixing matrices, and graph connectivity structures. We perform numerical simulations on a Nash-Cournot game to demonstrate accelerated convergence of the proposed algorithm compared to that of the existing methods. △ Less

Submitted 3 June, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2302.06953 [pdf, other]

A Bandit Approach to Online Pricing for Heterogeneous Edge Resource Allocation

Authors: Jiaming Cheng, Duong Thuy Anh Nguyen, Lele Wang, Duong Tung Nguyen, Vijay K. Bhargava

Abstract: Edge Computing (EC) offers a superior user experience by positioning cloud resources in close proximity to end users. The challenge of allocating edge resources efficiently while maximizing profit for the EC platform remains a sophisticated problem, especially with the added complexity of the online arrival of resource requests. To address this challenge, we propose to cast the problem as a multi-… ▽ More Edge Computing (EC) offers a superior user experience by positioning cloud resources in close proximity to end users. The challenge of allocating edge resources efficiently while maximizing profit for the EC platform remains a sophisticated problem, especially with the added complexity of the online arrival of resource requests. To address this challenge, we propose to cast the problem as a multi-armed bandit problem and develop two novel online pricing mechanisms, the Kullback-Leibler Upper Confidence Bound (KL-UCB) algorithm and the Min-Max Optimal algorithm, for heterogeneous edge resource allocation. These mechanisms operate in real-time and do not require prior knowledge of demand distribution, which can be difficult to obtain in practice. The proposed posted pricing schemes allow users to select and pay for their preferred resources, with the platform dynamically adjusting resource prices based on observed historical data. Numerical results show the advantages of the proposed mechanisms compared to several benchmark schemes derived from traditional bandit algorithms, including the Epsilon-Greedy, basic UCB, and Thompson Sampling algorithms. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2212.14615 [pdf, other]

DRG-Net: Interactive Joint Learning of Multi-lesion Segmentation and Classification for Diabetic Retinopathy Grading

Authors: Hasan Md Tusfiqur, Duy M. H. Nguyen, Mai T. N. Truong, Triet A. Nguyen, Binh T. Nguyen, Michael Barz, Hans-Juergen Profitlich, Ngoc T. T. Than, Ngan Le, Pengtao Xie, Daniel Sonntag

Abstract: Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DR… ▽ More Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner. △ Less

Submitted 30 December, 2022; originally announced December 2022.

Comments: First version

arXiv:2211.04691 [pdf, ps, other]

A Solution for a Fundamental Problem of 3D Inference based on 2D Representations

Authors: Thien An L. Nguyen

Abstract: 3D inference from monocular vision using neural networks is an important research area of computer vision. Applications of the research area are various with many proposed solutions and have shown remarkable performance. Although many efforts have been invested, there are still unanswered questions, some of which are fundamental. In this paper, I discuss a problem that I hope will come to be known… ▽ More 3D inference from monocular vision using neural networks is an important research area of computer vision. Applications of the research area are various with many proposed solutions and have shown remarkable performance. Although many efforts have been invested, there are still unanswered questions, some of which are fundamental. In this paper, I discuss a problem that I hope will come to be known as a generalization of the Blind Perspective-n-Point (Blind PnP) problem for object-driven 3D inference based on 2D representations. The vital difference between the fundamental problem and the Blind PnP problem is that 3D inference parameters in the fundamental problem are attached directly to 3D points and the camera concept will be represented through the sharing of the parameters of these points. By providing an explainable and robust gradient-decent solution based on 2D representations for an important special case of the problem, the paper opens up a new approach for using available information-based learning methods to solve problems related to 3D object pose estimation from 2D images. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.02956 [pdf, other]

Are word boundaries useful for unsupervised language learning?

Authors: Tu Anh Nguyen, Maureen de Seyssel, Robin Algayres, Patricia Roze, Ewan Dunbar, Emmanuel Dupoux

Abstract: Word or word-fragment based Language Models (LM) are typically preferred over character-based ones in many downstream applications. This may not be surprising as words seem more linguistically relevant units than characters. Words provide at least two kinds of relevant information: boundary information and meaningful units. However, word boundary information may be absent or unreliable in the case… ▽ More Word or word-fragment based Language Models (LM) are typically preferred over character-based ones in many downstream applications. This may not be surprising as words seem more linguistically relevant units than characters. Words provide at least two kinds of relevant information: boundary information and meaningful units. However, word boundary information may be absent or unreliable in the case of speech input (word boundaries are not marked explicitly in the speech stream). Here, we systematically compare LSTMs as a function of the input unit (character, phoneme, word, word part), with or without gold boundary information. We probe linguistic knowledge in the networks at the lexical, syntactic and semantic levels using three speech-adapted black box NLP psycholinguistically-inpired benchmarks (pWUGGY, pBLIMP, pSIMI). We find that the absence of boundaries costs between 2\% and 28\% in relative performance depending on the task. We show that gold boundaries can be replaced by automatically found ones obtained with an unsupervised segmentation algorithm, and that even modest segmentation performance gives a gain in performance on two of the three tasks compared to basic character/phone based models without boundary information. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: This is an archived version from September 2020

arXiv:2209.15483 [pdf, other]

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling

Authors: Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi

Abstract: Generative Spoken Language Modeling research focuses on optimizing speech Language Models (LMs) using raw audio recordings without accessing any textual supervision. Such speech LMs usually operate over discrete units obtained from quantizing internal representations of self-supervised models. Although such units show impressive modeling results, their robustness capabilities have not been extensi… ▽ More Generative Spoken Language Modeling research focuses on optimizing speech Language Models (LMs) using raw audio recordings without accessing any textual supervision. Such speech LMs usually operate over discrete units obtained from quantizing internal representations of self-supervised models. Although such units show impressive modeling results, their robustness capabilities have not been extensively investigated. This work focuses on improving the robustness of discrete input representations for generative spoken language modeling. First, we formally define how to measure the robustness of such representations to various signal variations that do not alter the spoken information (e.g., time-stretch). Next, we empirically demonstrate how current state-of-the-art representation models lack robustness to such variations. To overcome this, we propose an effective and efficient method to learn robust discrete speech representation for generative spoken language modeling. The proposed approach is based on applying a set of signal transformations to the speech signal and optimizing the model using an iterative pseudo-labeling scheme. Our method significantly improves over the evaluated baselines when considering encoding and modeling metrics. We additionally evaluate our method on the speech-to-speech translation task, considering Spanish-English and French-English translations, and show the proposed approach outperforms the evaluated baselines. △ Less

Submitted 29 May, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

arXiv:2209.12561 [pdf, other]

Improving Document Image Understanding with Reinforcement Finetuning

Authors: Bao-Sinh Nguyen, Dung Tien Le, Hieu M. Vu, Tuan Anh D. Nguyen, Minh-Tien Nguyen, Hung Le

Abstract: Successful Artificial Intelligence systems often require numerous labeled data to extract information from document images. In this paper, we investigate the problem of improving the performance of Artificial Intelligence systems in understanding document images, especially in cases where training data is limited. We address the problem by proposing a novel finetuning method using reinforcement le… ▽ More Successful Artificial Intelligence systems often require numerous labeled data to extract information from document images. In this paper, we investigate the problem of improving the performance of Artificial Intelligence systems in understanding document images, especially in cases where training data is limited. We address the problem by proposing a novel finetuning method using reinforcement learning. Our approach treats the Information Extraction model as a policy network and uses policy gradient training to update the model to maximize combined reward functions that complement the traditional cross-entropy losses. Our experiments on four datasets using labels and expert feedback demonstrate that our finetuning mechanism consistently improves the performance of a state-of-the-art information extractor, especially in the small training data regime. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: Accepted to ICONIP 2022

arXiv:2208.07088 [pdf, other]

Enhancing Deep Learning-based 3-lead ECG Classification with Heartbeat Counting and Demographic Data Integration

Authors: Khiem H. Le, Hieu H. Pham, Thao B. T. Nguyen, Tu A. Nguyen, Cuong D. Do

Abstract: Nowadays, an increasing number of people are being diagnosed with cardiovascular diseases (CVDs), the leading cause of death globally. The gold standard for identifying these heart problems is via electrocardiogram (ECG). The standard 12-lead ECG is widely used in clinical practice and the majority of current research. However, using a lower number of leads can make ECG more pervasive as it can be… ▽ More Nowadays, an increasing number of people are being diagnosed with cardiovascular diseases (CVDs), the leading cause of death globally. The gold standard for identifying these heart problems is via electrocardiogram (ECG). The standard 12-lead ECG is widely used in clinical practice and the majority of current research. However, using a lower number of leads can make ECG more pervasive as it can be integrated with portable or wearable devices. This article introduces two novel techniques to improve the performance of the current deep learning system for 3-lead ECG classification, making it comparable with models that are trained using standard 12-lead ECG. Specifically, we propose a multi-task learning scheme in the form of the number of heartbeats regression and an effective mechanism to integrate patient demographic data into the system. With these two advancements, we got classification performance in terms of F1 scores of 0.9796 and 0.8140 on two large-scale ECG datasets, i.e., Chapman and CPSC-2018, respectively, which surpassed current state-of-the-art ECG classification methods, even those trained on 12-lead data. To encourage further development, our source code is publicly available at https://github.com/lhkhiem28/LightX3ECG. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2207.12381

arXiv:2207.12381 [pdf, other]

LightX3ECG: A Lightweight and eXplainable Deep Learning System for 3-lead Electrocardiogram Classification

Authors: Khiem H. Le, Hieu H. Pham, Thao BT. Nguyen, Tu A. Nguyen, Tien N. Thanh, Cuong D. Do

Abstract: Cardiovascular diseases (CVDs) are a group of heart and blood vessel disorders that is one of the most serious dangers to human health, and the number of such patients is still growing. Early and accurate detection plays a key role in successful treatment and intervention. Electrocardiogram (ECG) is the gold standard for identifying a variety of cardiovascular abnormalities. In clinical practices… ▽ More Cardiovascular diseases (CVDs) are a group of heart and blood vessel disorders that is one of the most serious dangers to human health, and the number of such patients is still growing. Early and accurate detection plays a key role in successful treatment and intervention. Electrocardiogram (ECG) is the gold standard for identifying a variety of cardiovascular abnormalities. In clinical practices and most of the current research, standard 12-lead ECG is mainly used. However, using a lower number of leads can make ECG more prevalent as it can be conveniently recorded by portable or wearable devices. In this research, we develop a novel deep learning system to accurately identify multiple cardiovascular abnormalities by using only three ECG leads. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: Under review at Biomedical Signal Processing and Control

arXiv:2207.10643 [pdf, other]

STOP: A dataset for Spoken Task Oriented Semantic Parsing

Authors: Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed

Abstract: End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assi… ▽ More End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems. Initial experimentation show end-to-end SLU models performing slightly worse than their cascaded counterparts, which we hope encourages future work in this direction. △ Less

Submitted 18 October, 2022; v1 submitted 28 June, 2022; originally announced July 2022.

arXiv:2203.16502 [pdf, other]

Generative Spoken Dialogue Language Modeling

Authors: Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoit Sagot, Abdelrahman Mohamed, Emmanuel Dupoux

Abstract: We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech,… ▽ More We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces more naturalistic and fluid turn-taking compared to a text-based cascaded model. △ Less

Submitted 22 November, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

arXiv:2203.05936 [pdf, other]

doi 10.1109/JSTSP.2022.3200909

Are discrete units necessary for Spoken Language Modeling?

Authors: Tu Anh Nguyen, Benoit Sagot, Emmanuel Dupoux

Abstract: Recent work in spoken language modeling shows the possibility of learning a language unsupervisedly from raw audio without any text labels. The approach relies first on transforming the audio into a sequence of discrete units (or pseudo-text) and then training a language model directly on such pseudo-text. Is such a discrete bottleneck necessary, potentially introducing irreversible errors in the… ▽ More Recent work in spoken language modeling shows the possibility of learning a language unsupervisedly from raw audio without any text labels. The approach relies first on transforming the audio into a sequence of discrete units (or pseudo-text) and then training a language model directly on such pseudo-text. Is such a discrete bottleneck necessary, potentially introducing irreversible errors in the encoding of the speech signal, or could we learn a language model without discrete units at all? In this work, we study the role of discrete versus continuous representations in spoken language modeling. We show that discretization is indeed essential for good results in spoken language modeling. We show that discretization removes linguistically irrelevant information from the continuous features, helping to improve language modeling performances. On the basis of this study, we train a language model on the discrete units of the HuBERT features, reaching new state-of-the-art results in the lexical, syntactic and semantic metrics of the Zero Resource Speech Challenge 2021 (Track 1 - Speech Only). △ Less

Submitted 22 August, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

arXiv:2202.07359 [pdf, other]

textless-lib: a Library for Textless Spoken Language Processing

Authors: Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

Abstract: Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources. In this paper, we introduce textless-lib, a PyTorch-based library aimed to facilitate research in this research area. We describe the building blocks that the library provides and demonstrate its usability by discuss three differ… ▽ More Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources. In this paper, we introduce textless-lib, a PyTorch-based library aimed to facilitate research in this research area. We describe the building blocks that the library provides and demonstrate its usability by discuss three different use-case examples: (i) speaker probing, (ii) speech resynthesis and compression, and (iii) speech continuation. We believe that textless-lib substantially simplifies research the textless setting and will be handful not only for speech researchers but also for the NLP community at large. The code, documentation, and pre-trained models are available at https://github.com/facebookresearch/textlesslib/ . △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: The library is available here https://github.com/facebookresearch/textlesslib/

arXiv:2201.02323 [pdf, other]

Distributed Nash Equilibrium Seeking over Time-Varying Directed Communication Networks

Authors: Duong Thuy Anh Nguyen, Duong Tung Nguyen, Angelia Nedić

Abstract: We study distributed algorithms for finding a Nash equilibrium (NE) in a class of non-cooperative convex games under partial information. Specifically, each agent has access only to its own smooth local cost function and can receive information from its neighbors in a time-varying directed communication network. To this end, we propose a distributed gradient play algorithm to compute a NE by utili… ▽ More We study distributed algorithms for finding a Nash equilibrium (NE) in a class of non-cooperative convex games under partial information. Specifically, each agent has access only to its own smooth local cost function and can receive information from its neighbors in a time-varying directed communication network. To this end, we propose a distributed gradient play algorithm to compute a NE by utilizing local information exchange among the players. In this algorithm, every agent performs a gradient step to minimize its own cost function while sharing and retrieving information locally among its neighbors. The existing methods impose strong assumptions such as balancedness of the mixing matrices and global knowledge of the network communication structure, including Perron-Frobenius eigenvector of the adjacency matrix and other graph connectivity constants. In contrast, our approach relies only on a reasonable and widely-used assumption of row-stochasticity of the mixing matrices. We analyze the algorithm for time-varying directed graphs and prove its convergence to the NE, when the agents' cost functions are strongly convex and have Lipschitz continuous gradients. Numerical simulations are performed for a Nash-Cournot game to illustrate the efficacy of the proposed algorithm. △ Less

Submitted 13 March, 2024; v1 submitted 6 January, 2022; originally announced January 2022.

MSC Class: 91A10; 91A43; 05C20; 68W15

arXiv:2104.14700 [pdf, ps, other]

The Zero Resource Speech Challenge 2021: Spoken language modelling

Authors: Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Eugene Kharitonov, Emmanuel Dupoux

Abstract: We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text. We provide a pipeline baseline system consisting on an encoder based on contrastive predictive coding (C… ▽ More We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text. We provide a pipeline baseline system consisting on an encoder based on contrastive predictive coding (CPC), a quantizer ($k$-means) and a standard language model (BERT or LSTM). The metrics evaluate the learned representations at the acoustic (ABX discrimination), lexical (spot-the-word), syntactic (acceptability judgment) and semantic levels (similarity judgment). We present an overview of the eight submitted systems from four groups and discuss the main results. △ Less

Submitted 9 August, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

Comments: Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.11588

arXiv:2011.11588 [pdf, other]

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

Authors: Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux

Abstract: We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics. We present the results and analyses of a com… ▽ More We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics. We present the results and analyses of a composite baseline made of the concatenation of three unsupervised systems: self-supervised contrastive representation learning (CPC), clustering (k-means) and language modeling (LSTM or BERT). The language models learn on the basis of the pseudo-text derived from clustering the learned representations. This simple pipeline shows better than chance performance on all four metrics, demonstrating the feasibility of spoken language modeling from raw speech. It also yields worse performance compared to text-based 'topline' systems trained on the same data, delineating the space to be explored by more sophisticated end-to-end models. △ Less

Submitted 1 December, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

Comments: 14 pages, including references and supplementary material

arXiv:1905.12817 [pdf]

Deep Learning Approach for Receipt Recognition

Authors: Anh Duc Le, Dung Van Pham, Tuan Anh Nguyen

Abstract: Inspired by the recent successes of deep learning on Computer Vision and Natural Language Processing, we present a deep learning approach for recognizing scanned receipts. The recognition system has two main modules: text detection based on Connectionist Text Proposal Network and text recognition based on Attention-based Encoder-Decoder. We also proposed pre-processing to extract receipt area and… ▽ More Inspired by the recent successes of deep learning on Computer Vision and Natural Language Processing, we present a deep learning approach for recognizing scanned receipts. The recognition system has two main modules: text detection based on Connectionist Text Proposal Network and text recognition based on Attention-based Encoder-Decoder. We also proposed pre-processing to extract receipt area and OCR verification to ignore handwriting. The experiments on the dataset of the Robust Reading Challenge on Scanned Receipts OCR and Information Extraction 2019 demonstrate that the accuracies were improved by integrating the pre-processing and the OCR verification. Our recognition system achieved 71.9% of the F1 score for detection and recognition task. △ Less

Submitted 29 May, 2019; originally announced May 2019.

arXiv:1904.04710 [pdf]

Secure Biometric-based Remote Authentication Protocol using Chebyshev Polynomials and Fuzzy Extractor

Authors: Thi Ai Thao Nguyen, Tran Khanh Dang, Quynh Chi Truong, Dinh Thanh Nguyen

Abstract: In this paper, we have proposed a multi factor biometric-based remote authentication protocol. Our proposal overcomes the vulnerabilities of some previous works. At the same time, the protocol also obtains a low false accept rate (FAR) and false reject rate (FRR). In this paper, we have proposed a multi factor biometric-based remote authentication protocol. Our proposal overcomes the vulnerabilities of some previous works. At the same time, the protocol also obtains a low false accept rate (FAR) and false reject rate (FRR). △ Less

Submitted 9 April, 2019; originally announced April 2019.

Comments: RCCIE17

arXiv:1904.00264 [pdf]

A New Biometric Template Protection using Random Orthonormal Projection and Fuzzy Commitment

Authors: Thi Ai Thao Nguyen, Tran Khanh Dang, Dinh Thanh Nguyen

Abstract: Biometric template protection is one of most essential parts in putting a biometric-based authentication system into practice. There have been many researches proposing different solutions to secure biometric templates of users. They can be categorized into two approaches: feature transformation and biometric cryptosystem. However, no one single template protection approach can satisfy all the req… ▽ More Biometric template protection is one of most essential parts in putting a biometric-based authentication system into practice. There have been many researches proposing different solutions to secure biometric templates of users. They can be categorized into two approaches: feature transformation and biometric cryptosystem. However, no one single template protection approach can satisfy all the requirements of a secure biometric-based authentication system. In this work, we will propose a novel hybrid biometric template protection which takes benefits of both approaches while preventing their limitations. The experiments demonstrate that the performance of the system can be maintained with the support of a new random orthonormal project technique, which reduces the computational complexity while preserving the accuracy. Meanwhile, the security of biometric templates is guaranteed by employing fuzzy commitment protocol. △ Less

Submitted 30 March, 2019; originally announced April 2019.

Comments: 11 pages, 6 figures, accepted for IMCOM 2019

Showing 1–33 of 33 results for author: Nguyen, T A