Search | arXiv e-print repository

arXiv:2407.12094 [pdf, other]

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

Authors: Minh Nguyen, Franck Dernoncourt, Seunghyun Yoon, Hanieh Deilamsalehy, Hao Tan, Ryan Rossi, Quan Hung Tran, Trung Bui, Thien Huu Nguyen

Abstract: We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these ga… ▽ More We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these gaps, we present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names. Through extensive experiments, our best model achieves a great precision of 80.3\%, setting a new benchmark for SpeakerID. The data and code are publicly available here: \url{https://github.com/adobe-research/speaker-identification} △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: accepted to INTERSPEECH 2024

arXiv:2407.11771 [pdf, other]

XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach

Authors: Truong Thanh Hung Nguyen, Phuc Truong Loc Nguyen, Hung Cao

Abstract: Recent advancements in deep learning have significantly improved visual quality inspection and predictive maintenance within industrial settings. However, deploying these technologies on low-resource edge devices poses substantial challenges due to their high computational demands and the inherent complexity of Explainable AI (XAI) methods. This paper addresses these challenges by introducing a no… ▽ More Recent advancements in deep learning have significantly improved visual quality inspection and predictive maintenance within industrial settings. However, deploying these technologies on low-resource edge devices poses substantial challenges due to their high computational demands and the inherent complexity of Explainable AI (XAI) methods. This paper addresses these challenges by introducing a novel XAI-integrated Visual Quality Inspection framework that optimizes the deployment of semantic segmentation models on low-resource edge devices. Our framework incorporates XAI and the Large Vision Language Model to deliver human-centered interpretability through visual and textual explanations to end-users. This is crucial for end-user trust and model interpretability. We outline a comprehensive methodology consisting of six fundamental modules: base model fine-tuning, XAI-based explanation generation, evaluation of XAI approaches, XAI-guided data augmentation, development of an edge-compatible model, and the generation of understandable visual and textual explanations. Through XAI-guided data augmentation, the enhanced model incorporating domain expert knowledge with visual and textual explanations is successfully deployed on mobile devices to support end-users in real-world scenarios. Experimental results showcase the effectiveness of the proposed framework, with the mobile model achieving competitive accuracy while significantly reducing model size. This approach paves the way for the broader adoption of reliable and interpretable AI tools in critical industrial applications, where decisions must be both rapid and justifiable. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 28 pages, preprint submitted to Information Fusion journal

arXiv:2407.07369 [pdf, ps, other]

Viscosity estimation for 2D pipe flows I. Construction, consistency, asymptotic normality

Authors: Thi Hien Nguyen, Armen Shirikyan

Abstract: We consider the motion of incompressible viscous fluid in a rectangle, imposing the periodicity condition in one direction and the no-slip boundary condition in the other. Assuming that the flow is subject to an external random force, white in time and regular in space, we construct an estimator for the viscosity using only observations of the enstrophy. The goal of the paper is to prove that the… ▽ More We consider the motion of incompressible viscous fluid in a rectangle, imposing the periodicity condition in one direction and the no-slip boundary condition in the other. Assuming that the flow is subject to an external random force, white in time and regular in space, we construct an estimator for the viscosity using only observations of the enstrophy. The goal of the paper is to prove that the estimator is strongly consistent and asymptotically normal. The proof of consistency is based on the explicit formula for the estimator and some bounds for trajectories, while that of asymptotic normality uses in addition mixing properties of the Navier-Stokes flow. △ Less

Submitted 10 July, 2024; originally announced July 2024.

MSC Class: 35Q30; 37L55; 62M05; 76D06

arXiv:2406.15749 [pdf, ps, other]

Decay of CP-even Higgs $H\rightarrow h γγ$ in Two Higgs Doublet Model: (I) one-loop analytic results, ward identity checks

Authors: Khiem Hong Phan, Dzung Tri Tran, Thanh Huy Nguyen

Abstract: We present the first analytical expressions for one-loop induced contributions for the decay channels of CP-even Higgs $H\rightarrow h γγ$ with $h$ being standard model-like Higgs boson within the framework of Two Higgs Doublet Model in this paper. One-loop form factors for the decay processes are written in terms of the scalar Passarino-Veltman functions following the general notations of the pac… ▽ More We present the first analytical expressions for one-loop induced contributions for the decay channels of CP-even Higgs $H\rightarrow h γγ$ with $h$ being standard model-like Higgs boson within the framework of Two Higgs Doublet Model in this paper. One-loop form factors for the decay processes are written in terms of the scalar Passarino-Veltman functions following the general notations of the package~{\tt LoopTools} as well as the library {\tt Collier}. Subsequently, physical results for the decay processes can be generated numerically by using one of the above-mentioned packages. The analytical expressions shown in this paper, are verified by several numerical checks, for examples, the ultraviolet (UV) and the infrared (IR) finiteness for one-loop amplitude. Furthermore, the amplitude must be followed the so-called ward identity due to on-shell photons in final states. The identity can also be tested numerically in this work. We find that the numerical results for the checks are good stability. In phenomenological studies, the differential decay rates as functions of the invariant of two photons in final state of $H\rightarrow h γγ$ are first studied in parameter space for all types of Two Higgs Doublet Models. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 39 pages, 8 Figures, 9 Tables

Report number: DTU_2024-03

arXiv:2406.14835 [pdf, other]

ToVo: Toxicity Taxonomy via Voting

Authors: Tinh Son Luong, Thanh-Thien Le, Thang Viet Doan, Linh Ngo Van, Thien Huu Nguyen, Diep Thi-Ngoc Nguyen

Abstract: Existing toxic detection models face significant limitations, such as lack of transparency, customization, and reproducibility. These challenges stem from the closed-source nature of their training data and the paucity of explanations for their evaluation mechanism. To address these issues, we propose a dataset creation mechanism that integrates voting and chain-of-thought processes, producing a h… ▽ More Existing toxic detection models face significant limitations, such as lack of transparency, customization, and reproducibility. These challenges stem from the closed-source nature of their training data and the paucity of explanations for their evaluation mechanism. To address these issues, we propose a dataset creation mechanism that integrates voting and chain-of-thought processes, producing a high-quality open-source dataset for toxic content detection. Our methodology ensures diverse classification metrics for each sample and includes both classification scores and explanatory reasoning for the classifications. We utilize the dataset created through our proposed mechanism to train our model, which is then compared against existing widely-used detectors. Our approach not only enhances transparency and customizability but also facilitates better fine-tuning for specific use cases. This work contributes a robust framework for developing toxic content detection models, emphasizing openness and adaptability, thus paving the way for more effective and user-specific content moderation solutions. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2405.16623 [pdf, other]

Graph neural networks with configuration cross-attention for tensor compilers

Authors: Dmitrii Khizbullin, Eduardo Rocha de Andrade, Thanh Hau Nguyen, Matheus Pedroza Ferreira, David R. Pugh

Abstract: With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose… ▽ More With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $τ$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.10659 [pdf, other]

Realistic Evaluation of Toxicity in Large Language Models

Authors: Tinh Son Luong, Thanh-Thien Le, Linh Ngo Van, Thien Huu Nguyen

Abstract: Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeg… ▽ More Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior. △ Less

Submitted 20 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: Findings of ACL 2024

arXiv:2405.01609 [pdf, ps, other]

doi 10.1109/IPCCC51483.2021.9679398

Q-learning-based Opportunistic Communication for Real-time Mobile Air Quality Monitoring Systems

Authors: Trung Thanh Nguyen, Truong Thao Nguyen, Dinh Tuan Anh Nguyen, Thanh Hung Nguyen, Phi Le Nguyen

Abstract: We focus on real-time air quality monitoring systems that rely on devices installed on automobiles in this research. We investigate an opportunistic communication model in which devices can send the measured data directly to the air quality server through a 4G communication channel or via Wi-Fi to adjacent devices or the so-called Road Side Units deployed along the road. We aim to reduce 4G costs… ▽ More We focus on real-time air quality monitoring systems that rely on devices installed on automobiles in this research. We investigate an opportunistic communication model in which devices can send the measured data directly to the air quality server through a 4G communication channel or via Wi-Fi to adjacent devices or the so-called Road Side Units deployed along the road. We aim to reduce 4G costs while assuring data latency, where the data latency is defined as the amount of time it takes for data to reach the server. We propose an offloading scheme that leverages Q-learning to accomplish the purpose. The experiment results show that our offloading method significantly cuts down around 40-50% of the 4G communication cost while keeping the latency of 99.5% packets smaller than the required threshold. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 2021 IEEE International Conference on Performance, Computing and Communications (IPCCC). arXiv admin note: substantial text overlap with arXiv:2405.01057

arXiv:2405.01057 [pdf, other]

doi 10.1109/TNSM.2022.3192397

Fuzzy Q-Learning-Based Opportunistic Communication for MEC-Enhanced Vehicular Crowdsensing

Authors: Trung Thanh Nguyen, Truong Thao Nguyen, Thanh Hung Nguyen, Phi Le Nguyen

Abstract: This study focuses on MEC-enhanced, vehicle-based crowdsensing systems that rely on devices installed on automobiles. We investigate an opportunistic communication paradigm in which devices can transmit measured data directly to a crowdsensing server over a 4G communication channel or to nearby devices or so-called Road Side Units positioned along the road via Wi-Fi. We tackle a new problem that i… ▽ More This study focuses on MEC-enhanced, vehicle-based crowdsensing systems that rely on devices installed on automobiles. We investigate an opportunistic communication paradigm in which devices can transmit measured data directly to a crowdsensing server over a 4G communication channel or to nearby devices or so-called Road Side Units positioned along the road via Wi-Fi. We tackle a new problem that is how to reduce the cost of 4G while preserving the latency. We propose an offloading strategy that combines a reinforcement learning technique known as Q-learning with Fuzzy logic to accomplish the purpose. Q-learning assists devices in learning to decide the communication channel. Meanwhile, Fuzzy logic is used to optimize the reward function in Q-learning. The experiment results show that our offloading method significantly cuts down around 30-40% of the 4G communication cost while keeping the latency of 99% packets below the required threshold. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: IEEE Transactions on Network and Service Management

arXiv:2405.00567 [pdf, other]

Remote Sensing Data Assimilation with a Chained Hydrologic-hydraulic Model for Flood Forecasting

Authors: Thanh Huy Nguyen, Andrea Piacentini, Sophie Ricci, Ludovic Cassan, Simon Munier, Quentin Bonassies, Raquel Rodriguez-Suquet

Abstract: A chained hydrologic-hydraulic model is implemented using predicted runoff from a large-scale hydrologic model (namely ISBA-CTRIP) as inputs to local hydrodynamic models (TELEMAC-2D) to issue forecasts of water level and flood extent. The uncertainties in the hydrological forcing and in friction parameters are reduced by an Ensemble Kalman Filter that jointly assimilates in-situ water levels and f… ▽ More A chained hydrologic-hydraulic model is implemented using predicted runoff from a large-scale hydrologic model (namely ISBA-CTRIP) as inputs to local hydrodynamic models (TELEMAC-2D) to issue forecasts of water level and flood extent. The uncertainties in the hydrological forcing and in friction parameters are reduced by an Ensemble Kalman Filter that jointly assimilates in-situ water levels and flood extent maps derived from remote sensing observations. The data assimilation framework is cycled in a real-time forecasting configuration. A cycle consists of a reanalysis and a forecast phase. Over the analysis, observations up to the present are assimilated. An ensemble is then initialized from the last analyzed states and issued forecasts for next 36 hr. Three strategies of forcing data for this forecast are investigated: (i) using CTRIP runoff for reanalysis and forecast, (ii) using observed discharge for analysis, then CTRIP runoff for forecast and (iii) using observed discharge for reanalysis and keep a persistent discharge value for forecast. It was shown that the data assimilation strategy provides a reliable reanalysis in hindcast mode. The combination of observed discharge and CTRIP runoff provides the most accurate results. For all strategies, the quality of the forecast decreases as the lead time increases. When the errors in CTRIP forcing are non-stationary, the forecast capability may be reduced. This work demonstrates that the forcing provided by a hydrologic model, while imperfect, can be efficiently used as input to a hydraulic model to issue reanalysis and forecasts, thanks to the assimilation of in-situ and remote sensing observations. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 13 pages, 14 figures. Submitted to the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

arXiv:2404.13417 [pdf, other]

Efficient and Concise Explanations for Object Detection with Gaussian-Class Activation Mapping Explainer

Authors: Quoc Khanh Nguyen, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Van Binh Truong, Tuong Phan, Hung Cao

Abstract: To address the challenges of providing quick and plausible explanations in Explainable AI (XAI) for object detection models, we introduce the Gaussian Class Activation Mapping Explainer (G-CAME). Our method efficiently generates concise saliency maps by utilizing activation maps from selected layers and applying a Gaussian kernel to emphasize critical image regions for the predicted object. Compar… ▽ More To address the challenges of providing quick and plausible explanations in Explainable AI (XAI) for object detection models, we introduce the Gaussian Class Activation Mapping Explainer (G-CAME). Our method efficiently generates concise saliency maps by utilizing activation maps from selected layers and applying a Gaussian kernel to emphasize critical image regions for the predicted object. Compared with other Region-based approaches, G-CAME significantly reduces explanation time to 0.5 seconds without compromising the quality. Our evaluation of G-CAME, using Faster-RCNN and YOLOX on the MS-COCO 2017 dataset, demonstrates its ability to offer highly plausible and faithful explanations, especially in reducing the bias on tiny object detection. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: Canadian AI 2024

arXiv:2404.02417 [pdf, ps, other]

One-loop contributions for $A^0 \rightarrow \ell \bar{\ell} V$ with $\ell \equiv e, μ$ and $V\equiv γ, Z$ in Higgs Extensions of the Standard Model

Authors: Khiem Hong Phan, Dzung Tri Tran, Thanh Huy Nguyen

Abstract: We present one-loop formulas for the decay of CP-odd Higgs $A^0 \rightarrow \ell \bar{\ell} V$ with $\ell \equiv e, μ$ and $V\equiv γ, Z$ in Higgs Extensions of the Standard Model, considering two higgs doublet model with a complex (and real) scalar, two higgs doublet model as well as triplet higgs model. Analytic results for one-loop amplitudes are expressed in terms of Passarino-Veltman function… ▽ More We present one-loop formulas for the decay of CP-odd Higgs $A^0 \rightarrow \ell \bar{\ell} V$ with $\ell \equiv e, μ$ and $V\equiv γ, Z$ in Higgs Extensions of the Standard Model, considering two higgs doublet model with a complex (and real) scalar, two higgs doublet model as well as triplet higgs model. Analytic results for one-loop amplitudes are expressed in terms of Passarino-Veltman functions following the standard notations of {\tt LoopTools}. As a result, physical results can be generated numerically by using the package. In phenomenological results, the total decay widths and the differential decay rates with respect to the invariant mass of lepton pair are analyzed for two typical models such as two higgs doublet model and triplet higgs model. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 35 pages

Report number: DTU_2024-01

arXiv:2404.02246 [pdf, ps, other]

Matrix-weighted estimates beyond Calderón-Zygmund theory

Authors: Spyridon Kakaroumpas, Thu Hien Nguyen, Dimitris Vardakis

Abstract: We investigate matrix-weighted bounds for the sublinear non-kernel operators considered by F. Bernicot, D. Frey, and S. Petermichl. We extend their result to sublinear operators acting upon vector-valued functions. First, we dominate these operators by bilinear convex body sparse forms, adapting a recent general principle due to T. Hytönen. Then we use this domination to derive matrix-weighted bou… ▽ More We investigate matrix-weighted bounds for the sublinear non-kernel operators considered by F. Bernicot, D. Frey, and S. Petermichl. We extend their result to sublinear operators acting upon vector-valued functions. First, we dominate these operators by bilinear convex body sparse forms, adapting a recent general principle due to T. Hytönen. Then we use this domination to derive matrix-weighted bounds, adapting arguments of F. Nazarov, S. Petermichl, S. Treil, and A. Volberg. Our requirements on the weight are formulated in terms of two-exponent matrix Muckenhoupt conditions, which surprisingly exhibit a rich structure that is absent in the scalar case. Consequently, we deduce that our matrix-weighted bounds improve the ones that were recently obtained by A. Laukkarinen. The methods we use are flexible, which allows us to complement our results with a limited range extrapolation theorem for matrix weights, extending the results of P. Auscher and J. M. Martell, as well as M. Bownik and D. Cruz-Uribe. △ Less

Submitted 25 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 59 pages

arXiv:2403.14918 [pdf, other]

Deep learning-based method for weather forecasting: A case study in Itoshima

Authors: Yuzhong Cheng, Linh Thi Hoai Nguyen, Akinori Ozaki, Ton Viet Ta

Abstract: Accurate weather forecasting is of paramount importance for a wide range of practical applications, drawing substantial scientific and societal interest. However, the intricacies of weather systems pose substantial challenges to accurate predictions. This research introduces a multilayer perceptron model tailored for weather forecasting in Itoshima, Kyushu, Japan. Our meticulously designed archite… ▽ More Accurate weather forecasting is of paramount importance for a wide range of practical applications, drawing substantial scientific and societal interest. However, the intricacies of weather systems pose substantial challenges to accurate predictions. This research introduces a multilayer perceptron model tailored for weather forecasting in Itoshima, Kyushu, Japan. Our meticulously designed architecture demonstrates superior performance compared to existing models, surpassing benchmarks such as Long Short-Term Memory and Recurrent Neural Networks. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.14395 [pdf, other]

Early Flood Warning Using Satellite-Derived Convective System and Precipitation Data -- A Retrospective Case Study of Central Vietnam

Authors: Tran-Vu La, Thanh Huy Nguyen, Patrick Matgen, Marco Chini

Abstract: This paper addresses the challenges of an early flood warning caused by complex convective systems (CSs), by using Low-Earth Orbit and Geostationary satellite data. We focus on a sequence of extreme events that took place in central Vietnam during October 2020, with a specific emphasis on the events leading up to the floods, i.e., those occurring before October 10th, 2020. In this critical phase,… ▽ More This paper addresses the challenges of an early flood warning caused by complex convective systems (CSs), by using Low-Earth Orbit and Geostationary satellite data. We focus on a sequence of extreme events that took place in central Vietnam during October 2020, with a specific emphasis on the events leading up to the floods, i.e., those occurring before October 10th, 2020. In this critical phase, several hydrometeorological indicators could be identified thanks to an increasingly advanced and dense observation network composed of Earth Observation satellites, in particular those enabling the characterization and monitoring of a CS, in terms of low-temperature clouds and heavy rainfall. Himawari-8 images, both individually and in time-series, allow identifying and tracking convective clouds. This is complemented by the observation of heavy/violent rainfall through GPM IMERG data, as well as the detection of strong winds using radiometers and scatterometers. Collectively, these datasets, along with the estimated intensity and duration of the event from each source, form a comprehensive dataset detailing the intricate behaviors of CSs. All of these factors are significant contributors to the magnitude of flooding and the short-term dynamics anticipated in the studied region. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted for publication in IEEE 2024 International Geoscience & Remote Sensing Symposium (IGARSS 2024)

arXiv:2403.14394 [pdf, other]

Assimilation of SWOT Altimetry and Sentinel-1 Flood Extent Observations for Flood Reanalysis -- A Proof-of-Concept

Authors: Thanh Huy Nguyen, Sophie Ricci, Andrea Piacentini, Charlotte Emery, Raquel Rodriguez Suquet, Santiago Peña Luque

Abstract: In spite of astonishing advances and developments in remote sensing technologies, meeting the spatio-temporal requirements for flood hydrodynamic modeling remains a great challenge for Earth Observation. The assimilation of multi-source remote sensing data in 2D hydrodynamic models participates to overcome such a challenge. The recently launched Surface Water and Ocean Topography (SWOT) wide-swath… ▽ More In spite of astonishing advances and developments in remote sensing technologies, meeting the spatio-temporal requirements for flood hydrodynamic modeling remains a great challenge for Earth Observation. The assimilation of multi-source remote sensing data in 2D hydrodynamic models participates to overcome such a challenge. The recently launched Surface Water and Ocean Topography (SWOT) wide-swath altimetry satellite provides a global coverage of water surface elevation at a high resolution. SWOT provides complementary observation to radar and optical images, increasing the opportunity to observe and monitor flood events. This research work focuses on the assimilation of 2D flood extent maps derived from Sentinel-1 C-SAR imagery data, and water surface elevation from SWOT as well as in-situ water level measurements. An Ensemble Kalman Filter (EnKF) with a joint state-parameter analysis is implemented on top of a 2D hydrodynamic TELEMAC-2D model to account for errors in roughness, input forcing and water depth in floodplain subdomains. The proposed strategy is carried out in an Observing System Simulation Experiment based on the 2021 flood event over the Garonne Marmandaise catchment. This work makes the most of the large volume of heterogeneous data from space for flood prediction in hindcast mode paves the way for nowcasting. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted for publication in IEEE 2024 International Geoscience & Remote Sensing Symposium (IGARSS 2024)

arXiv:2403.11496 [pdf, other]

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

Authors: Thien-Minh Nguyen, Shenghai Yuan, Thien Hoang Nguyen, Pengyu Yin, Haozhi Cao, Lihua Xie, Maciej Wozniak, Patric Jensfelt, Marko Thiel, Justin Ziegenbein, Noel Blunder

Abstract: Perception plays a crucial role in various robot applications. However, existing well-annotated datasets are biased towards autonomous driving scenarios, while unlabelled SLAM datasets are quickly over-fitted, and often lack environment and domain variations. To expand the frontier of these fields, we introduce a comprehensive dataset named MCD (Multi-Campus Dataset), featuring a wide range of sen… ▽ More Perception plays a crucial role in various robot applications. However, existing well-annotated datasets are biased towards autonomous driving scenarios, while unlabelled SLAM datasets are quickly over-fitted, and often lack environment and domain variations. To expand the frontier of these fields, we introduce a comprehensive dataset named MCD (Multi-Campus Dataset), featuring a wide range of sensing modalities, high-accuracy ground truth, and diverse challenging environments across three Eurasian university campuses. MCD comprises both CCS (Classical Cylindrical Spinning) and NRE (Non-Repetitive Epicyclic) lidars, high-quality IMUs (Inertial Measurement Units), cameras, and UWB (Ultra-WideBand) sensors. Furthermore, in a pioneering effort, we introduce semantic annotations of 29 classes over 59k sparse NRE lidar scans across three domains, thus providing a novel challenge to existing semantic segmentation research upon this largely unexplored lidar modality. Finally, we propose, for the first time to the best of our knowledge, continuous-time ground truth based on optimization-based registration of lidar-inertial data on large survey-grade prior maps, which are also publicly released, each several times the size of existing ones. We conduct a rigorous evaluation of numerous state-of-the-art algorithms on MCD, report their performance, and highlight the challenges awaiting solutions from the research community. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

arXiv:2403.01225 [pdf, other]

A Cost-Effective Cooperative Exploration and Inspection Strategy for Heterogeneous Aerial System

Authors: Xinhang Xu, Muqing Cao, Shenghai Yuan, Thien Hoang Nguyen, Thien-Minh Nguyen, Lihua Xie

Abstract: In this paper, we propose a cost-effective strategy for heterogeneous UAV swarm systems for cooperative aerial inspection. Unlike previous swarm inspection works, the proposed method does not rely on precise prior knowledge of the environment and can complete full 3D surface coverage of objects in any shape. In this work, agents are partitioned into teams, with each drone assign a different task,… ▽ More In this paper, we propose a cost-effective strategy for heterogeneous UAV swarm systems for cooperative aerial inspection. Unlike previous swarm inspection works, the proposed method does not rely on precise prior knowledge of the environment and can complete full 3D surface coverage of objects in any shape. In this work, agents are partitioned into teams, with each drone assign a different task, including mapping, exploration, and inspection. Task allocation is facilitated by assigning optimal inspection volumes to each team, following best-first rules. A voxel map-based representation of the environment is used for pathfinding, and a rule-based path-planning method is the core of this approach. We achieved the best performance in all challenging experiments with the proposed approach, surpassing all benchmark methods for similar tasks across multiple evaluation trials. The proposed method is open source at https://github.com/ntu-aris/caric_baseline and used as the baseline of the Cooperative Aerial Robots Inspection Challenge at the 62nd IEEE Conference on Decision and Control 2023. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: Baseline method of CARIC at CDC 2023, Singapore

arXiv:2402.12525 [pdf, other]

LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks

Authors: Truong Thanh Hung Nguyen, Tobias Clement, Phuc Truong Loc Nguyen, Nils Kemmerzell, Van Binh Truong, Vo Thanh Khang Nguyen, Mohamed Abdelaal, Hung Cao

Abstract: LangXAI is a framework that integrates Explainable Artificial Intelligence (XAI) with advanced vision models to generate textual explanations for visual recognition tasks. Despite XAI advancements, an understanding gap persists for end-users with limited domain knowledge in artificial intelligence and computer vision. LangXAI addresses this by furnishing text-based explanations for classification,… ▽ More LangXAI is a framework that integrates Explainable Artificial Intelligence (XAI) with advanced vision models to generate textual explanations for visual recognition tasks. Despite XAI advancements, an understanding gap persists for end-users with limited domain knowledge in artificial intelligence and computer vision. LangXAI addresses this by furnishing text-based explanations for classification, object detection, and semantic segmentation model outputs to end-users. Preliminary results demonstrate LangXAI's enhanced plausibility, with high BERTScore across tasks, fostering a more transparent and reliable AI framework on vision tasks for end-users. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.12179 [pdf, other]

Examining Monitoring System: Detecting Abnormal Behavior In Online Examinations

Authors: Dinh An Ngo, Thanh Dat Nguyen, Thi Le Chi Dang, Huy Hoan Le, Ton Bao Ho, Vo Thanh Khang Nguyen, Truong Thanh Hung Nguyen

Abstract: Cheating in online exams has become a prevalent issue over the past decade, especially during the COVID-19 pandemic. To address this issue of academic dishonesty, our "Exam Monitoring System: Detecting Abnormal Behavior in Online Examinations" is designed to assist proctors in identifying unusual student behavior. Our system demonstrates high accuracy and speed in detecting cheating in real-time s… ▽ More Cheating in online exams has become a prevalent issue over the past decade, especially during the COVID-19 pandemic. To address this issue of academic dishonesty, our "Exam Monitoring System: Detecting Abnormal Behavior in Online Examinations" is designed to assist proctors in identifying unusual student behavior. Our system demonstrates high accuracy and speed in detecting cheating in real-time scenarios, providing valuable information, and aiding proctors in decision-making. This article outlines our methodology and the effectiveness of our system in mitigating the widespread problem of cheating in online exams. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.03706 [pdf, other]

MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats

Authors: Shenghai Yuan, Yizhuo Yang, Thien Hoang Nguyen, Thien-Minh Nguyen, Jianfei Yang, Fen Liu, Jianping Li, Han Wang, Lihua Xie

Abstract: In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which possess the potential to transport harmful payloads or independently cause damage, we introduce MMAUD: a comprehensive Multi-Modal Anti-UAV Dataset. MMAUD addresses a critical gap in contemporary threat detection methodologies by focusing on drone detection, UAV-type classification, and trajectory estimati… ▽ More In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which possess the potential to transport harmful payloads or independently cause damage, we introduce MMAUD: a comprehensive Multi-Modal Anti-UAV Dataset. MMAUD addresses a critical gap in contemporary threat detection methodologies by focusing on drone detection, UAV-type classification, and trajectory estimation. MMAUD stands out by combining diverse sensory inputs, including stereo vision, various Lidars, Radars, and audio arrays. It offers a unique overhead aerial detection vital for addressing real-world scenarios with higher fidelity than datasets captured on specific vantage points using thermal and RGB. Additionally, MMAUD provides accurate Leica-generated ground truth data, enhancing credibility and enabling confident refinement of algorithms and models, which has never been seen in other datasets. Most existing works do not disclose their datasets, making MMAUD an invaluable resource for developing accurate and efficient solutions. Our proposed modalities are cost-effective and highly adaptable, allowing users to experiment and implement new UAV threat detection tools. Our dataset closely simulates real-world scenarios by incorporating ambient heavy machinery sounds. This approach enhances the dataset's applicability, capturing the exact challenges faced during proximate vehicular operations. It is expected that MMAUD can play a pivotal role in advancing UAV threat detection, classification, trajectory estimation capabilities, and beyond. Our dataset, codes, and designs will be available in https://github.com/ntu-aris/MMAUD. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted by ICRA 2024

arXiv:2401.09900 [pdf, other]

XAI-Enhanced Semantic Segmentation Models for Visual Quality Inspection

Authors: Tobias Clement, Truong Thanh Hung Nguyen, Mohamed Abdelaal, Hung Cao

Abstract: Visual quality inspection systems, crucial in sectors like manufacturing and logistics, employ computer vision and machine learning for precise, rapid defect detection. However, their unexplained nature can hinder trust, error identification, and system improvement. This paper presents a framework to bolster visual quality inspection by using CAM-based explanations to refine semantic segmentation… ▽ More Visual quality inspection systems, crucial in sectors like manufacturing and logistics, employ computer vision and machine learning for precise, rapid defect detection. However, their unexplained nature can hinder trust, error identification, and system improvement. This paper presents a framework to bolster visual quality inspection by using CAM-based explanations to refine semantic segmentation models. Our approach consists of 1) Model Training, 2) XAI-based Model Explanation, 3) XAI Evaluation, and 4) Annotation Augmentation for Model Enhancement, informed by explanations and expert insights. Evaluations show XAI-enhanced models surpass original DeepLabv3-ResNet101 models, especially in intricate object segmentation. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: IEEE ICCE 2024

arXiv:2401.09852 [pdf, other]

Enhancing the Fairness and Performance of Edge Cameras with Explainable AI

Authors: Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Quoc Hung Cao, Van Binh Truong, Quoc Khanh Nguyen, Hung Cao

Abstract: The rising use of Artificial Intelligence (AI) in human detection on Edge camera systems has led to accurate but complex models, challenging to interpret and debug. Our research presents a diagnostic method using Explainable AI (XAI) for model debugging, with expert-driven problem identification and solution creation. Validated on the Bytetrack model in a real-world office Edge network, we found t… ▽ More The rising use of Artificial Intelligence (AI) in human detection on Edge camera systems has led to accurate but complex models, challenging to interpret and debug. Our research presents a diagnostic method using Explainable AI (XAI) for model debugging, with expert-driven problem identification and solution creation. Validated on the Bytetrack model in a real-world office Edge network, we found the training dataset as the main bias source and suggested model augmentation as a solution. Our approach helps identify model biases, essential for achieving fair and trustworthy models. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: IEEE ICCE 2024

arXiv:2312.11825 [pdf, other]

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

Authors: Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Jiaqi Yip, Dianwen Ng, Bin Ma

Abstract: Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to… ▽ More Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to model both long-range, coarse-scale dependencies and fine-scale recurrent patterns by integrating a recurrent module into the MossFormer framework. Instead of applying the recurrent neural networks (RNNs) that use traditional recurrent connections, we present a recurrent module based on a feedforward sequential memory network (FSMN), which is considered "RNN-free" recurrent network due to the ability to capture recurrent patterns without using recurrent connections. Our recurrent module mainly comprises an enhanced dilated FSMN block by using gated convolutional units (GCU) and dense connections. In addition, a bottleneck layer and an output layer are also added for controlling information flow. The recurrent module relies on linear projections and convolutions for seamless, parallel processing of the entire sequence. The integrated MossFormer2 hybrid model demonstrates remarkable enhancements over MossFormer and surpasses other state-of-the-art methods in WSJ0-2/3mix, Libri2Mix, and WHAM!/WHAMR! benchmarks. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures, accepted by ICASSP 2024

arXiv:2312.05239 [pdf, other]

SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation

Authors: Thuan Hoang Nguyen, Anh Tran

Abstract: Despite their ability to generate high-resolution and diverse images from text prompts, text-to-image diffusion models often suffer from slow iterative sampling processes. Model distillation is one of the most effective directions to accelerate these models. However, previous distillation methods fail to retain the generation quality while requiring a significant amount of images for training, eit… ▽ More Despite their ability to generate high-resolution and diverse images from text prompts, text-to-image diffusion models often suffer from slow iterative sampling processes. Model distillation is one of the most effective directions to accelerate these models. However, previous distillation methods fail to retain the generation quality while requiring a significant amount of images for training, either from real data or synthetically generated by the teacher model. In response to this limitation, we present a novel image-free distillation scheme named $\textbf{SwiftBrush}$. Drawing inspiration from text-to-3D synthesis, in which a 3D neural radiance field that aligns with the input prompt can be obtained from a 2D text-to-image diffusion prior via a specialized loss without the use of any 3D data ground-truth, our approach re-purposes that same loss for distilling a pretrained multi-step text-to-image model to a student network that can generate high-fidelity images with just a single inference step. In spite of its simplicity, our model stands as one of the first one-step text-to-image generators that can produce images of comparable quality to Stable Diffusion without reliance on any training image data. Remarkably, SwiftBrush achieves an FID score of $\textbf{16.67}$ and a CLIP score of $\textbf{0.29}$ on the COCO-30K benchmark, achieving competitive results or even substantially surpassing existing state-of-the-art distillation techniques. △ Less

Submitted 15 July, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: Accepted to CVPR 2024; Github: https://github.com/VinAIResearch/SwiftBrush

arXiv:2311.15341 [pdf, other]

Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning

Authors: Changyu Chen, Ramesha Karunasena, Thanh Hong Nguyen, Arunesh Sinha, Pradeep Varakantham

Abstract: Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces; these include problems in randomized allocation of resources such as placements of multiple security resources and emergency response units, etc. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large, for w… ▽ More Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces; these include problems in randomized allocation of resources such as placements of multiple security resources and emergency response units, etc. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large, for which existing RL methods do not perform well. Moreover, these problems require validity of the realized action (allocation); this validity constraint is often difficult to express compactly in a closed mathematical form. The allocation nature of the problem also prefers stochastic optimal policies, if one exists. In this work, we address these challenges by (1) applying a (state) conditional normalizing flow to compactly represent the stochastic policy -- the compactness arises due to the network only producing one sampled action and the corresponding log probability of the action, which is then used by an actor-critic method; and (2) employing an invalid action rejection method (via a valid action oracle) to update the base policy. The action rejection is enabled by a modified policy gradient that we derive. Finally, we conduct extensive experiments to show the scalability of our approach compared to prior methods and the ability to enforce arbitrary state-conditional constraints on the support of the distribution of actions in any state. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: Accepted in NeurIPS 2023. Website: https://cameron-chen.github.io/flow-iar/

arXiv:2311.14747 [pdf, other]

HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts

Authors: Do Huu Dat, Po Yuan Mao, Tien Hoang Nguyen, Wray Buntine, Mohammed Bennamoun

Abstract: Compositional Zero-Shot Learning (CZSL) has emerged as an essential paradigm in machine learning, aiming to overcome the constraints of traditional zero-shot learning by incorporating compositional thinking into its methodology. Conventional zero-shot learning has difficulty managing unfamiliar combinations of seen and unseen classes because it depends on pre-defined class embeddings. In contrast,… ▽ More Compositional Zero-Shot Learning (CZSL) has emerged as an essential paradigm in machine learning, aiming to overcome the constraints of traditional zero-shot learning by incorporating compositional thinking into its methodology. Conventional zero-shot learning has difficulty managing unfamiliar combinations of seen and unseen classes because it depends on pre-defined class embeddings. In contrast, Compositional Zero-Shot Learning uses the inherent hierarchies and structural connections among classes, creating new class representations by combining attributes, components, or other semantic elements. In our paper, we propose a novel framework that for the first time combines the Modern Hopfield Network with a Mixture of Experts (HOMOE) to classify the compositions of previously unseen objects. Specifically, the Modern Hopfield Network creates a memory that stores label prototypes and identifies relevant labels for a given input image. Following this, the Mixture of Expert models integrates the image with the fitting prototype to produce the final composition classification. Our approach achieves SOTA performance on several benchmarks, including MIT-States and UT-Zappos. We also examine how each component contributes to improved generalization. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.11730 [pdf, ps, other]

Mixing properties for multivariate Hawkes processes

Authors: Ousmane Boly, Felix Cheysson, Thi Hien Nguyen

Abstract: Properties of strong mixing have been established for the stationary linear Hawkes process in the univariate case, and can serve as a basis for statistical applications. In this paper, we provide the technical arguments needed to extend the proof to the multivariate case. We illustrate these properties by establishing a functional central limit theorem for multivariate Hawkes processes. Properties of strong mixing have been established for the stationary linear Hawkes process in the univariate case, and can serve as a basis for statistical applications. In this paper, we provide the technical arguments needed to extend the proof to the multivariate case. We illustrate these properties by establishing a functional central limit theorem for multivariate Hawkes processes. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.02998 [pdf, ps, other]

One-loop contributions for $h\rightarrow \ell \bar{\ell}γ$ and $e^-e^+\rightarrow hγ$ in $U(1)_{B-L}$ extension of the standard model

Authors: Dzung Tri Tran, Thanh Huy Nguyen, Khiem Hong Phan

Abstract: We present one-loop contributing for $h\rightarrow \ell \bar{\ell}γ$ with $\ell =ν_{e,μ, τ}, e, μ$ and $e^-e^+\rightarrow hγ$ in $U(1)_{B-L}$ extension of the standard models. In phenomenological results, the signal strengths for $h\rightarrow \ell \bar{\ell}γ$ at Large Hadron Collider and for $e^-e^+\rightarrow hγ$ at future Lepton Colliders are analyzed in physical parameter space for both vecto… ▽ More We present one-loop contributing for $h\rightarrow \ell \bar{\ell}γ$ with $\ell =ν_{e,μ, τ}, e, μ$ and $e^-e^+\rightarrow hγ$ in $U(1)_{B-L}$ extension of the standard models. In phenomenological results, the signal strengths for $h\rightarrow \ell \bar{\ell}γ$ at Large Hadron Collider and for $e^-e^+\rightarrow hγ$ at future Lepton Colliders are analyzed in physical parameter space for both vector and chiral $B-L$ models. We find that the contributions from neutral gauge boson $Z'$ to the signal strengths are rather small. Consequently, the effects are hard to probe at future colliders. While the impacts of charged Higgs, CP-odd Higgs in the chiral $B-L$ model on the signal strengths are significant and can be measured with the help of the initial polarization beams at future lepton colliders. △ Less

Submitted 30 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: 41 pages, to be published in Chinese Physics C

Report number: DTU2023-03

arXiv:2310.16242 [pdf, other]

ZzzGPT: An Interactive GPT Approach to Enhance Sleep Quality

Authors: Yonchanok Khaokaew, Kaixin Ji, Thuc Hanh Nguyen, Hiruni Kegalle, Marwah Alaofi, Hao Xue, Flora D. Salim

Abstract: This paper explores the intersection of technology and sleep pattern comprehension, presenting a cutting-edge two-stage framework that harnesses the power of Large Language Models (LLMs). The primary objective is to deliver precise sleep predictions paired with actionable feedback, addressing the limitations of existing solutions. This innovative approach involves leveraging the GLOBEM dataset alo… ▽ More This paper explores the intersection of technology and sleep pattern comprehension, presenting a cutting-edge two-stage framework that harnesses the power of Large Language Models (LLMs). The primary objective is to deliver precise sleep predictions paired with actionable feedback, addressing the limitations of existing solutions. This innovative approach involves leveraging the GLOBEM dataset alongside synthetic data generated by LLMs. The results highlight significant improvements, underlining the efficacy of merging advanced machine-learning techniques with a user-centric design ethos. Through this exploration, we bridge the gap between technological sophistication and user-friendly design, ensuring that our framework yields accurate predictions and translates them into actionable insights. △ Less

Submitted 6 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.09811 [pdf, other]

Spacing distribution for quantum Rabi models

Authors: Daniel Braak, Linh Thi Hoai Nguyen, Cid Reyes-Bustos, Masato Wakayama

Abstract: The asymmetric quantum Rabi model (AQRM) is a fundamental model in quantum optics describing the interaction of light and matter. Besides its immediate physical interest, the AQRM possesses an intriguing mathematical structure which is far from being completely understood. In this paper, we focus on the distribution of the level spacing, the difference between consecutive eigenvalues of the AQRM i… ▽ More The asymmetric quantum Rabi model (AQRM) is a fundamental model in quantum optics describing the interaction of light and matter. Besides its immediate physical interest, the AQRM possesses an intriguing mathematical structure which is far from being completely understood. In this paper, we focus on the distribution of the level spacing, the difference between consecutive eigenvalues of the AQRM in the limit of high energies, i.e. large quantum numbers. In the symmetric case, that is the quantum Rabi model (QRM), the spacing distribution for each parity (given by the $\mathbb{Z}_2$-symmetry) is fully clarified by an asymptotic expression derived by de Monvel and Zielinski, though some questions remain for the full spectrum spacing. However, in the general AQRM case, there is no parity decomposition for the eigenvalues. In connection with numerically exact studies for the first 40,000 eigenstates we describe the spacing distribution for the AQRM which is characterized by a new type of periodicity and symmetric behavior of the distribution with respect to the bias parameter. The results reflects the hidden symmetry of the AQRM known to appear for half-integer bias. In addition, we observe in the AQRM the excited state quantum phase transition for large values of the bias parameter, analogous to the QRM with large qubit energy, and an internal symmetry of the level spacing distribution for fixed bias. This novel symmetry is independent from the symmetry for half-integer bias and not explained by current theoretical knowledge. △ Less

Submitted 9 February, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

Comments: 28 pages. 15 figures. The conjecture in Section 4.4 (Theorem 4.5 in the current version) was proved using results published after the previous version. The rest of the manuscript was modified slightly according to this change

MSC Class: 47B06 (Primary) 81V73; 81R40 (Secondary)

arXiv:2310.06801 [pdf, other]

Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation Learning

Authors: The Viet Bui, Tien Mai, Thanh Hong Nguyen

Abstract: This paper concerns imitation learning (IL) (i.e, the problem of learning to mimic expert behaviors from demonstrations) in cooperative multi-agent systems. The learning problem under consideration poses several challenges, characterized by high-dimensional state and action spaces and intricate inter-agent dependencies. In a single-agent setting, IL has proven to be done efficiently through an inv… ▽ More This paper concerns imitation learning (IL) (i.e, the problem of learning to mimic expert behaviors from demonstrations) in cooperative multi-agent systems. The learning problem under consideration poses several challenges, characterized by high-dimensional state and action spaces and intricate inter-agent dependencies. In a single-agent setting, IL has proven to be done efficiently through an inverse soft-Q learning process given expert demonstrations. However, extending this framework to a multi-agent context introduces the need to simultaneously learn both local value functions to capture local observations and individual actions, and a joint value function for exploiting centralized learning. In this work, we introduce a novel multi-agent IL algorithm designed to address these challenges. Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions. A main advantage of this approach is that the weights of the mixing networks can be trained using information derived from global states. We further establish conditions for the mixing networks under which the multi-agent objective function exhibits convexity within the Q function space. We present extensive experiments conducted on some challenging competitive and cooperative multi-agent game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2), which demonstrates the effectiveness of our proposed algorithm compared to existing state-of-the-art multi-agent IL algorithms. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.12608 [pdf, other]

SPGM: Prioritizing Local Features for enhanced speech separation performance

Authors: Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma

Abstract: Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we pro… ▽ More Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and 0.3 dB respectively and matches the performance of recent SOTA models with up to 8 times fewer parameters. Model and weights are available at huggingface.co/yipjiaqi/spgm △ Less

Submitted 10 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: This paper was accepted by ICASSP 2024

arXiv:2309.09413 [pdf, other]

Are Soft Prompts Good Zero-shot Learners for Speech Recognition?

Authors: Dianwen Ng, Chong Zhang, Ruixi Zhang, Yukun Ma, Fabian Ritter-Gutierrez, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma

Abstract: Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understa… ▽ More Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR). Our findings highlight their role as zero-shot learners in improving ASR performance but also make them vulnerable to malicious modifications. Soft prompts aid generalization but are not obligatory for inference. We also identify two primary roles of soft prompts: content refinement and noise information enhancement, which enhances robustness against background noise. Additionally, we propose an effective modification on noise prompts to show that they are capable of zero-shot learning on adapting to out-of-distribution noise environments. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.09400 [pdf, other]

CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Authors: Thuat Nguyen, Chien Van Nguyen, Viet Dac Lai, Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen

Abstract: The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, es… ▽ More The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, especially the recent state-of-the-art models, they are often not fully disclosed. Creating training data for high-performing LLMs involves extensive cleaning and deduplication to ensure the necessary level of quality. The lack of transparency for training data has thus hampered research on attributing and addressing hallucination and bias issues in LLMs, hindering replication efforts and further advancements in the community. These challenges become even more pronounced in multilingual learning scenarios, where the available multilingual text datasets are often inadequately collected and cleaned. Consequently, there is a lack of open-source and readily usable dataset to effectively train LLMs in multiple languages. To overcome this issue, we present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development. Our dataset undergoes meticulous cleaning and deduplication through a rigorous pipeline of multiple stages to accomplish the best quality for model training, including language identification, URL-based filtering, metric-based cleaning, document refinement, and data deduplication. CulturaX is fully released to the public in HuggingFace to facilitate research and advancements in multilingual LLMs: https://huggingface.co/datasets/uonlp/CulturaX. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: Ongoing Work

arXiv:2308.10188 [pdf, other]

Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games

Authors: The Viet Bui, Tien Mai, Thanh Hong Nguyen

Abstract: Training agents in multi-agent competitive games presents significant challenges due to their intricate nature. These challenges are exacerbated by dynamics influenced not only by the environment but also by opponents' strategies. Existing methods often struggle with slow convergence and instability. To address this, we harness the potential of imitation learning to comprehend and anticipate oppon… ▽ More Training agents in multi-agent competitive games presents significant challenges due to their intricate nature. These challenges are exacerbated by dynamics influenced not only by the environment but also by opponents' strategies. Existing methods often struggle with slow convergence and instability. To address this, we harness the potential of imitation learning to comprehend and anticipate opponents' behavior, aiming to mitigate uncertainties with respect to the game dynamics. Our key contributions include: (i) a new multi-agent imitation learning model for predicting next moves of the opponents -- our model works with hidden opponents' actions and local observations; (ii) a new multi-agent reinforcement learning algorithm that combines our imitation learning model and policy training into one single training process; and (iii) extensive experiments in three challenging game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2). Experimental results show that our approach achieves superior performance compared to existing state-of-the-art multi-agent RL algorithms. △ Less

Submitted 20 August, 2023; originally announced August 2023.

arXiv:2307.16039 [pdf, other]

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Authors: Viet Dac Lai, Chien Van Nguyen, Nghia Trung Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen

Abstract: A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercia… ▽ More A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi. △ Less

Submitted 1 August, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

arXiv:2307.12949 [pdf, ps, other]

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Authors: Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen

Abstract: Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This… ▽ More Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap. The experiments show that our method achieves state-of-the-art performance on the ASR test set on two benchmark datasets for punctuation restoration. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: Accepted at INTERSPEECH 2023, 6 pages

arXiv:2307.09069 [pdf, other]

Mitigating Intersection Attacks in Anonymous Microblogging

Authors: Sarah Abdelwahab Gaballah, Thanh Hoang Long Nguyen, Lamya Abdullah, Ephraim Zimmer, Max Mühlhäuser

Abstract: Anonymous microblogging systems are known to be vulnerable to intersection attacks due to network churn. An adversary that monitors all communications can leverage the churn to learn who is publishing what with increasing confidence over time. In this paper, we propose a protocol for mitigating intersection attacks in anonymous microblogging systems by grouping users into anonymity sets based on s… ▽ More Anonymous microblogging systems are known to be vulnerable to intersection attacks due to network churn. An adversary that monitors all communications can leverage the churn to learn who is publishing what with increasing confidence over time. In this paper, we propose a protocol for mitigating intersection attacks in anonymous microblogging systems by grouping users into anonymity sets based on similarities in their publishing behavior. The protocol provides a configurable communication schedule for users in each set to manage the inevitable trade-off between latency and bandwidth overhead. In our evaluation, we use real-world datasets from two popular microblogging platforms, Twitter and Reddit, to simulate user publishing behavior. The results demonstrate that the protocol can protect users against intersection attacks at low bandwidth overhead when the users adhere to communication schedules. In addition, the protocol can sustain a slow degradation in the size of the anonymity set over time under various churn rates. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2307.04137 [pdf, other]

A Novel Explainable Artificial Intelligence Model in Image Classification problem

Authors: Quoc Hung Cao, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Xuan Phong Nguyen

Abstract: In recent years, artificial intelligence is increasingly being applied widely in many different fields and has a profound and direct impact on human life. Following this is the need to understand the principles of the model making predictions. Since most of the current high-precision models are black boxes, neither the AI scientist nor the end-user deeply understands what's going on inside these m… ▽ More In recent years, artificial intelligence is increasingly being applied widely in many different fields and has a profound and direct impact on human life. Following this is the need to understand the principles of the model making predictions. Since most of the current high-precision models are black boxes, neither the AI scientist nor the end-user deeply understands what's going on inside these models. Therefore, many algorithms are studied for the purpose of explaining AI models, especially those in the problem of image classification in the field of computer vision such as LIME, CAM, GradCAM. However, these algorithms still have limitations such as LIME's long execution time and CAM's confusing interpretation of concreteness and clarity. Therefore, in this paper, we propose a new method called Segmentation - Class Activation Mapping (SeCAM) that combines the advantages of these algorithms above, while at the same time overcoming their disadvantages. We tested this algorithm with various models, including ResNet50, Inception-v3, VGG16 from ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data set. Outstanding results when the algorithm has met all the requirements for a specific explanation in a remarkably concise time. △ Less

Submitted 9 July, 2023; originally announced July 2023.

Comments: Published in the Proceedings of FAIC 2021

arXiv:2306.10059 [pdf, other]

doi 10.1109/IGARSS52108.2023.10282456

Reducing Uncertainties of a Chained Hydrologic-hydraulic Model to Improve Flood Forecasting Using Multi-source Earth Observation Data

Authors: Thanh Huy Nguyen, Sophie Ricci, Andrea Piacentini, Quentin Bonassies, Raquel Rodriguez Suquet, Santiago Peña Luque, Kevin Marlis, Cédric David

Abstract: The challenges in operational flood forecasting lie in producing reliable forecasts given constrained computational resources and within processing times that are compatible with near-real-time forecasting. Flood hydrodynamic models exploit observed data from gauge networks, e.g. water surface elevation (WSE) and/or discharge that describe the forcing time-series at the upstream and lateral bounda… ▽ More The challenges in operational flood forecasting lie in producing reliable forecasts given constrained computational resources and within processing times that are compatible with near-real-time forecasting. Flood hydrodynamic models exploit observed data from gauge networks, e.g. water surface elevation (WSE) and/or discharge that describe the forcing time-series at the upstream and lateral boundary conditions of the model. A chained hydrologic-hydraulic model is thus interesting to allow extended lead time forecasts and overcome the limits of forecast when using only observed gauge measurements. This research work focuses on comprehensively reducing the uncertainties in the model parameters, hydraulic state and especially the forcing data in order to improve the overall flood reanalysis and forecast performance. It aims at assimilating two main complementary EO data sources, namely in-situ WSE and SAR-derived flood extent observations. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Journal ref: IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 2023, pp. 1525-1528

arXiv:2306.08798 [pdf, other]

MPSA-DenseNet: A novel deep learning model for English accent classification

Authors: Tianyu Song, Linh Thi Hoai Nguyen, Ton Viet Ta

Abstract: This paper presents three innovative deep learning models for English accent classification: Multi-DenseNet, PSA-DenseNet, and MPSE-DenseNet, that combine multi-task learning and the PSA module attention mechanism with DenseNet. We applied these models to data collected from six dialects of English across native English speaking regions (Britain, the United States, Scotland) and nonnative English… ▽ More This paper presents three innovative deep learning models for English accent classification: Multi-DenseNet, PSA-DenseNet, and MPSE-DenseNet, that combine multi-task learning and the PSA module attention mechanism with DenseNet. We applied these models to data collected from six dialects of English across native English speaking regions (Britain, the United States, Scotland) and nonnative English speaking regions (China, Germany, India). Our experimental results show a significant improvement in classification accuracy, particularly with MPSA-DenseNet, which outperforms all other models, including DenseNet and EPSA models previously used for accent identification. Our findings indicate that MPSA-DenseNet is a highly promising model for accurately identifying English accents. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2306.08466 [pdf, other]

doi 10.1109/IGARSS52108.2023.10282744

Dealing With Non-Gaussianity of SAR-derived Wet Surface Ratio for Flood Extent Representation Improvement

Authors: Thanh Huy Nguyen, Sophie Ricci, Andrea Piacentini, Ehouarn Simon, Raquel Rodriguez Suquet, Santiago Peña Luque

Abstract: Owing to advances in data assimilation, notably Ensemble Kalman Filter (EnKF), flood simulation and forecast capabilities have greatly improved in recent years. The motivation of the research work is to reduce comprehensively the uncertainties in the model parameters, forcing and hydraulic state, and consequently improve the overall flood reanalysis and forecast capability, especially in the flood… ▽ More Owing to advances in data assimilation, notably Ensemble Kalman Filter (EnKF), flood simulation and forecast capabilities have greatly improved in recent years. The motivation of the research work is to reduce comprehensively the uncertainties in the model parameters, forcing and hydraulic state, and consequently improve the overall flood reanalysis and forecast capability, especially in the floodplain. It aims at assimilating SAR-derived (typically from Sentinel-1 mission) flood extent observations, expressed in terms of wet surface ratio. The non-Gaussianity of the observation errors associated with the SAR flood observations violates a major hypothesis regarding the EnKF and jeopardizes the optimality of the filter analysis. Therefore, a special treatment of such non-Gaussianity with a Gaussian anamorphosis process is thus proposed. This strategy was validated and applied over the Garonne Marmandaise catchment (Southwest of France) represented with the TELEMAC-2D hydrodynamic model, focusing on a major flood event that occurred in December 2019. The assimilation of the SAR-derived wet surface ratio observations, in complement to the in-situ water surface elevations, is illustrated to consequentially improve the flood representation. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: Copyright 2023 IEEE. Published in the IEEE 2023 International Geoscience & Remote Sensing Symposium (IGARSS 2023), scheduled for July 16 - 21, 2023 in Pasadena, California, USA. arXiv admin note: text overlap with arXiv:2304.01058

Journal ref: IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 2023, pp. 1595-1598

arXiv:2306.04527 [pdf, other]

ContriMix: Scalable stain color augmentation for domain generalization without domain labels in digital pathology

Authors: Tan H. Nguyen, Dinkar Juyal, Jin Li, Aaditya Prakash, Shima Nofallah, Chintan Shah, Sai Chowdary Gullapally, Limin Yu, Michael Griffin, Anand Sampat, John Abel, Justin Lee, Amaro Taylor-Weiner

Abstract: Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test… ▽ More Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test time. Many color augmentation methods leverage domain labels to generate synthetic images. This approach causes three significant challenges to scaling such a model. Firstly, incorporating data from a new domain into deep-learning models trained on existing domain labels is not straightforward. Secondly, dependency on domain labels prevents the use of pathology images without domain labels to improve model performance. Finally, implementation of these methods becomes complicated when multiple domain labels (e.g., patient identification, medical center, etc) are associated with a single image. We introduce ContriMix, a novel domain label free stain color augmentation method based on DRIT++, a style-transfer method. Contrimix leverages sample stain color variation within a training minibatch and random mixing to extract content and attribute information from pathology images. This information can be used by a trained ContriMix model to create synthetic images to improve the performance of existing classifiers. ContriMix outperforms competing methods on the Camelyon17-WILDS dataset. Its performance is consistent across different slides in the test set while being robust to the color variation from rare substances in pathology images. We make our code and trained ContriMix models available for research use. The code for ContriMix can be found at https://gitlab.com/huutan86/contrimix △ Less

Submitted 8 March, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.03400 [pdf, other]

G-CAME: Gaussian-Class Activation Mapping Explainer for Object Detectors

Authors: Quoc Khanh Nguyen, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Van Binh Truong, Quoc Hung Cao

Abstract: Nowadays, deep neural networks for object detection in images are very prevalent. However, due to the complexity of these networks, users find it hard to understand why these objects are detected by models. We proposed Gaussian Class Activation Mapping Explainer (G-CAME), which generates a saliency map as the explanation for object detection models. G-CAME can be considered a CAM-based method that… ▽ More Nowadays, deep neural networks for object detection in images are very prevalent. However, due to the complexity of these networks, users find it hard to understand why these objects are detected by models. We proposed Gaussian Class Activation Mapping Explainer (G-CAME), which generates a saliency map as the explanation for object detection models. G-CAME can be considered a CAM-based method that uses the activation maps of selected layers combined with the Gaussian kernel to highlight the important regions in the image for the predicted box. Compared with other Region-based methods, G-CAME can transcend time constraints as it takes a very short time to explain an object. We also evaluated our method qualitatively and quantitatively with YOLOX on the MS-COCO 2017 dataset and guided to apply G-CAME into the two-stage Faster-RCNN model. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: 10 figures

arXiv:2306.02744 [pdf, other]

Towards Better Explanations for Object Detection

Authors: Van Binh Truong, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Quoc Khanh Nguyen, Quoc Hung Cao

Abstract: Recent advances in Artificial Intelligence (AI) technology have promoted their use in almost every field. The growing complexity of deep neural networks (DNNs) makes it increasingly difficult and important to explain the inner workings and decisions of the network. However, most current techniques for explaining DNNs focus mainly on interpreting classification tasks. This paper proposes a method t… ▽ More Recent advances in Artificial Intelligence (AI) technology have promoted their use in almost every field. The growing complexity of deep neural networks (DNNs) makes it increasingly difficult and important to explain the inner workings and decisions of the network. However, most current techniques for explaining DNNs focus mainly on interpreting classification tasks. This paper proposes a method to explain the decision for any object detection model called D-CLOSE. To closely track the model's behavior, we used multiple levels of segmentation on the image and a process to combine them. We performed tests on the MS-COCO dataset with the YOLOX model, which shows that our method outperforms D-RISE and can give a better quality and less noise explanation. △ Less

Submitted 6 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 9 pages, 10 figures

arXiv:2306.02196 [pdf, other]

Question-Context Alignment and Answer-Context Dependencies for Effective Answer Sentence Selection

Authors: Minh Van Nguyen, Kishan KC, Toan Nguyen, Thien Huu Nguyen, Ankit Chadha, Thuy Vu

Abstract: Answer sentence selection (AS2) in open-domain question answering finds answer for a question by ranking candidate sentences extracted from web documents. Recent work exploits answer context, i.e., sentences around a candidate, by incorporating them as additional input string to the Transformer models to improve the correctness scoring. In this paper, we propose to improve the candidate scoring by… ▽ More Answer sentence selection (AS2) in open-domain question answering finds answer for a question by ranking candidate sentences extracted from web documents. Recent work exploits answer context, i.e., sentences around a candidate, by incorporating them as additional input string to the Transformer models to improve the correctness scoring. In this paper, we propose to improve the candidate scoring by explicitly incorporating the dependencies between question-context and answer-context into the final representation of a candidate. Specifically, we use Optimal Transport to compute the question-based dependencies among sentences in the passage where the answer is extracted from. We then represent these dependencies as edges in a graph and use Graph Convolutional Network to derive the representation of a candidate, a node in the graph. Our proposed model achieves significant improvements on popular AS2 benchmarks, i.e., WikiQA and WDRASS, obtaining new state-of-the-art on all benchmarks. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: final copy for INTERSPEECH 2023

arXiv:2305.18458 [pdf, other]

Conditional Support Alignment for Domain Adaptation with Label Shift

Authors: Anh T Nguyen, Lam Tran, Anh Tong, Tuan-Duy H. Nguyen, Toan Tran

Abstract: Unsupervised domain adaptation (UDA) refers to a domain adaptation framework in which a learning model is trained based on the labeled samples on the source domain and unlabelled ones in the target domain. The dominant existing methods in the field that rely on the classical covariate shift assumption to learn domain-invariant feature representation have yielded suboptimal performance under the la… ▽ More Unsupervised domain adaptation (UDA) refers to a domain adaptation framework in which a learning model is trained based on the labeled samples on the source domain and unlabelled ones in the target domain. The dominant existing methods in the field that rely on the classical covariate shift assumption to learn domain-invariant feature representation have yielded suboptimal performance under the label distribution shift between source and target domains. In this paper, we propose a novel conditional adversarial support alignment (CASA) whose aim is to minimize the conditional symmetric support divergence between the source's and target domain's feature representation distributions, aiming at a more helpful representation for the classification task. We also introduce a novel theoretical target risk bound, which justifies the merits of aligning the supports of conditional feature distributions compared to the existing marginal support alignment approach in the UDA settings. We then provide a complete training process for learning in which the objective optimization functions are precisely based on the proposed target risk bound. Our empirical results demonstrate that CASA outperforms other state-of-the-art methods on different UDA benchmark tasks under label shift conditions. △ Less

Submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.12121 [pdf, other]

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

Authors: Jia Qi Yip, Tuan Truong, Dianwen Ng, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma

Abstract: In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling. ACA is able to distill large, variable-length sequences into small, fixed-sized latents by attending a small query to large key and value matrices. In ACA-Net, we buil… ▽ More In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling. ACA is able to distill large, variable-length sequences into small, fixed-sized latents by attending a small query to large key and value matrices. In ACA-Net, we build a Multi-Layer Aggregation (MLA) block using ACA to generate fixed-sized identity vectors from variable-length inputs. Through global attention, ACA-Net acts as an efficient global feature extractor that adapts to temporal variability unlike existing SV models that apply a fixed function for pooling over the temporal dimension which may obscure information about the signal's non-stationary temporal variability. Our experiments on the WSJ0-1talker show ACA-Net outperforms a strong baseline by 5\% relative improvement in EER using only 1/5 of the parameters. △ Less

Submitted 20 May, 2023; originally announced May 2023.

Comments: Accepted to INTERSPEECH 2023

arXiv:2305.08217 [pdf, other]

doi 10.1103/PhysRevC.107.065501

Neural Network predictions of inclusive electron-nucleus cross sections

Authors: O. Al Hammal, M. Martini, J. Frontera-Pons, T. H. Nguyen, R. Perez-Ramos

Abstract: We investigate whether a neural network approach can reproduce and predict the electron-nucleus cross sections in the kinematical domain of present and future accelerator-based neutrino oscillation experiments. For this purpose, we consider the large amount of data available to the community via the web-page ``Quasielastic Electron Nucleus scattering archive'', and use a residual, fully connected… ▽ More We investigate whether a neural network approach can reproduce and predict the electron-nucleus cross sections in the kinematical domain of present and future accelerator-based neutrino oscillation experiments. For this purpose, we consider the large amount of data available to the community via the web-page ``Quasielastic Electron Nucleus scattering archive'', and use a residual, fully connected feedforward neural network. We illustrate the training performances of the neural network by comparing its results with experimental data for the electron double-differential cross section on carbon. The agreement between predictions and data is remarkable from quasielastic to deep-inelastic scattering. To test the predicting power of the neural network we consider the numerous kinematical conditions for which experimental cross sections on calcium are available. Furthermore, we show the predictions of the electron scattering cross sections on oxygen, argon, and titanium: nuclei of particular interest in the context of present and future accelerator-based neutrino oscillation program. The agreement between these predictions and the data is comparable to the one of other theoretical models commonly used to calculate electron and neutrino cross sections, such as SuSAv2 and GiBUU. Results obtained with GENIE, a Monte Carlo event generator, are also discussed for comparison. The good performances obtained with our neural network suggest that neural networks could be exploited for theoretical and experimental investigations of electron- and neutrino-nucleus scattering. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Showing 1–50 of 220 results for author: Nguyen, T H