-
Language-driven Grasp Detection with Mask-guided Attention
Authors:
Tuan Van Vo,
Minh Nhat Vu,
Baoru Huang,
An Vuong,
Ngan Le,
Thieu Vo,
Anh Nguyen
Abstract:
Grasp detection is an essential task in robotics with various industrial applications. However, traditional methods often struggle with occlusions and do not utilize language for grasping. Incorporating natural language into grasp detection remains a challenging task and largely unexplored. To address this gap, we propose a new method for language-driven grasp detection with mask-guided attention…
▽ More
Grasp detection is an essential task in robotics with various industrial applications. However, traditional methods often struggle with occlusions and do not utilize language for grasping. Incorporating natural language into grasp detection remains a challenging task and largely unexplored. To address this gap, we propose a new method for language-driven grasp detection with mask-guided attention by utilizing the transformer attention mechanism with semantic segmentation features. Our approach integrates visual data, segmentation mask features, and natural language instructions, significantly improving grasp detection accuracy. Our work introduces a new framework for language-driven grasp detection, paving the way for language-driven robotic applications. Intensive experiments show that our method outperforms other recent baselines by a clear margin, with a 10.0% success score improvement. We further validate our method in real-world robotic experiments, confirming the effectiveness of our approach.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Lightweight Language-driven Grasp Detection using Conditional Consistency Model
Authors:
Nghia Nguyen,
Minh Nhat Vu,
Baoru Huang,
An Vuong,
Ngan Le,
Thieu Vo,
Anh Nguyen
Abstract:
Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight diffusion models to achieve fast inference time. By integrating diffusion processes with grasping prompts in natural language, our method can effectively encode v…
▽ More
Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight diffusion models to achieve fast inference time. By integrating diffusion processes with grasping prompts in natural language, our method can effectively encode visual and textual information, enabling more accurate and versatile grasp positioning that aligns well with the text query. To overcome the long inference time problem in diffusion models, we leverage the image and text features as the condition in the consistency model to reduce the number of denoising timesteps during inference. The intensive experimental results show that our method outperforms other recent grasp detection methods and lightweight diffusion models by a clear margin. We further validate our method in real-world robotic experiments to demonstrate its fast inference time capability.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Decoupled Prompt-Adapter Tuning for Continual Activity Recognition
Authors:
Di Fu,
Thanh Vinh Vo,
Haozhe Ma,
Tze-Yun Leong
Abstract:
Action recognition technology plays a vital role in enhancing security through surveillance systems, enabling better patient monitoring in healthcare, providing in-depth performance analysis in sports, and facilitating seamless human-AI collaboration in domains such as manufacturing and assistive technologies. The dynamic nature of data in these areas underscores the need for models that can conti…
▽ More
Action recognition technology plays a vital role in enhancing security through surveillance systems, enabling better patient monitoring in healthcare, providing in-depth performance analysis in sports, and facilitating seamless human-AI collaboration in domains such as manufacturing and assistive technologies. The dynamic nature of data in these areas underscores the need for models that can continuously adapt to new video data without losing previously acquired knowledge, highlighting the critical role of advanced continual action recognition. To address these challenges, we propose Decoupled Prompt-Adapter Tuning (DPAT), a novel framework that integrates adapters for capturing spatial-temporal information and learnable prompts for mitigating catastrophic forgetting through a decoupled training strategy. DPAT uniquely balances the generalization benefits of prompt tuning with the plasticity provided by adapters in pretrained vision models, effectively addressing the challenge of maintaining model performance amidst continuous data evolution without necessitating extensive finetuning. DPAT consistently achieves state-of-the-art performance across several challenging action recognition benchmarks, thus demonstrating the effectiveness of our model in the domain of continual action recognition.
△ Less
Submitted 20 July, 2024;
originally announced July 2024.
-
Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
Authors:
Toan Nguyen,
Minh Nhat Vu,
Baoru Huang,
An Vuong,
Quan Vuong,
Ngan Le,
Thieu Vo,
Anh Nguyen
Abstract:
6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection i…
▽ More
6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection in cluttered point clouds. We first introduce Grasp-Anything-6D, a large-scale dataset for the language-driven 6-DoF grasp detection task with 1M point cloud scenes and more than 200M language-associated 3D grasp poses. We further introduce a novel diffusion model that incorporates a new negative prompt guidance learning strategy. The proposed negative prompt strategy directs the detection process toward the desired object while steering away from unwanted ones given the language input. Our method enables an end-to-end framework where humans can command the robot to grasp desired objects in a cluttered scene using natural language. Intensive experimental results show the effectiveness of our method in both benchmarking experiments and real-world scenarios, surpassing other baselines. In addition, we demonstrate the practicality of our approach in real-world robotic applications. Our project is available at https://airvlab.github.io/grasp-anything.
△ Less
Submitted 25 July, 2024; v1 submitted 18 July, 2024;
originally announced July 2024.
-
Noether's normalization in skew polynomial rings
Authors:
Elad Paran,
Thieu N. Vo
Abstract:
We study Noether's normalization lemma for finitely generated algebras over a division algebra. In its classical form, the lemma states that if $I$ is a proper ideal of the ring $R=F[t_1,\ldots,t_n]$ of polynomials over a field $F$, then the quotient ring $R/I$ is a finite extension of a polynomial ring over $F$. We prove that the lemma holds when $R=D[t_1,\ldots,t_n]$ is the ring of polynomials i…
▽ More
We study Noether's normalization lemma for finitely generated algebras over a division algebra. In its classical form, the lemma states that if $I$ is a proper ideal of the ring $R=F[t_1,\ldots,t_n]$ of polynomials over a field $F$, then the quotient ring $R/I$ is a finite extension of a polynomial ring over $F$. We prove that the lemma holds when $R=D[t_1,\ldots,t_n]$ is the ring of polynomials in $n$ central variables over a division algebra $D$. We provide examples demonstrating that Noether's normalization may fail for the skew polynomial ring $D[t_1,\ldots,t_n;σ_1,\ldots,σ_n]$ with respect to commuting automorphisms $σ_1,\ldots,σ_n$ of $D$. We give a sufficient condition for $σ_1,\ldots,σ_n$ under which the normalization lemma holds for such ring. In the case where $D=F$ is a field, this sufficient condition is proved to be necessary.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Weighted Missing Linear Discriminant Analysis: An Explainable Approach for Classification with Missing Data
Authors:
Tuan L. Vo,
Uyen Dang,
Thu Nguyen
Abstract:
As Artificial Intelligence (AI) models are gradually being adopted in real-life applications, the explainability of the model used is critical, especially in high-stakes areas such as medicine, finance, etc. Among the commonly used models, Linear Discriminant Analysis (LDA) is a widely used classification tool that is also explainable thanks to its ability to model class distributions and maximize…
▽ More
As Artificial Intelligence (AI) models are gradually being adopted in real-life applications, the explainability of the model used is critical, especially in high-stakes areas such as medicine, finance, etc. Among the commonly used models, Linear Discriminant Analysis (LDA) is a widely used classification tool that is also explainable thanks to its ability to model class distributions and maximize class separation through linear feature combinations. Nevertheless, real-world data is frequently incomplete, presenting significant challenges for classification tasks and model explanations. In this paper, we propose a novel approach to LDA under missing data, termed \textbf{\textit{Weighted missing Linear Discriminant Analysis (WLDA)}}, to directly classify observations in data that contains missing values without imputation effectively by estimating the parameters directly on missing data and use a weight matrix for missing values to penalize missing entries during classification. Furthermore, we also analyze the theoretical properties and examine the explainability of the proposed technique in a comprehensive manner. Experimental results demonstrate that WLDA outperforms conventional methods by a significant margin, particularly in scenarios where missing values are present in both training and test sets.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Explainability of Machine Learning Models under Missing Data
Authors:
Tuan L. Vo,
Thu Nguyen,
Hugo L. Hammer,
Michael A. Riegler,
Pal Halvorsen
Abstract:
Missing data is a prevalent issue that can significantly impair model performance and interpretability. This paper briefly summarizes the development of the field of missing data with respect to Explainable Artificial Intelligence and experimentally investigates the effects of various imputation methods on the calculation of Shapley values, a popular technique for interpreting complex machine lear…
▽ More
Missing data is a prevalent issue that can significantly impair model performance and interpretability. This paper briefly summarizes the development of the field of missing data with respect to Explainable Artificial Intelligence and experimentally investigates the effects of various imputation methods on the calculation of Shapley values, a popular technique for interpreting complex machine learning models. We compare different imputation strategies and assess their impact on feature importance and interaction as determined by Shapley values. Moreover, we also theoretically analyze the effects of missing values on Shapley values. Importantly, our findings reveal that the choice of imputation method can introduce biases that could lead to changes in the Shapley values, thereby affecting the interpretability of the model. Moreover, and that a lower test prediction mean square error (MSE) may not imply a lower MSE in Shapley values and vice versa. Also, while Xgboost is a method that could handle missing data directly, using Xgboost directly on missing data can seriously affect interpretability compared to imputing the data before training Xgboost. This study provides a comprehensive evaluation of imputation methods in the context of model interpretation, offering practical guidance for selecting appropriate techniques based on dataset characteristics and analysis objectives. The results underscore the importance of considering imputation effects to ensure robust and reliable insights from machine learning models.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Low-Crosstalk, Silicon-Fabricated Optical Waveguides for Laser Delivery to Matter Qubits
Authors:
Clayton L. Craft,
Nicholas J. Barton,
Andrew C. Klug,
Kenneth Scalzi,
Ian Wildemann,
Pramod Asagodu,
Joseph D. Broz,
Nikola L. Porto,
Michael Macalik,
Anthony Rizzo,
Garrett Percevault,
Christopher C. Tison,
A. Matthew Smith,
Michael L. Fanto,
James Schneeloch,
Erin Sheridan,
Dylan Heberle,
Andrew Brownell,
Vijay S. S. Sundaram,
Venkatesh Deenadayalan,
Matthew van Niekerk,
Evan Manfreda-Schulz,
Gregory A. Howland,
Stefan F. Preble,
Daniel Coleman
, et al. (8 additional authors not shown)
Abstract:
Reliable control of quantum information in matter-based qubits requires precisely applied external fields, and unaccounted for spatial cross-talk of these fields between adjacent qubits leads to loss of fidelity. We report a CMOS foundry-produced, micro-fabricated silicon nitride (Si3N4) optical waveguide for addressing a chain of eight, unequally-spaced trapped barium ions with crosstalk compatib…
▽ More
Reliable control of quantum information in matter-based qubits requires precisely applied external fields, and unaccounted for spatial cross-talk of these fields between adjacent qubits leads to loss of fidelity. We report a CMOS foundry-produced, micro-fabricated silicon nitride (Si3N4) optical waveguide for addressing a chain of eight, unequally-spaced trapped barium ions with crosstalk compatible with scalable quantum information processing. The crosstalk mitigation techniques incorporated into the chip design result in a reduction of the measured optical field by at least 50.8(1.3) dB between adjacent waveguide outputs near 650 nm and similar behavior for devices designed for 493 nm and 585 nm. The waveguide outputs near 650 nm, along with a global laser near 493 nm were used to laser-cool a chain of eight barium-138 ions, and a camera imaged the resulting fluorescence at 493 nm.
△ Less
Submitted 27 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Language-driven Grasp Detection
Authors:
An Dinh Vuong,
Minh Nhat Vu,
Baoru Huang,
Nghia Nguyen,
Hieu Le,
Thieu Vo,
Anh Nguyen
Abstract:
Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samp…
▽ More
Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samples, over 3M objects, and upwards of 10M grasping instructions. We utilize foundation models to create a large-scale scene corpus with corresponding images and grasp prompts. We approach the language-driven grasp detection task as a conditional generation problem. Drawing on the success of diffusion models in generative tasks and given that language plays a vital role in this task, we propose a new language-driven grasp detection method based on diffusion models. Our key contribution is the contrastive training objective, which explicitly contributes to the denoising process to detect the grasp pose given the language instructions. We illustrate that our approach is theoretically supportive. The intensive experiments show that our method outperforms state-of-the-art approaches and allows real-world robotic grasping. Finally, we demonstrate our large-scale dataset enables zero-short grasp detection and is a challenging benchmark for future work. Project website: https://airvlab.github.io/grasp-anything/
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Greedy Heuristics for Sampling-based Motion Planning in High-Dimensional State Spaces
Authors:
Phone Thiha Kyaw,
Anh Vu Le,
Lim Yi,
Prabakaran Veerajagadheswar,
Mohan Rajesh Elara,
Dinh Tung Vo,
Minh Bui Vu
Abstract:
Sampling-based motion planning algorithms are very effective at finding solutions in high-dimensional continuous state spaces as they do not require prior approximations of the problem domain compared to traditional discrete graph-based searches. The anytime version of the Rapidly-exploring Random Trees (RRT) algorithm, denoted as RRT*, often finds high-quality solutions by incrementally approxima…
▽ More
Sampling-based motion planning algorithms are very effective at finding solutions in high-dimensional continuous state spaces as they do not require prior approximations of the problem domain compared to traditional discrete graph-based searches. The anytime version of the Rapidly-exploring Random Trees (RRT) algorithm, denoted as RRT*, often finds high-quality solutions by incrementally approximating and searching the problem domain through random sampling. However, due to its low sampling efficiency and slow convergence rate, research has proposed many variants of RRT*, incorporating different heuristics and sampling strategies to overcome the constraints in complex planning problems. Yet, these approaches address specific convergence aspects of RRT* limitations, leaving a need for a sampling-based algorithm that can quickly find better solutions in complex high-dimensional state spaces with a faster convergence rate for practical motion planning applications. This article unifies and leverages the greedy search and heuristic techniques used in various RRT* variants to develop a greedy version of the anytime Rapidly-exploring Random Trees algorithm, denoted as Greedy RRT* (G-RRT*). It improves the initial solution-finding time of RRT* by maintaining two trees rooted at both the start and goal ends, advancing toward each other using greedy connection heuristics. It also accelerates the convergence rate of RRT* by introducing a greedy version of direct informed sampling procedure, which guides the sampling towards the promising region of the problem domain based on heuristics. We validate our approach on simulated planning problems, manipulation problems on Barrett WAM Arms, and on a self-reconfigurable robot, Panthera. Results show that G-RRT* produces asymptotically optimal solution paths and outperforms state-of-the-art RRT* variants, especially in high-dimensional planning problems.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
VLSP 2023 -- LTER: A Summary of the Challenge on Legal Textual Entailment Recognition
Authors:
Vu Tran,
Ha-Thanh Nguyen,
Trung Vo,
Son T. Luu,
Hoang-Anh Dang,
Ngoc-Cam Le,
Thi-Thuy Le,
Minh-Tien Nguyen,
Truong-Son Nguyen,
Le-Minh Nguyen
Abstract:
In this new era of rapid AI development, especially in language processing, the demand for AI in the legal domain is increasingly critical. In the context where research in other languages such as English, Japanese, and Chinese has been well-established, we introduce the first fundamental research for the Vietnamese language in the legal domain: legal textual entailment recognition through the Vie…
▽ More
In this new era of rapid AI development, especially in language processing, the demand for AI in the legal domain is increasingly critical. In the context where research in other languages such as English, Japanese, and Chinese has been well-established, we introduce the first fundamental research for the Vietnamese language in the legal domain: legal textual entailment recognition through the Vietnamese Language and Speech Processing workshop. In analyzing participants' results, we discuss certain linguistic aspects critical in the legal domain that pose challenges that need to be addressed.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Virtual reassembling of 3D fragments for the data-driven analysis of fracture mechanisms in composite materials
Authors:
Thomas Wilhelm,
Trang Thu Võ,
Orkun Furat,
Urs A. Peuker,
Volker Schmidt
Abstract:
This paper introduces a novel method for characterizing fracture mechanisms in composite materials using 3D image data gained by computed tomography (CT) measurements. In mineral liberation, the understanding of these mechanisms is crucial, particularly whether fractures occur along the boundaries of mineral phases (intergranular fracture) and/or within mineral phases (transgranular fracture). Con…
▽ More
This paper introduces a novel method for characterizing fracture mechanisms in composite materials using 3D image data gained by computed tomography (CT) measurements. In mineral liberation, the understanding of these mechanisms is crucial, particularly whether fractures occur along the boundaries of mineral phases (intergranular fracture) and/or within mineral phases (transgranular fracture). Conventional techniques for analyzing fracture mechanisms are focused on globally comparing the surface exposure of mineral phases extracted from image measurements before and after fracture. Instead, we present a virtual reassembling algorithm based on image registration techniques, which is applied to 3D data of composite materials before and after fracture in order to determine and characterize the individual fracture surfaces. This enables us to conduct a local quantitative analysis of fracture mechanisms by voxelwise comparing adjacent regions at fracture surfaces. A quantitative analysis of fracture mechanisms is especially important in the context of geometallurgical recycling processes. As primary deposits are decreasing worldwide, the focus is shifting to secondary raw materials containing low concentrations of valuable elements such as lithium. To extract these elements, they can be enriched as engineered artificial minerals in the slag phase of appropriately designed cooling processes. The subsequent liberation through comminution processes, such as crushing, is essential for the extraction of valuable minerals. A better understanding of crushing processes, especially fracture mechanisms in slags, is crucial for the success of recycling. The reassembling algorithm presented in this paper is evaluated through a simulation study, followed by an application to a naturally occurring ore and a slag resulting from a recycling process.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
E(3)-Equivariant Mesh Neural Networks
Authors:
Thuan Trang,
Nhat Khang Ngo,
Daniel Levy,
Thieu N. Vo,
Siamak Ravanbakhsh,
Truong Son Hy
Abstract:
Triangular meshes are widely used to represent three-dimensional objects. As a result, many recent works have address the need for geometric deep learning on 3D mesh. However, we observe that the complexities in many of these architectures does not translate to practical performance, and simple deep models for geometric graphs are competitive in practice. Motivated by this observation, we minimall…
▽ More
Triangular meshes are widely used to represent three-dimensional objects. As a result, many recent works have address the need for geometric deep learning on 3D mesh. However, we observe that the complexities in many of these architectures does not translate to practical performance, and simple deep models for geometric graphs are competitive in practice. Motivated by this observation, we minimally extend the update equations of E(n)-Equivariant Graph Neural Networks (EGNNs) (Satorras et al., 2021) to incorporate mesh face information, and further improve it to account for long-range interactions through hierarchy. The resulting architecture, Equivariant Mesh Neural Network (EMNN), outperforms other, more complicated equivariant methods on mesh tasks, with a fast run-time and no expensive pre-processing. Our implementation is available at https://github.com/HySonLab/EquiMesh
△ Less
Submitted 18 February, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Autonomous Catheterization with Open-source Simulator and Expert Trajectory
Authors:
Tudor Jianu,
Baoru Huang,
Tuan Vo,
Minh Nhat Vu,
Jingxuan Kang,
Hoan Nguyen,
Olatunji Omisore,
Pierre Berthet-Rayne,
Sebastiano Fichera,
Anh Nguyen
Abstract:
Endovascular robots have been actively developed in both academia and industry. However, progress toward autonomous catheterization is often hampered by the widespread use of closed-source simulators and physical phantoms. Additionally, the acquisition of large-scale datasets for training machine learning algorithms with endovascular robots is usually infeasible due to expensive medical procedures…
▽ More
Endovascular robots have been actively developed in both academia and industry. However, progress toward autonomous catheterization is often hampered by the widespread use of closed-source simulators and physical phantoms. Additionally, the acquisition of large-scale datasets for training machine learning algorithms with endovascular robots is usually infeasible due to expensive medical procedures. In this chapter, we introduce CathSim, the first open-source simulator for endovascular intervention to address these limitations. CathSim emphasizes real-time performance to enable rapid development and testing of learning algorithms. We validate CathSim against the real robot and show that our simulator can successfully mimic the behavior of the real robot. Based on CathSim, we develop a multimodal expert navigation network and demonstrate its effectiveness in downstream endovascular navigation tasks. The intensive experimental results suggest that CathSim has the potential to significantly accelerate research in the autonomous catheterization field. Our project is publicly available at https://github.com/airvlab/cathsim.
△ Less
Submitted 19 January, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Recanting twins: addressing intermediate confounding in mediation analysis
Authors:
Tat-Thang Vo,
Nicholas Williams,
Richard Liu,
Kara E. Rudolph,
Ivan Dıaz
Abstract:
The presence of intermediate confounders, also called recanting witnesses, is a fundamental challenge to the investigation of causal mechanisms in mediation analysis, preventing the identification of natural path-specific effects. Proposed alternative parameters (such as randomizational interventional effects) are problematic because they can be non-null even when there is no mediation for any ind…
▽ More
The presence of intermediate confounders, also called recanting witnesses, is a fundamental challenge to the investigation of causal mechanisms in mediation analysis, preventing the identification of natural path-specific effects. Proposed alternative parameters (such as randomizational interventional effects) are problematic because they can be non-null even when there is no mediation for any individual in the population; i.e., they are not an average of underlying individual-level mechanisms. In this paper we develop a novel method for mediation analysis in settings with intermediate confounding, with guarantees that the causal parameters are summaries of the individual-level mechanisms of interest. The method is based on recently proposed ideas that view causality as the transfer of information, and thus replace recanting witnesses by draws from their conditional distribution, what we call "recanting twins". We show that, in the absence of intermediate confounding, recanting twin effects recover natural path-specific effects. We present the assumptions required for identification of recanting twins effects under a standard structural causal model, as well as the assumptions under which the recanting twin identification formulas can be interpreted in the context of the recently proposed separable effects models. To estimate recanting-twin effects, we develop efficient semi-parametric estimators that allow the use of data driven methods in the estimation of the nuisance parameters. We present numerical studies of the methods using synthetic data, as well as an application to evaluate the role of new-onset anxiety and depressive disorder in explaining the relationship between gabapentin/pregabalin prescription and incident opioid use disorder among Medicaid beneficiaries with chronic pain.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
A skew Newton-Puiseux Theorem
Authors:
Elad Paran,
Thieu N. Vo
Abstract:
We prove a skew generalization of the Newton-Puiseux theorem for the field $F = \bigcup_{n=1}^\infty \mathbb{C}((x^\frac{1}{n}))$ of Puiseux series: For any positive real number $α$, we consider the $\mathbb{C}$-automorphism $σ$ of $F$ given by $x \mapsto αx$, and prove that every non-constant polynomial in the skew polynomial ring $F[t,σ]$ factors into a product of linear terms. This generalizes…
▽ More
We prove a skew generalization of the Newton-Puiseux theorem for the field $F = \bigcup_{n=1}^\infty \mathbb{C}((x^\frac{1}{n}))$ of Puiseux series: For any positive real number $α$, we consider the $\mathbb{C}$-automorphism $σ$ of $F$ given by $x \mapsto αx$, and prove that every non-constant polynomial in the skew polynomial ring $F[t,σ]$ factors into a product of linear terms. This generalizes the classical theorem where $σ= {\rm id}$, and gives the first concrete example of a field of characteristic $0$ that is algebraically closed with respect to a non-trivial automorphism -- a notion studied in works of Aryapoor and of Smith. Our result also resolves an open question of Aryapoor concerning such fields. A key ingredient in the proof is a new variant of Hensel's lemma.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Imputation using training labels and classification via label imputation
Authors:
Thu Nguyen,
Tuan L. Vo,
Pål Halvorsen,
Michael A. Riegler
Abstract:
Missing data is a common problem in practical settings. Various imputation methods have been developed to deal with missing data. However, even though the label is usually available in the training data, the common practice of imputation usually only relies on the input and ignores the label. In this work, we illustrate how stacking the label into the input can significantly improve the imputation…
▽ More
Missing data is a common problem in practical settings. Various imputation methods have been developed to deal with missing data. However, even though the label is usually available in the training data, the common practice of imputation usually only relies on the input and ignores the label. In this work, we illustrate how stacking the label into the input can significantly improve the imputation of the input. In addition, we propose a classification strategy that initializes the predicted test label with missing values and stacks the label with the input for imputation. This allows imputing the label and the input at the same time. Also, the technique is capable of handling data training with missing labels without any prior imputation and is applicable to continuous, categorical, or mixed-type data. Experiments show promising results in terms of accuracy.
△ Less
Submitted 23 April, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Language-driven Scene Synthesis using Multi-conditional Diffusion Model
Authors:
An Vuong,
Minh Nhat Vu,
Toan Tien Nguyen,
Baoru Huang,
Dzung Nguyen,
Thieu Vo,
Anh Nguyen
Abstract:
Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which…
▽ More
Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which is a new task that integrates text prompts, human motion, and existing objects for scene synthesis. Unlike other single-condition synthesis tasks, our problem involves multiple conditions and requires a strategy for processing and encoding them into a unified space. To address the challenge, we present a multi-conditional diffusion model, which differs from the implicit unification approach of other diffusion literature by explicitly predicting the guiding points for the original data distribution. We demonstrate that our approach is theoretically supportive. The intensive experiment results illustrate that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications. The source code and dataset can be accessed at https://lang-scene-synth.github.io/.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Generative Pre-trained Transformer for Vietnamese Community-based COVID-19 Question Answering
Authors:
Tam Minh Vo,
Khiem Vinh Tran
Abstract:
Recent studies have provided empirical evidence of the wide-ranging potential of Generative Pre-trained Transformer (GPT), a pretrained language model, in the field of natural language processing. GPT has been effectively employed as a decoder within state-of-the-art (SOTA) question answering systems, yielding exceptional performance across various tasks. However, the current research landscape co…
▽ More
Recent studies have provided empirical evidence of the wide-ranging potential of Generative Pre-trained Transformer (GPT), a pretrained language model, in the field of natural language processing. GPT has been effectively employed as a decoder within state-of-the-art (SOTA) question answering systems, yielding exceptional performance across various tasks. However, the current research landscape concerning GPT's application in Vietnamese remains limited. This paper aims to address this gap by presenting an implementation of GPT-2 for community-based question answering specifically focused on COVID-19 related queries in Vietnamese. We introduce a novel approach by conducting a comparative analysis of different Transformers vs SOTA models in the community-based COVID-19 question answering dataset. The experimental findings demonstrate that the GPT-2 models exhibit highly promising outcomes, outperforming other SOTA models as well as previous community-based COVID-19 question answering models developed for Vietnamese.
△ Less
Submitted 31 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation
Authors:
Tuan Van Vo,
Minh Nhat Vu,
Baoru Huang,
Toan Nguyen,
Ngan Le,
Thieu Vo,
Anh Nguyen
Abstract:
Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D po…
▽ More
Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D point clouds, leveraging knowledge distillation and text-point correlation. Our approach employs pre-trained 3D models through knowledge distillation to enhance feature extraction and semantic understanding in 3D point clouds. We further introduce a new text-point correlation method to learn the semantic links between point cloud features and open-vocabulary labels. The intensive experiments show that our approach outperforms previous works and adapts to new affordance labels and unseen objects. Notably, our method achieves the improvement of 7.96% mIOU score compared to the baselines. Furthermore, it offers real-time inference which is well-suitable for robotic manipulation applications.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Language-Conditioned Affordance-Pose Detection in 3D Point Clouds
Authors:
Toan Nguyen,
Minh Nhat Vu,
Baoru Huang,
Tuan Van Vo,
Vy Truong,
Ngan Le,
Thieu Vo,
Bac Le,
Anh Nguyen
Abstract:
Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-wor…
▽ More
Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-world environments. In this paper, we propose a new method for language-conditioned affordance-pose joint learning in 3D point clouds. Given a 3D point cloud object, our method detects the affordance region and generates appropriate 6-DoF poses for any unconstrained affordance label. Our method consists of an open-vocabulary affordance detection branch and a language-guided diffusion model that generates 6-DoF poses based on the affordance text. We also introduce a new high-quality dataset for the task of language-driven affordance-pose joint learning. Intensive experimental results demonstrate that our proposed method works effectively on a wide range of open-vocabulary affordances and outperforms other baselines by a large margin. In addition, we illustrate the usefulness of our method in real-world robotic applications. Our code and dataset are publicly available at https://3DAPNet.github.io
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Grasp-Anything: Large-scale Grasp Dataset from Foundation Models
Authors:
An Dinh Vuong,
Minh Nhat Vu,
Hieu Le,
Baoru Huang,
Binh Huynh,
Thieu Vo,
Andreas Kugi,
Anh Nguyen
Abstract:
Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately…
▽ More
Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately, foundation models possess an extensive repository of real-world knowledge, including objects we encounter in our daily lives. As a consequence, a promising solution to the limited representation in previous grasp datasets is to harness the universal knowledge embedded in these foundation models. We present Grasp-Anything, a new large-scale grasp dataset synthesized from foundation models to implement this solution. Grasp-Anything excels in diversity and magnitude, boasting 1M samples with text descriptions and more than 3M objects, surpassing prior datasets. Empirically, we show that Grasp-Anything successfully facilitates zero-shot grasp detection on vision-based tasks and real-world robotic experiments. Our dataset and code are available at https://grasp-anything-2023.github.io.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
A functional limit theorem for lattice oscillating random walk
Authors:
Marc Peigné,
Tran Duy Vo
Abstract:
The paper is devoted to an invariance principle for Kemperman's model of oscillating random walk on $\mathbb{Z}$. This result appears as an extension of the invariance principal theorem for classical random walks on $\mathbb{Z}$ or reflected random walks on $\mathbb{N}_0$. Relying on some natural Markov sub-process which takes into account the oscillation of the random walks between…
▽ More
The paper is devoted to an invariance principle for Kemperman's model of oscillating random walk on $\mathbb{Z}$. This result appears as an extension of the invariance principal theorem for classical random walks on $\mathbb{Z}$ or reflected random walks on $\mathbb{N}_0$. Relying on some natural Markov sub-process which takes into account the oscillation of the random walks between $\mathbb{Z}^-$ and $\mathbb{Z}^+$, we first construct an aperiodic sequence of renewal operators acting on a suitable Banach space and then apply a powerful theorem proved by S. Gouëzel.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Federated Causal Inference from Observational Data
Authors:
Thanh Vinh Vo,
Young lee,
Tze-Yun Leong
Abstract:
Decentralized data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints. The presence of dissimilar data distributions and missing values within them can potentially introduce bias to the causal estimands. In this article, we propose a framework to estimate causal…
▽ More
Decentralized data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints. The presence of dissimilar data distributions and missing values within them can potentially introduce bias to the causal estimands. In this article, we propose a framework to estimate causal effects from decentralized data sources. The proposed framework avoid exchanging raw data among the sources, thus contributing towards privacy-preserving causal learning. Three instances of the proposed framework are introduced to estimate causal effects across a wide range of diverse scenarios within a federated setting. (1) FedCI: a Bayesian framework based on Gaussian processes for estimating causal effects from federated observational data sources. It estimates the posterior distributions of the causal effects to compute the higher-order statistics that capture the uncertainty. (2) CausalRFF: an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. It estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. (3) CausalFI: a new approach for federated causal inference from incomplete data, enabling the estimation of causal effects from multiple decentralized and incomplete data sources. It accounts for the missing data under the missing at random assumption, while also estimating higher-order statistics of the causal estimands. The proposed federated framework and its instances are an important step towards a privacy-preserving causal learning model.
△ Less
Submitted 30 May, 2024; v1 submitted 24 August, 2023;
originally announced August 2023.
-
UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering
Authors:
Triet M. Thai,
Anh T. Vo,
Hao K. Tieu,
Linh N. P. Bui,
Thien T. B. Nguyen
Abstract:
In recent years, artificial intelligence has played an important role in medicine and disease diagnosis, with many applications to be mentioned, one of which is Medical Visual Question Answering (MedVQA). By combining computer vision and natural language processing, MedVQA systems can assist experts in extracting relevant information from medical image based on a given question and providing preci…
▽ More
In recent years, artificial intelligence has played an important role in medicine and disease diagnosis, with many applications to be mentioned, one of which is Medical Visual Question Answering (MedVQA). By combining computer vision and natural language processing, MedVQA systems can assist experts in extracting relevant information from medical image based on a given question and providing precise diagnostic answers. The ImageCLEFmed-MEDVQA-GI-2023 challenge carried out visual question answering task in the gastrointestinal domain, which includes gastroscopy and colonoscopy images. Our team approached Task 1 of the challenge by proposing a multimodal learning method with image enhancement to improve the VQA performance on gastrointestinal images. The multimodal architecture is set up with BERT encoder and different pre-trained vision models based on convolutional neural network (CNN) and Transformer architecture for features extraction from question and endoscopy image. The result of this study highlights the dominance of Transformer-based vision models over the CNNs and demonstrates the effectiveness of the image enhancement process, with six out of the eight vision models achieving better F1-Score. Our best method, which takes advantages of BERT+BEiT fusion and image enhancement, achieves up to 87.25% accuracy and 91.85% F1-Score on the development test set, while also producing good result on the private test set with accuracy of 82.01%.
△ Less
Submitted 19 November, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation
Authors:
An Dinh Vuong,
Toan Tien Nguyen,
Minh Nhat VU,
Baoru Huang,
Dzung Nguyen,
Huynh Thi Thanh Binh,
Thieu Vo,
Anh Nguyen
Abstract:
Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics hav…
▽ More
Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics have several limitations, particularly in terms of computational efficiency, which is a promise of E-AI simulators. To overcome these shortcomings, we introduce HabiCrowd, the first standard benchmark for crowd-aware visual navigation that integrates a crowd dynamics model with diverse human settings into photorealistic environments. Empirical evaluations demonstrate that our proposed human dynamics model achieves state-of-the-art performance in collision avoidance, while exhibiting superior computational efficiency compared to its counterparts. We leverage HabiCrowd to conduct several comprehensive studies on crowd-aware visual navigation tasks and human-robot interactions. The source code and data can be found at https://habicrowd.github.io/.
△ Less
Submitted 29 July, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Dissipation, quantum coherence, and asymmetry of finite-time cross-correlations
Authors:
Tan Van Vu,
Van Tuan Vo,
Keiji Saito
Abstract:
Recent studies have revealed a deep connection between the asymmetry of cross-correlations and thermodynamic quantities in the short-time limit. In this study, we address the finite-time domain of the asymmetry for both open classical and quantum systems. Focusing on Markovian dynamics, we show that the asymmetry observed in finite-time cross-correlations is upper bounded by dissipation. We prove…
▽ More
Recent studies have revealed a deep connection between the asymmetry of cross-correlations and thermodynamic quantities in the short-time limit. In this study, we address the finite-time domain of the asymmetry for both open classical and quantum systems. Focusing on Markovian dynamics, we show that the asymmetry observed in finite-time cross-correlations is upper bounded by dissipation. We prove that, for classical systems in a steady state with arbitrary operational durations, the asymmetry exhibits, at most, linear growth over time, with the growth speed determined by the rates of entropy production and dynamical activity. In the long-time regime, the asymmetry exhibits exponential decay, with the decay rate determined by the spectral gap of the transition matrix. Remarkably, for quantum cases, quantum coherence is equally important as dissipation in constraining the asymmetry of correlations. We demonstrate an example where only quantum coherence bounds the asymmetry while the entropy production rate vanishes. Furthermore, we generalize the short-time bounds on correlation asymmetry, as reported by Shiraishi [Phys. Rev. E 108, L042103 (2023)] and Ohga et al. [Phys. Rev. Lett. 131, 077101 (2023)], to encompass finite-time scenarios. These findings offer novel insights into the thermodynamic aspects of correlation asymmetry.
△ Less
Submitted 19 February, 2024; v1 submitted 29 May, 2023;
originally announced May 2023.
-
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models
Authors:
Xuan-Quy Dao,
Ngoc-Bich Le,
The-Duy Vo,
Xuan-Dung Phan,
Bac-Bien Ngo,
Van-Tien Nguyen,
Thi-My-Thanh Nguyen,
Hong-Phuoc Nguyen
Abstract:
The VNHSGE (VietNamese High School Graduation Examination) dataset, developed exclusively for evaluating large language models (LLMs), is introduced in this article. The dataset, which covers nine subjects, was generated from the Vietnamese National High School Graduation Examination and comparable tests. 300 literary essays have been included, and there are over 19,000 multiple-choice questions o…
▽ More
The VNHSGE (VietNamese High School Graduation Examination) dataset, developed exclusively for evaluating large language models (LLMs), is introduced in this article. The dataset, which covers nine subjects, was generated from the Vietnamese National High School Graduation Examination and comparable tests. 300 literary essays have been included, and there are over 19,000 multiple-choice questions on a range of topics. The dataset assesses LLMs in multitasking situations such as question answering, text generation, reading comprehension, visual question answering, and more by including both textual data and accompanying images. Using ChatGPT and BingChat, we evaluated LLMs on the VNHSGE dataset and contrasted their performance with that of Vietnamese students to see how well they performed. The results show that ChatGPT and BingChat both perform at a human level in a number of areas, including literature, English, history, geography, and civics education. They still have space to grow, though, especially in the areas of mathematics, physics, chemistry, and biology. The VNHSGE dataset seeks to provide an adequate benchmark for assessing the abilities of LLMs with its wide-ranging coverage and variety of activities. We intend to promote future developments in the creation of LLMs by making this dataset available to the scientific community, especially in resolving LLMs' limits in disciplines involving mathematics and the natural sciences.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction
Authors:
Tu T. Do,
Mai Anh Vu,
Tuan L. Vo,
Hoang Thien Ly,
Thu Nguyen,
Steven A. Hicks,
Michael A. Riegler,
Pål Halvorsen,
Binh T. Nguyen
Abstract:
Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Pri…
▽ More
Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Principal Component Analysis (PCA) on the observed part of each monotone block of the data and then imputes on merging the obtained principal components using a chosen imputation technique. BPI can work with various imputation techniques and can significantly reduce imputation time compared to conducting dimensionality reduction after imputation. This makes it a practical and efficient approach for large datasets with monotone missing data. Our experiments validate the improvement in speed. In addition, our experiments also show that while applying MICE imputation directly on missing data may not yield convergence, applying BPI with MICE for the data may lead to convergence.
△ Less
Submitted 10 January, 2024; v1 submitted 10 May, 2023;
originally announced May 2023.
-
OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese
Authors:
Nghia Hieu Nguyen,
Duong T. D. Vo,
Kiet Van Nguyen,
Ngan Luu-Thuy Nguyen
Abstract:
In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the informati…
▽ More
In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers. Neural visual question answering models have achieved tremendous growth on large-scale datasets which are mostly for resource-rich languages such as English. However, available datasets narrow the VQA task as the answers selection task or answer classification task. We argue that this form of VQA is far from human ability and eliminates the challenge of the answering aspect in the VQA task by just selecting answers rather than generating them. In this paper, we introduce the OpenViVQA (Open-domain Vietnamese Visual Question Answering) dataset, the first large-scale dataset for VQA with open-ended answers in Vietnamese, consists of 11,000+ images associated with 37,000+ question-answer pairs (QAs). Moreover, we proposed FST, QuMLAG, and MLPAG which fuse information from images and answers, then use these fused features to construct answers as humans iteratively. Our proposed methods achieve results that are competitive with SOTA models such as SAAA, MCAN, LORA, and M4C. The dataset is available to encourage the research community to develop more generalized algorithms including transformers for low-resource languages such as Vietnamese.
△ Less
Submitted 6 May, 2023;
originally announced May 2023.
-
Multipole Expansion for the Electron-Nucleus Scattering at High Energies in the Unified Electroweak Theory
Authors:
Z. P. Luong,
M. T. Vo
Abstract:
The paper presents the multipole expansion for the electron-nucleus scattering cross section at high energies within the framework of the unified electroweak theory. The electroweak currents of the nucleus are expanded into the simple components with definite angular momenta, called the multipole form factors. The multipole expansion of the cross section is a consequence of the above expansion. Be…
▽ More
The paper presents the multipole expansion for the electron-nucleus scattering cross section at high energies within the framework of the unified electroweak theory. The electroweak currents of the nucleus are expanded into the simple components with definite angular momenta, called the multipole form factors. The multipole expansion of the cross section is a consequence of the above expansion. Besides the familiar electromagnetic form factors, there are also the vector and axial form factors, respectively, related to weak interactions. To determine multipole form factors, general formulas for the calculation of reduced matrix elements are established using the fractional parentage coefficient method and the multiparticle shell model. Calculation of them enables us to obtain more detailed information about the nuclear structure and elucidate the role played by the weak interaction in the high-energy reaction mechanisms.
△ Less
Submitted 5 August, 2023; v1 submitted 17 April, 2023;
originally announced April 2023.
-
CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting
Authors:
Simon Graham,
Quoc Dang Vu,
Mostafa Jahanifar,
Martin Weigert,
Uwe Schmidt,
Wenhua Zhang,
Jun Zhang,
Sen Yang,
Jinxi Xiang,
Xiyue Wang,
Josef Lorenz Rumberger,
Elias Baumann,
Peter Hirsch,
Lihao Liu,
Chenyang Hong,
Angelica I. Aviles-Rivero,
Ayushi Jain,
Heeyoung Ahn,
Yiyu Hong,
Hussam Azzuni,
Min Xu,
Mohammad Yaqub,
Marie-Claire Blache,
Benoît Piégu,
Bertrand Vernay
, et al. (64 additional authors not shown)
Abstract:
Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of repro…
▽ More
Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of reproducible algorithms for cellular recognition with real-time result inspection on public leaderboards. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. With around 700 million detected nuclei per model, associated features were used for dysplasia grading and survival analysis, where we demonstrated that the challenge's improvement over the previous state-of-the-art led to significant boosts in downstream performance. Our findings also suggest that eosinophils and neutrophils play an important role in the tumour microevironment. We release challenge models and WSI-level results to foster the development of further methods for biomarker discovery.
△ Less
Submitted 14 March, 2023; v1 submitted 10 March, 2023;
originally announced March 2023.
-
Open-Vocabulary Affordance Detection in 3D Point Clouds
Authors:
Toan Nguyen,
Minh Nhat Vu,
An Vuong,
Dzung Nguyen,
Thieu Vo,
Ngan Le,
Anh Nguyen
Abstract:
Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of dete…
▽ More
Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of detecting an unbounded number of affordances in 3D point clouds. By simultaneously learning the affordance text and the point feature, OpenAD successfully exploits the semantic relationships between affordances. Therefore, our proposed method enables zero-shot detection and can be able to detect previously unseen affordances without a single annotation example. Intensive experimental results show that OpenAD works effectively on a wide range of affordance detection setups and outperforms other baselines by a large margin. Additionally, we demonstrate the practicality of the proposed OpenAD in real-world robotic applications with a fast inference speed (~100ms). Our project is available at https://openad2023.github.io.
△ Less
Submitted 23 July, 2023; v1 submitted 4 March, 2023;
originally announced March 2023.
-
EVJVQA Challenge: Multilingual Visual Question Answering
Authors:
Ngan Luu-Thuy Nguyen,
Nghia Hieu Nguyen,
Duong T. D Vo,
Khanh Quoc Tran,
Kiet Van Nguyen
Abstract:
Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addi…
▽ More
Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addition, there is no multilingual dataset targeting the visual content of a particular country with its own objects and cultural characteristics. To address the weakness, we provide the research community with a benchmark dataset named EVJVQA, including 33,000+ pairs of question-answer over three languages: Vietnamese, English, and Japanese, on approximately 5,000 images taken from Vietnam for evaluating multilingual VQA systems or models. EVJVQA is used as a benchmark dataset for the challenge of multilingual visual question answering at the 9th Workshop on Vietnamese Language and Speech Processing (VLSP 2022). This task attracted 62 participant teams from various universities and organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 0.4392 in F1-score and 0.4009 in BLUE on the private test set. The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture. EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems. We released the challenge on the Codalab evaluation system for further research.
△ Less
Submitted 17 April, 2024; v1 submitted 22 February, 2023;
originally announced February 2023.
-
Chemical Mechanical Planarization for Ta-based Superconducting Quantum Devices
Authors:
Ekta Bhatia,
Soumen Kar,
Jakub Nalaskowski,
Tuan Vo,
Stephen Olson,
Hunter Frost,
John Mucci,
Brian Martinick,
Pui Yee Hung,
Ilyssa Wells,
Sandra Schujman,
Satyavolu S. Papa Rao
Abstract:
We report on the development of a chemical mechanical planarization (CMP) process for thick damascene Ta structures with pattern feature sizes down to 100 nm. This CMP process is the core of the fabrication sequence for scalable superconducting integrated circuits at 300 mm wafer scale. This work has established the elements of the various CMP-related design rules that can be followed by a designe…
▽ More
We report on the development of a chemical mechanical planarization (CMP) process for thick damascene Ta structures with pattern feature sizes down to 100 nm. This CMP process is the core of the fabrication sequence for scalable superconducting integrated circuits at 300 mm wafer scale. This work has established the elements of the various CMP-related design rules that can be followed by a designer for the layout of circuits that include Ta-based coplanar waveguide resonators, capacitors, and interconnects for tantalum-based qubits and single flux quantum (SFQ) circuits. The fabrication of these structures utilizes 193 nm optical lithography, along with 300 mm process tools for dielectric deposition, reactive ion etch, wet-clean, CMP and in-line metrology, all tools typical for a 300 mm wafer CMOS foundry. Process development was guided by measurements of physical and electrical characteristics of the planarized structures. Physical characterization such as atomic force microscopy across the 300 mm wafer surface showed local topography was less than 5 nm. Electrical characterization confirmed low leakage at room temperature, and less than 12% within wafer sheet resistance variation, for damascene Ta line-widths ranging from 100 nm to 3 μm. Run-to-run reproducibility was also evaluated. Effects of process integration choices including deposited thickness of Ta are discussed.
△ Less
Submitted 15 February, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects
Authors:
Thanh Vinh Vo,
Arnab Bhattacharyya,
Young Lee,
Tze-Yun Leong
Abstract:
We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have…
▽ More
We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have different distributions; the causal effects are independently and systematically incorporated. The proposed method estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. The heterogeneous causal effects can be estimated with no sharing of the raw training data among the sources, thus minimizing the risk of privacy leak. We also provide minimax lower bounds to assess the quality of the parameters learned from the disparate sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.
△ Less
Submitted 31 December, 2022;
originally announced January 2023.
-
Sheaf-theoretic self-filtering network of low-cost sensors for local air quality monitoring: A causal approach
Authors:
Anh-Duy Pham,
Chuong Dinh Le,
Hoang Viet Pham,
Thinh Gia Tran,
Dat Thanh Vo,
Chau Long Tran,
An Dinh Le,
Hien Bich Vo
Abstract:
Sheaf theory, which is a complex but powerful tool supported by topological theory, offers more flexibility and precision than traditional graph theory when it comes to modeling relationships between multiple features. In the realm of air quality monitoring, this can be incredibly useful in detecting sudden changes in local dust particle density, which can be difficult to accurately measure using…
▽ More
Sheaf theory, which is a complex but powerful tool supported by topological theory, offers more flexibility and precision than traditional graph theory when it comes to modeling relationships between multiple features. In the realm of air quality monitoring, this can be incredibly useful in detecting sudden changes in local dust particle density, which can be difficult to accurately measure using commercial instruments. Traditional methods for air quality measurement often rely on calibrating the measurement with public standard instruments or calculating the measurements moving average over a constant period. However, this can lead to an incorrect index at the measurement location, as well as an oversmoothing effect on the signal. In this study, we propose a compact device that uses sheaf theory to detect and count vehicles as a local air quality change-causing factor. By inferring the number of vehicles into the PM2.5 index and propagating it into the recorded PM2.5 index from low-cost air monitoring sensors such as PMS7003 and BME280, we can achieve self-correction in real-time. Plus, the sheaf-theoretic method allows for easy scaling to multiple nodes for further filtering effects. By implementing sheaf theory in air quality monitoring, we can overcome the limitations of traditional methods and provide more accurate and reliable results.
△ Less
Submitted 29 December, 2022;
originally announced December 2022.
-
Fronts in the wake of a parameter ramp: slow passage through pitchfork and fold bifurcations
Authors:
Ryan Goh,
Tasso J. Kaper,
Arnd Scheel,
Theodore Vo
Abstract:
This work studies front formation in the Allen-Cahn equation with a parameter heterogeneity which slowly varies in space. In particular, we consider a heterogeneity which mediates the local stability of the zero state and subsequent pitchfork bifurcation to a non-trivial state. For slowly-varying ramps which are either rigidly propagating in time or stationary, we rigorously establish existence an…
▽ More
This work studies front formation in the Allen-Cahn equation with a parameter heterogeneity which slowly varies in space. In particular, we consider a heterogeneity which mediates the local stability of the zero state and subsequent pitchfork bifurcation to a non-trivial state. For slowly-varying ramps which are either rigidly propagating in time or stationary, we rigorously establish existence and stability of positive, monotone fronts and give leading order expansions for their interface location. For non-zero ramp speeds, and sufficiently small ramp slopes, the front location is determined by the local transition between convective and absolute instability of the base state and leads to an O(1) delay beyond the instantaneous pitchfork location before the system jumps to a nontrivial state. The slow ramp induces a further delay of the interface controlled by a slow-passage through a fold of strong- and weak-stable eigenspaces of the associated linearization. We introduce projective coordinates to de-singularize the dynamics near the trivial state and track relevant invariant manifolds all the way to the fold point. We then use geometric singular perturbation theory and blow-up techniques to locate the desired intersection of invariant manifolds. For stationary ramps, the front is governed by the slow passage through the instantaneous pitchfork bifurcation with inner expansion given by the unique Hastings-McLeod connecting solution of Painlevé's second equation. We once again use geometric singular perturbation theory and blow-up to track invariant manifolds into a neighborhood of the non-hyperbolic point where the ramp passes through zero and to locate intersections.
△ Less
Submitted 18 December, 2022;
originally announced December 2022.
-
Scalable, low-cost, and versatile system design for air pollution and traffic density monitoring and analysis
Authors:
Thinh Gia Tran,
Dat Thanh Vo,
Long Chau Tran,
Hoang Viet Pham,
Chuong Dinh Le,
An Dinh Le,
Duy Anh Pham,
Hien Bich Vo
Abstract:
Vietnam requires a sustainable urbanization, for which city sensing is used in planning and de-cision-making. Large cities need portable, scalable, and inexpensive digital technology for this purpose. End-to-end air quality monitoring companies such as AirVisual and Plume Air have shown their reliability with portable devices outfitted with superior air sensors. They are pricey, yet homeowners use…
▽ More
Vietnam requires a sustainable urbanization, for which city sensing is used in planning and de-cision-making. Large cities need portable, scalable, and inexpensive digital technology for this purpose. End-to-end air quality monitoring companies such as AirVisual and Plume Air have shown their reliability with portable devices outfitted with superior air sensors. They are pricey, yet homeowners use them to get local air data without evaluating the causal effect. Our air quality inspection system is scalable, reasonably priced, and flexible. Minicomputer of the sys-tem remotely monitors PMS7003 and BME280 sensor data through a microcontroller processor. The 5-megapixel camera module enables researchers to infer the causal relationship between traffic intensity and dust concentration. The design enables inexpensive, commercial-grade hardware, with Azure Blob storing air pollution data and surrounding-area imagery and pre-venting the system from physically expanding. In addition, by including an air channel that re-plenishes and distributes temperature, the design improves ventilation and safeguards electrical components. The gadget allows for the analysis of the correlation between traffic and air quali-ty data, which might aid in the establishment of sustainable urban development plans and poli-cies.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Perspectives on Novel Refractory Amorphous High-Entropy Alloys in Extreme Environments
Authors:
Matheus A. Tunes,
Hi T. Vo,
Jon K. S. Baldwin,
Tarik A. Saleh,
Saryu J. Fensin,
Osman El-Atwani
Abstract:
Two new refractory amorphous high-entropy alloys (RAHEAs) within the W--Ta--Cr--V and W--Ta--Cr--V--Hf systems were herein synthesized using magnetron-sputtering and tested under high-temperature annealing and displacing irradiation using \textit{in situ} Transmission Electron Microscopy. While the WTaCrV RAHEA was found to be unstable under such tests, additions of Hf in this system composing a n…
▽ More
Two new refractory amorphous high-entropy alloys (RAHEAs) within the W--Ta--Cr--V and W--Ta--Cr--V--Hf systems were herein synthesized using magnetron-sputtering and tested under high-temperature annealing and displacing irradiation using \textit{in situ} Transmission Electron Microscopy. While the WTaCrV RAHEA was found to be unstable under such tests, additions of Hf in this system composing a new quinary WTaCrVHf RAHEA was found to be a route to achieve stability both under annealing and irradiation. A new effect of nanoprecipitate reassembling observed to take place within the WTaCrVHf RAHEA under irradiation indicates that a duplex microstructure composed of an amorphous matrix with crystalline nanometer-sized precipitates enhances the radiation response of the system. It is demonstrated that tunable chemical complexity arises as a new alloy design strategy to foster the use of novel RAHEAs within extreme environments. New perspectives for the alloy design and application of chemically-complex amorphous metallic alloys in extreme environments are presented with focus on their thermodynamic phase stability when subjected to high-temperature annealing and displacing irradiation.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
UIT-HWDB: Using Transferring Method to Construct A Novel Benchmark for Evaluating Unconstrained Handwriting Image Recognition in Vietnamese
Authors:
Nghia Hieu Nguyen,
Duong T. D. Vo,
Kiet Van Nguyen
Abstract:
Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not ma…
▽ More
Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not many datasets for researching handwriting recognition in Vietnamese, which makes handwriting recognition in this language have a barrier for researchers to approach. Recent works evaluated offline handwriting recognition methods in Vietnamese using images from an online handwriting dataset constructed by connecting pen stroke coordinates without further processing. This approach obviously can not measure the ability of recognition methods effectively, as it is trivial and may be lack of features that are essential in offline handwriting images. Therefore, in this paper, we propose the Transferring method to construct a handwriting image dataset that associates crucial natural attributes required for offline handwriting images. Using our method, we provide a first high-quality synthetic dataset which is complex and natural for efficiently evaluating handwriting recognition methods. In addition, we conduct experiments with various state-of-the-art methods to figure out the challenge to reach the solution for handwriting recognition in Vietnamese.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning
Authors:
Nghia Hieu Nguyen,
Duong T. D. Vo,
Minh-Quan Ha
Abstract:
Image captioning is currently a challenging task that requires the ability to both understand visual information and use human language to describe this visual information in the image. In this paper, we propose an efficient way to improve the image understanding ability of transformer-based method by extending Object Relation Transformer architecture with Attention on Attention mechanism. Experim…
▽ More
Image captioning is currently a challenging task that requires the ability to both understand visual information and use human language to describe this visual information in the image. In this paper, we propose an efficient way to improve the image understanding ability of transformer-based method by extending Object Relation Transformer architecture with Attention on Attention mechanism. Experiments on the VieCap4H dataset show that our proposed method significantly outperforms its original structure on both the public test and private test of the Image Captioning shared task held by VLSP.
△ Less
Submitted 20 March, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
Edge, Fog, and Cloud Computing : An Overview on Challenges and Applications
Authors:
Thong Vo,
Pranjal Dave,
Gaurav Bajpai,
Rasha Kashef
Abstract:
With the rapid growth of the Internet of Things (IoT) and a wide range of mobile devices, the conventional cloud computing paradigm faces significant challenges (high latency, bandwidth cost, etc.). Motivated by those constraints and concerns for the future of the IoT, modern architectures are gearing toward distributing the cloud computational resources to remote locations where most end-devices…
▽ More
With the rapid growth of the Internet of Things (IoT) and a wide range of mobile devices, the conventional cloud computing paradigm faces significant challenges (high latency, bandwidth cost, etc.). Motivated by those constraints and concerns for the future of the IoT, modern architectures are gearing toward distributing the cloud computational resources to remote locations where most end-devices are located. Edge and fog computing are considered as the key enablers for applications where centralized cloud-based solutions are not suitable. In this paper, we review the high-level definition of edge, fog, cloud computing, and their configurations in various IoT scenarios. We further discuss their interactions and collaborations in many applications such as cloud offloading, smart cities, health care, and smart agriculture. Though there are still challenges in the development of such distributed systems, early research to tackle those limitations have also surfaced.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
An innovative materials design protocol for the development of novel refractory high-entropy alloys for extreme environments
Authors:
O. El Atwani,
H. T. Vo,
M. Tunes,
C. Lee,
A. Alvarado,
N. Krienke,
J. D. Poplawsky,
A. A. Kohnert,
J. Gigax,
W. -Y. Chen,
M. Li,
Y. Wang,
J. S. Wróbel,
Duc Nguyen-Manh,
J. K. S. Baldwin,
U. Tukac,
E. Aydogan,
S. Fensin,
E. Martinez
Abstract:
In the quest of new materials that can withstand severe irradiation and mechanical extremes for advanced applications (e.g. fission reactors, fusion devices, space applications, etc), design, prediction and control of advanced materials beyond current material designs become a paramount goal. Here, though a combined experimental and simulation methodology, the design of a new nanocrystalline refra…
▽ More
In the quest of new materials that can withstand severe irradiation and mechanical extremes for advanced applications (e.g. fission reactors, fusion devices, space applications, etc), design, prediction and control of advanced materials beyond current material designs become a paramount goal. Here, though a combined experimental and simulation methodology, the design of a new nanocrystalline refractory high entropy alloy (RHEA) system is established. Compositions of this alloy, assessed under extreme environments and in situ electron-microscopy, revealed both high mechanical strength and thermal stability, grain refinement under heavy ion irradiation and outstanding irradiation resistance to dual-beam irradiation and helium implantation, marked by remarkable resistance to defect generation, growth and coalescence. The experimental and modeling results, which demonstrated notable agreement, can be applied to design and rapidly assess other alloys subjected to extreme environmental conditions.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Radiation-resistant aluminium alloy for space missions in the extreme environment of the solar system
Authors:
Patrick D. Willenshofer,
Matheus A. Tunes,
Ho T. Vo,
Lukas Stemper,
Oliver Renk,
Graeme Greaves,
Peter J. Uggowitzer,
Stefan Pogatscher
Abstract:
Future human-based exploration of our solar system requires the invention of materials that can resist harsh environments. Age-hardenable aluminium alloys would be attractive candidates for structural components in long-distance spacecrafts, but their radiation resistance to solar energetic particles is insufficient. Common hardening phases dissolve and displacement damage occurs in the alloy matr…
▽ More
Future human-based exploration of our solar system requires the invention of materials that can resist harsh environments. Age-hardenable aluminium alloys would be attractive candidates for structural components in long-distance spacecrafts, but their radiation resistance to solar energetic particles is insufficient. Common hardening phases dissolve and displacement damage occurs in the alloy matrix, which strongly degrades properties. Here we present an alloy where hardening is achieved by T-phase, featuring a giant unit cell and highly-negative enthalpy of formation. The phase shows record radiation survivability and can stabilize an ultrafine-grained structure upon temperature and radiation in the alloy, therby successfully preventing displacement damage to occur. Such concept can be considered ideal for the next-generation space materials and the design of radiation resistant alloy.
△ Less
Submitted 12 October, 2022; v1 submitted 7 October, 2022;
originally announced October 2022.
-
Structural mean models for instrumented difference-in-differences
Authors:
Tat-Thang Vo,
Ting Ye,
Ashkan Ertefaie,
Samrat Roy,
James Flory,
Sean Hennessy,
Stijn Vansteelandt,
Dylan S. Small
Abstract:
In the standard difference-in-differences research design, the parallel trends assumption may be violated when the relationship between the exposure trend and the outcome trend is confounded by unmeasured confounders. Progress can be made if there is an exogenous variable that (i) does not directly influence the change in outcome means (i.e. the outcome trend) except through influencing the change…
▽ More
In the standard difference-in-differences research design, the parallel trends assumption may be violated when the relationship between the exposure trend and the outcome trend is confounded by unmeasured confounders. Progress can be made if there is an exogenous variable that (i) does not directly influence the change in outcome means (i.e. the outcome trend) except through influencing the change in exposure means (i.e. the exposure trend), and (ii) is not related to the unmeasured exposure - outcome confounders on the trend scale. Such exogenous variable is called an instrument for difference-in-differences. For continuous outcomes that lend themselves to linear modelling, so-called instrumented difference-in-differences methods have been proposed. In this paper, we will suggest novel multiplicative structural mean models for instrumented difference-in-differences, which allow one to identify and estimate the average treatment effect on count and rare binary outcomes, in the whole population or among the treated, when a valid instrument for difference-in-differences is available. We discuss the identifiability of these models, then develop efficient semi-parametric estimation approaches that allow the use of flexible, data-adaptive or machine learning methods to estimate the nuisance parameters. We apply our proposal on health care data to investigate the risk of moderate to severe weight gain under sulfonylurea treatment compared to metformin treatment, among new users of antihyperglycemic drugs.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
UIT-ViCoV19QA: A Dataset for COVID-19 Community-based Question Answering on Vietnamese Language
Authors:
Triet Minh Thai,
Ngan Ha-Thao Chu,
Anh Tuan Vo,
Son T. Luu
Abstract:
For the last two years, from 2020 to 2021, COVID-19 has broken disease prevention measures in many countries, including Vietnam, and negatively impacted various aspects of human life and the social community. Besides, the misleading information in the community and fake news about the pandemic are also serious situations. Therefore, we present the first Vietnamese community-based question answerin…
▽ More
For the last two years, from 2020 to 2021, COVID-19 has broken disease prevention measures in many countries, including Vietnam, and negatively impacted various aspects of human life and the social community. Besides, the misleading information in the community and fake news about the pandemic are also serious situations. Therefore, we present the first Vietnamese community-based question answering dataset for developing question answering systems for COVID-19 called UIT-ViCoV19QA. The dataset comprises 4,500 question-answer pairs collected from trusted medical sources, with at least one answer and at most four unique paraphrased answers per question. Along with the dataset, we set up various deep learning models as baseline to assess the quality of our dataset and initiate the benchmark results for further research through commonly used metrics such as BLEU, METEOR, and ROUGE-L. We also illustrate the positive effects of having multiple paraphrased answers experimented on these models, especially on Transformer - a dominant architecture in the field of study.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Learning to diagnose common thorax diseases on chest radiographs from radiology reports in Vietnamese
Authors:
Thao T. B. Nguyen,
Tam M. Vo,
Thang V. Nguyen,
Hieu H. Pham,
Ha Q. Nguyen
Abstract:
We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country. To assess the efficacy of the proposed labeling technique, we…
▽ More
We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country. To assess the efficacy of the proposed labeling technique, we built a CXR dataset containing 9,752 studies and evaluated our pipeline using a subset of this dataset. With an F1-score of at least 0.9923, the evaluation demonstrates that our labeling tool performs precisely and consistently across all classes. After building the dataset, we train deep learning models that leverage knowledge transferred from large public CXR datasets. We employ a variety of loss functions to overcome the curse of imbalanced multi-label datasets and conduct experiments with various model architectures to select the one that delivers the best performance. Our best model (CheXpert-pretrained EfficientNet-B2) yields an F1-score of 0.6989 (95% CI 0.6740, 0.7240), AUC of 0.7912, sensitivity of 0.7064 and specificity of 0.8760 for the abnormal diagnosis in general. Finally, we demonstrate that our coarse classification (based on five specific locations of abnormalities) yields comparable results to fine classification (twelve pathologies) on the benchmark CheXpert dataset for general anomaly detection while delivering better performance in terms of the average performance of all classes.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
Heterogeneity assessment in causal data fusion problems
Authors:
Tat-Thang Vo,
Kara E. Rudolph,
Ivan Diaz
Abstract:
Previous works have formalized the conditions under which findings from a source population could be reasonably extrapolated to another target population, the so-called "transportability" problem. While most of these works focus on a setting with two populations, many recent works have also provided the identifiability of a causal parameter when multiple data sources are available, under certain h…
▽ More
Previous works have formalized the conditions under which findings from a source population could be reasonably extrapolated to another target population, the so-called "transportability" problem. While most of these works focus on a setting with two populations, many recent works have also provided the identifiability of a causal parameter when multiple data sources are available, under certain homogeneity assumptions. However, we know of little work examining transportability when data sources are possibly heterogeneous, e.g. in the distribution of mediators of the exposure-outcome relation. The presence of such heterogeneity generally invalidates the transportability assumption required in most of the literature. In this paper, we will propose a general approach for heterogeneity assessment when estimating the average exposure effect in a target population, with mediator and outcome data obtained from multiple external sources. To account for heterogeneity, we define different effect estimands when the mediator and outcome information is transported from different sources. We discuss the causal assumptions to identify these estimands, then propose efficient semi-parametric estimation strategies that allow the use of flexible data-adaptive machine learning methods to estimate the nuisance parameters. We also propose two new methods to investigate sources of heterogeneity in the transported estimates. These methods will inform users about how much of the observed statistical heterogeneity in the transported effects is due to the differences across data sources in: 1) conditional distribution of mediator variables, and/or 2) conditional distribution of the outcome. We illustrate the proposed methods using four sites that were part of the Moving to Opportunity Study, which was an experiment that randomized housing voucher receipt to participating families living in public housing.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Value-Offset Bifiltrations for Digital Images
Authors:
Anway De,
Thong Vo,
Matthew Wright
Abstract:
Persistent homology, an algebraic method for discerning structure in abstract data, relies on the construction of a sequence of nested topological spaces known as a filtration. Two-parameter persistent homology allows the analysis of data simultaneously filtered by two parameters, but requires a bifiltration -- a sequence of topological spaces simultaneously indexed by two parameters. To apply two…
▽ More
Persistent homology, an algebraic method for discerning structure in abstract data, relies on the construction of a sequence of nested topological spaces known as a filtration. Two-parameter persistent homology allows the analysis of data simultaneously filtered by two parameters, but requires a bifiltration -- a sequence of topological spaces simultaneously indexed by two parameters. To apply two-parameter persistence to digital images, we first must consider bifiltrations constructed from digital images, which have scarcely been studied. We introduce the value-offset bifiltration for grayscale digital image data. We present efficient algorithms for computing this bifiltration with respect to the taxicab distance and for approximating it with respect to the Euclidean distance. We analyze the runtime complexity of our algorithms, demonstrate the results on sample images, and contrast the bifiltrations obtained from real images with those obtained from random noise.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.