Search | arXiv e-print repository

arXiv:2407.20249 [pdf, other]

Revisiting the Disequilibrium Issues in Tackling Heart Disease Classification Tasks

Authors: Thao Hoang, Linh Nguyen, Khoi Do, Duong Nguyen, Viet Dung Nguyen

Abstract: In the field of heart disease classification, two primary obstacles arise. Firstly, existing Electrocardiogram (ECG) datasets consistently demonstrate imbalances and biases across various modalities. Secondly, these time-series data consist of diverse lead signals, causing Convolutional Neural Networks (CNNs) to become overfitting to the one with higher power, hence diminishing the performance of… ▽ More In the field of heart disease classification, two primary obstacles arise. Firstly, existing Electrocardiogram (ECG) datasets consistently demonstrate imbalances and biases across various modalities. Secondly, these time-series data consist of diverse lead signals, causing Convolutional Neural Networks (CNNs) to become overfitting to the one with higher power, hence diminishing the performance of the Deep Learning (DL) process. In addition, when facing an imbalanced dataset, performance from such high-dimensional data may be susceptible to overfitting. Current efforts predominantly focus on enhancing DL models by designing novel architectures, despite these evident challenges, seemingly overlooking the core issues, therefore hindering advancements in heart disease classification. To address these obstacles, our proposed approach introduces two straightforward and direct methods to enhance the classification tasks. To address the high dimensionality issue, we employ a Channel-wise Magnitude Equalizer (CME) on signal-encoded images. This approach reduces redundancy in the feature data range, highlighting changes in the dataset. Simultaneously, to counteract data imbalance, we propose the Inverted Weight Logarithmic Loss (IWL) to alleviate imbalances among the data. When applying IWL loss, the accuracy of state-of-the-art models (SOTA) increases up to 5% in the CPSC2018 dataset. CME in combination with IWL also surpasses the classification results of other baseline models from 5% to 10%. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.20247 [pdf, other]

How Homogenizing the Channel-wise Magnitude Can Enhance EEG Classification Model?

Authors: Huyen Ngo, Khoi Do, Duong Nguyen, Viet Dung Nguyen, Lan Dang

Abstract: A significant challenge in the electroencephalogram EEG lies in the fact that current data representations involve multiple electrode signals, resulting in data redundancy and dominant lead information. However extensive research conducted on EEG classification focuses on designing model architectures without tackling the underlying issues. Otherwise, there has been a notable gap in addressing dat… ▽ More A significant challenge in the electroencephalogram EEG lies in the fact that current data representations involve multiple electrode signals, resulting in data redundancy and dominant lead information. However extensive research conducted on EEG classification focuses on designing model architectures without tackling the underlying issues. Otherwise, there has been a notable gap in addressing data preprocessing for EEG, leading to considerable computational overhead in Deep Learning (DL) processes. In light of these issues, we propose a simple yet effective approach for EEG data pre-processing. Our method first transforms the EEG data into an encoded image by an Inverted Channel-wise Magnitude Homogenization (ICWMH) to mitigate inter-channel biases. Next, we apply the edge detection technique on the EEG-encoded image combined with skip connection to emphasize the most significant transitions in the data while preserving structural and invariant information. By doing so, we can improve the EEG learning process efficiently without using a huge DL network. Our experimental evaluations reveal that we can significantly improve (i.e., from 2% to 5%) over current baselines. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.18839 [pdf, other]

Scalable Group Choreography via Variational Phase Manifold Learning

Authors: Nhat Le, Khoa Do, Xuan Bui, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen

Abstract: Generating group dance motion from the music is a challenging task with several industrial applications. Although several methods have been proposed to tackle this problem, most of them prioritize optimizing the fidelity in dancing movement, constrained by predetermined dancer counts in datasets. This limitation impedes adaptability to real-world applications. Our study addresses the scalability p… ▽ More Generating group dance motion from the music is a challenging task with several industrial applications. Although several methods have been proposed to tackle this problem, most of them prioritize optimizing the fidelity in dancing movement, constrained by predetermined dancer counts in datasets. This limitation impedes adaptability to real-world applications. Our study addresses the scalability problem in group choreography while preserving naturalness and synchronization. In particular, we propose a phase-based variational generative model for group dance generation on learning a generative manifold. Our method achieves high-fidelity group dance motion and enables the generation with an unlimited number of dancers while consuming only a minimal and constant amount of memory. The intensive experiments on two public datasets show that our proposed method outperforms recent state-of-the-art approaches by a large margin and is scalable to a great number of dancers beyond the training data. △ Less

Submitted 31 July, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

Comments: Accepted at ECCV 2024

arXiv:2406.07124 [pdf, other]

CHARME: A chain-based reinforcement learning approach for the minor embedding problem

Authors: Hoang M. Ngo, Nguyen H K. Do, Minh N. Vu, Tamer Kahveci, My T. Thai

Abstract: Quantum Annealing (QA) holds great potential for solving combinatorial optimization problems efficiently. However, the effectiveness of QA algorithms heavily relies on the embedding of problem instances, represented as logical graphs, into the quantum unit processing (QPU) whose topology is in form of a limited connectivity graph, known as the minor embedding Problem. Existing methods for the mino… ▽ More Quantum Annealing (QA) holds great potential for solving combinatorial optimization problems efficiently. However, the effectiveness of QA algorithms heavily relies on the embedding of problem instances, represented as logical graphs, into the quantum unit processing (QPU) whose topology is in form of a limited connectivity graph, known as the minor embedding Problem. Existing methods for the minor embedding problem suffer from scalability issues when confronted with larger problem sizes. In this paper, we propose a novel approach utilizing Reinforcement Learning (RL) techniques to address the minor embedding problem, named CHARME. CHARME includes three key components: a Graph Neural Network (GNN) architecture for policy modeling, a state transition algorithm ensuring solution validity, and an order exploration strategy for effective training. Through comprehensive experiments on synthetic and real-world instances, we demonstrate that the efficiency of our proposed order exploration strategy as well as our proposed RL framework, CHARME. In details, CHARME yields superior solutions compared to fast embedding methods such as Minorminer and ATOM. Moreover, our method surpasses the OCT-based approach, known for its slower runtime but high-quality solutions, in several cases. In addition, our proposed exploration enhances the efficiency of the training of the CHARME framework by providing better solutions compared to the greedy strategy. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.01557 [pdf, other]

Bayesian compositional regression with flexible microbiome feature aggregation and selection

Authors: Satabdi Saha, Liangliang Zhang, Kim-Anh Do, Christine B. Peterson

Abstract: Ongoing advances in microbiome profiling have allowed unprecedented insights into the molecular activities of microbial communities. This has fueled a strong scientific interest in understanding the critical role the microbiome plays in governing human health, by identifying microbial features associated with clinical outcomes of interest. Several aspects of microbiome data limit the applicability… ▽ More Ongoing advances in microbiome profiling have allowed unprecedented insights into the molecular activities of microbial communities. This has fueled a strong scientific interest in understanding the critical role the microbiome plays in governing human health, by identifying microbial features associated with clinical outcomes of interest. Several aspects of microbiome data limit the applicability of existing variable selection approaches. In particular, microbiome data are high-dimensional, extremely sparse, and compositional. Importantly, many of the observed features, although categorized as different taxa, may play related functional roles. To address these challenges, we propose a novel compositional regression approach that leverages the data-adaptive clustering and variable selection properties of the spiked Dirichlet process to identify taxa that exhibit similar functional roles. Our proposed method, Bayesian Regression with Agglomerated Compositional Effects using a dirichLET process (BRACElet), enables the identification of a sparse set of features with shared impacts on the outcome, facilitating dimension reduction and model interpretation. We demonstrate that BRACElet outperforms existing approaches for microbiome variable selection through simulation studies and an application elucidating the impact of oral microbiome composition on insulin resistance. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.16388 [pdf, other]

Multi-Reference Preference Optimization for Large Language Models

Authors: Hung Le, Quan Tran, Dung Nguyen, Kien Do, Saloni Mittal, Kelechi Ogueji, Svetha Venkatesh

Abstract: How can Large Language Models (LLMs) be aligned with human intentions and values? A typical solution is to gather human preference on model outputs and finetune the LLMs accordingly while ensuring that updates do not deviate too far from a reference model. Recent approaches, such as direct preference optimization (DPO), have eliminated the need for unstable and sluggish reinforcement learning opti… ▽ More How can Large Language Models (LLMs) be aligned with human intentions and values? A typical solution is to gather human preference on model outputs and finetune the LLMs accordingly while ensuring that updates do not deviate too far from a reference model. Recent approaches, such as direct preference optimization (DPO), have eliminated the need for unstable and sluggish reinforcement learning optimization by introducing close-formed supervised losses. However, a significant limitation of the current approach is its design for a single reference model only, neglecting to leverage the collective power of numerous pretrained LLMs. To overcome this limitation, we introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models, substantially enhancing preference learning capabilities compared to the single-reference DPO. Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance. Furthermore, MRPO effectively finetunes LLMs to exhibit superior performance in several downstream natural language processing tasks such as GSM8K and TruthfulQA. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 20 pages

arXiv:2404.11870 [pdf, ps, other]

Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory

Authors: Hung Le, Dung Nguyen, Kien Do, Svetha Venkatesh, Truyen Tran

Abstract: We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities. PANM facilitates pointer assignment, dereference, and arithmetic by explicitly usin… ▽ More We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities. PANM facilitates pointer assignment, dereference, and arithmetic by explicitly using physical pointers to access memory content. Remarkably, it can learn to perform these operations through end-to-end training on sequence data, powering various sequential models. Our experiments demonstrate PANM's exceptional length extrapolating capabilities and improved performance in tasks that require symbol processing, such as algorithmic reasoning and Dyck language recognition. PANM helps Transformer achieve up to 100% generalization accuracy in compositional learning tasks and significantly better results in mathematical reasoning, question answering and machine translation tasks. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2404.05393 [pdf, other]

PAT: Pixel-wise Adaptive Training for Long-tailed Segmentation

Authors: Khoi Do, Duong Nguyen, Nguyen H. Tran, Viet Dung Nguyen

Abstract: Beyond class frequency, we recognize the impact of class-wise relationships among various class-specific predictions and the imbalance in label masks on long-tailed segmentation learning. To address these challenges, we propose an innovative Pixel-wise Adaptive Training (PAT) technique tailored for long-tailed segmentation. PAT has two key features: 1) class-wise gradient magnitude homogenization,… ▽ More Beyond class frequency, we recognize the impact of class-wise relationships among various class-specific predictions and the imbalance in label masks on long-tailed segmentation learning. To address these challenges, we propose an innovative Pixel-wise Adaptive Training (PAT) technique tailored for long-tailed segmentation. PAT has two key features: 1) class-wise gradient magnitude homogenization, and 2) pixel-wise class-specific loss adaptation (PCLA). First, the class-wise gradient magnitude homogenization helps alleviate the imbalance among label masks by ensuring equal consideration of the class-wise impact on model updates. Second, PCLA tackles the detrimental impact of both rare classes within the long-tailed distribution and inaccurate predictions from previous training stages by encouraging learning classes with low prediction confidence and guarding against forgetting classes with high confidence. This combined approach fosters robust learning while preventing the model from forgetting previously learned knowledge. PAT exhibits significant performance improvements, surpassing the current state-of-the-art by 2.2% in the NyU dataset. Moreover, it enhances overall pixel-wise accuracy by 2.85% and intersection over union value by 2.07%, with a particularly notable declination of 0.39% in detecting rare classes compared to Balance Logits Variation, as demonstrated on the three popular datasets, i.e., OxfordPetIII, CityScape, and NYU. △ Less

Submitted 10 July, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.09986 [pdf, other]

doi 10.1145/3613904.3642614

Designing Sousveillance Tools for Gig Workers

Authors: Maya De Los Santos, Kimberly Do, Michael Muller, Saiph Savage

Abstract: As independently-contracted employees, gig workers disproportionately suffer the consequences of workplace surveillance, which include increased pressures to work, breaches of privacy, and decreased digital autonomy. Despite the negative impacts of workplace surveillance, gig workers lack the tools, strategies, and workplace social support to protect themselves against these harms. Meanwhile, some… ▽ More As independently-contracted employees, gig workers disproportionately suffer the consequences of workplace surveillance, which include increased pressures to work, breaches of privacy, and decreased digital autonomy. Despite the negative impacts of workplace surveillance, gig workers lack the tools, strategies, and workplace social support to protect themselves against these harms. Meanwhile, some critical theorists have proposed sousveillance as a potential means of countering such abuses of power, whereby those under surveillance monitor those in positions of authority (e.g., gig workers collect data about requesters/platforms). To understand the benefits of sousveillance systems in the gig economy, we conducted semi-structured interviews and led co-design activities with gig workers. We use "care ethics" as a guiding concept to understand our interview and co-design data, while also focusing on empathic sousveillance technology design recommendations. Through our study, we identify gig workers' attitudes towards and past experiences with sousveillance. We also uncover the type of sousveillance technologies imagined by workers, provide design recommendations, and finish by discussing how to create empowering, empathic spaces on gig platforms. △ Less

Submitted 23 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: Published as a conference paper at the ACM Conference on Human Factors in Computing Systems, CHI 2024, 3 figures, 30 pages

arXiv:2403.09875 [pdf, other]

Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting

Authors: Aiden Swann, Matthew Strong, Won Kyung Do, Gadiel Sznaier Camps, Mac Schwager, Monroe Kennedy III

Abstract: In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to impli… ▽ More In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. Please see our project page at http://armlabstanford.github.io/touch-gs △ Less

Submitted 18 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 8 pages, 7 figures

arXiv:2403.08997 [pdf, other]

Caltech Aerial RGB-Thermal Dataset in the Wild

Authors: Connor Lee, Matthew Anderson, Nikhil Raganathan, Xingxing Zuo, Kevin Do, Georgia Gkioxari, Soon-Jo Chung

Abstract: We present the first publicly-available RGB-thermal dataset designed for aerial robotics operating in natural environments. Our dataset captures a variety of terrain across the United States, including rivers, lakes, coastlines, deserts, and forests, and consists of synchronized RGB, thermal, global positioning, and inertial data. We provide semantic segmentation annotations for 10 classes commonl… ▽ More We present the first publicly-available RGB-thermal dataset designed for aerial robotics operating in natural environments. Our dataset captures a variety of terrain across the United States, including rivers, lakes, coastlines, deserts, and forests, and consists of synchronized RGB, thermal, global positioning, and inertial data. We provide semantic segmentation annotations for 10 classes commonly encountered in natural settings in order to drive the development of perception algorithms robust to adverse weather and nighttime conditions. Using this dataset, we propose new and challenging benchmarks for thermal and RGB-thermal (RGB-T) semantic segmentation, RGB-T image translation, and motion tracking. We present extensive results using state-of-the-art methods and highlight the challenges posed by temporal and geographical domain shifts in our data. The dataset and accompanying code is available at https://github.com/aerorobotics/caltech-aerial-rgbt-dataset. △ Less

Submitted 31 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted to ECCV 2024

arXiv:2402.03577 [pdf, other]

Revisiting the Dataset Bias Problem from a Statistical Perspective

Authors: Kien Do, Dung Nguyen, Hung Le, Thao Le, Dang Nguyen, Haripriya Harikumar, Truyen Tran, Santu Rana, Svetha Venkatesh

Abstract: In this paper, we study the "dataset bias" problem from a statistical standpoint, and identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b in the input x, represented by p(u|b) differing significantly from p(u). Since p(u|b) appears as part of the sampling distributions in the standard maximum log-likelihood (MLL) objective, a mod… ▽ More In this paper, we study the "dataset bias" problem from a statistical standpoint, and identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b in the input x, represented by p(u|b) differing significantly from p(u). Since p(u|b) appears as part of the sampling distributions in the standard maximum log-likelihood (MLL) objective, a model trained on a biased dataset via MLL inherently incorporates such correlation into its parameters, leading to poor generalization to unbiased test data. From this observation, we propose to mitigate dataset bias via either weighting the objective of each sample n by \frac{1}{p(u_{n}|b_{n})} or sampling that sample with a weight proportional to \frac{1}{p(u_{n}|b_{n})}. While both methods are statistically equivalent, the former proves more stable and effective in practice. Additionally, we establish a connection between our debiasing approach and causal reasoning, reinforcing our method's theoretical foundation. However, when the bias label is unavailable, computing p(u|b) exactly is difficult. To overcome this challenge, we propose to approximate \frac{1}{p(u|b)} using a biased classifier trained with "bias amplification" losses. Extensive experiments on various biased datasets demonstrate the superiority of our method over existing debiasing techniques in most settings, validating our theoretical analysis. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02977 [pdf, other]

Variational Flow Models: Flowing in Your Style

Authors: Kien Do, Duc Kieu, Toan Nguyen, Dang Nguyen, Hung Le, Dung Nguyen, Thin Nguyen

Abstract: We introduce "posterior flows" - generalizations of "probability flows" to a broader class of stochastic processes not necessarily diffusion processes - and propose a systematic training-free method to transform the posterior flow of a "linear" stochastic process characterized by the equation Xt = at * X0 + st * X1 into a straight constant-speed (SC) flow, reminiscent of Rectified Flow. This trans… ▽ More We introduce "posterior flows" - generalizations of "probability flows" to a broader class of stochastic processes not necessarily diffusion processes - and propose a systematic training-free method to transform the posterior flow of a "linear" stochastic process characterized by the equation Xt = at * X0 + st * X1 into a straight constant-speed (SC) flow, reminiscent of Rectified Flow. This transformation facilitates fast sampling along the original posterior flow without training a new model of the SC flow. The flexibility of our approach allows us to extend our transformation to inter-convert two posterior flows from distinct "linear" stochastic processes. Moreover, we can easily integrate high-order numerical solvers into the transformed SC flow, further enhancing sampling accuracy and efficiency. Rigorous theoretical analysis and extensive experimental results substantiate the advantages of our framework. △ Less

Submitted 29 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2310.18986 [pdf, other]

Controllable Group Choreography using Contrastive Diffusion

Authors: Nhat Le, Tuong Do, Khoa Do, Hien Nguyen, Erman Tjiputra, Quang D. Tran, Anh Nguyen

Abstract: Music-driven group choreography poses a considerable challenge but holds significant potential for a wide range of industrial applications. The ability to generate synchronized and visually appealing group dance motions that are aligned with music opens up opportunities in many fields such as entertainment, advertising, and virtual performances. However, most of the recent works are not able to ge… ▽ More Music-driven group choreography poses a considerable challenge but holds significant potential for a wide range of industrial applications. The ability to generate synchronized and visually appealing group dance motions that are aligned with music opens up opportunities in many fields such as entertainment, advertising, and virtual performances. However, most of the recent works are not able to generate high-fidelity long-term motions, or fail to enable controllable experience. In this work, we aim to address the demand for high-quality and customizable group dance generation by effectively governing the consistency and diversity of group choreographies. In particular, we utilize a diffusion-based generative approach to enable the synthesis of flexible number of dancers and long-term group dances, while ensuring coherence to the input music. Ultimately, we introduce a Group Contrastive Diffusion (GCD) strategy to enhance the connection between dancers and their group, presenting the ability to control the consistency or diversity level of the synthesized group animation via the classifier-guidance sampling technique. Through intensive experiments and evaluation, we demonstrate the effectiveness of our approach in producing visually captivating and consistent group dance motions. The experimental results show the capability of our method to achieve the desired levels of consistency and diversity, while maintaining the overall quality of the generated group choreography. The source code can be found at https://aioz-ai.github.io/GCD △ Less

Submitted 3 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

arXiv:2310.18598 [pdf, other]

Domain Generalisation via Risk Distribution Matching

Authors: Toan Nguyen, Kien Do, Bao Duong, Thin Nguyen

Abstract: We propose a novel approach for domain generalisation (DG) leveraging risk distributions to characterise domains, thereby achieving domain invariance. In our findings, risk distributions effectively highlight differences between training domains and reveal their inherent complexities. In testing, we may observe similar, or potentially intensifying in magnitude, divergences between risk distributio… ▽ More We propose a novel approach for domain generalisation (DG) leveraging risk distributions to characterise domains, thereby achieving domain invariance. In our findings, risk distributions effectively highlight differences between training domains and reveal their inherent complexities. In testing, we may observe similar, or potentially intensifying in magnitude, divergences between risk distributions. Hence, we propose a compelling proposition: Minimising the divergences between risk distributions across training domains leads to robust invariance for DG. The key rationale behind this concept is that a model, trained on domain-invariant or stable features, may consistently produce similar risk distributions across various domains. Building upon this idea, we propose Risk Distribution Matching (RDM). Using the maximum mean discrepancy (MMD) distance, RDM aims to minimise the variance of risk distributions across training domains. However, when the number of domains increases, the direct optimisation of variance leads to linear growth in MMD computations, resulting in inefficiency. Instead, we propose an approximation that requires only one MMD computation, by aligning just two distributions: that of the worst-case domain and the aggregated distribution from all domains. Notably, this method empirically outperforms optimising distributional variance while being computationally more efficient. Unlike conventional DG matching algorithms, RDM stands out for its enhanced efficacy by concentrating on scalar risk distributions, sidestepping the pitfalls of high-dimensional challenges seen in feature or gradient matching. Our extensive experiments on standard benchmark datasets demonstrate that RDM shows superior generalisation capability over state-of-the-art DG methods. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted at 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

arXiv:2310.18472 [pdf, other]

doi 10.21428/594757db.8bee12fd

Parameter-Efficient Methods for Metastases Detection from Clinical Notes

Authors: Maede Ashofteh Barabadi, Xiaodan Zhu, Wai Yip Chan, Amber L. Simpson, Richard K. G. Do

Abstract: Understanding the progression of cancer is crucial for defining treatments for patients. The objective of this study is to automate the detection of metastatic liver disease from free-style computed tomography (CT) radiology reports. Our research demonstrates that transferring knowledge using three approaches can improve model performance. First, we utilize generic language models (LMs), pretraine… ▽ More Understanding the progression of cancer is crucial for defining treatments for patients. The objective of this study is to automate the detection of metastatic liver disease from free-style computed tomography (CT) radiology reports. Our research demonstrates that transferring knowledge using three approaches can improve model performance. First, we utilize generic language models (LMs), pretrained in a self-supervised manner. Second, we use a semi-supervised approach to train our model by automatically annotating a large unlabeled dataset; this approach substantially enhances the model's performance. Finally, we transfer knowledge from related tasks by designing a multi-task transfer learning methodology. We leverage the recent advancement of parameter-efficient LM adaptation strategies to improve performance and training efficiency. Our dataset consists of CT reports collected at Memorial Sloan Kettering Cancer Center (MSKCC) over the course of 12 years. 2,641 reports were manually annotated by domain experts; among them, 841 reports have been annotated for the presence of liver metastases. Our best model achieved an F1-score of 73.8%, a precision of 84%, and a recall of 65.8%. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 6 pages, 1 figure, The 36th Canadian Conference on Artificial Intelligence

Journal ref: Barabadi, M. A., Zhu, X., Chan, W. Y., Simpson, A. L., & Do, R. K. G. (2023). Parameter-Efficient Methods for Metastases Detection fromClinical Notes. Proceedings of the Canadian Conference on Artificial Intelligence

arXiv:2309.14053 [pdf, other]

Revisiting LARS for Large Batch Training Generalization of Neural Networks

Authors: Khoi Do, Duong Nguyen, Hoa Nguyen, Long Tran-Thanh, Nguyen-Hoang Tran, Quoc-Viet Pham

Abstract: This paper explores Large Batch Training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings, uncovering insights. LARS algorithms with warm-up tend to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. B… ▽ More This paper explores Large Batch Training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings, uncovering insights. LARS algorithms with warm-up tend to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. Building on these findings, we propose Time Varying LARS (TVLARS), a novel algorithm that replaces warm-up with a configurable sigmoid-like function for robust training in the initial phase. TVLARS promotes gradient exploration early on, surpassing sharp optimizers and gradually transitioning to LARS for robustness in later phases. Extensive experiments demonstrate that TVLARS consistently outperforms LARS and LAMB in most cases, with up to 2\% improvement in classification scenarios. Notably, in all self-supervised learning cases, TVLARS dominates LARS and LAMB with performance improvements of up to 10\%. △ Less

Submitted 15 February, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.08860 [pdf, other]

DenseTact-Mini: An Optical Tactile Sensor for Grasping Multi-Scale Objects From Flat Surfaces

Authors: Won Kyung Do, Ankush Kundan Dhawan, Mathilda Kitzmann, Monroe Kennedy III

Abstract: Dexterous manipulation, especially of small daily objects, continues to pose complex challenges in robotics. This paper introduces the DenseTact-Mini, an optical tactile sensor with a soft, rounded, smooth gel surface and compact design equipped with a synthetic fingernail. We propose three distinct grasping strategies: tap grasping using adhesion forces such as electrostatic and van der Waals, fi… ▽ More Dexterous manipulation, especially of small daily objects, continues to pose complex challenges in robotics. This paper introduces the DenseTact-Mini, an optical tactile sensor with a soft, rounded, smooth gel surface and compact design equipped with a synthetic fingernail. We propose three distinct grasping strategies: tap grasping using adhesion forces such as electrostatic and van der Waals, fingernail grasping leveraging rolling/sliding contact between the object and fingernail, and fingertip grasping with two soft fingertips. Through comprehensive evaluations, the DenseTact-Mini demonstrates a lifting success rate exceeding 90.2% when grasping various objects, spanning items from 1mm basil seeds and small paperclips to items nearly 15mm. This work demonstrates the potential of soft optical tactile sensors for dexterous manipulation and grasping. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.08109 [pdf, other]

CAT: a conditional association test for microbiome data using a leave-out approach

Authors: Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert R. Jenq, Christine B. Peterson

Abstract: In microbiome analysis, researchers often seek to identify taxonomic features associated with an outcome of interest. However, microbiome features are intercorrelated and linked by phylogenetic relationships, making it challenging to assess the association between an individual feature and an outcome. Researchers have developed global tests for the association of microbiome profiles with outcomes… ▽ More In microbiome analysis, researchers often seek to identify taxonomic features associated with an outcome of interest. However, microbiome features are intercorrelated and linked by phylogenetic relationships, making it challenging to assess the association between an individual feature and an outcome. Researchers have developed global tests for the association of microbiome profiles with outcomes using beta diversity metrics which offer robustness to extreme values and can incorporate information on the phylogenetic tree structure. Despite the popularity of global association testing, most existing methods for follow-up testing of individual features only consider the marginal effect and do not provide relevant information for the design of microbiome interventions. This paper proposes a novel conditional association test, CAT, which can account for other features and phylogenetic relatedness when testing the association between a feature and an outcome. CAT adopts a leave-out method, measuring the importance of a feature in predicting the outcome by removing that feature from the data and quantifying how much the association with the outcome is weakened through the change in the coefficient of determination. By leveraging global tests including PERMANOVA and MiRKAT-based methods, CAT allows association testing for continuous, binary, categorical, count, survival, and correlated outcomes. Our simulation and real data application results illustrate the potential of CAT to inform the design of microbiome interventions aimed at improving clinical outcomes. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2308.16598 [pdf, other]

doi 10.1007/978-3-031-18814-5_11

Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation

Authors: Ramtin Mojtahedi, Mohammad Hamghalam, Richard K. G. Do, Amber L. Simpson

Abstract: Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer. Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans. However, since their convolution layers suffer from limited kernel size, they are not able to captur… ▽ More Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer. Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans. However, since their convolution layers suffer from limited kernel size, they are not able to capture long-range dependencies and global context. To tackle this restriction, vision transformers have been introduced to solve FCNN's locality of receptive fields. Although transformers can capture long-range features, their segmentation performance decreases with various tumor sizes due to the model sensitivity to the input patch size. While finding an optimal patch size improves the performance of vision transformer-based models on segmentation tasks, it is a time-consuming and challenging procedure. This paper proposes a technique to select the vision transformer's optimal input multi-resolution image patch size based on the average volume size of metastasis lesions. We further validated our suggested framework using a transfer-learning technique, demonstrating that the highest Dice similarity coefficient (DSC) performance was obtained by pre-training on training data with a larger tumour volume using the suggested ideal patch size and then training with a smaller one. We experimentally evaluate this idea through pre-training our model on a multi-resolution public dataset. Our model showed consistent and improved results when applied to our private multi-resolution mCRC dataset with a smaller average tumor volume. This study lays the groundwork for optimizing semantic segmentation of small objects using vision transformers. The implementation source code is available at:https://github.com/Ramtin-Mojtahedi/OVTPS. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Journal ref: Multiscale Multimodal Medical Imaging. MMMI 2022. Lecture Notes in Computer Science, vol 13594. Springer, Cham

arXiv:2308.16480 [pdf, other]

Inter-finger Small Object Manipulation with DenseTact Optical Tactile Sensor

Authors: Won Kyung Do, Bianca Aumann, Camille Chungyoun, Monroe Kennedy III

Abstract: The ability to grasp and manipulate small objects in cluttered environments remains a significant challenge. This paper introduces a novel approach that utilizes a tactile sensor-equipped gripper with eight degrees of freedom to overcome these limitations. We employ DenseTact 2.0 for the gripper, enabling precise control and improved grasp success rates, particularly for small objects ranging from… ▽ More The ability to grasp and manipulate small objects in cluttered environments remains a significant challenge. This paper introduces a novel approach that utilizes a tactile sensor-equipped gripper with eight degrees of freedom to overcome these limitations. We employ DenseTact 2.0 for the gripper, enabling precise control and improved grasp success rates, particularly for small objects ranging from 5mm to 25mm. Our integrated strategy incorporates the robot arm, gripper, and sensor to manipulate and orient small objects for subsequent classification effectively. We contribute a specialized dataset designed for classifying these objects based on tactile sensor output and a new control algorithm for in-hand orientation tasks. Our system demonstrates 88% of successful grasp and successfully classified small objects in cluttered scenarios. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.15932 [pdf, other]

doi 10.1117/12.2656072

Attention-based CT Scan Interpolation for Lesion Segmentation of Colorectal Liver Metastases

Authors: Mohammad Hamghalam, Richard K. G. Do, Amber L. Simpson

Abstract: Small liver lesions common to colorectal liver metastases (CRLMs) are challenging for convolutional neural network (CNN) segmentation models, especially when we have a wide range of slice thicknesses in the computed tomography (CT) scans. Slice thickness of CT images may vary by clinical indication. For example, thinner slices are used for presurgical planning when fine anatomic details of small v… ▽ More Small liver lesions common to colorectal liver metastases (CRLMs) are challenging for convolutional neural network (CNN) segmentation models, especially when we have a wide range of slice thicknesses in the computed tomography (CT) scans. Slice thickness of CT images may vary by clinical indication. For example, thinner slices are used for presurgical planning when fine anatomic details of small vessels are required. While keeping the effective radiation dose in patients as low as possible, various slice thicknesses are employed in CRLMs due to their limitations. However, differences in slice thickness across CTs lead to significant performance degradation in CT segmentation models based on CNNs. This paper proposes a novel unsupervised attention-based interpolation model to generate intermediate slices from consecutive triplet slices in CT scans. We integrate segmentation loss during the interpolation model's training to leverage segmentation labels in existing slices to generate middle ones. Unlike common interpolation techniques in CT volumes, our model highlights the regions of interest (liver and lesions) inside the abdominal CT scans in the interpolated slice. Moreover, our model's outputs are consistent with the original input slices while increasing the segmentation performance in two cutting-edge 3D segmentation pipelines. We tested the proposed model on the CRLM dataset to upsample subjects with thick slices and create isotropic volume for our segmentation model. The produced isotropic dataset increases the Dice score in the segmentation of lesions and outperforms other interpolation approaches in terms of interpolation metrics. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Journal ref: Proc. SPIE 12468, Medical Imaging 2023: Biomedical Applications in Molecular, Structural, and Functional Imaging, 124680U (10 April 2023)

arXiv:2308.13737 [pdf, other]

survivalContour: Visualizing predicted survival via colored contour plots

Authors: Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert R. Jenq, Christine B. Peterson

Abstract: Advances in survival analysis have facilitated unprecedented flexibility in data modeling, yet there remains a lack of tools for graphically illustrating the influence of continuous covariates on predicted survival outcomes. We propose the utilization of a colored contour plot to depict the predicted survival probabilities over time, and provide a Shiny app and R package as implementations of this… ▽ More Advances in survival analysis have facilitated unprecedented flexibility in data modeling, yet there remains a lack of tools for graphically illustrating the influence of continuous covariates on predicted survival outcomes. We propose the utilization of a colored contour plot to depict the predicted survival probabilities over time, and provide a Shiny app and R package as implementations of this tool. Our approach is capable of supporting conventional models, including the Cox and Fine-Gray models. However, its capability shines when coupled with cutting-edge machine learning models such as random survival forests and deep neural networks. △ Less

Submitted 12 January, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.11087 [pdf, other]

doi 10.1007/s42979-024-02731-6

Embedded Object Detection and Mapping in Soft Materials Using Optical Tactile Sensing

Authors: Jose A. Solano-Castellanos, Won Kyung Do, Monroe Kennedy III

Abstract: In this paper, we present a methodology that uses an optical tactile sensor for efficient tactile exploration of embedded objects within soft materials. The methodology consists of an exploration phase, where a probabilistic estimate of the location of the embedded objects is built using a Bayesian approach. The exploration phase is then followed by a mapping phase which exploits the probabilistic… ▽ More In this paper, we present a methodology that uses an optical tactile sensor for efficient tactile exploration of embedded objects within soft materials. The methodology consists of an exploration phase, where a probabilistic estimate of the location of the embedded objects is built using a Bayesian approach. The exploration phase is then followed by a mapping phase which exploits the probabilistic map to reconstruct the underlying topography of the workspace by sampling in more detail regions where there is expected to be embedded objects. To demonstrate the effectiveness of the method, we tested our approach on an experimental setup that consists of a series of quartz beads located underneath a polyethylene foam that prevents direct observation of the configuration and requires the use of tactile exploration to recover the location of the beads. We show the performance of our methodology using ten different configurations of the beads where the proposed approach is able to approximate the underlying configuration. We benchmark our results against a random sampling policy. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Journal ref: Springer Nature Computer Science, Vol 5, Article 372, 2024

arXiv:2308.04836 [pdf, other]

Beyond Surprise: Improving Exploration Through Surprise Novelty

Authors: Hung Le, Kien Do, Dung Nguyen, Svetha Venkatesh

Abstract: We present a new computing model for intrinsic rewards in reinforcement learning that addresses the limitations of existing surprise-driven explorations. The reward is the novelty of the surprise rather than the surprise norm. We estimate the surprise novelty as retrieval errors of a memory network wherein the memory stores and reconstructs surprises. Our surprise memory (SM) augments the capabili… ▽ More We present a new computing model for intrinsic rewards in reinforcement learning that addresses the limitations of existing surprise-driven explorations. The reward is the novelty of the surprise rather than the surprise norm. We estimate the surprise novelty as retrieval errors of a memory network wherein the memory stores and reconstructs surprises. Our surprise memory (SM) augments the capability of surprise-based intrinsic motivators, maintaining the agent's interest in exciting exploration while reducing unwanted attraction to unpredictable or noisy observations. Our experiments demonstrate that the SM combined with various surprise predictors exhibits efficient exploring behaviors and significantly boosts the final performance in sparse reward environments, including Noisy-TV, navigation and challenging Atari games. △ Less

Submitted 30 January, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: 17 pages including Appendix

arXiv:2307.16044 [pdf, other]

doi 10.3390/quantum5040042

A Schrödinger Equation for Evolutionary Dynamics

Authors: Vi D. Ao, Duy V. Tran, Kien T. Pham, Duc M. Nguyen, Huy D. Tran, Tuan K. Do, Van H. Do, Trung V. Phan

Abstract: We establish an analogy between the Fokker-Planck equation describing evolutionary landscape dynamics and the Schrödinger equation which characterizes quantum mechanical particles, showing how a population with multiple genetic traits evolves analogously to a wavefunction under a multi-dimensional energy potential in imaginary time. Furthermore, we discover within this analogy that the stationary… ▽ More We establish an analogy between the Fokker-Planck equation describing evolutionary landscape dynamics and the Schrödinger equation which characterizes quantum mechanical particles, showing how a population with multiple genetic traits evolves analogously to a wavefunction under a multi-dimensional energy potential in imaginary time. Furthermore, we discover within this analogy that the stationary population distribution on the landscape corresponds exactly to the ground-state wavefunction. This mathematical equivalence grants entry to a wide range of analytical tools developed by the quantum mechanics community, such as the Rayleigh-Ritz variational method and the Rayleigh-Schrödinger perturbation theory, allowing us to not only make reasonable quantitative assessments but also explore fundamental biological inquiries. We demonstrate the effectiveness of these tools by estimating the population success on landscapes where precise answers are elusive, and unveiling the ecological consequences of stress-induced mutagenesis -- a prevalent evolutionary mechanism in pathogenic and neoplastic systems. We show that, even in a unchanging environment, a sharp mutational burst resulting from stress can always be advantageous, while a gradual increase only enhances population size when the number of relevant evolving traits is limited. Our interdisciplinary approach offers novel insights, opening up new avenues for deeper understanding and predictive capability regarding the complex dynamics of evolving populations. △ Less

Submitted 31 August, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

Journal ref: Quantum Rep. 2023, 5(4), 659-682

arXiv:2304.08329 [pdf, ps, other]

Computing the Weil representation of a superelliptic curve

Authors: Irene I. Bouw, Duc Khoi Do, Stefan Wewers

Abstract: We study the Weil representation $ρ$ of a curve over a $p$-adic field with potential reduction of compact type. We show that $ρ$ can be reconstructed from its stable reduction. For superelliptic curves of the form $y^n=f(x)$ at primes $p$ whose residue characteristic is prime to the exponent $n$ we make this explicit. We study the Weil representation $ρ$ of a curve over a $p$-adic field with potential reduction of compact type. We show that $ρ$ can be reconstructed from its stable reduction. For superelliptic curves of the form $y^n=f(x)$ at primes $p$ whose residue characteristic is prime to the exponent $n$ we make this explicit. △ Less

Submitted 30 October, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

MSC Class: 11F80 (primary); 14H25; 11G20; 11S40 (secondary)

arXiv:2303.15536 [pdf, other]

doi 10.1145/3544548.3581347

"That's important, but...": How Computer Science Researchers Anticipate Unintended Consequences of Their Research Innovations

Authors: Kimberly Do, Rock Yuren Pang, Jiachen Jiang, Katharina Reinecke

Abstract: Computer science research has led to many breakthrough innovations but has also been scrutinized for enabling technology that has negative, unintended consequences for society. Given the increasing discussions of ethics in the news and among researchers, we interviewed 20 researchers in various CS sub-disciplines to identify whether and how they consider potential unintended consequences of their… ▽ More Computer science research has led to many breakthrough innovations but has also been scrutinized for enabling technology that has negative, unintended consequences for society. Given the increasing discussions of ethics in the news and among researchers, we interviewed 20 researchers in various CS sub-disciplines to identify whether and how they consider potential unintended consequences of their research innovations. We show that considering unintended consequences is generally seen as important but rarely practiced. Principal barriers are a lack of formal process and strategy as well as the academic practice that prioritizes fast progress and publications. Drawing on these findings, we discuss approaches to support researchers in routinely considering unintended consequences, from bringing diverse perspectives through community participation to increasing incentives to investigate potential consequences. We intend for our work to pave the way for routine explorations of the societal implications of technological innovations before, during, and after the research process. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: Corresponding author: Rock Yuren Pang, email provided below. Kimberly Do and Rock Yuren Pang contributed equally to this research. The author order is listed alphabetically. To appear in CHI Conference on Human Factors in Computing Systems (CHI '23), April 23-April 28, 2023, Hamburg, Germany. ACM, New York, NY, USA, 16 pages

arXiv:2303.09084 [pdf, other]

Stress-Induced Mutagenesis Can Further Boost Population Success in Static Ecology

Authors: Kien T. Pham, Duc M. Nguyen, Duy V. Tran, Vi D. Ao, Huy D. Tran, Tuan K. Do, Trung V. Phan

Abstract: We have developed a mathematical model that captures stress-induced mutagenesis, a fundamental aspect of pathogenic and neoplastic evolutionary dynamics, on the fitness landscape with multiple relevant genetic traits as a high-dimensional Euclidean space. In this framework, stress-induced mutagenesis manifests as a heterogeneous diffusion process. We show how increasing mutations, and thus reducin… ▽ More We have developed a mathematical model that captures stress-induced mutagenesis, a fundamental aspect of pathogenic and neoplastic evolutionary dynamics, on the fitness landscape with multiple relevant genetic traits as a high-dimensional Euclidean space. In this framework, stress-induced mutagenesis manifests as a heterogeneous diffusion process. We show how increasing mutations, and thus reducing exploitation, in a static ecology with fixed carrying capacity and maximum growth rates, can paradoxically boost population size. Remarkably, this unexpected biophysical phenomenon applies universally to any number of traits. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2303.06751 [pdf, ps, other]

Diagonal cycles and anticyclotomic Iwasawa theory of modular forms

Authors: Francesc Castella, Kim Tuan Do

Abstract: We construct a new anticyclotomic Euler system (in the sense of Jetchev-Nekovar-Skinner) for the Galois representation $V_{f,χ}$ attached to a newform $f$ of weight $k\geq 2$ twisted by an anticyclotomic Hecke character $χ$. We then show some arithmetic applications of the constructed Euler system, including new results on the Bloch-Kato conjecture in ranks zero and one, and a divisibility towards… ▽ More We construct a new anticyclotomic Euler system (in the sense of Jetchev-Nekovar-Skinner) for the Galois representation $V_{f,χ}$ attached to a newform $f$ of weight $k\geq 2$ twisted by an anticyclotomic Hecke character $χ$. We then show some arithmetic applications of the constructed Euler system, including new results on the Bloch-Kato conjecture in ranks zero and one, and a divisibility towards the Iwasawa-Greenberg main conjecture for $V_{f,χ}$. In particular, in the case where the base-change of $f$ to our imaginary quadratic field has root number $+1$ and $χ$ has higher weight (which implies that the complex $L$-function $L(V_{f,χ},s)$ vanishes at the center), our results show that the Bloch-Kato Selmer group of $V_{f,χ}$ is nonzero, and if a certain distinguished class $κ_{f,χ}$ is nonzero, then the Selmer group is one-dimensional. Such applications to the Bloch-Kato conjecture were left wide open by the earlier approaches using Heegner cycles and/or Beilinson-Flach classes. Our construction is based instead on a generalisation of the Gross-Kudla-Schoen diagonal cycles. △ Less

Submitted 12 March, 2023; originally announced March 2023.

Comments: 50 pages

arXiv:2301.06926 [pdf, ps, other]

Memory-Augmented Theory of Mind Network

Authors: Dung Nguyen, Phuoc Nguyen, Hung Le, Kien Do, Svetha Venkatesh, Truyen Tran

Abstract: Social reasoning necessitates the capacity of theory of mind (ToM), the ability to contextualise and attribute mental states to others without having access to their internal cognitive structure. Recent machine learning approaches to ToM have demonstrated that we can train the observer to read the past and present behaviours of other agents and infer their beliefs (including false beliefs about th… ▽ More Social reasoning necessitates the capacity of theory of mind (ToM), the ability to contextualise and attribute mental states to others without having access to their internal cognitive structure. Recent machine learning approaches to ToM have demonstrated that we can train the observer to read the past and present behaviours of other agents and infer their beliefs (including false beliefs about things that no longer exist), goals, intentions and future actions. The challenges arise when the behavioural space is complex, demanding skilful space navigation for rapidly changing contexts for an extended period. We tackle the challenges by equipping the observer with novel neural memory mechanisms to encode, and hierarchical attention to selectively retrieve information about others. The memories allow rapid, selective querying of distal related past behaviours of others to deliberatively reason about their current mental state, beliefs and future behaviours. This results in ToMMY, a theory of mind model that learns to reason while making little assumptions about the underlying mental processes. We also construct a new suite of experiments to demonstrate that memories facilitate the learning process and achieve better theory of mind performance, especially for high-demand false-belief tasks that require inferring through multiple steps of changes. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: Accepted for publication at AAAI 2023

arXiv:2212.03063 [pdf, other]

doi 10.1145/3580305.3599270

Causal Inference via Style Transfer for Out-of-distribution Generalisation

Authors: Toan Nguyen, Kien Do, Duc Thanh Nguyen, Bao Duong, Thin Nguyen

Abstract: Out-of-distribution (OOD) generalisation aims to build a model that can generalise well on an unseen target domain using knowledge from multiple source domains. To this end, the model should seek the causal dependence between inputs and labels, which may be determined by the semantics of inputs and remain invariant across domains. However, statistical or non-causal methods often cannot capture thi… ▽ More Out-of-distribution (OOD) generalisation aims to build a model that can generalise well on an unseen target domain using knowledge from multiple source domains. To this end, the model should seek the causal dependence between inputs and labels, which may be determined by the semantics of inputs and remain invariant across domains. However, statistical or non-causal methods often cannot capture this dependence and perform poorly due to not considering spurious correlations learnt from model training via unobserved confounders. A well-known existing causal inference method like back-door adjustment cannot be applied to remove spurious correlations as it requires the observation of confounders. In this paper, we propose a novel method that effectively deals with hidden confounders by successfully implementing front-door adjustment (FA). FA requires the choice of a mediator, which we regard as the semantic information of images that helps access the causal mechanism without the need for observing confounders. Further, we propose to estimate the combination of the mediator with other observed images in the front-door formula via style transfer algorithms. Our use of style transfer to estimate FA is novel and sensible for OOD generalisation, which we justify by extensive experimental results on widely used benchmark datasets. △ Less

Submitted 10 June, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 23), August 6-10, 2023, Long Beach, CA, USA. ACM, New York, NY, USA, 19 pages

Journal ref: In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 23), August 6-10, 2023, Long Beach, CA, USA. ACM, New York, NY, USA, 19 pages

arXiv:2211.10812 [pdf, other]

Face Swapping as A Simple Arithmetic Operation

Authors: Truong Vu, Kien Do, Khang Nguyen, Khoat Than

Abstract: We propose a novel high-fidelity face swapping method called "Arithmetic Face Swapping" (AFS) that explicitly disentangles the intermediate latent space W+ of a pretrained StyleGAN into the "identity" and "style" subspaces so that a latent code in W+ is the sum of an "identity" code and a "style" code in the corresponding subspaces. Via our disentanglement, face swapping (FS) can be regarded as a… ▽ More We propose a novel high-fidelity face swapping method called "Arithmetic Face Swapping" (AFS) that explicitly disentangles the intermediate latent space W+ of a pretrained StyleGAN into the "identity" and "style" subspaces so that a latent code in W+ is the sum of an "identity" code and a "style" code in the corresponding subspaces. Via our disentanglement, face swapping (FS) can be regarded as a simple arithmetic operation in W+, i.e., the summation of a source "identity" code and a target "style" code. This makes AFS more intuitive and elegant than other FS methods. In addition, our method can generalize over the standard face swapping to support other interesting operations, e.g., combining the identity of one source with styles of multiple targets and vice versa. We implement our identity-style disentanglement by learning a neural network that maps a latent code to a "style" code. We provide a condition for this network which theoretically guarantees identity preservation of the source face even after a sequence of face swapping operations. Extensive experiments demonstrate the advantage of our method over state-of-the-art FS methods in producing high-quality swapped faces. Our source code was made public at https://github.com/truongvu2000nd/AFS △ Less

Submitted 3 February, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

arXiv:2209.10359 [pdf, other]

Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation

Authors: Kien Do, Hung Le, Dung Nguyen, Dang Nguyen, Haripriya Harikumar, Truyen Tran, Santu Rana, Svetha Venkatesh

Abstract: Data-free Knowledge Distillation (DFKD) has attracted attention recently thanks to its appealing capability of transferring knowledge from a teacher network to a student network without using training data. The main idea is to use a generator to synthesize data for training the student. As the generator gets updated, the distribution of synthetic data will change. Such distribution shift could be… ▽ More Data-free Knowledge Distillation (DFKD) has attracted attention recently thanks to its appealing capability of transferring knowledge from a teacher network to a student network without using training data. The main idea is to use a generator to synthesize data for training the student. As the generator gets updated, the distribution of synthetic data will change. Such distribution shift could be large if the generator and the student are trained adversarially, causing the student to forget the knowledge it acquired at previous steps. To alleviate this problem, we propose a simple yet effective method called Momentum Adversarial Distillation (MAD) which maintains an exponential moving average (EMA) copy of the generator and uses synthetic samples from both the generator and the EMA generator to train the student. Since the EMA generator can be considered as an ensemble of the generator's old versions and often undergoes a smaller change in updates compared to the generator, training on its synthetic samples can help the student recall the past knowledge and prevent the student from adapting too quickly to new updates of the generator. Our experiments on six benchmark datasets including big datasets like ImageNet and Places365 demonstrate the superior performance of MAD over competing methods for handling the large distribution shift problem. Our method also compares favorably to existing DFKD methods and even achieves state-of-the-art results in some cases. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2209.10122 [pdf, other]

DenseTact 2.0: Optical Tactile Sensor for Shape and Force Reconstruction

Authors: Won Kyung Do, Bianca Jurewicz, Monroe Kennedy III

Abstract: Collaborative robots stand to have an immense impact on both human welfare in domestic service applications and industrial superiority in advanced manufacturing with dexterous assembly. The outstanding challenge is providing robotic fingertips with a physical design that makes them adept at performing dexterous tasks that require high-resolution, calibrated shape reconstruction and force sensing.… ▽ More Collaborative robots stand to have an immense impact on both human welfare in domestic service applications and industrial superiority in advanced manufacturing with dexterous assembly. The outstanding challenge is providing robotic fingertips with a physical design that makes them adept at performing dexterous tasks that require high-resolution, calibrated shape reconstruction and force sensing. In this work, we present DenseTact 2.0, an optical-tactile sensor capable of visualizing the deformed surface of a soft fingertip and using that image in a neural network to perform both calibrated shape reconstruction and 6-axis wrench estimation. We demonstrate the sensor accuracy of 0.3633mm per pixel for shape reconstruction, 0.410N for forces, 0.387Nmm for torques, and the ability to calibrate new fingers through transfer learning, which achieves comparable performance with only 12% of the non-transfer learning dataset size. △ Less

Submitted 4 March, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

arXiv:2208.09567 [pdf, other]

doi 10.1007/978-3-031-16919-9_4

Multiple Instance Neuroimage Transformer

Authors: Ayush Singla, Qingyu Zhao, Daniel K. Do, Yuyin Zhou, Kilian M. Pohl, Ehsan Adeli

Abstract: For the first time, we propose using a multiple instance learning based convolution-free transformer model, called Multiple Instance Neuroimage Transformer (MINiT), for the classification of T1weighted (T1w) MRIs. We first present several variants of transformer models adopted for neuroimages. These models extract non-overlapping 3D blocks from the input volume and perform multi-headed self-attent… ▽ More For the first time, we propose using a multiple instance learning based convolution-free transformer model, called Multiple Instance Neuroimage Transformer (MINiT), for the classification of T1weighted (T1w) MRIs. We first present several variants of transformer models adopted for neuroimages. These models extract non-overlapping 3D blocks from the input volume and perform multi-headed self-attention on a sequence of their linear projections. MINiT, on the other hand, treats each of the non-overlapping 3D blocks of the input MRI as its own instance, splitting it further into non-overlapping 3D patches, on which multi-headed self-attention is computed. As a proof-of-concept, we evaluate the efficacy of our model by training it to identify sex from T1w-MRIs of two public datasets: Adolescent Brain Cognitive Development (ABCD) and the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA). The learned attention maps highlight voxels contributing to identifying sex differences in brain morphometry. The code is available at https://github.com/singlaayush/MINIT. △ Less

Submitted 19 August, 2022; originally announced August 2022.

arXiv:2207.14753 [pdf, other]

Estimating Causal Effects with Hidden Confounding using Instrumental Variables and Environments

Authors: James P. Long, Hongxu Zhu, Kim-Anh Do, Min Jin Ha

Abstract: Recent works have proposed regression models which are invariant across data collection environments. These estimators often have a causal interpretation under conditions on the environments and type of invariance imposed. One recent example, the Causal Dantzig (CD), is consistent under hidden confounding and represents an alternative to classical instrumental variable estimators such as Two Stage… ▽ More Recent works have proposed regression models which are invariant across data collection environments. These estimators often have a causal interpretation under conditions on the environments and type of invariance imposed. One recent example, the Causal Dantzig (CD), is consistent under hidden confounding and represents an alternative to classical instrumental variable estimators such as Two Stage Least Squares (TSLS). In this work we derive the CD as a generalized method of moments (GMM) estimator. The GMM representation leads to several practical results, including 1) creation of the Generalized Causal Dantzig (GCD) estimator which can be applied to problems with continuous environments where the CD cannot be fit 2) a Hybrid (GCD-TSLS combination) estimator which has properties superior to GCD or TSLS alone 3) straightforward asymptotic results for all methods using GMM theory. We compare the CD, GCD, TSLS, and Hybrid estimators in simulations and an application to a Flow Cytometry data set. The newly proposed GCD and Hybrid estimators have superior performance to existing methods in many settings. △ Less

Submitted 9 November, 2023; v1 submitted 29 July, 2022; originally announced July 2022.

Comments: 32 pages, 7 figures, 4 tables

arXiv:2207.12106 [pdf, other]

Black-box Few-shot Knowledge Distillation

Authors: Dang Nguyen, Sunil Gupta, Kien Do, Svetha Venkatesh

Abstract: Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network. Traditional KD methods require lots of labeled training samples and a white-box teacher (parameters are accessible) to train a good student. However, these resources are not always available in real-world applications. The distillation process often happens… ▽ More Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network. Traditional KD methods require lots of labeled training samples and a white-box teacher (parameters are accessible) to train a good student. However, these resources are not always available in real-world applications. The distillation process often happens at an external party side where we do not have access to much data, and the teacher does not disclose its parameters due to security and privacy concerns. To overcome these challenges, we propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher. Our main idea is to expand the training set by generating a diverse set of out-of-distribution synthetic images using MixUp and a conditional variational auto-encoder. These synthetic images along with their labels obtained from the teacher are used to train the student. We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks. The code and models are available at: https://github.com/nphdang/FS-BBT △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: To appear at ECCV 2022

arXiv:2207.09991 [pdf, other]

Causal Models, Prediction, and Extrapolation in Cell Line Perturbation Experiments

Authors: James P. Long, Yumeng Yang, Kim-Anh Do

Abstract: In cell line perturbation experiments, a collection of cells is perturbed with external agents (e.g. drugs) and responses such as protein expression measured. Due to cost constraints, only a small fraction of all possible perturbations can be tested in vitro. This has led to the development of computational (in silico) models which can predict cellular responses to perturbations. Perturbations wit… ▽ More In cell line perturbation experiments, a collection of cells is perturbed with external agents (e.g. drugs) and responses such as protein expression measured. Due to cost constraints, only a small fraction of all possible perturbations can be tested in vitro. This has led to the development of computational (in silico) models which can predict cellular responses to perturbations. Perturbations with clinically interesting predicted responses can be prioritized for in vitro testing. In this work, we compare causal and non-causal regression models for perturbation response prediction in a Melanoma cancer cell line. The current best performing method on this data set is Cellbox which models how proteins causally effect each other using a system of ordinary differential equations (ODEs). We derive a closed form solution to the Cellbox system of ODEs in the linear case. These analytic results facilitate comparison of Cellbox to regression approaches. We show that causal models such as Cellbox, while requiring more assumptions, enable extrapolation in ways that non-causal regression models cannot. For example, causal models can predict responses for never before tested drugs. We illustrate these strengths and weaknesses in simulations. In an application to the Melanoma cell line data, we find that regression models outperform the Cellbox causal model. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: 13 pages, 4 figures

arXiv:2207.03895 [pdf, other]

Defense Against Multi-target Trojan Attacks

Authors: Haripriya Harikumar, Santu Rana, Kien Do, Sunil Gupta, Wei Zong, Willy Susilo, Svetha Venkastesh

Abstract: Adversarial attacks on deep learning-based models pose a significant threat to the current AI infrastructure. Among them, Trojan attacks are the hardest to defend against. In this paper, we first introduce a variation of the Badnet kind of attacks that introduces Trojan backdoors to multiple target classes and allows triggers to be placed anywhere in the image. The former makes it more potent and… ▽ More Adversarial attacks on deep learning-based models pose a significant threat to the current AI infrastructure. Among them, Trojan attacks are the hardest to defend against. In this paper, we first introduce a variation of the Badnet kind of attacks that introduces Trojan backdoors to multiple target classes and allows triggers to be placed anywhere in the image. The former makes it more potent and the latter makes it extremely easy to carry out the attack in the physical space. The state-of-the-art Trojan detection methods fail with this threat model. To defend against this attack, we first introduce a trigger reverse-engineering mechanism that uses multiple images to recover a variety of potential triggers. We then propose a detection mechanism by measuring the transferability of such recovered triggers. A Trojan trigger will have very high transferability i.e. they make other images also go to the same class. We study many practical advantages of our attack method and then demonstrate the detection performance using a variety of image datasets. The experimental results show the superior detection performance of our method over the state-of-the-arts. △ Less

Submitted 8 July, 2022; originally announced July 2022.

arXiv:2204.09315 [pdf, ps, other]

Learning to Constrain Policy Optimization with Virtual Trust Region

Authors: Hung Le, Thommen Karimpanal George, Majid Abdolshah, Dung Nguyen, Kien Do, Sunil Gupta, Svetha Venkatesh

Abstract: We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust region, we propose forming a second trust region through another virtual policy representing a wide range of past policies. We then enforce the new policy to stay… ▽ More We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust region, we propose forming a second trust region through another virtual policy representing a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial if the old policy performs poorly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory of past policies, providing a new capability for dynamically learning appropriate virtual trust regions during the optimization process. Our proposed method, dubbed Memory-Constrained Policy Optimization (MCPO), is examined in diverse environments, including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods. △ Less

Submitted 15 September, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

Comments: Preprint, 22 pages

arXiv:2204.09047 [pdf, ps, other]

Learning Theory of Mind via Dynamic Traits Attribution

Authors: Dung Nguyen, Phuoc Nguyen, Hung Le, Kien Do, Svetha Venkatesh, Truyen Tran

Abstract: Machine learning of Theory of Mind (ToM) is essential to build social agents that co-live with humans and other agents. This capacity, once acquired, will help machines infer the mental states of others from observed contextual action trajectories, enabling future prediction of goals, intention, actions and successor representations. The underlying mechanism for such a prediction remains unclear,… ▽ More Machine learning of Theory of Mind (ToM) is essential to build social agents that co-live with humans and other agents. This capacity, once acquired, will help machines infer the mental states of others from observed contextual action trajectories, enabling future prediction of goals, intention, actions and successor representations. The underlying mechanism for such a prediction remains unclear, however. Inspired by the observation that humans often infer the character traits of others, then use it to explain behaviour, we propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories. This trait vector then multiplicatively modulates the prediction mechanism via a `fast weights' scheme in the prediction neural network, which reads the current context and predicts the behaviour. We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability. On the indirect assessment of false-belief understanding, the new ToM model enables more efficient helping behaviours. △ Less

Submitted 17 April, 2022; originally announced April 2022.

Comments: Accepted for publication at AAMAS 2022

arXiv:2202.12154 [pdf, other]

Towards Effective and Robust Neural Trojan Defenses via Input Filtering

Authors: Kien Do, Haripriya Harikumar, Hung Le, Dung Nguyen, Truyen Tran, Santu Rana, Dang Nguyen, Willy Susilo, Svetha Venkatesh

Abstract: Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a single input-agnostic trigger and targeting only one class to using multiple, input-specific triggers and targeting multiple classes. However, Trojan defenses have not caught up with this development. Most defense methods still make inadequate assumpt… ▽ More Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a single input-agnostic trigger and targeting only one class to using multiple, input-specific triggers and targeting multiple classes. However, Trojan defenses have not caught up with this development. Most defense methods still make inadequate assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks. To deal with this problem, we propose two novel "filtering" defenses called Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF) which leverage lossy data compression and adversarial learning respectively to effectively purify potential Trojan triggers in the input at run time without making assumptions about the number of triggers/target classes or the input dependence property of triggers. In addition, we introduce a new defense mechanism called "Filtering-then-Contrasting" (FtC) which helps avoid the drop in classification accuracy on clean data caused by "filtering", and combine it with VIF/AIF to derive new defenses of this kind. Extensive experimental results and ablation studies show that our proposed defenses significantly outperform well-known baseline defenses in mitigating five advanced Trojan attacks including two recent state-of-the-art while being quite robust to small amounts of training data and large-norm triggers. △ Less

Submitted 14 February, 2023; v1 submitted 24 February, 2022; originally announced February 2022.

Comments: Accepted to ECCV 2022

arXiv:2201.01367 [pdf, other]

DenseTact: Optical Tactile Sensor for Dense Shape Reconstruction

Authors: Won Kyung Do, Monroe Kennedy III

Abstract: Increasing the performance of tactile sensing in robots enables versatile, in-hand manipulation. Vision-based tactile sensors have been widely used as rich tactile feedback has been shown to be correlated with increased performance in manipulation tasks. Existing tactile sensor solutions with high resolution have limitations that include low accuracy, expensive components, or lack of scalability.… ▽ More Increasing the performance of tactile sensing in robots enables versatile, in-hand manipulation. Vision-based tactile sensors have been widely used as rich tactile feedback has been shown to be correlated with increased performance in manipulation tasks. Existing tactile sensor solutions with high resolution have limitations that include low accuracy, expensive components, or lack of scalability. In this paper, an inexpensive, scalable, and compact tactile sensor with high-resolution surface deformation modeling for surface reconstruction of the 3D sensor surface is proposed. By measuring the image from the fisheye camera, it is shown that the sensor can successfully estimate the surface deformation in real-time (1.8ms) by using deep convolutional neural networks. This sensor in its design and sensing abilities represents a significant step toward better object in-hand localization, classification, and surface estimation all enabled by high-resolution shape reconstruction. △ Less

Submitted 8 March, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

arXiv:2112.01853 [pdf, other]

Episodic Policy Gradient Training

Authors: Hung Le, Majid Abdolshah, Thommen K. George, Kien Do, Dung Nguyen, Svetha Venkatesh

Abstract: We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate hyperparameter scheduling as a standard Markov Decision Process and use episodic memory to store the outcome of used hyperparameters and their training contexts. At any… ▽ More We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate hyperparameter scheduling as a standard Markov Decision Process and use episodic memory to store the outcome of used hyperparameters and their training contexts. At any policy update step, the policy learner refers to the stored experiences, and adaptively reconfigures its learning algorithm with the new hyperparameters determined by the memory. This mechanism, dubbed as Episodic Policy Gradient Training (EPGT), enables an episodic learning process, and jointly learns the policy and the learning algorithm's hyperparameters within a single run. Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Comments: 19 pages

arXiv:2110.13414 [pdf, ps, other]

Semantic Host-free Trojan Attack

Authors: Haripriya Harikumar, Kien Do, Santu Rana, Sunil Gupta, Svetha Venkatesh

Abstract: In this paper, we propose a novel host-free Trojan attack with triggers that are fixed in the semantic space but not necessarily in the pixel space. In contrast to existing Trojan attacks which use clean input images as hosts to carry small, meaningless trigger patterns, our attack considers triggers as full-sized images belonging to a semantically meaningful object class. Since in our attack, the… ▽ More In this paper, we propose a novel host-free Trojan attack with triggers that are fixed in the semantic space but not necessarily in the pixel space. In contrast to existing Trojan attacks which use clean input images as hosts to carry small, meaningless trigger patterns, our attack considers triggers as full-sized images belonging to a semantically meaningful object class. Since in our attack, the backdoored classifier is encouraged to memorize the abstract semantics of the trigger images than any specific fixed pattern, it can be later triggered by semantically similar but different looking images. This makes our attack more practical to be applied in the real-world and harder to defend against. Extensive experimental results demonstrate that with only a small number of Trojan patterns for training, our attack can generalize well to new patterns of the same Trojan class and can bypass state-of-the-art defense methods. △ Less

Submitted 26 October, 2021; originally announced October 2021.

arXiv:2107.11635 [pdf, other]

Clustering by Maximizing Mutual Information Across Views

Authors: Kien Do, Truyen Tran, Svetha Venkatesh

Abstract: We propose a novel framework for image clustering that incorporates joint representation learning and clustering. Our method consists of two heads that share the same backbone network - a "representation learning" head and a "clustering" head. The "representation learning" head captures fine-grained patterns of objects at the instance level which serve as clues for the "clustering" head to extract… ▽ More We propose a novel framework for image clustering that incorporates joint representation learning and clustering. Our method consists of two heads that share the same backbone network - a "representation learning" head and a "clustering" head. The "representation learning" head captures fine-grained patterns of objects at the instance level which serve as clues for the "clustering" head to extract coarse-grain information that separates objects into clusters. The whole model is trained in an end-to-end manner by minimizing the weighted sum of two sample-oriented contrastive losses applied to the outputs of the two heads. To ensure that the contrastive loss corresponding to the "clustering" head is optimal, we introduce a novel critic function called "log-of-dot-product". Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art single-stage clustering methods across a variety of image datasets, improving over the best baseline by about 5-7% in accuracy on CIFAR10/20, STL10, and ImageNet-Dogs. Further, the "two-stage" variant of our method also achieves better results than baselines on three challenging ImageNet subsets. △ Less

Submitted 24 July, 2021; originally announced July 2021.

Comments: Accepted at ICCV 2021

arXiv:2106.05735 [pdf, other]

doi 10.1038/s41467-022-30695-9

The Medical Segmentation Decathlon

Authors: Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, AnnetteKopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers, Bram van Ginneken, Michel Bilello, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc J. Gollub, Stephan H. Heckers, Henkjan Huisman, William R. Jarnagin, Maureen K. McHugo, Sandy Napel, Jennifer S. Goli Pernicka, Kawal Rhode, Catalina Tobon-Gomez, Eugene Vorontsov , et al. (34 additional authors not shown)

Abstract: International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical pro… ▽ More International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical problem. We hypothesized that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. To investigate the hypothesis, we organized the Medical Segmentation Decathlon (MSD) - a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities. The underlying data set was designed to explore the axis of difficulties typically encountered when dealing with medical images, such as small data sets, unbalanced labels, multi-site data and small objects. The MSD challenge confirmed that algorithms with a consistent good performance on a set of tasks preserved their good average performance on a different set of previously unseen tasks. Moreover, by monitoring the MSD winner for two years, we found that this algorithm continued generalizing well to a wide range of other clinical problems, further confirming our hypothesis. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms are mature, accurate, and generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to non AI experts. △ Less

Submitted 10 June, 2021; originally announced June 2021.

MSC Class: 68T07

arXiv:2105.04722 [pdf, other]

doi 10.2528/PIERL22071401

On the Electrostatic Interaction between Point Charges due to Dielectrical Shielding

Authors: Long T. Nguyen, Kim Tuan Do, Duy V. Nguyen, Trung Phan

Abstract: How will the electrostatic interaction between two point charges change if they are shielded from the other by a dielectrical slab? While the physical setting of this electromagnetic problem is relatively simple, it is easy to be wronged and the correct solution is surprisingly complicated. Here we will show a general answer using the method of images, in which the electrical field are not found b… ▽ More How will the electrostatic interaction between two point charges change if they are shielded from the other by a dielectrical slab? While the physical setting of this electromagnetic problem is relatively simple, it is easy to be wronged and the correct solution is surprisingly complicated. Here we will show a general answer using the method of images, in which the electrical field are not found by solving the Poisson's equation but by superposing an infinite number of image charges to recurrently satisfy all interfaces' boundary conditions. We also obtain analytical and algebraic results in some special cases. △ Less

Submitted 31 October, 2022; v1 submitted 10 May, 2021; originally announced May 2021.

Journal ref: Progress In Electromagnetics Research Letters, Vol. 107, 111-118, 2022

arXiv:2103.01554 [pdf]

doi 10.1002/adom.202001778

Sharp spectral variations of the ultrafast transient light extinction by bimetallic nanoparticles in the near-UV

Authors: Tadele Otomalo, Lorenzo Di Mario, Cyrille Hamon, Doru Constantin, Khanh-Van Do, Patrick O'Keeffe, Daniele Catone, Alessandra Paladini, Bruno Palpant

Abstract: Noble metal nanoparticles exhibit localized plasmon resonance modes that span the visible and near-infrared spectral ranges and have many applications. Modifying the size, shape, and composition of the nanoparticles changes the number of modes and their properties. The characteristics of these modes are transiently affected when illuminating the nano-objects with ultrashort laser pulses. Here, we… ▽ More Noble metal nanoparticles exhibit localized plasmon resonance modes that span the visible and near-infrared spectral ranges and have many applications. Modifying the size, shape, and composition of the nanoparticles changes the number of modes and their properties. The characteristics of these modes are transiently affected when illuminating the nano-objects with ultrashort laser pulses. Here, we synthesize core-shell gold-silver nanocuboids and measure their spectral signature in the stationary and ultrafast transient regimes. Their dipolar transverse mode vanishes with increasing Ag-shell thickness, while higher-order modes grow in the near-ultraviolet range where no plasmon resonance can be generated with single noble metal nanoparticles. These higher-energy modes are associated with sharp spectral variations of the ultrafast transient light extinction by the bimetallic nanocuboids. By carrying out a theoretical investigation, we break down the different contributions to this response and △ Less

Submitted 2 March, 2021; originally announced March 2021.

Journal ref: Advanced Optical Materials, Wiley, 2021, pp.2001778

Showing 1–50 of 67 results for author: Do, K