Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 401 results for author: Nguyen, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17381  [pdf, other

    cs.LG cs.CV

    Forget but Recall: Incremental Latent Rectification in Continual Learning

    Authors: Nghia D. Nguyen, Hieu Trung Nguyen, Ang Li, Hoang Pham, Viet Anh Nguyen, Khoa D. Doan

    Abstract: Intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs). However, current DNNs suffer from catastrophic forgetting, which hinders remembering past knowledge. To mitigate this issue, existing Continual Learning (CL) approaches either retain exemplars for replay, regularize learning, or allocate dedicated capacity for new tasks. This paper in… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.15098  [pdf, other

    cs.LG cs.AI

    How Intermodal Interaction Affects the Performance of Deep Multimodal Fusion for Mixed-Type Time Series

    Authors: Simon Dietz, Thomas Altstidl, Dario Zanca, Björn Eskofier, An Nguyen

    Abstract: Mixed-type time series (MTTS) is a bimodal data type that is common in many domains, such as healthcare, finance, environmental monitoring, and social media. It consists of regularly sampled continuous time series and irregularly sampled categorical event sequences. The integration of both modalities through multimodal fusion is a promising approach for processing MTTS. However, the question of ho… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  3. arXiv:2406.09489  [pdf, other

    cs.CV

    Language-driven Grasp Detection

    Authors: An Dinh Vuong, Minh Nhat Vu, Baoru Huang, Nghia Nguyen, Hieu Le, Thieu Vo, Anh Nguyen

    Abstract: Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samp… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 19 pages. Accepted to CVPR24

  4. arXiv:2406.09039  [pdf, other

    cs.RO

    Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning

    Authors: Huy Hoang Nguyen, Minh Nhat Vu, Florian Beck, Gerald Ebmer, Anh Nguyen, Andreas Kugi

    Abstract: Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects… ▽ More

    Submitted 19 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures

  5. arXiv:2406.02317  [pdf, other

    cs.LG cs.AI stat.ML

    Generative Conditional Distributions by Neural (Entropic) Optimal Transport

    Authors: Bao Nguyen, Binh Nguyen, Hieu Trung Nguyen, Viet Anh Nguyen

    Abstract: Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 15 pages, 8 figures

  6. arXiv:2406.00973  [pdf, other

    cs.IR cs.LG

    Cold-start Recommendation by Personalized Embedding Region Elicitation

    Authors: Hieu Trung Nguyen, Duy Nguyen, Khoa Doan, Viet Anh Nguyen

    Abstract: Rating elicitation is a success element for recommender systems to perform well at cold-starting, in which the systems need to recommend items to a newly arrived user with no prior knowledge about the user's preference. Existing elicitation methods employ a fixed set of items to learn the user's preference and then infer the users' preferences on the remaining items. Using a fixed seed set can lim… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted at UAI 2024

  7. arXiv:2405.20529  [pdf

    cs.AI cs.CL

    An Automatic Question Usability Evaluation Toolkit

    Authors: Steven Moore, Eamon Costello, Huy A. Nguyen, John Stamper

    Abstract: Evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automa… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Artificial Intelligence in Education 2024

  8. arXiv:2405.20124  [pdf, other

    stat.ML cs.LG math.OC

    A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set

    Authors: Man-Chung Yue, Yves Rychener, Daniel Kuhn, Viet Anh Nguyen

    Abstract: The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either chosen heuristically - without compelling theoretical justification - or optimally in view of restrictive distributional assumptions. In this paper, we propose a pri… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  9. arXiv:2405.14352  [pdf, other

    cs.LG

    Explaining Graph Neural Networks via Structure-aware Interaction Index

    Authors: Ngoc Bui, Hieu Trung Nguyen, Viet Anh Nguyen, Rex Ying

    Abstract: The Shapley value is a prominent tool for interpreting black-box machine learning models thanks to its strong theoretical foundation. However, for models with structured inputs, such as graph neural networks, existing Shapley-based explainability approaches either focus solely on node-wise importance or neglect the graph structure when perturbing the input instance. This paper introduces the Myers… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 30 pages, ICML'24

  10. arXiv:2405.14124  [pdf, ps, other

    cs.LG

    Mixture of Experts Meets Prompt-Based Continual Learning

    Authors: Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen, Trang Pham, Linh Van Ngo, Nhat Ho

    Abstract: Exploiting the power of pre-trained models, prompt-based approaches stand out compared to other continual learning solutions in effectively preventing catastrophic forgetting, even with very few learnable parameters and without the need for a memory buffer. While existing prompt-based continual learning methods excel in leveraging prompts for state-of-the-art performance, they often lack a theoret… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 34 pages

  11. Q-learning-based Opportunistic Communication for Real-time Mobile Air Quality Monitoring Systems

    Authors: Trung Thanh Nguyen, Truong Thao Nguyen, Dinh Tuan Anh Nguyen, Thanh Hung Nguyen, Phi Le Nguyen

    Abstract: We focus on real-time air quality monitoring systems that rely on devices installed on automobiles in this research. We investigate an opportunistic communication model in which devices can send the measured data directly to the air quality server through a 4G communication channel or via Wi-Fi to adjacent devices or the so-called Road Side Units deployed along the road. We aim to reduce 4G costs… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 2021 IEEE International Conference on Performance, Computing and Communications (IPCCC). arXiv admin note: substantial text overlap with arXiv:2405.01057

  12. arXiv:2404.18496  [pdf, other

    cs.SE

    AI-powered Code Review with LLMs: Early Results

    Authors: Zeeshan Rasheed, Malik Abdul Sami, Muhammad Waseem, Kai-Kristian Kemell, Xiaofeng Wang, Anh Nguyen, Kari Systä, Pekka Abrahamsson

    Abstract: In this paper, we present a novel approach to improving software quality and efficiency through a Large Language Model (LLM)-based model designed to review code and identify potential issues. Our proposed LLM-based AI agent model is trained on large code repositories. This training includes code reviews, bug reports, and documentation of best practices. It aims to detect code smells, identify pote… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 8 pages

  13. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  14. arXiv:2404.13819  [pdf, other

    cs.CV

    HOIST-Former: Hand-held Objects Identification, Segmentation, and Tracking in the Wild

    Authors: Supreeth Narasimhaswamy, Huy Anh Nguyen, Lihan Huang, Minh Hoai

    Abstract: We address the challenging task of identifying, segmenting, and tracking hand-held objects, which is crucial for applications such as human action segmentation and performance evaluation. This task is particularly challenging due to heavy occlusion, rapid motion, and the transitory nature of objects being hand-held, where an object may be held, released, and subsequently picked up again. To tackle… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  15. arXiv:2404.13437  [pdf, other

    cs.CV

    High-fidelity Endoscopic Image Synthesis by Utilizing Depth-guided Neural Surfaces

    Authors: Baoru Huang, Yida Wang, Anh Nguyen, Daniel Elson, Francisco Vasconcelos, Danail Stoyanov

    Abstract: In surgical oncology, screening colonoscopy plays a pivotal role in providing diagnostic assistance, such as biopsy, and facilitating surgical navigation, particularly in polyp detection. Computer-assisted endoscopic surgery has recently gained attention and amalgamated various 3D computer vision techniques, including camera localization, depth estimation, surface reconstruction, etc. Neural Radia… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  16. arXiv:2404.09621  [pdf, other

    eess.SY cs.ET cs.HC cs.RO

    AAM-VDT: Vehicle Digital Twin for Tele-Operations in Advanced Air Mobility

    Authors: Tuan Anh Nguyen, Taeho Kwag, Vinh Pham, Viet Nghia Nguyen, Jeongseok Hyun, Minseok Jang, Jae-Woo Lee

    Abstract: This study advanced tele-operations in Advanced Air Mobility (AAM) through the creation of a Vehicle Digital Twin (VDT) system for eVTOL aircraft, tailored to enhance remote control safety and efficiency, especially for Beyond Visual Line of Sight (BVLOS) operations. By synergizing digital twin technology with immersive Virtual Reality (VR) interfaces, we notably elevate situational awareness and… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  17. arXiv:2404.09206  [pdf, other

    cs.CL

    DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

    Authors: Yuqi Wang, Zeqiang Wang, Wei Wang, Qi Chen, Kaizhu Huang, Anh Nguyen, Suparna De

    Abstract: Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and doma… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  18. arXiv:2404.07594  [pdf

    cs.CV cs.LG cs.RO

    Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Guidewire Segmentation in Robot-Assisted Cardiovascular Catheterization

    Authors: Olatunji Mumini Omisore, Toluwanimi Akinyemi, Anh Nguyen, Lei Wang

    Abstract: Although robot-assisted cardiovascular catheterization is commonly performed for intervention of cardiovascular diseases, more studies are needed to support the procedure with automated tool segmentation. This can aid surgeons on tool tracking and visualization during intervention. Learning-based segmentation has recently offered state-of-the-art segmentation performances however, generating groun… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  19. arXiv:2404.05929  [pdf, other

    physics.data-an cs.IT stat.ME

    A feature-based information-theoretic approach for detecting interpretable, long-timescale pairwise interactions from time series

    Authors: Aria Nguyen, Oscar McMullin, Joseph T. Lizier, Ben D. Fulcher

    Abstract: Quantifying relationships between components of a complex system is critical to understanding the rich network of interactions that characterize the behavior of the system. Traditional methods for detecting pairwise dependence of time series, such as Pearson correlation, Granger causality, and mutual information, are computed directly in the space of measured time-series values. But for systems in… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 20 pages, 7 figures

  20. arXiv:2404.05238  [pdf, other

    cs.CV cs.HC

    Allowing humans to interactively guide machines where to look does not always improve human-AI team's classification accuracy

    Authors: Giang Nguyen, Mohammad Reza Taesiri, Sunnie S. Y. Kim, Anh Nguyen

    Abstract: Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature importance maps \cite{bansal2020sam} have been established as a common means for finding how important each input feature is to an AI's decisions. It is an interesting, unexplored question whether allowing users to edit the feature importance at test time would improve a human-AI team's accuracy… ▽ More

    Submitted 20 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted for presentation at the XAI4CV Workshop, part of the CVPR 2024 proceedings

  21. arXiv:2404.01705  [pdf

    cs.CV

    Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

    Authors: Qinfeng Zhu, Yuanzhi Cai, Yuan Fang, Yihan Yang, Cheng Chen, Lei Fan, Anh Nguyen

    Abstract: High-resolution remotely sensed images pose a challenge for commonly used semantic segmentation methods such as Convolutional Neural Network (CNN) and Vision Transformer (ViT). CNN-based methods struggle with handling such high-resolution images due to their limited receptive field, while ViT faces challenges in handling long sequences. Inspired by Mamba, which adopts a State Space Model (SSM) to… ▽ More

    Submitted 11 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  22. arXiv:2404.00439  [pdf, other

    cs.CL

    DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering

    Authors: Alex Nguyen, Zilong Wang, Jingbo Shang, Dheeraj Mekala

    Abstract: The application of natural language processing models to PDF documents is pivotal for various business applications yet the challenge of training models for this purpose persists in businesses due to specific hurdles. These include the complexity of working with PDF formats that necessitate parsing text and layout information for curating training data and the lack of privacy-preserving annotation… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  23. arXiv:2403.14592  [pdf, other

    cs.SE cs.AI cs.HC

    Envisioning the Next-Generation AI Coding Assistants: Insights & Proposals

    Authors: Khanh Nghiem, Anh Minh Nguyen, Nghi D. Q. Bui

    Abstract: As a research-product hybrid group in AI for Software Engineering (AI4SE), we present four key takeaways from our experience developing in-IDE AI coding assistants. AI coding assistants should set clear expectations for usage, integrate with advanced IDE capabilities and existing extensions, use extendable backend designs, and collect app data responsibly for downstream analyses. We propose open q… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  24. arXiv:2403.12606  [pdf, other

    cs.LG

    On the Effectiveness of Heterogeneous Ensemble Methods for Re-identification

    Authors: Simon Klüttermann, Jérôme Rutinowski, Anh Nguyen, Britta Grimme, Moritz Roidl, Emmanuel Müller

    Abstract: In this contribution, we introduce a novel ensemble method for the re-identification of industrial entities, using images of chipwood pallets and galvanized metal plates as dataset examples. Our algorithms replace commonly used, complex siamese neural networks with an ensemble of simplified, rudimentary models, providing wider applicability, especially in hardware-restricted scenarios. Each ensemb… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  25. arXiv:2403.11376  [pdf, other

    cs.CV

    ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation

    Authors: Minh Tran, Winston Bounsavy, Khoa Vo, Anh Nguyen, Tri Nguyen, Ngan Le

    Abstract: Amodal Instance Segmentation (AIS) presents a challenging task as it involves predicting both visible and occluded parts of objects within images. Existing AIS methods rely on a bidirectional approach, encompassing both the transition from amodal features to visible features (amodal-to-visible) and from visible features to amodal features (visible-to-amodal). Our observation shows that the utiliza… ▽ More

    Submitted 17 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted to IJCNN2024

  26. arXiv:2403.05610  [pdf, ps, other

    cs.LG cs.CV

    Evidence, Definitions and Algorithms regarding the Existence of Cohesive-Convergence Groups in Neural Network Optimization

    Authors: Thien An L. Nguyen

    Abstract: Understanding the convergence process of neural networks is one of the most complex and crucial issues in the field of machine learning. Despite the close association of notable successes in this domain with the convergence of artificial neural networks, this concept remains predominantly theoretical. In reality, due to the non-convex nature of the optimization problems that artificial neural netw… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  27. arXiv:2403.05297  [pdf, other

    cs.CV cs.AI cs.CL

    PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck

    Authors: Thang M. Pham, Peijie Chen, Tin Nguyen, Seunghyun Yoon, Trung Bui, Anh Totti Nguyen

    Abstract: CLIP-based classifiers rely on the prompt containing a {class name} that is known to the text encoder. Therefore, they perform poorly on new classes or the classes whose names rarely appear on the Internet (e.g., scientific names of birds). For fine-grained classification, we propose PEEB - an explainable and editable classifier to (1) express the class name into a set of text descriptors that des… ▽ More

    Submitted 12 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: Findings of NAACL 2024 (long paper)

  28. arXiv:2402.15073  [pdf, other

    cs.LG

    Cost-Adaptive Recourse Recommendation by Adaptive Preference Elicitation

    Authors: Duy Nguyen, Bao Nguyen, Viet Anh Nguyen

    Abstract: Algorithmic recourse recommends a cost-efficient action to a subject to reverse an unfavorable machine learning classification decision. Most existing methods in the literature generate recourse under the assumption of complete knowledge about the cost function. In real-world practice, subjects could have distinct preferences, leading to incomplete information about the underlying cost function of… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 30 pages, 7 figures

  29. arXiv:2402.11446  [pdf, other

    cs.HC

    Penetration Vision through Virtual Reality Headsets: Identifying 360-degree Videos from Head Movements

    Authors: Anh Nguyen, Xiaokuan Zhang, Zhisheng Yan

    Abstract: In this paper, we present the first contactless side-channel attack for identifying 360 videos being viewed in a Virtual Reality (VR) Head Mounted Display (HMD). Although the video content is displayed inside the HMD without any external exposure, we observe that user head movements are driven by the video content, which creates a unique side channel that does not exist in traditional 2D videos. B… ▽ More

    Submitted 7 March, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Accepted to USENIX Security '24

  30. arXiv:2402.10430  [pdf, other

    cs.CL

    Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models

    Authors: Dheeraj Mekala, Alex Nguyen, Jingbo Shang

    Abstract: Instruction-tuning language models has become a crucial step in aligning them for general use. Typically, this process involves extensive training on large datasets, incurring high training costs. In this paper, we introduce a novel training data selection based on the learning percentage of the samples. We assert that current language models possess the capability to autonomously select high-qual… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  31. arXiv:2402.09569  [pdf, other

    cs.CV

    Automated Plaque Detection and Agatston Score Estimation on Non-Contrast CT Scans: A Multicenter Study

    Authors: Andrew M. Nguyen, Jianfei Liu, Tejas Sudharshan Mathai, Peter C. Grayson, Ronald M. Summers

    Abstract: Coronary artery calcification (CAC) is a strong and independent predictor of cardiovascular disease (CVD). However, manual assessment of CAC often requires radiological expertise, time, and invasive imaging techniques. The purpose of this multicenter study is to validate an automated cardiac plaque detection model using a 3D multiclass nnU-Net for gated and non-gated non-contrast chest CT volumes.… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted at SPIE Medical Imaging 2024

  32. arXiv:2402.05755  [pdf, other

    cs.CL cs.SD eess.AS

    SpiRit-LM: Interleaved Spoken and Written Language Model

    Authors: Tu Anh Nguyen, Benjamin Muller, Bokai Yu, Marta R. Costa-jussa, Maha Elbayad, Sravya Popuri, Paul-Ambroise Duquenne, Robin Algayres, Ruslan Mavlyutov, Itai Gat, Gabriel Synnaeve, Juan Pino, Benoit Sagot, Emmanuel Dupoux

    Abstract: We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Speech and text sequences are concatenated as a single set of tokens, and trained with a word-level interleaving method using a small automatically-curated… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  33. arXiv:2402.04480  [pdf, other

    eess.IV cs.CV math.OC

    MIRT: a simultaneous reconstruction and affine motion compensation technique for four dimensional computed tomography (4DCT)

    Authors: Anh-Tuan Nguyen, Jens Renders, Domenico Iuso, Yves Maris, Jeroen Soete, Martine Wevers, Jan Sijbers, Jan De Beenhouwer

    Abstract: In four-dimensional computed tomography (4DCT), 3D images of moving or deforming samples are reconstructed from a set of 2D projection images. Recent techniques for iterative motion-compensated reconstruction either necessitate a reference acquisition or alternate image reconstruction and motion estimation steps. In these methods, the motion estimation step involves the estimation of either comple… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Submitted to the SIAM Journal on Imaging Sciences (SIIMS)

    MSC Class: 65K10; 68U10; 68W01; 92C55; 94A08

  34. arXiv:2401.17824  [pdf, other

    cs.CL

    A Survey of Pre-trained Language Models for Processing Scientific Text

    Authors: Xanh Ho, Anh Khoa Duong Nguyen, An Tuan Dao, Junfeng Jiang, Yuki Chida, Kaito Sugimoto, Huy Quoc To, Florian Boudin, Akiko Aizawa

    Abstract: The number of Language Models (LMs) dedicated to processing scientific text is on the rise. Keeping pace with the rapid growth of scientific LMs (SciLMs) has become a daunting task for researchers. To date, no comprehensive surveys on SciLMs have been undertaken, leaving this issue unaddressed. Given the constant stream of new SciLMs, appraising the state-of-the-art and how they compare to each ot… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Resources are available at https://github.com/Alab-NII/Awesome-SciLM

  35. arXiv:2401.09059  [pdf, other

    cs.RO cs.CV

    Autonomous Catheterization with Open-source Simulator and Expert Trajectory

    Authors: Tudor Jianu, Baoru Huang, Tuan Vo, Minh Nhat Vu, Jingxuan Kang, Hoan Nguyen, Olatunji Omisore, Pierre Berthet-Rayne, Sebastiano Fichera, Anh Nguyen

    Abstract: Endovascular robots have been actively developed in both academia and industry. However, progress toward autonomous catheterization is often hampered by the widespread use of closed-source simulators and physical phantoms. Additionally, the acquisition of large-scale datasets for training machine learning algorithms with endovascular robots is usually infeasible due to expensive medical procedures… ▽ More

    Submitted 19 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: Code: https://github.com/airvlab/cathsim

  36. arXiv:2401.08041  [pdf, ps, other

    math.OC cs.NI

    Two-Stage Distributionally Robust Edge Node Placement Under Endogenous Demand Uncertainty

    Authors: Jiaming Cheng, Duong Thuy Anh Nguyen, Duong Tung Nguyen

    Abstract: Edge computing (EC) promises to deliver low-latency and ubiquitous computation to numerous devices at the network edge. This paper aims to jointly optimize edge node (EN) placement and resource allocation for an EC platform, considering demand uncertainty. Diverging from existing approaches treating uncertainties as exogenous, we propose a novel two-stage decision-dependent distributionally robust… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  37. arXiv:2312.16414  [pdf, other

    cs.CV cs.LG

    Bellman Optimal Stepsize Straightening of Flow-Matching Models

    Authors: Bao Nguyen, Binh Nguyen, Viet Anh Nguyen

    Abstract: Flow matching is a powerful framework for generating high-quality samples in various applications, especially image synthesis. However, the intensive computational demands of these models, especially during the finetuning process and sampling processes, pose significant challenges for low-resource scenarios. This paper introduces Bellman Optimal Stepsize Straightening (BOSS) technique for distilli… ▽ More

    Submitted 20 February, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: 21 pages, 14 figures

  38. arXiv:2312.14999  [pdf, other

    cs.CV

    Leveraging Habitat Information for Fine-grained Bird Identification

    Authors: Tin Nguyen, Anh Nguyen

    Abstract: Traditional bird classifiers mostly rely on the visual characteristics of birds. Some prior works even train classifiers to be invariant to the background, completely discarding the living environment of birds. Instead, we are the first to explore integrating habitat information, one of the four major cues for identifying birds by ornithologists, into modern bird classifiers. We focus on two leadi… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  39. arXiv:2312.13970  [pdf, other

    cs.LG cs.AI math.OC

    On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods

    Authors: Anh Duc Nguyen, Tuan Dung Nguyen, Quang Minh Nguyen, Hoang H. Nguyen, Lam M. Nguyen, Kim-Chuan Toh

    Abstract: This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of… ▽ More

    Submitted 22 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  40. arXiv:2312.10671  [pdf, other

    cs.CV

    Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

    Authors: Phuc D. A. Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis, Chuang Gan, Anh Tran, Cuong Pham, Khoi Nguyen

    Abstract: We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task. Recent advancements in Open-Vocabulary scene understanding have made significant strides in this area by employing class-agnostic… ▽ More

    Submitted 5 April, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project page: https://open3dis.github.io/

  41. WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge

    Authors: Huy Le, Tung Kieu, Anh Nguyen, Ngan Le

    Abstract: Text-video retrieval, a prominent sub-field within the domain of multimodal information retrieval, has witnessed remarkable growth in recent years. However, existing methods assume video scenes are consistent with unbiased descriptions. These limitations fail to align with real-world scenarios since descriptions can be influenced by annotator biases, diverse writing styles, and varying textual per… ▽ More

    Submitted 10 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP 2024

  42. arXiv:2312.09241  [pdf, other

    cs.LG cs.CL

    TinyGSM: achieving >80% on GSM8k with small language models

    Authors: Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, Yi Zhang

    Abstract: Small-scale models offer various computational advantages, and yet to which extent size is critical for problem-solving abilities remains an open question. Specifically for solving grade school math, the smallest model size so far required to break the 80\% barrier on the GSM8K benchmark remains to be 34B. Our work studies how high-quality datasets may be the key for small language models to acqui… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  43. arXiv:2312.08999  [pdf, other

    cs.LG stat.ML

    Conformalised data synthesis with statistical quality guarantees

    Authors: Julia A. Meister, Khuong An Nguyen

    Abstract: With the proliferation of ever more complicated Deep Learning architectures, data synthesis is a highly promising technique to address the demand of data-hungry models. However, reliably assessing the quality of a 'synthesiser' model's output is an open research question with significant associated risks for high-stake domains. To address this challenge, we have designed a unique confident data sy… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Submitted to the Machine Learning journal special issue "Conformal Prediction and Distribution-Free Uncertainty Quantification"

    MSC Class: 68T37

  44. arXiv:2312.05291  [pdf, other

    cs.CV cs.AI cs.CL

    GlitchBench: Can large multimodal models detect video game glitches?

    Authors: Mohammad Reza Taesiri, Tianjun Feng, Anh Nguyen, Cor-Paul Bezemer

    Abstract: Large multimodal models (LMMs) have evolved from large language models (LLMs) to integrate multiple input modalities, such as visual inputs. This integration augments the capacity of LLMs for tasks requiring visual comprehension and reasoning. However, the extent and limitations of their enhanced abilities are not fully understood, especially when it comes to real-world tasks. To address this gap,… ▽ More

    Submitted 29 March, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  45. arXiv:2311.13100  [pdf, other

    eess.IV cs.CV

    Automated Measurement of Pericoronary Adipose Tissue Attenuation and Volume in CT Angiography

    Authors: Andrew M. Nguyen, Tejas Sudharshan Mathai, Liangchen Liu, Jianfei Liu, Ronald M. Summers

    Abstract: Pericoronary adipose tissue (PCAT) is the deposition of fat in the vicinity of the coronary arteries. It is an indicator of coronary inflammation and associated with coronary artery disease. Non-invasive coronary CT angiography (CCTA) is presently used to obtain measures of the thickness, volume, and attenuation of fat deposition. However, prior works solely focus on measuring PCAT using semi-auto… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 5 pages, 4 figures, IEE ISBI2024 conference

    MSC Class: 62P10

  46. arXiv:2311.11861  [pdf, other

    cs.CL cs.AI

    Generating Valid and Natural Adversarial Examples with Large Language Models

    Authors: Zimu Wang, Wei Wang, Qi Chen, Qiufeng Wang, Anh Nguyen

    Abstract: Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Submitted to the IEEE for possible publication

  47. arXiv:2311.11349  [pdf, other

    cs.LG math.OC

    Coverage-Validity-Aware Algorithmic Recourse

    Authors: Ngoc Bui, Duy Nguyen, Man-Chung Yue, Viet Anh Nguyen

    Abstract: Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency and hence ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the predictive model is usually updated upon the arrival of new data. Thus, a recourse that is valid respective to the present model may become invalid for the fut… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  48. arXiv:2311.11238  [pdf, other

    cs.HC cs.AI

    AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction

    Authors: Alice Cai, Caine Ardayfio, AnhPhu Nguyen, Tica Lin, Elena Glassman

    Abstract: As technological advancements in extended reality (XR) amplify the demand for more XR content, traditional development processes face several challenges: 1) a steep learning curve for inexperienced developers, 2) a disconnect between 2D development environments and 3D user experiences inside headsets, and 3) slow iteration cycles due to context switching between development and testing environment… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 15 pages, 14 figures, in submission

    ACM Class: H.5.2; I.2

  49. arXiv:2311.11209  [pdf, other

    eess.IV cs.CV

    3D Guidewire Shape Reconstruction from Monoplane Fluoroscopic Images

    Authors: Tudor Jianu, Baoru Huang, Pierre Berthet-Rayne, Sebastiano Fichera, Anh Nguyen

    Abstract: Endovascular navigation, essential for diagnosing and treating endovascular diseases, predominantly hinges on fluoroscopic images due to the constraints in sensory feedback. Current shape reconstruction techniques for endovascular intervention often rely on either a priori information or specialized equipment, potentially subjecting patients to heightened radiation exposure. While deep learning ho… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: 11 pages

  50. arXiv:2311.11205  [pdf, other

    eess.IV cs.CV

    Shape-Sensitive Loss for Catheter and Guidewire Segmentation

    Authors: Chayun Kongtongvattana, Baoru Huang, Jingxuan Kang, Hoan Nguyen, Olajide Olufemi, Anh Nguyen

    Abstract: We introduce a shape-sensitive loss function for catheter and guidewire segmentation and utilize it in a vision transformer network to establish a new state-of-the-art result on a large-scale X-ray images dataset. We transform network-derived predictions and their corresponding ground truths into signed distance maps, thereby enabling any networks to concentrate on the essential boundaries rather… ▽ More

    Submitted 19 January, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: 13 pages