Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 173 results for author: Ma, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11816  [pdf, other

    cs.CV

    VideoLLM-online: Online Video Large Language Model for Streaming Video

    Authors: Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou

    Abstract: Recent Large Language Models have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content. However, the learning methods of these large multimodal models typically treat videos as predetermined clips, making them less effective and efficient at handling streaming video inputs. In this paper, we propose a novel Learning-In-Video-St… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: CVPR 2024. This arxiv version is upgraded with Llama-3

  2. arXiv:2406.09923  [pdf, other

    cs.CL cs.AI cs.LG

    CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

    Authors: Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei Ping, Timothy S Chang, Wei Wang

    Abstract: The integration of Artificial Intelligence (AI), especially Large Language Models (LLMs), into the clinical diagnosis process offers significant potential to improve the efficiency and accessibility of medical care. While LLMs have shown some promise in the medical domain, their application in clinical diagnosis remains underexplored, especially in real-world clinical practice, where highly sophis… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project page: https://clibench.github.io

  3. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2406.06962  [pdf, other

    cs.CL cs.AI

    Evolving Subnetwork Training for Large Language Models

    Authors: Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu

    Abstract: Large language models have ushered in a new era of artificial intelligence research. However, their substantial training costs hinder further development and widespread adoption. In this paper, inspired by the redundancy in the parameters of large language models, we propose a novel training paradigm: Evolving Subnetwork Training (EST). EST samples subnetworks from the layers of the large language… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  5. arXiv:2406.01392  [pdf, other

    cs.CL

    Sparsity-Accelerated Training for Large Language Models

    Authors: Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu

    Abstract: Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this trai… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  6. arXiv:2405.19338  [pdf, other

    eess.SP cs.AI cs.CV

    Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images

    Authors: Yuzhen Ding, Jason M. Holmes, Hongying Feng, Baoxin Li, Lisa A. McGee, Jean-Claude M. Rwigema, Sujay A. Vora, Daniel J. Ma, Robert L. Foote, Samir H. Patel, Wei Liu

    Abstract: In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imag… ▽ More

    Submitted 1 April, 2024; originally announced May 2024.

    Comments: 17 pages, 8 figures and tables

  7. arXiv:2405.06909  [pdf, ps, other

    cs.LG cs.AI cs.CY

    Fairness in Reinforcement Learning: A Survey

    Authors: Anka Reuel, Devin Ma

    Abstract: While our understanding of fairness in machine learning has significantly progressed, our understanding of fairness in reinforcement learning (RL) remains nascent. Most of the attention has been on fairness in one-shot classification tasks; however, real-world, RL-enabled systems (e.g., autonomous vehicles) are much more complicated in that agents operate in dynamic environments over a long period… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 10 pages

    ACM Class: A.1; I.2

  8. arXiv:2405.05983  [pdf

    cs.CV cs.AI cs.LG

    Real-Time Pill Identification for the Visually Impaired Using Deep Learning

    Authors: Bo Dang, Wenchao Zhao, Yufeng Li, Danqing Ma, Qixuan Yu, Elly Yijun Zhu

    Abstract: The prevalence of mobile technology offers unique opportunities for addressing healthcare challenges, especially for individuals with visual impairments. This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. Utilizing the YOLO framework, the application aims to… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  9. arXiv:2404.12634  [pdf

    cs.CV cs.AI cs.LG

    Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment

    Authors: Danqing Ma, Meng Wang, Ao Xiang, Zongqing Qi, Qin Yang

    Abstract: This study proposes a multi-modal fusion framework Multitrans based on the Transformer architecture and self-attention mechanism. This architecture combines the study of non-contrast computed tomography (NCCT) images and discharge diagnosis reports of patients undergoing stroke treatment, using a variety of methods based on Transformer architecture approach to predicting functional outcomes of str… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  10. arXiv:2403.18349  [pdf, other

    cs.CL

    Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback

    Authors: Hongshen Xu, Zichen Zhu, Situo Zhang, Da Ma, Shuai Fan, Lu Chen, Kai Yu

    Abstract: Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations, due to their limitations in discerning questions beyond their knowledge scope. While addressing hallucination has been a focal point in research, previous efforts primarily concentrate on enhancing correctness without giving due consideration to the significance of rejection mechanisms. In this paper, we conduc… ▽ More

    Submitted 7 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  11. arXiv:2403.17421  [pdf, other

    cs.IR cs.AI

    MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

    Authors: Yiqun Chen, Jiaxin Mao, Yi Zhang, Dehong Ma, Long Xia, Jun Fan, Daiting Shi, Zhicong Cheng, Simiu Gu, Dawei Yin

    Abstract: The objective of search result diversification (SRD) is to ensure that selected documents cover as many different subtopics as possible. Existing methods primarily utilize a paradigm of "greedy selection", i.e., selecting one document with the highest diversity score at a time. These approaches tend to be inefficient and are easily trapped in a suboptimal state. In addition, some other methods aim… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  12. arXiv:2403.14483  [pdf, other

    cs.LG cs.AI q-fin.ST

    Utilizing the LightGBM Algorithm for Operator User Credit Assessment Research

    Authors: Shaojie Li, Xinqi Dong, Danqing Ma, Bo Dang, Hengyi Zang, Yulu Gong

    Abstract: Mobile Internet user credit assessment is an important way for communication operators to establish decisions and formulate measures, and it is also a guarantee for operators to obtain expected benefits. However, credit evaluation methods have long been monopolized by financial industries such as banks and credit. As supporters and providers of platform network technology and network resources, co… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  13. arXiv:2403.13703  [pdf

    cs.CV cs.AI

    Fostc3net:A Lightweight YOLOv5 Based On the Network Structure Optimization

    Authors: Danqing Ma, Shaojie Li, Bo Dang, Hengyi Zang, Xinqi Dong

    Abstract: Transmission line detection technology is crucial for automatic monitoring and ensuring the safety of electrical facilities. The YOLOv5 series is currently one of the most advanced and widely used methods for object detection. However, it faces inherent challenges, such as high computational load on devices and insufficient detection accuracy. To address these concerns, this paper presents an enha… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  14. arXiv:2403.12574  [pdf, other

    cs.CV cs.AI cs.NE

    EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks

    Authors: Ziming Wang, Ziling Wang, Huaning Li, Lang Qin, Runhao Jiang, De Ma, Huajin Tang

    Abstract: Event cameras, with their high dynamic range and temporal resolution, are ideally suited for object detection, especially under scenarios with motion blur and challenging lighting conditions. However, while most existing approaches prioritize optimizing spatiotemporal representations with advanced detection backbones and early aggregation functions, the crucial issue of adaptive event sampling rem… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  15. arXiv:2403.09035  [pdf, other

    cs.LG

    DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers

    Authors: Xiao Ma, Shengfeng He, Hezhe Qiao, Dong Ma

    Abstract: Enabling efficient and accurate deep neural network (DNN) inference on microcontrollers is non-trivial due to the constrained on-chip resources. Current methodologies primarily focus on compressing larger models yet at the expense of model accuracy. In this paper, we rethink the problem from the inverse perspective by constructing small/weak models directly and improving their accuracy. Thus, we i… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  16. arXiv:2403.08511  [pdf

    cs.CV

    A Multimodal Fusion Network For Student Emotion Recognition Based on Transformer and Tensor Product

    Authors: Ao Xiang, Zongqing Qi, Han Wang, Qin Yang, Danqing Ma

    Abstract: This paper introduces a new multi-modal model based on the Transformer architecture and tensor product fusion strategy, combining BERT's text vectors and ViT's image vectors to classify students' psychological conditions, with an accuracy of 93.65%. The purpose of the study is to accurately analyze the mental health status of students from various data sources. This paper discusses modal fusion me… ▽ More

    Submitted 19 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  17. arXiv:2403.08499  [pdf

    cs.CV

    Improved YOLOv5 Based on Attention Mechanism and FasterNet for Foreign Object Detection on Railway and Airway tracks

    Authors: Zongqing Qi, Danqing Ma, Jingyu Xu, Ao Xiang, Hedi Qu

    Abstract: In recent years, there have been frequent incidents of foreign objects intruding into railway and Airport runways. These objects can include pedestrians, vehicles, animals, and debris. This paper introduces an improved YOLOv5 architecture incorporating FasterNet and attention mechanisms to enhance the detection of foreign objects on railways and Airport runways. This study proposes a new dataset,… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  18. arXiv:2403.02586  [pdf, other

    cs.CL

    Improving Event Definition Following For Zero-Shot Event Detection

    Authors: Zefan Cai, Po-Nien Kung, Ashima Suvarna, Mingyu Derek Ma, Hritik Bansal, Baobao Chang, P. Jeffrey Brantingham, Wei Wang, Nanyun Peng

    Abstract: Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types, and prompt them with unseen event definitions. These approaches yield sporadic successes, yet generally fall short of expectations. In this work, we aim to improve zero-shot event detection by training models to better follow event definitions. We hypothesize that a diverse set of ev… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  19. arXiv:2402.18262  [pdf, other

    cs.CL cs.CV

    Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding

    Authors: Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu

    Abstract: The growing prevalence of visually rich documents, such as webpages and scanned/digital-born documents (images, PDFs, etc.), has led to increased interest in automatic document understanding and information extraction across academia and industry. Although various document modalities, including image, text, layout, and structure, facilitate human information retrieval, the interconnected nature of… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  20. arXiv:2402.09264  [pdf, other

    cs.LG cs.HC

    UR2M: Uncertainty and Resource-Aware Event Detection on Microcontrollers

    Authors: Hong Jia, Young D. Kwon, Dong Ma, Nhat Pham, Lorena Qendro, Tam Vu, Cecilia Mascolo

    Abstract: Traditional machine learning techniques are prone to generating inaccurate predictions when confronted with shifts in the distribution of data between the training and testing phases. This vulnerability can lead to severe consequences, especially in applications such as mobile healthcare. Uncertainty estimation has the potential to mitigate this issue by assessing the reliability of a model's outp… ▽ More

    Submitted 12 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  21. arXiv:2402.03557  [pdf, other

    cs.CV

    Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement

    Authors: Dayou Mao, Yuhao Chen, Yifan Wu, Maximilian Gilles, Alexander Wong

    Abstract: One of the main motivations of MTL is to develop neural networks capable of inferring multiple tasks simultaneously. While countless methods have been proposed in the past decade investigating robust model architectures and efficient training algorithms, there is still lack of understanding of these methods when applied on smaller feature extraction backbones, the generalizability of the commonly… ▽ More

    Submitted 16 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  22. arXiv:2401.14818  [pdf, other

    cs.CL cs.DL

    ChemDFM: Dialogue Foundation Model for Chemistry

    Authors: Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen, Xin Chen, Kai Yu

    Abstract: Large language models (LLMs) have established great success in the general domain of natural language processing. Their emerging task generalization and free-form dialogue capabilities can greatly help to design Chemical General Intelligence (CGI) to assist real-world research in chemistry. However, the existence of specialized language and knowledge in the field of chemistry, such as the highly i… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 10 pages, 12 figures, 13 tables. Under Review

  23. arXiv:2401.13976  [pdf, other

    cs.CV cs.AI

    Learning to Manipulate Artistic Images

    Authors: Wei Guo, Yuqi Zhang, De Ma, Qian Zheng

    Abstract: Recent advancement in computer vision has significantly lowered the barriers to artistic creation. Exemplar-based image translation methods have attracted much attention due to flexibility and controllability. However, these methods hold assumptions regarding semantics or require semantic information as the input, while accurate semantics is not easy to obtain in artistic images. Besides, these me… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  24. arXiv:2401.12255  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Instructional Fingerprinting of Large Language Models

    Authors: Jiashu Xu, Fei Wang, Mingyu Derek Ma, Pang Wei Koh, Chaowei Xiao, Muhao Chen

    Abstract: The exorbitant cost of training Large language models (LLMs) from scratch makes it essential to fingerprint the models to protect intellectual property via ownership authentication and to ensure downstream users and developers comply with their license terms (e.g. restricting commercial use). In this study, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tu… ▽ More

    Submitted 3 April, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted at NAACL 2024; 30 pages

  25. arXiv:2401.11504  [pdf, other

    cs.CL cs.AI

    With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation

    Authors: Y. Wang, D. Ma, D. Cai

    Abstract: Long text generation, such as novel writing and discourse-level translation with extremely long contexts, presents significant challenges to current language models. Existing methods mainly focus on extending the model's context window through strategies like length extrapolation. However, these approaches demand substantial hardware resources during the training and/or inference phases. Our propo… ▽ More

    Submitted 25 March, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

  26. arXiv:2401.06994  [pdf, other

    cs.CV

    UniVision: A Unified Framework for Vision-Centric 3D Perception

    Authors: Yu Hong, Qian Liu, Huayuan Cheng, Danjiao Ma, Hang Dai, Yu Wang, Guangzhi Cao, Yong Ding

    Abstract: The past few years have witnessed the rapid development of vision-centric 3D perception in autonomous driving. Although the 3D perception models share many structural and conceptual similarities, there still exist gaps in their feature representations, data formats, and objectives, posing challenges for unified and efficient 3D perception framework design. In this paper, we present UniVision, a si… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  27. arXiv:2401.00534  [pdf, other

    cs.LG q-fin.ST

    Financial Time-Series Forecasting: Towards Synergizing Performance And Interpretability Within a Hybrid Machine Learning Approach

    Authors: Shun Liu, Kexin Wu, Chufeng Jiang, Bin Huang, Danqing Ma

    Abstract: In the realm of cryptocurrency, the prediction of Bitcoin prices has garnered substantial attention due to its potential impact on financial markets and investment strategies. This paper propose a comparative study on hybrid machine learning algorithms and leverage on enhancing model interpretability. Specifically, linear regression(OLS, LASSO), long-short term memory(LSTM), decision tree regresso… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  28. arXiv:2312.17582  [pdf, other

    cs.NE cs.AR

    Darwin3: A large-scale neuromorphic chip with a Novel ISA and On-Chip Learning

    Authors: De Ma, Xiaofei Jin, Shichun Sun, Yitao Li, Xundong Wu, Youneng Hu, Fangchao Yang, Huajin Tang, Xiaolei Zhu, Peng Lin, Gang Pan

    Abstract: Spiking Neural Networks (SNNs) are gaining increasing attention for their biological plausibility and potential for improved computational efficiency. To match the high spatial-temporal dynamics in SNNs, neuromorphic chips are highly desired to execute SNNs in hardware-based neuron and synapse circuits directly. This paper presents a large-scale neuromorphic chip named Darwin3 with a novel instruc… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

  29. arXiv:2312.13108  [pdf, other

    cs.CV

    ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

    Authors: Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou

    Abstract: Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However, these tasks are primarily aimed at simple device usage and entertainment operations. This paper… ▽ More

    Submitted 1 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Project Page: https://showlab.github.io/assistgui/

  30. arXiv:2311.16678  [pdf, other

    cs.CL

    Entity-Aspect-Opinion-Sentiment Quadruple Extraction for Fine-grained Sentiment Analysis

    Authors: Dan Ma, Jun Xu, Zongyu Wang, Xuezhi Cao, Yunsen Xian

    Abstract: Product reviews often contain a large number of implicit aspects and object-attribute co-existence cases. Unfortunately, many existing studies in Aspect-Based Sentiment Analysis (ABSA) have overlooked this issue, which can make it difficult to extract opinions comprehensively and fairly. In this paper, we propose a new task called Entity-Aspect-Opinion-Sentiment Quadruple Extraction (EASQE), which… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  31. arXiv:2311.09630  [pdf, other

    cs.CL cs.CY cs.SI

    Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach

    Authors: Yanchen Liu, Mingyu Derek Ma, Wenna Qin, Azure Zhou, Jiaao Chen, Weiyan Shi, Wei Wang, Diyi Yang

    Abstract: Susceptibility to misinformation describes the degree of belief in unverifiable claims, a latent aspect of individuals' mental processes that is not observable. Existing susceptibility studies heavily rely on self-reported beliefs, which can be subject to bias, expensive to collect, and challenging to scale for downstream applications. To address these limitations, in this work, we propose a compu… ▽ More

    Submitted 16 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  32. arXiv:2311.08268  [pdf, other

    cs.CL

    A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily

    Authors: Peng Ding, Jun Kuang, Dan Ma, Xuezhi Cao, Yunsen Xian, Jiajun Chen, Shujian Huang

    Abstract: Large Language Models (LLMs), such as ChatGPT and GPT-4, are designed to provide useful and safe responses. However, adversarial prompts known as 'jailbreaks' can circumvent safeguards, leading LLMs to generate potentially harmful content. Exploring jailbreak prompts can help to better reveal the weaknesses of LLMs and further steer us to secure them. Unfortunately, existing jailbreak methods eith… ▽ More

    Submitted 6 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Acccepted by NAACL 2024, 18 pages, 7 figures, 13 tables

  33. arXiv:2310.19620  [pdf, other

    cs.RO cs.AI cs.CV

    Large Trajectory Models are Scalable Motion Predictors and Planners

    Authors: Qiao Sun, Shiduo Zhang, Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao, Hang Zhao

    Abstract: Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models… ▽ More

    Submitted 28 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  34. arXiv:2310.18662  [pdf, other

    cs.CL

    ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL

    Authors: Ruisheng Cao, Hanchong Zhang, Hongshen Xu, Jieyu Li, Da Ma, Lu Chen, Kai Yu

    Abstract: Text-to-SQL aims to generate an executable SQL program given the user utterance and the corresponding database schema. To ensure the well-formedness of output SQLs, one prominent approach adopts a grammar-based recurrent decoder to produce the equivalent SQL abstract syntax tree (AST). However, previous methods mainly utilize an RNN-series decoder, which 1) is time-consuming and inefficient and 2)… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  35. arXiv:2310.15560  [pdf, ps, other

    cs.PF

    Modeling and Design of the Communication Sensing and Control Coupled Closed-Loop Industrial System

    Authors: Zeyang Meng, Dingyou Ma, Shengfeng Wang, Zhiqing Wei, Zhiyong Feng

    Abstract: With the advent of 5G era, factories are transitioning towards wireless networks to break free from the limitations of wired networks. In 5G-enabled factories, unmanned automatic devices such as automated guided vehicles and robotic arms complete production tasks cooperatively through the periodic control loops. In such loops, the sensing data is generated by sensors, and transmitted to the contro… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 6 pages, 3 figures, received by GlobeCom 2023

    MSC Class: 93C55; 94A99 ACM Class: C.4

  36. arXiv:2310.13347  [pdf, other

    cs.CV cs.AI

    NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding

    Authors: Ming Hu, Lin Wang, Siyuan Yan, Don Ma, Qingli Ren, Peng Xia, Wei Feng, Peibo Duan, Lie Ju, Zongyuan Ge

    Abstract: The application of deep learning to nursing procedure activity understanding has the potential to greatly enhance the quality and safety of nurse-patient interactions. By utilizing the technique, we can facilitate training and education, improve quality control, and enable operational compliance monitoring. However, the development of automatic recognition systems in this field is currently hinder… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023 Datasets and Benchmarks Track

  37. Staged Depthwise Correlation and Feature Fusion for Siamese Object Tracking

    Authors: Dianbo Ma, Jianqiang Xiao, Ziyan Gao, Satoshi Yamane

    Abstract: In this work, we propose a novel staged depthwise correlation and feature fusion network, named DCFFNet, to further optimize the feature extraction for visual tracking. We build our deep tracker upon a siamese network architecture, which is offline trained from scratch on multiple large-scale datasets in an end-to-end manner. The model contains a core component, that is, depthwise correlation and… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: Accepted in 2023 International Joint Conference on Neural Networks (IJCNN)

  38. Foundation Ark: Accruing and Reusing Knowledge for Superior and Robust Performance

    Authors: DongAo Ma, Jiaxuan Pang, Michael B. Gotway, Jianming Liang

    Abstract: Deep learning nowadays offers expert-level and sometimes even super-expert-level performance, but achieving such performance demands massive annotated data for training (e.g., Google's proprietary CXR Foundation Model (CXR-FM) was trained on 821,544 labeled and mostly private chest X-rays (CXRs)). Numerous datasets are publicly available in medical imaging but individually small and heterogeneous… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: Best Paper Award Runner-Up at Medical Image Computing and Computer Assisted Intervention (MICCAI) 2023

  39. arXiv:2310.08795  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Mitigating Bias for Question Answering Models by Tracking Bias Influence

    Authors: Mingyu Derek Ma, Jiun-Yu Kao, Arpit Gupta, Yu-Hsiang Lin, Wenbo Zhao, Tagyoung Chung, Wei Wang, Kai-Wei Chang, Nanyun Peng

    Abstract: Models of various NLP tasks have been shown to exhibit stereotypes, and the bias in the question answering (QA) models is especially harmful as the output answers might be directly consumed by the end users. There have been datasets to evaluate bias in QA models, while bias mitigation technique for the QA models is still under-explored. In this work, we propose BMBI, an approach to mitigate the bi… ▽ More

    Submitted 17 June, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: To appear at NAACL 2024 main conference

  40. arXiv:2310.02529  [pdf, other

    cs.SI cs.AI cs.HC

    MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways

    Authors: Mingyu Derek Ma, Alexander K. Taylor, Nuan Wen, Yanchen Liu, Po-Nien Kung, Wenna Qin, Shicheng Wen, Azure Zhou, Diyi Yang, Xuezhe Ma, Nanyun Peng, Wei Wang

    Abstract: We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights, including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information. Besides discovering information flow patterns among users,… ▽ More

    Submitted 20 February, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: To appear at AAAI'24. System demo video and more info: info-pathways.github.io

  41. arXiv:2309.09627  [pdf, other

    cs.SD eess.AS

    Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

    Authors: Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda

    Abstract: We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conv… ▽ More

    Submitted 20 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024. Demo page: lesterphillip.github.io/icassp2024_el_sie

  42. arXiv:2308.16633  [pdf, other

    cs.CV

    Semi-Supervised SAR ATR Framework with Transductive Auxiliary Segmentation

    Authors: Chenwei Wang, Xiaoyu Liu, Yulin Huang, Siyi Luo, Jifang Pei, Jianyu Yang, Deqing Mao

    Abstract: Convolutional neural networks (CNNs) have achieved high performance in synthetic aperture radar (SAR) automatic target recognition (ATR). However, the performance of CNNs depends heavily on a large amount of training data. The insufficiency of labeled training SAR images limits the recognition performance and even invalidates some ATR methods. Furthermore, under few labeled training data, many exi… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

  43. arXiv:2308.16408  [pdf, other

    math.OC cs.CG

    Last Mile Delivery with Drones and Sharing Economy

    Authors: Mehdi Behroozi, Dinghao Ma

    Abstract: We consider a combined system of regular delivery trucks and crowdsourced drones, available via a sharing economy platform, to provide a technology-assisted crowd-based last-mile delivery experience. We develop analytical models and methods for a system in which package delivery is performed by a big truck carrying many packages to a neighborhood or a town in a metropolitan area and then the packa… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  44. arXiv:2308.14329  [pdf, other

    cs.RO cs.AI

    End-to-End Driving via Self-Supervised Imitation Learning Using Camera and LiDAR Data

    Authors: Jin Bok Park, Jinkyu Lee, Muhyun Back, Hyunmin Han, David T. Ma, Sang Min Won, Sung Soo Hwang, Il Yong Chun

    Abstract: In autonomous driving, the end-to-end (E2E) driving approach that predicts vehicle control signals directly from sensor data is rapidly gaining attention. To learn a safe E2E driving system, one needs an extensive amount of driving data and human intervention. Vehicle control data is constructed by many hours of human driving, and it is challenging to construct large vehicle control datasets. Ofte… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 20 pages, 8 figures

  45. arXiv:2308.13149  [pdf, other

    cs.CL

    SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

    Authors: Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen, Kai Yu

    Abstract: Recently, there has been growing interest in using Large Language Models (LLMs) for scientific research. Numerous benchmarks have been proposed to evaluate the ability of LLMs for scientific research. However, current benchmarks are mostly based on pre-collected objective questions. This design suffers from data leakage problem and lacks the evaluation of subjective Q/A ability. In this paper, we… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: 12 pages, 17 figures, 12 tables. Under Review

  46. arXiv:2308.10521  [pdf, other

    cs.CV

    PHE-SICH-CT-IDS: A Benchmark CT Image Dataset for Evaluation Semantic Segmentation, Object Detection and Radiomic Feature Extraction of Perihematomal Edema in Spontaneous Intracerebral Hemorrhage

    Authors: Deguo Ma, Chen Li, Lin Qiao, Tianming Du, Dechao Tang, Zhiyu Ma, Marcin Grzegorzek Hongzan, Hongzan Sun

    Abstract: Intracerebral hemorrhage is one of the diseases with the highest mortality and poorest prognosis worldwide. Spontaneous intracerebral hemorrhage (SICH) typically presents acutely, prompt and expedited radiological examination is crucial for diagnosis, localization, and quantification of the hemorrhage. Early detection and accurate segmentation of perihematomal edema (PHE) play a critical role in g… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  47. arXiv:2308.08313  [pdf, other

    eess.IV cs.CV

    ECPC-IDS:A benchmark endometrail cancer PET/CT image dataset for evaluation of semantic segmentation and detection of hypermetabolic regions

    Authors: Dechao Tang, Tianming Du, Deguo Ma, Zhiyu Ma, Hongzan Sun, Marcin Grzegorzek, Huiyan Jiang, Chen Li

    Abstract: Endometrial cancer is one of the most common tumors in the female reproductive system and is the third most common gynecological malignancy that causes death after ovarian and cervical cancer. Early diagnosis can significantly improve the 5-year survival rate of patients. With the development of artificial intelligence, computer-assisted diagnosis plays an increasingly important role in improving… ▽ More

    Submitted 11 October, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 14 pages,6 figures

  48. arXiv:2308.08172  [pdf, other

    eess.IV cs.CV cs.LG

    AATCT-IDS: A Benchmark Abdominal Adipose Tissue CT Image Dataset for Image Denoising, Semantic Segmentation, and Radiomics Evaluation

    Authors: Zhiyu Ma, Chen Li, Tianming Du, Le Zhang, Dechao Tang, Deguo Ma, Shanchuan Huang, Yan Liu, Yihao Sun, Zhihao Chen, Jin Yuan, Qianqing Nie, Marcin Grzegorzek, Hongzan Sun

    Abstract: Methods: In this study, a benchmark \emph{Abdominal Adipose Tissue CT Image Dataset} (AATTCT-IDS) containing 300 subjects is prepared and published. AATTCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3,213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentati… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 17 pages, 7 figures

  49. arXiv:2307.15074  [pdf, other

    cs.IT cs.LG

    ISAC-NET: Model-driven Deep Learning for Integrated Passive Sensing and Communication

    Authors: Wangjun Jiang, Dingyou Ma, Zhiqing Wei, Zhiyong Feng, Ping Zhang

    Abstract: Recent advances in wireless communication with the enormous demands of sensing ability have given rise to the integrated sensing and communication (ISAC) technology, among which passive sensing plays an important role. The main challenge of passive sensing is how to achieve high sensing performance in the condition of communication demodulation errors. In this paper, we propose an ISAC network (IS… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: 29 pages, 11 figures

  50. arXiv:2307.10244  [pdf, other

    cs.IR cs.LG

    Evaluating and Enhancing Robustness of Deep Recommendation Systems Against Hardware Errors

    Authors: Dongning Ma, Xun Jiao, Fred Lin, Mengshi Zhang, Alban Desmaison, Thomas Sellinger, Daniel Moore, Sriram Sankar

    Abstract: Deep recommendation systems (DRS) heavily depend on specialized HPC hardware and accelerators to optimize energy, efficiency, and recommendation quality. Despite the growing number of hardware errors observed in large-scale fleet systems where DRS are deployed, the robustness of DRS has been largely overlooked. This paper presents the first systematic study of DRS robustness against hardware error… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.