Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 739 results for author: Ding, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.19821  [pdf

    eess.IV cs.CV q-bio.TO

    Distilling High Diagnostic Value Patches for Whole Slide Image Classification Using Attention Mechanism

    Authors: Tianhang Nan, Hao Quan, Yong Ding, Xingyu Li, Kai Yang, Xiaoyu Cui

    Abstract: Multiple Instance Learning (MIL) has garnered widespread attention in the field of Whole Slide Image (WSI) classification as it replaces pixel-level manual annotation with diagnostic reports as labels, significantly reducing labor costs. Recent research has shown that bag-level MIL methods often yield better results because they can consider all patches of the WSI as a whole. However, a drawback o… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2407.19201  [pdf, other

    cs.LG

    Long Range Switching Time Series Prediction via State Space Model

    Authors: Jiaming Zhang, Yang Ding, Yunfeng Gao

    Abstract: In this study, we delve into the Structured State Space Model (S4), Change Point Detection methodologies, and the Switching Non-linear Dynamics System (SNLDS). Our central proposition is an enhanced inference technique and long-range dependency method for SNLDS. The cornerstone of our approach is the fusion of S4 and SNLDS, leveraging the strengths of both models to effectively address the intrica… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 14 pages, 14 figures

  3. arXiv:2407.17349  [pdf, other

    cs.CL

    Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching

    Authors: Yuyang Ding, Hanglei Hu, Jie Zhou, Qin Chen, Bo Jiang, Liang He

    Abstract: With the introduction of large language models (LLMs), automatic math reasoning has seen tremendous success. However, current methods primarily focus on providing solutions or using techniques like Chain-of-Thought to enhance problem-solving accuracy. In this paper, we focus on improving the capability of mathematics teaching via a Socratic teaching-based LLM (\texttt{SocraticLLM}), which guides l… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted By CIKM 2024

  4. arXiv:2407.17157  [pdf

    cs.CV q-bio.TO

    Establishing Truly Causal Relationship Between Whole Slide Image Predictions and Diagnostic Evidence Subregions in Deep Learning

    Authors: Tianhang Nan, Yong Ding, Hao Quan, Deliang Li, Mingchen Zou, Xiaoyu Cui

    Abstract: In the field of deep learning-driven Whole Slide Image (WSI) classification, Multiple Instance Learning (MIL) has gained significant attention due to its ability to be trained using only slide-level diagnostic labels. Previous MIL researches have primarily focused on enhancing feature aggregators for globally analyzing WSIs, but overlook a causal relationship in diagnosis: model's prediction shoul… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  5. arXiv:2407.17126  [pdf

    cs.CL cs.AI

    SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

    Authors: Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding

    Abstract: Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  6. arXiv:2407.17030  [pdf, other

    cs.NI

    Applications of Multi-Agent Deep Reinforcement Learning Communication in Network Management: A Survey

    Authors: Yue Pi, Wang Zhang, Yong Zhang, Hairong Huang, Baoquan Rao, Yulong Ding, Shuanghua Yang

    Abstract: With the advancement of artificial intelligence technology, the automation of network management, also known as Autonomous Driving Networks (ADN), is gaining widespread attention. The network management has shifted from traditional homogeneity and centralization to heterogeneity and decentralization. Multi-agent deep reinforcement learning (MADRL) allows agents to make decisions based on local obs… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  7. arXiv:2407.15617  [pdf, other

    cs.CV cs.AI

    Norface: Improving Facial Expression Analysis by Identity Normalization

    Authors: Hanwei Liu, Rudong An, Zhimeng Zhang, Bowen Ma, Wei Zhang, Yan Song, Yujing Hu, Wei Chen, Yu Ding

    Abstract: Facial Expression Analysis remains a challenging task due to unexpected task-irrelevant noise, such as identity, head pose, and background. To address this issue, this paper proposes a novel framework, called Norface, that is unified for both Action Unit (AU) analysis and Facial Emotion Recognition (FER) tasks. Norface consists of a normalization network and a classification network. First, the ca… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  8. arXiv:2407.14996  [pdf, other

    cs.LG

    All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks

    Authors: Ajay Jaiswal, Nurendra Choudhary, Ravinarayana Adkathimar, Muthu P. Alagappan, Gaurush Hiranandani, Ying Ding, Zhangyang Wang, Edward W Huang, Karthik Subbian

    Abstract: Graph Neural Networks (GNNs) have attracted immense attention in the past decade due to their numerous real-world applications built around graph-structured data. On the other hand, Large Language Models (LLMs) with extensive pretrained knowledge and powerful semantic comprehension abilities have recently shown a remarkable ability to benefit applications using vision and text data. In this paper,… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  9. arXiv:2407.14643  [pdf, other

    cs.RO

    Double-Layer Soft Data Fusion for Indoor Robot WiFi-Visual Localization

    Authors: Yuehua Ding, Jean-Francois Dollinger, Vincent Vauchey, Mourad Zghal

    Abstract: This paper presents a novel WiFi-Visual data fusion method for indoor robot (TIAGO++) localization. This method can use 10 WiFi samples and 4 low-resolution images ($58 \times 58$ in pixels) to localize a indoor robot with an average error distance about 1.32 meters. The experiment test is 3 months after the data collection in a general teaching building, whose WiFi and visual environments are par… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  10. arXiv:2407.13126  [pdf, other

    cs.DC

    Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration

    Authors: Tianyu Wang, Sheng Li, Bingyao Li, Yue Dai, Ao Li, Geng Yuan, Yufei Ding, Youtao Zhang, Xulong Tang

    Abstract: Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (through model retraining) and use the updated model (if available) to serve overtime arriving inference requests. It is generally beneficial to co-locate the retraining and inference together to enabl… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  11. arXiv:2407.08255  [pdf, other

    cs.CV cs.LG

    GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification

    Authors: Aitao Yang, Min Li, Yao Ding, Leyuan Fang, Yaoming Cai, Yujie He

    Abstract: Efficient extraction of spectral sequences and geospatial information has always been a hot topic in hyperspectral image classification. In terms of spectral sequence feature capture, RNN and Transformer have become mainstream classification frameworks due to their long-range feature capture capabilities. In terms of spatial information aggregation, CNN enhances the receptive field to retain integ… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 13 pages, 10 figures

  12. arXiv:2407.06505  [pdf

    cs.HC

    Not all explicit cues help communicate: Pedestrians' perceptions, fixations, and decisions toward automated vehicles with varied appearance

    Authors: Wei Lyu, Yaqin Cao, Yi Ding, Jingyu Li, Kai Tian, Hui Zhang

    Abstract: Given pedestrians' vulnerability in road traffic, it remains unclear how novel AV appearances will impact pedestrians crossing behaviour. To address this gap, this study pioneers an investigation into the influence of AVs' exterior design, correlated with their kinematics, on pedestrians' road-crossing perception and decision-making. A video-based eye-tracking experimental study was conducted with… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 37 pages, 13 figures, 4 tables

  13. CrowdTransfer: Enabling Crowd Knowledge Transfer in AIoT Community

    Authors: Yan Liu, Bin Guo, Nuo Li, Yasan Ding, Zhouyangzi Zhang, Zhiwen Yu

    Abstract: Artificial Intelligence of Things (AIoT) is an emerging frontier based on the deep fusion of Internet of Things (IoT) and Artificial Intelligence (AI) technologies. Although advanced deep learning techniques enhance the efficient data processing and intelligent analysis of complex IoT data, they still suffer from notable challenges when deployed to practical AIoT applications, such as constrained… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted for publication in IEEE Communications Surveys & Tutorials. Copyright will be transferred without notice, after this version may no longer be accessible

  14. arXiv:2407.05739  [pdf, other

    cs.NE cs.AI

    Multi-Bit Mechanism: A Novel Information Transmission Paradigm for Spiking Neural Networks

    Authors: Yongjun Xiao, Xianlong Tian, Yongqi Ding, Pei He, Mengmeng Jing, Lin Zuo

    Abstract: Since proposed, spiking neural networks (SNNs) gain recognition for their high performance, low power consumption and enhanced biological interpretability. However, while bringing these advantages, the binary nature of spikes also leads to considerable information loss in SNNs, ultimately causing performance degradation. We claim that the limited expressiveness of current binary spikes, resulting… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Under review

  15. arXiv:2407.03474  [pdf

    cs.CY

    How high-status women promote repeated collaboration among women in male-dominated contexts

    Authors: Huimin Xu, Jamie Strassman, Ying Ding, Steven Gray, Maytal Saar-Tsechansky

    Abstract: Male-dominated contexts pose a dilemma: they increase the benefits of repeated collaboration among women, yet at the same time, make such collaborations less likely. This paper seeks to understand the conditions that foster repeated collaboration among women versus men in male-dominated settings by examining the critical role of status hierarchies. Using collaboration data on 8,232,769 computer sc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  16. arXiv:2407.03374  [pdf

    cs.AI cs.SE eess.SP eess.SY

    An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

    Authors: Laifa Tao, Shangyu Li, Haifei Liu, Qixuan Huang, Liang Ma, Guoao Ning, Yiling Chen, Yunlong Wu, Bin Li, Weiwei Zhang, Zhengduo Zhao, Wenchao Zhan, Wenyan Cao, Chao Wang, Hongmei Liu, Jian Ma, Mingliang Suo, Yujie Cheng, Yu Ding, Dengwei Song, Chen Lu

    Abstract: Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Larg… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  17. arXiv:2407.02390  [pdf, other

    cs.DC cs.LG

    Uncertainty-Aware Decarbonization for Datacenters

    Authors: Amy Li, Sihang Liu, Yi Ding

    Abstract: This paper represents the first effort to quantify uncertainty in carbon intensity forecasting for datacenter decarbonization. We identify and analyze two types of uncertainty -- temporal and spatial -- and discuss their system implications. To address the temporal dynamics in quantifying uncertainty for carbon intensity forecasting, we introduce a conformal prediction-based framework. Evaluation… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  18. arXiv:2407.02159  [pdf, other

    cs.CV eess.IV

    SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images

    Authors: Jintu Zheng, Yi Ding, Qizhe Liu, Yi Cao, Ying Hu, Zenan Wang

    Abstract: Traditional fluorescence staining is phototoxic to live cells, slow, and expensive; thus, the subcellular structure prediction (SSP) from transmitted light (TL) images is emerging as a label-free, faster, low-cost alternative. However, existing approaches utilize 3D networks for one-to-one voxel level dense prediction, which necessitates a frequent and time-consuming Z-axis imaging process. Moreov… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accpeted to ECCV2024

  19. arXiv:2407.00105  [pdf, other

    cs.LG cs.AI

    Multiple Kronecker RLS fusion-based link propagation for drug-side effect prediction

    Authors: Yuqing Qian, Ziyu Zheng, Prayag Tiwari, Yijie Ding, Quan Zou

    Abstract: Drug-side effect prediction has become an essential area of research in the field of pharmacology. As the use of medications continues to rise, so does the importance of understanding and mitigating the potential risks associated with them. At present, researchers have turned to data-driven methods to predict drug-side effects. Drug-side effect prediction is a link prediction problem, and the rela… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Transactions on Machine Learning Research (TMLR 2024)

  20. arXiv:2406.18938  [pdf, other

    cs.IR

    Towards Personalized Federated Multi-scenario Multi-task Recommendation

    Authors: Yue Ding, Yanbiao Ji, Xun Cai, Xin Xin, Xiaofeng Gao, Hongtao Lu

    Abstract: In modern recommender system applications, such as e-commerce, predicting multiple targets like click-through rate (CTR) and post-view click-through \& conversion rate (CTCVR) is common. Multi-task recommender systems are gaining traction in research and practical use. Existing multi-task recommender systems tackle diverse business scenarios, merging and modeling these scenarios unlocks shared kno… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  21. arXiv:2406.18585  [pdf, other

    cs.CV cs.AI

    Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition

    Authors: Lin Zuo, Kunshan Yang, Xianlong Tian, Kunbin He, Yongqi Ding, Mengmeng Jing

    Abstract: Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise fro… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: under review

  22. arXiv:2406.18345  [pdf, other

    cs.LG eess.SP

    EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition

    Authors: Yi Ding, Chengxuan Tong, Shuailei Zhang, Muyun Jiang, Yong Li, Kevin Lim Jun Liang, Cuntai Guan

    Abstract: Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  23. arXiv:2406.17659  [pdf, other

    cs.AI cs.RO

    DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

    Authors: Xiaohan Zhang, Zainab Altaweel, Yohei Hayamizu, Yan Ding, Saeid Amiri, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang

    Abstract: Vision-language models (VLMs) have been applied to robot task planning problems, where the robot receives a task in natural language and generates plans based on visual inputs. While current VLMs have demonstrated strong vision-language understanding capabilities, their performance is still far from being satisfactory in planning tasks. At the same time, although classical task planners, such as P… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  24. arXiv:2406.16034  [pdf, ps, other

    math.LO cs.LO

    Some General Completeness Results for Propositionally Quantified Modal Logics

    Authors: Yifeng Ding, Yipu Li

    Abstract: We study the completeness problem for propositionally quantified modal logics on quantifiable general frames, where the admissible sets are the propositions the quantifiers can range over and expressible sets of worlds are admissible, and Kripke frames, where the quantifiers range over all sets of worlds. We show that any normal propositionally quantified modal logic containing all instances of th… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  25. arXiv:2406.13951  [pdf, other

    cs.CV

    Towards the in-situ Trunk Identification and Length Measurement of Sea Cucumbers via Bézier Curve Modelling

    Authors: Shuaixin Liu, Kunqian Li, Yilin Ding, Kuangwei Xu, Qianli Jiang, Q. M. Jonathan Wu, Dalei Song

    Abstract: We introduce a novel vision-based framework for in-situ trunk identification and length measurement of sea cucumbers, which plays a crucial role in the monitoring of marine ranching resources and mechanized harvesting. To model sea cucumber trunk curves with varying degrees of bending, we utilize the parametric Bézier curve due to its computational simplicity, stability, and extensive range of tra… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  26. arXiv:2406.13344  [pdf, other

    cs.CV

    WaterMono: Teacher-Guided Anomaly Masking and Enhancement Boosting for Robust Underwater Self-Supervised Monocular Depth Estimation

    Authors: Yilin Ding, Kunqian Li, Han Mei, Shuaixin Liu, Guojia Hou

    Abstract: Depth information serves as a crucial prerequisite for various visual tasks, whether on land or underwater. Recently, self-supervised methods have achieved remarkable performance on several terrestrial benchmarks despite the absence of depth annotations. However, in more challenging underwater scenarios, they encounter numerous brand-new obstacles such as the influence of marine life and degradati… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  27. arXiv:2406.12404  [pdf

    cs.CV

    Scan-to-BIM for As-built Roads: Automatic Road Digital Twinning from Semantically Labeled Point Cloud Data

    Authors: Yuexiong Ding, Mengtian Yin, Ran Wei, Ioannis Brilakis, Muyang Liu, Xiaowei Luo

    Abstract: Creating geometric digital twins (gDT) for as-built roads still faces many challenges, such as low automation level and accuracy, limited asset types and shapes, and reliance on engineering experience. A novel scan-to-building information modeling (scan-to-BIM) framework is proposed for automatic road gDT creation based on semantically labeled point cloud data (PCD), which considers six asset type… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  28. arXiv:2406.12395  [pdf

    cs.CV

    SDNIA-YOLO: A Robust Object Detection Model for Extreme Weather Conditions

    Authors: Yuexiong Ding, Xiaowei Luo

    Abstract: Though current object detection models based on deep learning have achieved excellent results on many conventional benchmark datasets, their performance will dramatically decline on real-world images taken under extreme conditions. Existing methods either used image augmentation based on traditional image processing algorithms or applied customized and scene-limited image adaptation technologies f… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  29. arXiv:2406.09998  [pdf, other

    eess.AS cs.AI cs.LG cs.MM cs.SD

    Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors

    Authors: Chaeyeon Han, Pavan Seshadri, Yiwei Ding, Noah Posner, Bon Woo Koo, Animesh Agrawal, Alexander Lerch, Subhrajit Guhathakurta

    Abstract: While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study dis… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: submitted to Urban Informatics

  30. arXiv:2406.07862  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Self-Distillation Learning Based on Temporal-Spatial Consistency for Spiking Neural Networks

    Authors: Lin Zuo, Yongqi Ding, Mengmeng Jing, Kunshan Yang, Yunqian Yu

    Abstract: Spiking neural networks (SNNs) have attracted considerable attention for their event-driven, low-power characteristics and high biological interpretability. Inspired by knowledge distillation (KD), recent research has improved the performance of the SNN model with a pre-trained teacher model. However, additional teacher models require significant computational resources, and it is tedious to manua… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 6 figures

    ACM Class: I.2.6; I.5.1

  31. 1-D CNN-Based Online Signature Verification with Federated Learning

    Authors: Lingfeng Zhang, Yuheng Guo, Yepeng Ding, Hiroyuki Sato

    Abstract: Online signature verification plays a pivotal role in security infrastructures. However, conventional online signature verification models pose significant risks to data privacy, especially during training processes. To mitigate these concerns, we propose a novel federated learning framework that leverages 1-D Convolutional Neural Networks (CNN) for online signature verification. Furthermore, our… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 8 pages, 11 figures, 1 table

  32. arXiv:2406.06025  [pdf, other

    cs.SE cs.CL cs.LG

    RepoQA: Evaluating Long Context Code Understanding

    Authors: Jiawei Liu, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding, Yuhan Katherine Wang, Jun Yang, Lingming Zhang

    Abstract: Recent advances have been improving the context windows of Large Language Models (LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been developed to test LLMs over a large chunk of raw texts. While effective, current evaluations overlook the insight of how LLMs work with long-context code, i.e., repositories. To this end, we in… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  33. arXiv:2406.04802  [pdf, other

    cs.CV cs.LG

    Predictive Dynamic Fusion

    Authors: Bing Cao, Yinan Xia, Yi Ding, Changqing Zhang, Qinghua Hu

    Abstract: Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most existing dynamic multimodal fusion methods lack theoretical guarantees and easily fall into suboptimal problems, yielding unreliability and instability.… ▽ More

    Submitted 13 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  34. arXiv:2406.04151  [pdf, other

    cs.AI cs.CL

    AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

    Authors: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project site: https://agentgym.github.io

  35. arXiv:2406.02133  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    SimulTron: On-Device Simultaneous Speech to Speech Translation

    Authors: Alex Agranovich, Eliya Nachmani, Oleg Rybakov, Yifan Ding, Ye Jia, Nadav Bar, Heiga Zen, Michelle Tadmor Ramanovich

    Abstract: Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down communication barriers and enabling fluid conversations across languages. However, achieving accurate, real-time translation through mobile devices remains a major challenge. We introduce SimulTron, a novel S2ST architecture designed to tackle this task. SimulTron is a lightweight direct S2ST model that uses the st… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  36. arXiv:2406.01006  [pdf, other

    cs.CL cs.AI cs.SE

    SemCoder: Training Code Language Models with Comprehensive Semantics

    Authors: Yangruibo Ding, Jinjun Peng, Marcus J. Min, Gail Kaiser, Junfeng Yang, Baishakhi Ray

    Abstract: Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for thorough semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy to train Code LLMs with c… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  37. Model-Driven Security Analysis of Self-Sovereign Identity Systems

    Authors: Yepeng Ding, Hiroyuki Sato

    Abstract: Best practices of self-sovereign identity (SSI) are being intensively explored in academia and industry. Reusable solutions obtained from best practices are generalized as architectural patterns for systematic analysis and design reference, which significantly boosts productivity and increases the dependability of future implementations. For security-sensitive projects, architects make architectur… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  38. arXiv:2405.20860  [pdf, other

    cs.LG

    Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

    Authors: Shangding Gu, Laixi Shi, Yuhao Ding, Alois Knoll, Costas Spanos, Adam Wierman, Ming Jin

    Abstract: Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the ef… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  39. arXiv:2405.19338  [pdf, other

    eess.SP cs.AI cs.CV

    Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images

    Authors: Yuzhen Ding, Jason M. Holmes, Hongying Feng, Baoxin Li, Lisa A. McGee, Jean-Claude M. Rwigema, Sujay A. Vora, Daniel J. Ma, Robert L. Foote, Samir H. Patel, Wei Liu

    Abstract: In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imag… ▽ More

    Submitted 1 April, 2024; originally announced May 2024.

    Comments: 17 pages, 8 figures and tables

  40. arXiv:2405.18194  [pdf, other

    cs.LG cs.CR

    Delving into Differentially Private Transformer

    Authors: Youlong Ding, Xueyang Wu, Yining Meng, Yonggang Luo, Hao Wang, Weike Pan

    Abstract: Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to `reduce' the problem of training DP Transformer to the mor… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  41. Superpixelwise Low-rank Approximation based Partial Label Learning for Hyperspectral Image Classification

    Authors: Shujun Yang, Yu Zhang, Yao Ding, Danfeng Hong

    Abstract: Insufficient prior knowledge of a captured hyperspectral image (HSI) scene may lead the experts or the automatic labeling systems to offer incorrect labels or ambiguous labels (i.e., assigning each training sample to a group of candidate labels, among which only one of them is valid; this is also known as partial label learning) during the labeling process. Accordingly, how to learn from such data… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 0

    Journal ref: journal={IEEE Geoscience and Remote Sensing Letters}, year={2023}, publisher={IEEE}

  42. arXiv:2405.16601  [pdf, other

    cs.LG

    A CMDP-within-online framework for Meta-Safe Reinforcement Learning

    Authors: Vanshaj Khattar, Yuhao Ding, Bilgehan Sel, Javad Lavaei, Ming Jin

    Abstract: Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience. However, the aspect of constraint violations has not been adequately addressed in the existing works, making their application restricted in real-world settings. In this paper, we study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-on… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Journal ref: ICLR 2023

  43. arXiv:2405.16390  [pdf, other

    cs.AI cs.LG

    Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning

    Authors: Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Alois Knoll, Ming Jin

    Abstract: In numerous reinforcement learning (RL) problems involving safety-critical systems, a key challenge lies in balancing multiple objectives while simultaneously meeting all stringent safety constraints. To tackle this issue, we propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence. Our method employs a novel natural policy gr… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  44. arXiv:2405.16214  [pdf, other

    cs.CV

    Underwater Image Enhancement by Diffusion Model with Customized CLIP-Classifier

    Authors: Shuaixin Liu, Kunqian Li, Yilin Ding, Qi Qi

    Abstract: Underwater Image Enhancement (UIE) aims to improve the visual quality from a low-quality input. Unlike other image enhancement tasks, underwater images suffer from the unavailability of real reference images. Although existing works exploit synthetic images and manually select well-enhanced images as reference images to train enhancement networks, their upper performance bound is limited by the re… ▽ More

    Submitted 7 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  45. arXiv:2405.13401  [pdf, ps, other

    cs.CR cs.CL

    TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models

    Authors: Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Ping Yi, Zhuosheng Zhang, Gongshen Liu

    Abstract: Large language models (LLMs) have raised concerns about potential security threats despite performing significantly in Natural Language Processing (NLP). Backdoor attacks initially verified that LLM is doing substantial harm at all stages, but the cost and robustness have been criticized. Attacking LLMs is inherently risky in security review, while prohibitively expensive. Besides, the continuous… ▽ More

    Submitted 7 July, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures, 4 tables

  46. arXiv:2405.11647  [pdf, other

    cs.AI cs.LG

    Hummer: Towards Limited Competitive Preference Dataset

    Authors: Li Jiang, Yusen Wu, Junwu Xiong, Jingqing Ruan, Yichuan Ding, Qingpei Guo, Zujie Wen, Jun Zhou, Xiaotie Deng

    Abstract: Preference datasets are essential for incorporating human preferences into pre-trained language models, playing a key role in the success of Reinforcement Learning from Human Feedback. However, these datasets often demonstrate conflicting alignment objectives, leading to increased vulnerability to jailbreak attacks and challenges in adapting downstream tasks to prioritize specific alignment object… ▽ More

    Submitted 20 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: 9 pages, 5 figures

  47. arXiv:2405.06211  [pdf, other

    cs.CL cs.AI cs.IR

    A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models

    Authors: Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li

    Abstract: As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-Generated Content (AIGC), the powerful capacity of retrieval in providing additional knowledge enables RAG to assist existing generative AI in producing high-quality outputs. Recently, L… ▽ More

    Submitted 17 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: This is the long version of the corresponding survey paper accepted by KDD2024

  48. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  49. arXiv:2405.04819  [pdf, other

    cs.CL cs.AI

    DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature

    Authors: Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, Huan Liu, Li Shen, Tianlong Chen

    Abstract: Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on… ▽ More

    Submitted 12 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: Under Review; Incorrect author name revised

  50. arXiv:2405.03144  [pdf, other

    cs.CV cs.LG

    PTQ4SAM: Post-Training Quantization for Segment Anything

    Authors: Chengtao Lv, Hong Chen, Jinyang Guo, Yifu Ding, Xianglong Liu

    Abstract: Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to th… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: CVPR 2024