Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 855 results for author: Jiang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02438  [pdf, other

    cs.CV

    Non-target Divergence Hypothesis: Toward Understanding Domain Gaps in Cross-Modal Knowledge Distillation

    Authors: Yilong Chen, Zongyi Xu, Xiaoshui Huang, Shanshan Zhao, Xinqi Jiang, Xinyu Gao, Xinbo Gao

    Abstract: Compared to single-modal knowledge distillation, cross-modal knowledge distillation faces more severe challenges due to domain gaps between modalities. Although various methods have proposed various solutions to overcome these challenges, there is still limited research on how domain gaps affect cross-modal knowledge distillation. This paper provides an in-depth analysis and evaluation of this iss… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.00920  [pdf, other

    cs.LG cs.AI cs.CL

    ToolACE: Winning the Points of LLM Function Calling

    Authors: Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian , et al. (2 additional authors not shown)

    Abstract: Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic ag… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 21 pages, 22 figures

  3. arXiv:2408.17424  [pdf, other

    cs.CV cs.HC

    CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion

    Authors: Yiran Chen, Anyi Rao, Xuekun Jiang, Shishi Xiao, Ruiqing Ma, Zeyu Wang, Hui Xiong, Bo Dai

    Abstract: With advancements in video generative AI models (e.g., SORA), creators are increasingly using these techniques to enhance video previsualization. However, they face challenges with incomplete and mismatched AI workflows. Existing methods mainly rely on text descriptions and struggle with camera placement, a key component of previsualization. To address these issues, we introduce CinePreGen, a visu… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  4. arXiv:2408.16256  [pdf

    cs.LG cs.AI cs.NE q-bio.QM

    Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer Metastasis Using Real-World Clinical Data with AUC up to 0.9

    Authors: Xia Jiang, Yijun Zhou, Alan Wells, Adam Brufsky

    Abstract: Breast cancer is one of the two cancers responsible for the most deaths in women, with about 42,000 deaths each year in the US. That there are over 300,000 breast cancers newly diagnosed each year suggests that only a fraction of the cancers result in mortality. Thus, most of the women undergo seemingly curative treatment for localized cancers, but a significant later succumb to metastatic disease… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  5. arXiv:2408.16186  [pdf, other

    math.OC cs.LG

    Single-Loop Deterministic and Stochastic Interior-Point Algorithms for Nonlinearly Constrained Optimization

    Authors: Frank E. Curtis, Xin Jiang, Qi Wang

    Abstract: An interior-point algorithm framework is proposed, analyzed, and tested for solving nonlinearly constrained continuous optimization problems. The main setting of interest is when the objective and constraint functions may be nonlinear and/or nonconvex, and when constraint values and derivatives are tractable to compute, but objective function values and derivatives can only be estimated. The algor… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Report number: Lehigh ISE Technical Report 24T-008

  6. arXiv:2408.15498  [pdf

    cs.LG cs.AI cs.NE q-bio.QM

    Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep Feed-forward Neural Network

    Authors: Yijun Zhou, Om Arora-Jain, Xia Jiang

    Abstract: While machine learning has advanced in medicine, its widespread use in clinical applications, especially in predicting breast cancer metastasis, is still limited. We have been dedicated to constructing a DFNN model to predict breast cancer metastasis n years in advance. However, the challenge lies in efficiently identifying optimal hyperparameter values through grid search, given the constraints o… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  7. arXiv:2408.14812  [pdf, other

    cs.CV

    HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling

    Authors: Yubin Wang, Xinyang Jiang, De Cheng, Wenli Sun, Dongsheng Li, Cairong Zhao

    Abstract: Prompt learning has become a prevalent strategy for adapting vision-language foundation models (VLMs) such as CLIP to downstream tasks. With the emergence of large language models (LLMs), recent studies have explored the potential of using category-related descriptions to enhance prompt effectiveness. However, conventional descriptions lack explicit structured information necessary to represent th… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 19 pages, 7 figures, 7 tables. arXiv admin note: substantial text overlap with arXiv:2312.06323

  8. arXiv:2408.12632  [pdf

    physics.ao-ph cs.AI

    Generative Diffusion Model-based Downscaling of Observed Sea Surface Height over Kuroshio Extension since 2000

    Authors: Qiuchang Han, Xingliang Jiang, Yang Zhao, Xudong Wang, Zhijin Li, Renhe Zhang

    Abstract: Satellite altimetry has been widely utilized to monitor global sea surface dynamics, enabling investigation of upper ocean variability from basin-scale to localized eddy ranges. However, the sparse spatial resolution of observational altimetry limits our understanding of oceanic submesoscale variability, prevalent at horizontal scales below 0.25o resolution. Here, we introduce a state-of-the-art g… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 28 pages, 7 figures, and 1 table

  9. arXiv:2408.12214  [pdf, other

    cs.AI

    UNCO: Towards Unifying Neural Combinatorial Optimization through Large Language Model

    Authors: Xia Jiang, Yaoxin Wu, Yuan Wang, Yingqian Zhang

    Abstract: Recently, applying neural networks to address combinatorial optimization problems (COPs) has attracted considerable research attention. The prevailing methods always train deep models independently on specific problems, lacking a unified framework for concurrently tackling various COPs. To this end, we propose a unified neural combinatorial optimization (UNCO) framework to solve different types of… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  10. arXiv:2408.12142  [pdf, other

    cs.CL cs.AI

    MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents

    Authors: Congchi Yin, Feng Li, Shu Zhang, Zike Wang, Jun Shao, Piji Li, Jianhua Chen, Xun Jiang

    Abstract: The clinical diagnosis of most mental disorders primarily relies on the conversations between psychiatrist and patient. The creation of such diagnostic conversation datasets is promising to boost the AI mental healthcare community. However, directly collecting the conversations in real diagnosis scenarios is near impossible due to stringent privacy and ethical considerations. To address this issue… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  11. arXiv:2408.11849  [pdf, other

    cs.CL cs.AI eess.AS

    Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation

    Authors: Yinghao Aaron Li, Xilin Jiang, Jordan Darefsky, Ge Zhu, Nima Mesgarani

    Abstract: The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent and contextually relevant dialogues. However, extending these advancements to enable end-to-end speech-to-speech conversation bots remains a formidable challenge, primarily due to the extensive dataset and computational resou… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: CoLM 2024

  12. VrdONE: One-stage Video Visual Relation Detection

    Authors: Xinjie Jiang, Chenxi Zheng, Xuemiao Xu, Bangzhen Liu, Weiying Zheng, Huaidong Zhang, Shengfeng He

    Abstract: Video Visual Relation Detection (VidVRD) focuses on understanding how entities interact over time and space in videos, a key step for gaining deeper insights into video scenes beyond basic visual tasks. Traditional methods for VidVRD, challenged by its complexity, typically split the task into two parts: one for identifying what relation categories are present and another for determining their tem… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 figures, accepted by ACM Multimedia 2024

  13. arXiv:2408.09199  [pdf, other

    cs.IR

    TC-RAG:Turing-Complete RAG's Case study on Medical LLM Systems

    Authors: Xinke Jiang, Yue Fang, Rihong Qiu, Haoyu Zhang, Yongxin Xu, Hao Chen, Wentao Zhang, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang

    Abstract: In the pursuit of enhancing domain-specific Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) emerges as a promising solution to mitigate issues such as hallucinations, outdated knowledge, and limited expertise in highly specialized queries. However, existing approaches to RAG fall short by neglecting system state variables, which are crucial for ensuring adaptive control, retriev… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: version 1.0

  14. arXiv:2408.07673  [pdf

    cs.LG cs.AI cs.NE q-bio.QM

    Deep Learning: a Heuristic Three-stage Mechanism for Grid Searches to Optimize the Future Risk Prediction of Breast Cancer Metastasis Using EHR-based Clinical Data

    Authors: Xia Jiang, Yijun Zhou, Chuhan Xu, Adam Brufsky, Alan Wells

    Abstract: A grid search, at the cost of training and testing a large number of models, is an effective way to optimize the prediction performance of deep learning models. A challenging task concerning grid search is the time management. Without a good time management scheme, a grid search can easily be set off as a mission that will not finish in our lifetime. In this study, we introduce a heuristic three-s… ▽ More

    Submitted 15 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  15. arXiv:2408.07471  [pdf, other

    cs.CL

    Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

    Authors: Yuxin Jiang, Bo Huang, Yufei Wang, Xingshan Zeng, Liangyou Li, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Wei Wang

    Abstract: Direct preference optimization (DPO), a widely adopted offline preference optimization algorithm, aims to align large language models (LLMs) with human-desired behaviors using pairwise preference data. However, the winning response and the losing response within pairwise data are generated isolatedly, leading to weak correlations between them as well as suboptimal alignment performance. To address… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 18 pages, 8 figures, 8 tables, working in progress

  16. arXiv:2408.06741  [pdf, other

    cs.CV

    Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective

    Authors: Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Fuli Feng

    Abstract: With recent generative models facilitating photo-realistic image synthesis, the proliferation of synthetic images has also engendered certain negative impacts on social platforms, thereby raising an urgent imperative to develop effective detectors. Current synthetic image detection (SID) pipelines are primarily dedicated to crafting universal artifact features, accompanied by an oversight about SI… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  17. arXiv:2408.06622  [pdf, other

    cs.CV

    ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding

    Authors: Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao

    Abstract: Video temporal grounding is an emerging topic aiming to identify specific clips within videos. In addition to pre-trained video models, contemporary methods utilize pre-trained vision-language models (VLM) to capture detailed characteristics of diverse scenes and objects from video frames. However, as pre-trained on images, VLM may struggle to distinguish action-sensitive patterns from static obje… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 figures

  18. arXiv:2408.04392  [pdf, other

    cs.CL

    Open-domain Implicit Format Control for Large Language Model Generation

    Authors: Yiqun Yao, Wenjia Ma, Xuezhi Fang, Xin Jiang, Xiang Li, Xuying Meng, Peng Han, Jing Li, Aixin Sun, Yequan Wang

    Abstract: Controlling the format of outputs generated by large language models (LLMs) is a critical functionality in various applications. Current methods typically employ constrained decoding with rule-based automata or fine-tuning with manually crafted format instructions, both of which struggle with open-domain format requirements. To address this limitation, we introduce a novel framework for controlled… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 6 pages

  19. Tackling Noisy Clients in Federated Learning with End-to-end Label Correction

    Authors: Xuefeng Jiang, Sheng Sun, Jia Li, Jingjing Xue, Runhan Li, Zhiyuan Wu, Gang Xu, Yuwei Wang, Min Liu

    Abstract: Recently, federated learning (FL) has achieved wide successes for diverse privacy-sensitive applications without sacrificing the sensitive private information of clients. However, the data quality of client datasets can not be guaranteed since corresponding annotations of different clients often contain complex label noise of varying degrees, which inevitably causes the performance degradation. In… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: To appear in ACM CIKM'24 full research paper track

  20. arXiv:2408.04299  [pdf, other

    cs.CV

    Respiratory Subtraction for Pulmonary Microwave Ablation Evaluation

    Authors: Wan Li, Xinyun Zhong, Wei Li, Song Zhang, Moheng Rong, Yan Xi, Peng Yuan, Zechen Wang, Xiaolei Jiang, Rongxi Yi, Hui Tang, Yang Chen, Chaohui Tong, Zhan Wu, Feng Wang

    Abstract: Currently, lung cancer is a leading cause of global cancer mortality, often necessitating minimally invasive interventions. Microwave ablation (MWA) is extensively utilized for both primary and secondary lung tumors. Although numerous clinical guidelines and standards for MWA have been established, the clinical evaluation of ablation surgery remains challenging and requires long-term patient follo… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  21. arXiv:2408.03297  [pdf, other

    cs.CL cs.AI

    KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models

    Authors: Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao, Runchuan Zhu, Xinke Jiang, Xu Chu, Junfeng Zhao, Yasha Wang

    Abstract: By integrating external knowledge, Retrieval-Augmented Generation (RAG) has become an effective strategy for mitigating the hallucination problems that large language models (LLMs) encounter when dealing with knowledge-intensive tasks. However, in the process of integrating external non-parametric supporting evidence with internal parametric knowledge, inevitable knowledge conflicts may arise, lea… ▽ More

    Submitted 19 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  22. arXiv:2408.01928  [pdf, other

    cs.CL cs.AI cs.IR

    A Semi-supervised Multi-channel Graph Convolutional Network for Query Classification in E-commerce

    Authors: Chunyuan Yuan, Ming Pang, Zheng Fang, Xue Jiang, Changping Peng, Zhangang Lin

    Abstract: Query intent classification is an essential module for customers to find desired products on the e-commerce application quickly. Most existing query intent classification methods rely on the users' click behavior as a supervised signal to construct training samples. However, these methods based entirely on posterior labels may lead to serious category imbalance problems because of the Matthew effe… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by WWW2024

  23. arXiv:2408.01319  [pdf, other

    cs.AI

    A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

    Authors: Jiaqi Wang, Hanqi Jiang, Yiheng Liu, Chong Ma, Xu Zhang, Yi Pan, Mengyuan Liu, Peiran Gu, Sichen Xia, Wenjun Li, Yutong Zhang, Zihao Wu, Zhengliang Liu, Tianyang Zhong, Bao Ge, Tuo Zhang, Ning Qiang, Xintao Hu, Xi Jiang, Xin Zhang, Wei Zhang, Dinggang Shen, Tianming Liu, Shu Zhang

    Abstract: In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  24. arXiv:2408.00799  [pdf, other

    cs.IR cs.LG stat.ML

    Deep Uncertainty-Based Explore for Index Construction and Retrieval in Recommendation System

    Authors: Xin Jiang, Kaiqiang Wang, Yinlong Wang, Fengchang Lv, Taiyang Peng, Shuai Yang, Xianteng Wu, Pengye Zhang, Shuo Yuan, Yifan Zeng

    Abstract: In recommendation systems, the relevance and novelty of the final results are selected through a cascade system of Matching -> Ranking -> Strategy. The matching model serves as the starting point of the pipeline and determines the upper bound of the subsequent stages. Balancing the relevance and novelty of matching results is a crucial step in the design and optimization of recommendation systems,… ▽ More

    Submitted 5 August, 2024; v1 submitted 21 July, 2024; originally announced August 2024.

    Comments: accepted by cikm2024

  25. arXiv:2407.21050  [pdf

    cs.CL

    Artificial Intelligence in Extracting Diagnostic Data from Dental Records

    Authors: Yao-Shun Chuang, Chun-Teh Lee, Oluwabunmi Tokede, Guo-Hao Lin, Ryan Brandon, Trung Duong Tran, Xiaoqian Jiang, Muhammad F. Walji

    Abstract: This research addresses the issue of missing structured data in dental records by extracting diagnostic information from unstructured text. The updated periodontology classification system's complexity has increased incomplete or missing structured diagnoses. To tackle this, we use advanced AI and NLP methods, leveraging GPT-4 to generate synthetic notes for fine-tuning a RoBERTa model. This signi… ▽ More

    Submitted 12 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: 11 pages, 2 tables, 3 figures, under review

  26. arXiv:2407.19311  [pdf, other

    cs.LG cs.SI

    Can Modifying Data Address Graph Domain Adaptation?

    Authors: Renhong Huang, Jiarong Xu, Xin Jiang, Ruichuan An, Yang Yang

    Abstract: Graph neural networks (GNNs) have demonstrated remarkable success in numerous graph analytical tasks. Yet, their effectiveness is often compromised in real-world scenarios due to distribution shifts, limiting their capacity for knowledge transfer across changing environments or domains. Recently, Unsupervised Graph Domain Adaptation (UGDA) has been introduced to resolve this issue. UGDA aims to fa… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  27. arXiv:2407.18449  [pdf, other

    eess.IV cs.CV cs.LG

    Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

    Authors: Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin, Xinrui Jiang, Anjia Han, Li Liang, Ronald Cheong Kin Chan, Jiguang Wang, Kwang-Ting Cheng, Hao Chen

    Abstract: Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear.… ▽ More

    Submitted 3 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Report number: I.2.10

  28. arXiv:2407.16166  [pdf

    cs.CL

    Robust Privacy Amidst Innovation with Large Language Models Through a Critical Assessment of the Risks

    Authors: Yao-Shun Chuang, Atiquer Rahman Sarkar, Noman Mohammed, Xiaoqian Jiang

    Abstract: This study examines integrating EHRs and NLP with large language models (LLMs) to improve healthcare data management and patient care. It focuses on using advanced models to create secure, HIPAA-compliant synthetic patient notes for biomedical research. The study used de-identified and re-identified MIMIC III datasets with GPT-3.5, GPT-4, and Mistral 7B to generate synthetic notes. Text generation… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 13 pages, 4 figures, 1 table, 1 supplementary, under review

  29. arXiv:2407.13768  [pdf, other

    cs.CV cs.AI

    Addressing Imbalance for Class Incremental Learning in Medical Image Classification

    Authors: Xuze Hao, Wenqian Ni, Xuhao Jiang, Weimin Tan, Bo Yan

    Abstract: Deep convolutional neural networks have made significant breakthroughs in medical image classification, under the assumption that training samples from all classes are simultaneously available. However, in real-world medical scenarios, there's a common need to continuously learn about new diseases, leading to the emerging field of class incremental learning (CIL) in the medical domain. Typically,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  30. arXiv:2407.13761  [pdf, other

    cs.CV

    SegPoint: Segment Any Point Cloud via Large Language Model

    Authors: Shuting He, Henghui Ding, Xudong Jiang, Bihan Wen

    Abstract: Despite significant progress in 3D point cloud segmentation, existing methods primarily address specific tasks and depend on explicit instructions to identify targets, lacking the capability to infer and understand implicit user intentions in a unified framework. In this work, we propose a model, called SegPoint, that leverages the reasoning capabilities of a multi-modal Large Language Model (LLM)… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project Page: https://heshuting555.github.io/SegPoint

  31. arXiv:2407.13584  [pdf, other

    cs.CV

    Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation

    Authors: Zongrui Li, Minghui Hu, Qian Zheng, Xudong Jiang

    Abstract: Although recent advancements in text-to-3D generation have significantly improved generation quality, issues like limited level of detail and low fidelity still persist, which requires further improvement. To understand the essence of those issues, we thoroughly analyze current score distillation methods by connecting theories of consistency distillation to score distillation. Based on the insight… ▽ More

    Submitted 20 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Paper accepted by ECCV2024

  32. arXiv:2407.12705  [pdf, other

    cs.CV

    IMAGDressing-v1: Customizable Virtual Dressing

    Authors: Fei Shen, Xin Jiang, Xin He, Hu Ye, Cong Wang, Xiaoyu Du, Zechao Li, Jinhui Tang

    Abstract: Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional faces, poses, and scenes. To address this issue, we… ▽ More

    Submitted 6 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  33. arXiv:2407.12025  [pdf, other

    cs.HC cs.AI

    LLM4DESIGN: An Automated Multi-Modal System for Architectural and Environmental Design

    Authors: Ran Chen, Xueqi Yao, Xuhui Jiang

    Abstract: This study introduces LLM4DESIGN, a highly automated system for generating architectural and environmental design proposals. LLM4DESIGN, relying solely on site conditions and design requirements, employs Multi-Agent systems to foster creativity, Retrieval Augmented Generation (RAG) to ground designs in realism, and Visual Language Models (VLM) to synchronize all information. This system resulting… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  34. arXiv:2407.11882  [pdf, other

    cs.CR

    Enhancing Covert Communication in Relay Systems Using Multi-Antenna Technique

    Authors: He Zhu, Huihui Wu, Wei Su, Xiaohong Jiang

    Abstract: This paper exploits the multi-antenna technique to enhance the covert communication performance in a relay system, where a source S conducts covert communication with a destination D via a relay R, subjecting to the detections of transmissions in the two hops from a single-antenna warden W. To demonstrate the performance gain from adopting the multi-antenna technique, we first consider the scenari… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  35. arXiv:2407.11325  [pdf, other

    cs.CV

    VISA: Reasoning Video Object Segmentation via Large Language Models

    Authors: Cilin Yan, Haochen Wang, Shilin Yan, Xiaolong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves

    Abstract: Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as categories, masks, or short phrases, restricting their ability to perform complex video segmentation requiring reasoning with world knowledge. In this paper, we introduce a new task, Reasoning Video Object Segmentation (ReasonVOS). This task aims to generate a sequence of segmentation masks in response to implic… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  36. arXiv:2407.10805  [pdf, other

    cs.CL cs.AI

    Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval

    Authors: Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Jian Guo

    Abstract: Retrieval-augmented generation (RAG) has significantly advanced large language models (LLMs) by enabling dynamic information retrieval to mitigate knowledge gaps and hallucinations in generated content. However, these systems often falter with complex reasoning and consistency across diverse queries. In this work, we present Think-on-Graph 2.0, an enhanced RAG framework that aligns questions with… ▽ More

    Submitted 6 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  37. arXiv:2407.09732  [pdf, other

    eess.AS cs.LG cs.SD

    Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

    Authors: Xilin Jiang, Yinghao Aaron Li, Adrian Nicolas Florea, Cong Han, Nima Mesgarani

    Abstract: It is too early to conclude that Mamba is a better alternative to transformers for speech before comparing Mamba with transformers in terms of both performance and efficiency in multiple speech-related tasks. To reach this conclusion, we propose and evaluate three models for three tasks: Mamba-TasNet for speech separation, ConMamba for speech recognition, and VALL-M for speech synthesis. We compar… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  38. arXiv:2407.07084  [pdf, other

    cs.LG math.OC

    Stabilized Proximal-Point Methods for Federated Optimization

    Authors: Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

    Abstract: In developing efficient optimization algorithms, it is crucial to account for communication constraints -- a significant challenge in modern federated learning settings. The best-known communication complexity among non-accelerated algorithms is achieved by DANE, a distributed proximal-point algorithm that solves local subproblems in each iteration and that can exploit second-order similarity amon… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  39. arXiv:2407.05758  [pdf, other

    eess.IV cs.AI cs.CV

    Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

    Authors: Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang

    Abstract: Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  40. arXiv:2407.05619  [pdf, other

    cs.RO eess.SY

    AIRA: A Low-cost IR-based Approach Towards Autonomous Precision Drone Landing and NLOS Indoor Navigation

    Authors: Yanchen Liu, Minghui Zhao, Kaiyuan Hou, Junxi Xia, Charlie Carver, Stephen Xia, Xia Zhou, Xiaofan Jiang

    Abstract: Automatic drone landing is an important step for achieving fully autonomous drones. Although there are many works that leverage GPS, video, wireless signals, and active acoustic sensing to perform precise landing, autonomous drone landing remains an unsolved challenge for palm-sized microdrones that may not be able to support the high computational requirements of vision, wireless, or active audio… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  41. arXiv:2407.05017  [pdf, other

    cs.RO

    VIPS-Odom: Visual-Inertial Odometry Tightly-coupled with Parking Slots for Autonomous Parking

    Authors: Xuefeng Jiang, Fangyuan Wang, Rongzhang Zheng, Han Liu, Yixiong Huo, Jinzhang Peng, Lu Tian, Emad Barsoum

    Abstract: Precise localization is of great importance for autonomous parking task since it provides service for the downstream planning and control modules, which significantly affects the system performance. For parking scenarios, dynamic lighting, sparse textures, and the instability of global positioning system (GPS) signals pose challenges for most traditional localization methods. To address these diff… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: A SLAM Method for Autonomous Parking

  42. arXiv:2407.03106  [pdf, other

    cs.CV

    Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric

    Authors: Xiruo Jiang, Yazhou Yao, Xili Dai, Fumin Shen, Xian-Sheng Hua, Heng-Tao Shen

    Abstract: Deep metric learning (DML) aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. Prior literature predominantly focuses on pair-based and proxy-based methods to maximize inter-class discrepancy and minimize intra-class diversity. However, these methods tend to suffer from the collapse of the embedding space due to their… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accepted by IEEE Transactions on Multimedia

  43. arXiv:2407.02960  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets

    Authors: Ahmed Frikha, Nassim Walha, Ricardo Mendes, Krishna Kanth Nakka, Xue Jiang, Xuebing Zhou

    Abstract: This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity on the confidential/private data of another data owner entity, in a way that ensures the confidentiality of both the model and the data. Hereby, the finetuning is conducted offsite, i.e., on the computation infrastructure of a third-party cloud provi… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Preprint

  44. arXiv:2407.02956  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization

    Authors: Ahmed Frikha, Nassim Walha, Krishna Kanth Nakka, Ricardo Mendes, Xue Jiang, Xuebing Zhou

    Abstract: In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while keeping the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a redu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Preprint

  45. arXiv:2407.02943  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

    Authors: Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, Xuebing Zhou

    Abstract: The latest and most impactful advances in large models stem from their increased size. Unfortunately, this translates into an improved memorization capacity, raising data privacy concerns. Specifically, it has been shown that models can output personal identifiable information (PII) contained in their training data. However, reported PIII extraction performance varies widely, and there is no conse… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024

  46. arXiv:2407.02783  [pdf, ps, other

    cs.CL cs.AI

    52B to 1T: Lessons Learned via Tele-FLM Series

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: For the Tele-FLM-52B tech report, see also 2404.16645

  47. arXiv:2407.02768  [pdf, other

    cs.CV

    Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation

    Authors: Tao Chen, XiRuo Jiang, Gensheng Pei, Zeren Sun, Yucheng Wang, Yazhou Yao

    Abstract: Though adversarial erasing has prevailed in weakly supervised semantic segmentation to help activate integral object regions, existing approaches still suffer from the dilemma of under-activation and over-expansion due to the difficulty in determining when to stop erasing. In this paper, we propose a \textbf{K}nowledge \textbf{T}ransfer with \textbf{S}imulated Inter-Image \textbf{E}rasing (KTSE) a… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: accepted by the European Conference on Computer Vision (ECCV), 2024

  48. arXiv:2407.02476  [pdf, other

    cs.LG stat.ML

    Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference

    Authors: Xiaoyu Jiang, Sokratia Georgaka, Magnus Rattray, Mauricio A. Alvarez

    Abstract: The Multi-Output Gaussian Process is is a popular tool for modelling data from multiple sources. A typical choice to build a covariance function for a MOGP is the Linear Model of Coregionalization (LMC) which parametrically models the covariance between outputs. The Latent Variable MOGP (LV-MOGP) generalises this idea by modelling the covariance between outputs using a kernel applied to latent var… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: none

  49. arXiv:2407.02031  [pdf, other

    cs.DC cs.AI cs.LG

    SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules

    Authors: Suyi Li, Lingyun Yang, Xiaoxiao Jiang, Hanfeng Lu, Zhipeng Di, Weiyi Lu, Jiawei Chen, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang

    Abstract: This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generatin… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  50. arXiv:2407.00708  [pdf, other

    cs.LG

    Heterogeneous Graph Contrastive Learning with Spectral Augmentation

    Authors: Jing Zhang, Xiaoqian Jiang, Yingjie Xie, Cangqi Zhou

    Abstract: Heterogeneous graphs can well describe the complex entity relationships in the real world. For example, online shopping networks contain multiple physical types of consumers and products, as well as multiple relationship types such as purchasing and favoriting. More and more scholars pay attention to this research because heterogeneous graph representation learning shows strong application potenti… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.