Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 190 results for author: Tian, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chen Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2407.03772  [pdf, other

    eess.IV cs.CV q-bio.QM

    CS3: Cascade SAM for Sperm Segmentation

    Authors: Yi Shi, Xu-Peng Tian, Yun-Kai Wang, Tie-Yi Zhang, Bin Yao, Hui Wang, Yong Shao, Cen-Cen Wang, Rong Zeng, De-Chuan Zhan

    Abstract: Automated sperm morphology analysis plays a crucial role in the assessment of male fertility, yet its efficacy is often compromised by the challenges in accurately segmenting sperm images. Existing segmentation techniques, including the Segment Anything Model(SAM), are notably inadequate in addressing the complex issue of sperm overlap-a frequent occurrence in clinical samples. Our exploratory stu… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  3. arXiv:2406.18585  [pdf, other

    cs.CV cs.AI

    Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition

    Authors: Lin Zuo, Kunshan Yang, Xianlong Tian, Kunbin He, Yongqi Ding, Mengmeng Jing

    Abstract: Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise fro… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: under review

  4. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, Jingyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  5. arXiv:2406.13340  [pdf, other

    cs.CL cs.SD eess.AS

    SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

    Authors: Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu

    Abstract: Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, includin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.11890  [pdf, other

    cs.LG cs.AI cs.CL

    Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

    Authors: Hui Liu, Wenya Wang, Hao Sun, Chris Xing Tian, Chenqi Kong, Xin Dong, Haoliang Li

    Abstract: Large Language Models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities from few-shot demonstration exemplars. While recent learning-based demonstration selection methods have proven beneficial to ICL by choosing more useful exemplars, their underlying mechanisms are opaque, hindering efforts to address limitations such as high training costs and poor generalization across… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2406.08782  [pdf, other

    eess.IV cs.CV

    Hybrid Spatial-spectral Neural Network for Hyperspectral Image Denoising

    Authors: Hao Liang, Chengjie, Kun Li, Xin Tian

    Abstract: Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  8. arXiv:2406.08343  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver

    Authors: Hegan Chen, Jichang Yang, Jia Chen, Songqi Wang, Shaocong Wang, Dingchen Wang, Xinyu Tian, Yifei Yu, Xi Chen, Yinan Lin, Yangu He, Xiaoshan Wu, Yi Li, Xinyuan Zhang, Ning Lin, Meng Xu, Yi Li, Xumeng Zhang, Zhongrui Wang, Han Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for developing digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underl… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  9. arXiv:2405.18955  [pdf, other

    cs.CV

    RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision

    Authors: Jinzhong Wang, Xuetao Tian, Shun Dai, Tao Zhuo, Haorui Zeng, Hongjuan Liu, Jiaqi Liu, Xiuwei Zhang, Yanning Zhang

    Abstract: Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Atte… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  10. arXiv:2405.18203  [pdf, other

    cs.CL

    IAPT: Instruction-Aware Prompt Tuning for Large Language Models

    Authors: Wei Zhu, Aaron Xuxiang Tian, Congrui Yin, Yuan Ni, Xiaoling Wang, Guotong Xie

    Abstract: Soft prompt tuning is a widely studied parameter-efficient fine-tuning method. However, it has a clear drawback: many soft tokens must be inserted into the input sequences to guarantee downstream performance. As a result, soft prompt tuning is less considered than Low-rank adaptation (LoRA) in the large language modeling (LLM) era. In this work, we propose a novel prompt tuning method, Instruction… ▽ More

    Submitted 7 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL-2024

  11. arXiv:2405.16837  [pdf, ps, other

    stat.ML cs.LG

    Enhancing Accuracy in Generative Models via Knowledge Transfer

    Authors: Xinyu Tian, Xiaotong Shen

    Abstract: This paper investigates the accuracy of generative models and the impact of knowledge transfer on their generation precision. Specifically, we examine a generative model for a target task, fine-tuned using a pre-trained model from a source task. Building on the "Shared Embedding" concept, which bridges the source and target tasks, we introduce a novel framework for transfer learning under distribu… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  12. arXiv:2405.13350  [pdf, other

    cs.CL cs.LG

    Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages

    Authors: Corinne Aars, Lauren Adams, Xiaokan Tian, Zhaoyu Wang, Colton Wismer, Jason Wu, Pablo Rivas, Korn Sooksatra, Matthew Fendt

    Abstract: This study presents the development and evaluation of a ByT5-based multilingual translation model tailored for translating the Bible into underrepresented languages. Utilizing the comprehensive Johns Hopkins University Bible Corpus, we trained the model to capture the intricate nuances of character-based and morphologically rich languages. Our results, measured by the BLEU score and supplemented w… ▽ More

    Submitted 30 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: LXAI Workshop at the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)

    ACM Class: I.2.7

  13. arXiv:2405.05702  [pdf, other

    cs.RO

    NGM-SLAM: Gaussian Splatting SLAM with Radiance Field Submap

    Authors: Mingrui Li, Jingwei Huang, Lei Sun, Aaron Xuxiang Tian, Tianchen Deng, Hongyu Wang

    Abstract: SLAM systems based on Gaussian Splatting have garnered attention due to their capabilities for rapid real-time rendering and high-fidelity mapping. However, current Gaussian Splatting SLAM systems usually struggle with large scene representation and lack effective loop closure detection. To address these issues, we introduce NGM-SLAM, the first 3DGS based SLAM system that utilizes neural radiance… ▽ More

    Submitted 28 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: 9pages, 4 figures

  14. arXiv:2405.02807  [pdf

    cs.LG cs.AI cs.CV

    Kinematic analysis of structural mechanics based on convolutional neural network

    Authors: Leye Zhang, Xiangxiang Tian, Hongjun Zhang

    Abstract: Attempt to use convolutional neural network to achieve kinematic analysis of plane bar structure. Through 3dsMax animation software and OpenCV module, self-build image dataset of geometrically stable system and geometrically unstable system. we construct and train convolutional neural network model based on the TensorFlow and Keras deep learning platform framework. The model achieves 100% accuracy… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 9 pages, 13 figures

  15. arXiv:2405.01189  [pdf, other

    cs.LG cs.AI

    Gradient-Congruity Guided Federated Sparse Training

    Authors: Chris Xing Tian, Yibing Liu, Haoliang Li, Ray C. C. Cheung, Shiqi Wang

    Abstract: Edge computing allows artificial intelligence and machine learning models to be deployed on edge devices, where they can learn from local data and collaborate to form a global model. Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy. However, FL also faces challenges such as high computational and communication costs reg… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  16. arXiv:2404.17890  [pdf, other

    eess.IV cs.AI cs.CV

    DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction

    Authors: Chenhe Du, Xiyue Lin, Qing Wu, Xuanyu Tian, Ying Su, Zhe Luo, Hongjiang Wei, S. Kevin Zhou, Jingyi Yu, Yuyao Zhang

    Abstract: Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging recon… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 15 pages, 10 figures

    ACM Class: I.2.10; I.4.5

  17. arXiv:2404.12759  [pdf, other

    cs.LG

    decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points

    Authors: Yi Guo, Fanliu Kong, Xiaoyang Li, Hui Li, Wei Chen, Xiaogang Tian, Jinping Cai, Yang Zhang, Shouda Liu

    Abstract: Quantization emerges as one of the most promising compression technologies for deploying efficient large models for various real time application in recent years. Considering that the storage and IO of weights take up the vast majority of the overhead inside a large model, weight only quantization can lead to large gains. However, existing quantization schemes suffer from significant accuracy degr… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: quantization for deep models

  18. arXiv:2404.03523  [pdf

    cs.CE

    Integrating Generative AI into Financial Market Prediction for Improved Decision Making

    Authors: Chang Che, Zengyi Huang, Chen Li, Haotian Zheng, Xinyu Tian

    Abstract: This study provides an in-depth analysis of the model architecture and key technologies of generative artificial intelligence, combined with specific application cases, and uses conditional generative adversarial networks ( cGAN ) and time series analysis methods to simulate and predict dynamic changes in financial markets. The research results show that the cGAN model can effectively capture the… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  19. arXiv:2404.02082  [pdf, other

    cs.CV

    WcDT: World-centric Diffusion Transformer for Traffic Scene Generation

    Authors: Chen Yang, Aaron Xuxiang Tian, Dong Chen, Tianyu Shi, Arsalan Heydarian

    Abstract: In this paper, we introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models (a.k.a., diffusion models) and transformers. Our proposed framework, termed the "World-Centric Diffusion Transformer" (WcDT), optimizes the entire trajectory generation process, from feature extraction to model inference. To enhance t… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 12 pages, 6 figures

  20. arXiv:2403.16187  [pdf, other

    cs.CL

    ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

    Authors: Zequan Liu, Jiawen Lyn, Wei Zhu, Xing Tian, Yvette Graham

    Abstract: Parameter-efficient fine-tuning (PEFT) is widely studied for its effectiveness and efficiency in the era of large language models. Low-rank adaptation (LoRA) has demonstrated commendable performance as a popular and representative method. However, it is implemented with a fixed intrinsic rank that might not be the ideal setting for the downstream tasks. Recognizing the need for more flexible downs… ▽ More

    Submitted 15 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted by NAACL-2024

  21. arXiv:2403.09996  [pdf, other

    cs.CV

    MEDPNet: Achieving High-Precision Adaptive Registration for Complex Die Castings

    Authors: Yu Du, Yu Song, Ce Guo, Xiaojing Tian, Dong Liu, Ming Cong

    Abstract: Due to their complex spatial structure and diverse geometric features, achieving high-precision and robust point cloud registration for complex Die Castings has been a significant challenge in the die-casting industry. Existing point cloud registration methods primarily optimize network models using well-established high-quality datasets, often neglecting practical application in real scenarios. T… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  22. arXiv:2403.09412  [pdf, other

    cs.CV cs.AI cs.RO

    OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

    Authors: Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue

    Abstract: Environment representations endowed with sophisticated semantics are pivotal for facilitating seamless interaction between robots and humans, enabling them to effectively carry out various tasks. Open-vocabulary maps, powered by Visual-Language models (VLMs), possess inherent advantages, including zero-shot learning and support for open-set classes. However, existing open-vocabulary maps are prima… ▽ More

    Submitted 28 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  23. arXiv:2403.05801  [pdf, other

    cs.AI

    Enhancing Multi-Hop Knowledge Graph Reasoning through Reward Shaping Techniques

    Authors: Chen Li, Haotian Zheng, Yiping Sun, Cangqing Wang, Liqiang Yu, Che Chang, Xinyu Tian, Bo Liu

    Abstract: In the realm of computational knowledge representation, Knowledge Graph Reasoning (KG-R) stands at the forefront of facilitating sophisticated inferential capabilities across multifarious domains. The quintessence of this research elucidates the employment of reinforcement learning (RL) strategies, notably the REINFORCE algorithm, to navigate the intricacies inherent in multi-hop KG-R. This invest… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted by the 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT 2024)

  24. arXiv:2402.14430  [pdf, other

    cs.LG

    Robust Training of Federated Models with Extremely Label Deficiency

    Authors: Yonggang Zhang, Zhiqin Yang, Xinmei Tian, Nannan Wang, Tongliang Liu, Bo Han

    Abstract: Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. Advanced FSSL methods predominantly focus on training a single model on each client. However, this approach could lead to a discrepancy between the objective functions of labeled and unlabeled data, resulting in gradient con… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: ICLR 2024, 22 pages

  25. arXiv:2402.14155  [pdf, other

    cs.CL cs.AI

    Can Similarity-Based Domain-Ordering Reduce Catastrophic Forgetting for Intent Recognition?

    Authors: Amogh Mannekote, Xiaoyi Tian, Kristy Elizabeth Boyer, Bonnie J. Dorr

    Abstract: Task-oriented dialogue systems are expected to handle a constantly expanding set of intents and domains even after they have been deployed to support more and more functionalities. To live up to this expectation, it becomes critical to mitigate the catastrophic forgetting problem (CF) that occurs in continual learning (CL) settings for a task such as intent recognition. While existing dialogue sys… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  26. arXiv:2402.12289  [pdf, other

    cs.CV

    DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

    Authors: Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

    Abstract: A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of reasoning modules for scen… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Project Page: https://tsinghua-mars-lab.github.io/DriveVLM/

  27. arXiv:2402.11778  [pdf, other

    cs.LG cs.AI

    Towards Theoretical Understandings of Self-Consuming Generative Models

    Authors: Shi Fu, Sen Zhang, Yingjie Wang, Xinmei Tian, Dacheng Tao

    Abstract: This paper tackles the emerging challenge of training generative models within a self-consuming loop, wherein successive generations of models are recursively trained on mixtures of real and synthetic data from previous generations. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models, including parametric a… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted at ICML 2024

  28. arXiv:2402.07011  [pdf, other

    cs.LG cs.AI cs.DC

    FedImpro: Measuring and Improving Client Update in Federated Learning

    Authors: Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang Liu, Bo Han, Xiaowen Chu

    Abstract: Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved… ▽ More

    Submitted 14 March, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

  29. arXiv:2401.12264  [pdf, other

    eess.AS cs.MM cs.SD eess.IV

    CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

    Authors: Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li

    Abstract: There has been a long-standing quest for a unified audio-visual-text model to enable various multimodal understanding tasks, which mimics the listening, seeing and reading process of human beings. Humans tends to represent knowledge using two separate systems: one for representing verbal (textual) information and one for representing non-verbal (visual and auditory) information. These two systems… ▽ More

    Submitted 21 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  30. arXiv:2401.02777  [pdf, other

    cs.CL cs.AI

    From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models

    Authors: Na Liu, Liangyu Chen, Xiaoyu Tian, Wei Zou, Kaijiang Chen, Ming Cui

    Abstract: This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. I… ▽ More

    Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  31. arXiv:2401.02034  [pdf, other

    cs.CL

    Text2MDT: Extracting Medical Decision Trees from Medical Texts

    Authors: Wei Zhu, Wenfeng Li, Xing Tian, Pengfei Wang, Xiaoling Wang, Jin Chen, Yuanbin Wu, Yuan Ni, Guotong Xie

    Abstract: Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and laborious manual annotation. In this work, we propose a novel task, Text2MDT, to explore the automatic extraction of MDTs from medical texts such as medical guidelin… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  32. arXiv:2312.13305  [pdf, other

    cs.CV

    DVIS++: Improved Decoupled Framework for Universal Video Segmentation

    Authors: Tao Zhang, Xingye Tian, Yikang Zhou, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu Wu

    Abstract: We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS). Unlike previous methods that model video segmentation in an end-to-end manner, our approach decouples video segmentat… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  33. arXiv:2312.11413  [pdf, other

    cs.LG cs.AI

    DeRDaVa: Deletion-Robust Data Valuation for Machine Learning

    Authors: Xiao Tian, Rachael Hwee Ling Sim, Jue Fan, Bryan Kian Hsiang Low

    Abstract: Data valuation is concerned with determining a fair valuation of data from data sources to compensate them or to identify training examples that are the most or least useful for predictions. With the rising interest in personal data ownership and data protection regulations, model owners will likely have to fulfil more data deletion requests. This raises issues that have not been addressed by exis… ▽ More

    Submitted 21 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  34. arXiv:2312.07514  [pdf, other

    cs.RO

    Integrated and Lightweight Design of Electro-hydraulic Ankle Prosthesis

    Authors: Yi Wei, Xingjian Wang, Xinyu Tian, Shaoping Wang, Rujun Jia

    Abstract: For lower limb amputees, an active ankle joint prosthesis can provide basic mobility functions. This study focuses on an ankle joint prosthesis system based on the principle of electric-hydraulic actuation. By analyzing the characteristics of human gait cycles and the mechanics of ankle joint movement, a lightweight and integrated ankle joint prosthesis is designed, considering the requirements fo… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 8 pages, 21 figures, conference

  35. arXiv:2312.04877  [pdf, other

    cs.CL cs.DB

    Generating Explanations to Understand and Repair Embedding-based Entity Alignment

    Authors: Xiaobin Tian, Zequn Sun, Wei Hu

    Abstract: Entity alignment (EA) seeks identical entities in different knowledge graphs, which is a long-standing task in the database research. Recent work leverages deep learning to embed entities in vector space and align them via nearest neighbor search. Although embedding-based EA has gained marked success in recent years, it lacks explanations for alignment decisions. In this paper, we present the firs… ▽ More

    Submitted 21 March, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: Accepted in the 40th IEEE International Conference on Data Engineering (ICDE 2024)

  36. arXiv:2312.01598  [pdf, other

    cs.CV

    Good Questions Help Zero-Shot Image Reasoning

    Authors: Kaiwen Yang, Tao Shen, Xinmei Tian, Xiubo Geng, Chongyang Tao, Dacheng Tao, Tianyi Zhou

    Abstract: Aligning the recent large language models (LLMs) with computer vision models leads to large vision-language models (LVLMs), which have paved the way for zero-shot image reasoning tasks. However, LVLMs are usually trained on short high-level captions only referring to sparse focus regions in images. Such a ``tunnel vision'' limits LVLMs to exploring other relevant contexts in complex scenes. To add… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

  37. arXiv:2311.16494  [pdf, other

    cs.CV

    ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models

    Authors: Xinyu Tian, Shu Zou, Zhaoyuan Yang, Jing Zhang

    Abstract: Although soft prompt tuning is effective in efficiently adapting Vision-Language (V&L) models for downstream tasks, it shows limitations in dealing with distribution shifts. We address this issue with Attribute-Guided Prompt Tuning (ArGue), making three key contributions. 1) In contrast to the conventional approach of directly appending soft prompts preceding class names, we align the model with p… ▽ More

    Submitted 12 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to CVPR2024

  38. arXiv:2311.10902  [pdf, other

    eess.IV cs.CV

    OCT2Confocal: 3D CycleGAN based Translation of Retinal OCT Images to Confocal Microscopy

    Authors: Xin Tian, Nantheera Anantrasirichai, Lindsay Nicholson, Alin Achim

    Abstract: Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, each presenting unique benefits and limitations. In-vivo OCT offers rapid, non-invasive imaging but can be hampered by clarity issues and motion artifacts. Ex-vivo confocal microscopy provides high-resolution, cellular detailed color images but is invasive and poses ethical concerns and potential tissue dama… ▽ More

    Submitted 16 February, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: 4pages, 5 figures

  39. arXiv:2311.07203  [pdf, other

    quant-ph cs.AI physics.optics

    Optical Quantum Sensing for Agnostic Environments via Deep Learning

    Authors: Zeqiao Zhou, Yuxuan Du, Xu-Fei Yin, Shanshan Zhao, Xinmei Tian, Dacheng Tao

    Abstract: Optical quantum sensing promises measurement precision beyond classical sensors termed the Heisenberg limit (HL). However, conventional methodologies often rely on prior knowledge of the target system to achieve HL, presenting challenges in practical applications. Addressing this limitation, we introduce an innovative Deep Learning-based Quantum Sensing scheme (DQS), enabling optical quantum senso… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  40. arXiv:2310.18075  [pdf, other

    cs.CL cs.AI

    DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking

    Authors: Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui

    Abstract: Inspired by the dual-process theory of human cognition, we introduce DUMA, a novel conversational agent framework that embodies a dual-mind mechanism through the utilization of two generative Large Language Models (LLMs) dedicated to fast and slow thinking respectively. The fast thinking model serves as the primary interface for external interactions and initial response generation, evaluating the… ▽ More

    Submitted 24 November, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

  41. arXiv:2310.15097  [pdf, other

    cs.LG

    A Canonical Data Transformation for Achieving Inter- and Within-group Fairness

    Authors: Zachary McBride Lazri, Ivan Brugere, Xin Tian, Dana Dachman-Soled, Antigoni Polychroniadou, Danial Dervovic, Min Wu

    Abstract: Increases in the deployment of machine learning algorithms for applications that deal with sensitive data have brought attention to the issue of fairness in machine learning. Many works have been devoted to applications that require different demographic groups to be treated fairly. However, algorithms that aim to satisfy inter-group fairness (also called group fairness) may inadvertently treat in… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  42. arXiv:2310.09625  [pdf, other

    eess.IV cs.CV

    JSMoCo: Joint Coil Sensitivity and Motion Correction in Parallel MRI with a Self-Calibrating Score-Based Diffusion Model

    Authors: Lixuan Chen, Xuanyu Tian, Jiangjie Wu, Ruimin Feng, Guoyan Lao, Yuyao Zhang, Hongjiang Wei

    Abstract: Magnetic Resonance Imaging (MRI) stands as a powerful modality in clinical diagnosis. However, it is known that MRI faces challenges such as long acquisition time and vulnerability to motion-induced artifacts. Despite the success of many existing motion correction algorithms, there has been limited research focused on correcting motion artifacts on the estimated coil sensitivity maps for fast MRI… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: 10 pages,8 figures, journal

  43. arXiv:2310.05077  [pdf, other

    cs.LG

    FedFed: Feature Distillation against Data Heterogeneity in Federated Learning

    Authors: Zhiqin Yang, Yonggang Zhang, Yu Zheng, Xinmei Tian, Hao Peng, Tongliang Liu, Bo Han

    Abstract: Federated learning (FL) typically faces data heterogeneity, i.e., distribution shifting among clients. Sharing clients' information has shown great potentiality in mitigating data heterogeneity, yet incurs a dilemma in preserving privacy and promoting model performance. To alleviate the dilemma, we raise a fundamental question: \textit{Is it possible to share partial features in the data to tackle… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 32 pages

    Journal ref: NeurIPS 2023

  44. arXiv:2310.01603  [pdf, other

    cs.CL cs.AI cs.HC

    A Review of Digital Learning Environments for Teaching Natural Language Processing in K-12 Education

    Authors: Xiaoyi Tian, Kristy Elizabeth Boyer

    Abstract: Natural Language Processing (NLP) plays a significant role in our daily lives and has become an essential part of Artificial Intelligence (AI) education in K-12. As children grow up with NLP-powered applications, it is crucial to introduce NLP concepts to them, fostering their understanding of language processing, language generation, and ethical implications of AI and NLP. This paper presents a c… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 24 pages, 13 figures

  45. arXiv:2309.08375  [pdf, other

    cs.LG cs.CY

    Boosting Fair Classifier Generalization through Adaptive Priority Reweighing

    Authors: Zhihao Hu, Yiran Xu, Mengnan Du, Jindong Gu, Xinmei Tian, Fengxiang He

    Abstract: With the increasing penetration of machine learning applications in critical decision-making areas, calls for algorithmic fairness are more prominent. Although there have been various modalities to improve algorithmic fairness through learning with fairness constraints, their performance does not generalize well in the test set. A performance-promising fair algorithm with better generalizability i… ▽ More

    Submitted 20 May, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

  46. arXiv:2308.14392  [pdf, other

    cs.CV

    1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation

    Authors: Tao Zhang, Xingye Tian, Yikang Zhou, Yu Wu, Shunping Ji, Cilin Yan, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

    Abstract: Video instance segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving. In this report, we present further improvements to the SOTA VIS method, DVIS. First, we introduce a denoising training strategy for the trainable tracker, allowing it to achieve more stable and accurate object tracking in complex and… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  47. arXiv:2308.09708  [pdf, other

    cs.CV

    Training with Product Digital Twins for AutoRetail Checkout

    Authors: Yue Yao, Xinyu Tian, Zheng Tang, Sujit Biswas, Huan Lei, Tom Gedeon, Liang Zheng

    Abstract: Automating the checkout process is important in smart retail, where users effortlessly pass products by hand through a camera, triggering automatic product detection, tracking, and counting. In this emerging area, due to the lack of annotated training data, we introduce a dataset comprised of product 3D models, which allows for fast, flexible, and large-scale training data generation through graph… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  48. arXiv:2308.07771  [pdf, other

    cs.CV

    Dual-path TokenLearner for Remote Photoplethysmography-based Physiological Measurement with Facial Videos

    Authors: Wei Qian, Dan Guo, Kun Li, Xilan Tian, Meng Wang

    Abstract: Remote photoplethysmography (rPPG) based physiological measurement is an emerging yet crucial vision task, whose challenge lies in exploring accurate rPPG prediction from facial videos accompanied by noises of illumination variations, facial occlusions, head movements, \etc, in a non-contact manner. Existing mainstream CNN-based models make efforts to detect physiological signals by capturing subt… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  49. arXiv:2306.09595  [pdf, other

    cs.LG cs.AI

    Structured Cooperative Learning with Graphical Model Priors

    Authors: Shuangtong Li, Tianyi Zhou, Xinmei Tian, Dacheng Tao

    Abstract: We study how to train personalized models for different tasks on decentralized devices with limited local data. We propose "Structured Cooperative Learning (SCooL)", in which a cooperation graph across devices is generated by a graphical model prior to automatically coordinate mutual learning between devices. By choosing graphical models enforcing different structures, we can derive a rich class o… ▽ More

    Submitted 21 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted by icml 2023

  50. arXiv:2306.04091  [pdf, other

    cs.CV

    1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

    Authors: Tao Zhang, Xingye Tian, Haoran Wei, Yu Wu, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

    Abstract: Video panoptic segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving. We believe that the decoupling strategy proposed by DVIS enables more effective utilization of temporal information for both "thing" and "stuff" objects. In this report, we successfully validated the effectiveness of the decoupling st… ▽ More

    Submitted 8 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.