Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 743 results for author: Cheng, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19055  [pdf, other

    cs.CV

    SimpleFusion: A Simple Fusion Framework for Infrared and Visible Images

    Authors: Ming Chen, Yuxuan Cheng, Xinwei He, Xinyue Wang, Yan Aze, Jinhai Xiang

    Abstract: Integrating visible and infrared images into one high-quality image, also known as visible and infrared image fusion, is a challenging yet critical task for many downstream vision tasks. Most existing works utilize pretrained deep neural networks or design sophisticated frameworks with strong priors for this task, which may be unsuitable or lack flexibility. This paper presents SimpleFusion, a sim… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: code:https://github.com/hxwxss/SimpleFusion-A-Simple-Fusion-Framework-for-Infrared-and-Visible-Images

  2. arXiv:2406.16942  [pdf, other

    eess.IV cs.AI cs.CV

    Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

    Authors: Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

    Abstract: Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: All codes are available at https://github.com/yuanyuanpeng0129/FMUE

  3. arXiv:2406.16554  [pdf, other

    cs.CL

    LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

    Authors: Tong Zhu, Xiaoye Qu, Daize Dong, Jiacheng Ruan, Jingqi Tong, Conghui He, Yu Cheng

    Abstract: Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems. Motivated by this limit, we investigate building MoE models from existing dense large language models. Specifically, based on the well-known LLaMA-2 7B mod… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.15480  [pdf, other

    cs.CL cs.AI cs.LG

    On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion

    Authors: Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng

    Abstract: Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. \thm{Can we fine-tune a series of task-specific small models and transfer their… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: submit under review

  5. arXiv:2406.15479  [pdf, other

    cs.CL cs.AI cs.LG

    Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

    Authors: Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng

    Abstract: In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these i… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: submit in review

  6. arXiv:2406.14885  [pdf, other

    cs.HC

    Ink and Algorithm: Exploring Temporal Dynamics in Human-AI Collaborative Writing

    Authors: Kaixun Yang, Yixin Cheng, Linxuan Zhao, Mladen Raković, Zachari Swiecki, Dragan Gašević, Guanliang Chen

    Abstract: The advent of Generative Artificial Intelligence (GAI) has revolutionized the field of writing, marking a shift towards human-AI collaborative writing in education. However, the dynamics of human-AI interaction in the collaborative writing process are not well understood, and thus it remains largely unknown how human learning can be effectively supported with such cutting-edge GAI technologies. In… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.14192  [pdf, other

    cs.CL cs.AI

    Timo: Towards Better Temporal Reasoning for Language Models

    Authors: Zhaochen Su, Jun Zhang, Tong Zhu, Xiaoye Qu, Juntao Li, Min Zhang, Yu Cheng

    Abstract: Reasoning about time is essential for Large Language Models (LLMs) to understand the world. Previous works focus on solving specific tasks, primarily on time-sensitive question answering. While these methods have proven effective, they cannot generalize to a wider spectrum of temporal reasoning tasks. Therefore, we propose a crucial question: Can we build a universal framework to handle a variety… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review

  8. arXiv:2406.13960  [pdf, other

    cs.CL cs.AI

    Evolving to be Your Soulmate: Personalized Dialogue Agents with Dynamically Adapted Personas

    Authors: Yi Cheng, Wenge Liu, Kaishuai Xu, Wenjun Hou, Yi Ouyang, Chak Tou Leong, Xian Wu, Yefeng Zheng

    Abstract: Previous research on persona-based dialogue agents typically preset the agent's persona before deployment, which remains static thereafter. In this paper, we take a step further and explore a new paradigm called Self-evolving Personalized Dialogue Agents (SPDA), where the agent continuously evolves during the conversation to better align with the user's anticipation by dynamically adapting its per… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Work in progress

  9. arXiv:2406.13934  [pdf, other

    cs.CL cs.AI

    Reasoning Like a Doctor: Improving Medical Dialogue Systems via Diagnostic Reasoning Process Alignment

    Authors: Kaishuai Xu, Yi Cheng, Wenjun Hou, Qiaoyu Tan, Wenjie Li

    Abstract: Medical dialogue systems have attracted significant attention for their potential to act as medical assistants. Enabling these medical systems to emulate clinicians' diagnostic reasoning process has been the long-standing research focus. Previous studies rudimentarily realized the simulation of clinicians' diagnostic process by fine-tuning language models on high-quality dialogue datasets. Nonethe… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 Findings

  10. arXiv:2406.13578  [pdf, other

    cs.CL

    Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

    Authors: Han-Cheng Yu, Yu-An Shih, Kin-Man Law, Kai-Yu Hsieh, Yu-Chen Cheng, Hsin-Chih Ho, Zih-An Lin, Wen-Chuan Hsu, Yao-Chung Fan

    Abstract: In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Throug… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Findings at ACL 2024

  11. arXiv:2406.13231  [pdf, other

    cs.DS

    Tight Lower Bounds for Directed Cut Sparsification and Distributed Min-Cut

    Authors: Yu Cheng, Max Li, Honghao Lin, Zi-Yi Tai, David P. Woodruff, Jason Zhang

    Abstract: In this paper, we consider two fundamental cut approximation problems on large graphs. We prove new lower bounds for both problems that are optimal up to logarithmic factors. The first problem is to approximate cuts in balanced directed graphs. In this problem, the goal is to build a data structure that $(1 \pm ε)$-approximates cut values in graphs with $n$ vertices. For arbitrary directed graph… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  12. arXiv:2406.11353  [pdf, other

    cs.LG cs.CL

    $\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts

    Authors: Guanjie Chen, Xinyu Zhao, Tianlong Chen, Yu Cheng

    Abstract: Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, the reliability assessment of MoE lags behind its surging applications. Moreover, when transferred to new domains such as in fine-tuning MoE models sometimes underperform their dense counterparts. Motivated by the research gap and counter-intuitive phenomenon, we… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 9 pages, 8 figures, camera ready on ICML2024

  13. arXiv:2406.11256  [pdf, other

    cs.CL

    Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

    Authors: Tong Zhu, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng

    Abstract: Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales. However, previous methods simply merge all training tasks (e.g. creative writing, coding, and mathematics) and apply fixed sampling weights, without considering the importance of different tasks as the model training state changes. In this way, the most helpful data c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  14. arXiv:2406.10807  [pdf, other

    stat.ML cs.AI cs.LG stat.AP

    Bayesian Networks and Machine Learning for COVID-19 Severity Explanation and Demographic Symptom Classification

    Authors: Oluwaseun T. Ajayi, Yu Cheng

    Abstract: With the prevailing efforts to combat the coronavirus disease 2019 (COVID-19) pandemic, there are still uncertainties that are yet to be discovered about its spread, future impact, and resurgence. In this paper, we present a three-stage data-driven approach to distill the hidden information about COVID-19. The first stage employs a Bayesian network structure learning method to identify the causal… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  15. arXiv:2406.10517  [pdf, other

    cs.IR cs.AI cs.LG

    ADSNet: Cross-Domain LTV Prediction with an Adaptive Siamese Network in Advertising

    Authors: Ruize Wang, Hui Xu, Ying Cheng, Qi He, Xing Zhou, Rui Feng, Wei Xu, Lei Huang, Jie Jiang

    Abstract: Advertising platforms have evolved in estimating Lifetime Value (LTV) to better align with advertisers' true performance metric. However, the sparsity of real-world LTV data presents a significant challenge to LTV predictive model(i.e., pLTV), severely limiting the their capabilities. Therefore, we propose to utilize external data, in addition to the internal data of advertising platform, to expan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to KDD 2024

  16. arXiv:2406.09773  [pdf

    cs.CV cs.AI

    Research on Edge Detection of LiDAR Images Based on Artificial Intelligence Technology

    Authors: Haowei Yang, Liyang Wang, Jingyu Zhang, Yu Cheng, Ao Xiang

    Abstract: With the widespread application of Light Detection and Ranging (LiDAR) technology in fields such as autonomous driving, robot navigation, and terrain mapping, the importance of edge detection in LiDAR images has become increasingly prominent. Traditional edge detection methods often face challenges in accuracy and computational complexity when processing LiDAR images. To address these issues, this… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  17. arXiv:2406.09765  [pdf

    q-fin.RM cs.CL

    Application of Natural Language Processing in Financial Risk Detection

    Authors: Liyang Wang, Yu Cheng, Ao Xiang, Jingyu Zhang, Haowei Yang

    Abstract: This paper explores the application of Natural Language Processing (NLP) in financial risk detection. By constructing an NLP-based financial risk detection model, this study aims to identify and predict potential risks in financial documents and communications. First, the fundamental concepts of NLP and its theoretical foundation, including text mining methods, NLP model design principles, and mac… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  18. arXiv:2406.09295  [pdf, other

    cs.CL cs.CV

    AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

    Authors: Yuhang Wu, Wenmeng Yu, Yean Cheng, Yan Wang, Xiaohan Zhang, Jiazheng Xu, Ming Ding, Yuxiao Dong

    Abstract: Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, a comprehensive alignment benchmark specifically des… ▽ More

    Submitted 13 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  19. arXiv:2406.09072  [pdf, other

    cs.CL

    Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

    Authors: Zhaochen Su, Juntao Li, Jun Zhang, Tong Zhu, Xiaoye Qu, Pan Zhou, Yan Bowen, Yu Cheng, Min zhang

    Abstract: Temporal reasoning is fundamental for large language models (LLMs) to comprehend the world. Current temporal reasoning datasets are limited to questions about single or isolated events, falling short in mirroring the realistic temporal characteristics involving concurrent nature and intricate temporal interconnections. In this paper, we introduce CoTempQA, a comprehensive co-temporal Question Answ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted to the ACL 2024 main conference

  20. arXiv:2406.08155  [pdf, other

    cs.LG cs.AI cs.CL

    Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark

    Authors: Pingzhi Li, Xiaolong Jin, Yu Cheng, Tianlong Chen

    Abstract: Large Language Models~(LLMs) have become foundational in the realm of natural language processing, demonstrating performance improvements as model sizes increase. The Mixture-of-Experts~(MoE) approach offers a promising way to scale LLMs more efficiently by using fewer computational FLOPs through sparse activation. However, it suffers from significant memory overheads, necessitating model compress… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Our code for reproducing all our experiments is provided at https://github.com/UNITES-Lab/moe-quantization

  21. arXiv:2406.08035  [pdf, other

    cs.CV cs.AI

    LVBench: An Extreme Long Video Understanding Benchmark

    Authors: Weihan Wang, Zehai He, Wenyi Hong, Yean Cheng, Xiaohan Zhang, Ji Qi, Shiyu Huang, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang

    Abstract: Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sport… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  22. arXiv:2406.07176  [pdf, other

    cs.CV

    RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection

    Authors: Yuqi Cheng, Yunkang Cao, Rui Chen, Weiming Shen

    Abstract: Robustness against noisy imaging is crucial for practical image anomaly detection systems. This study introduces a Robust Anomaly Detection (RAD) dataset with free views, uneven illuminations, and blurry collections to systematically evaluate the robustness of current anomaly detection methods. Specifically, RAD aims to identify foreign objects on working platforms as anomalies. The collection pro… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 6 pages, 5 figures

  23. arXiv:2406.07001  [pdf, other

    cs.CL cs.AI

    Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models

    Authors: Zhenyi Lu, Jie Tian, Wei Wei, Xiaoye Qu, Yu Cheng, Wenfeng xie, Dangyang Chen

    Abstract: Text classification is a crucial task encountered frequently in practical scenarios, yet it is still under-explored in the era of large language models (LLMs). This study shows that LLMs are vulnerable to changes in the number and arrangement of options in text classification. Our extensive empirical analyses reveal that the key bottleneck arises from ambiguous decision boundaries and inherent bia… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL2024 findings

  24. arXiv:2406.04809  [pdf, other

    cs.CR cs.AI

    A Survey of Fragile Model Watermarking

    Authors: Zhenzhe Gao, Yu Cheng, Zhaoxia Yin

    Abstract: Model fragile watermarking, inspired by both the field of adversarial attacks on neural networks and traditional multimedia fragile watermarking, has gradually emerged as a potent tool for detecting tampering, and has witnessed rapid development in recent years. Unlike robust watermarks, which are widely used for identifying model copyrights, fragile watermarks for models are designed to identify… ▽ More

    Submitted 20 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Submitted Expert Systems with Applications

  25. arXiv:2406.02979  [pdf, other

    cs.LG cs.AI

    Efficient User Sequence Learning for Online Services via Compressed Graph Neural Networks

    Authors: Yucheng Wu, Liyue Chen, Yu Cheng, Shuai Chen, Jinyu Xu, Leye Wang

    Abstract: Learning representations of user behavior sequences is crucial for various online services, such as online fraudulent transaction detection mechanisms. Graph Neural Networks (GNNs) have been extensively applied to model sequence relationships, and extract information from similar sequences. While user behavior sequence data volume is usually huge for online applications, directly applying GNN mode… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE ICWS 2024

  26. arXiv:2406.01913  [pdf, other

    cs.LG cs.AI

    Generating Synthetic Net Load Data with Physics-informed Diffusion Model

    Authors: Shaorong Zhang, Yuanbin Cheng, Nanpeng Yu

    Abstract: This paper presents a novel physics-informed diffusion model for generating synthetic net load data, addressing the challenges of data scarcity and privacy concerns. The proposed framework embeds physical models within denoising networks, offering a versatile approach that can be readily generalized to unforeseen scenarios. A conditional denoising neural network is designed to jointly train the pa… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  27. arXiv:2406.01388  [pdf, other

    cs.CV

    AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation

    Authors: Junhao Cheng, Xi Lu, Hanhui Li, Khun Loun Zai, Baiqiao Yin, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: As cutting-edge Text-to-Image (T2I) generation models already excel at producing remarkable single images, an even more challenging task, i.e., multi-turn interactive image generation begins to attract the attention of related research communities. This task requires models to interact with users over multiple turns to generate a coherent sequence of images. However, since users may switch subject… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Multi-turn interactive image generation

  28. arXiv:2406.00548  [pdf, ps, other

    cs.LG cs.CL

    LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models

    Authors: Tianci Liu, Haoyu Wang, Shiyang Wang, Yu Cheng, Jing Gao

    Abstract: Large language models (LLMs) have achieved impressive performance on various natural language generation tasks. Nonetheless, they suffer from generating negative and harmful contents that are biased against certain demographic groups (e.g., female), raising severe fairness concerns. As remedies, prior works intervened the generation by removing attitude or demographic information, inevitably degra… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  29. arXiv:2406.00440  [pdf, other

    cs.CV

    Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

    Authors: X. Li, Y. Cheng, X. Ren, H. Jia, D. Xu, W. Zhu, Y. Yan

    Abstract: 4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  30. arXiv:2406.00426  [pdf, other

    cs.LG

    InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation

    Authors: Jacob Si, Wendy Yusi Cheng, Michael Cooper, Rahul G. Krishnan

    Abstract: Tabular data are omnipresent in various sectors of industries. Neural networks for tabular data such as TabNet have been proposed to make predictions while leveraging the attention mechanism for interpretability. However, the inferred attention masks are often dense, making it challenging to come up with rationales about the predictive signal. To remedy this, we propose InterpreTabNet, a variant o… ▽ More

    Submitted 11 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: ICML 2024 Spotlight

  31. arXiv:2405.20603  [pdf

    cs.LG cs.AI

    Advancing Financial Risk Prediction Through Optimized LSTM Model Performance and Comparative Analysis

    Authors: Ke Xu, Yu Cheng, Shiqing Long, Junjie Guo, Jue Xiao, Mengfang Sun

    Abstract: This paper focuses on the application and optimization of LSTM model in financial risk prediction. The study starts with an overview of the architecture and algorithm foundation of LSTM, and then details the model training process and hyperparameter tuning strategy, and adjusts network parameters through experiments to improve performance. Comparative experiments show that the optimized LSTM model… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  32. arXiv:2405.18547  [pdf, ps, other

    cs.CR

    User Perception of CAPTCHAs: A Comparative Study between University and Internet Users

    Authors: Arun Reddy, Yuan Cheng

    Abstract: CAPTCHAs are commonly used to distinguish between human and bot users on the web. However, despite having various types of CAPTCHAs, there are still concerns about their security and usability. To address these concerns, we surveyed over 250 participants from a university campus and Amazon Mechanical Turk. Our goal was to gather user perceptions regarding the security and usability of current CAPT… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  33. arXiv:2405.17849  [pdf, other

    cs.LG cs.AI cs.CL

    I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models

    Authors: Xing Hu, Yuan Cheng, Dawei Yang, Zhihang Yuan, Jiangyong Yu, Chen Xu, Sifan Zhou

    Abstract: Post-training quantization (PTQ) serves as a potent technique to accelerate the inference of large language models (LLMs). Nonetheless, existing works still necessitate a considerable number of floating-point (FP) operations during inference, including additional quantization and de-quantization, as well as non-linear operators such as RMSNorm and Softmax. This limitation hinders the deployment of… ▽ More

    Submitted 5 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  34. arXiv:2405.16444  [pdf, other

    cs.LG

    CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion

    Authors: Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang

    Abstract: Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks are not always the input prefix, and when they are not, their precomput… ▽ More

    Submitted 3 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  35. arXiv:2405.16229  [pdf, other

    cs.CL cs.CR

    No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks

    Authors: Chak Tou Leong, Yi Cheng, Kaishuai Xu, Jian Wang, Hanlin Wang, Wenjie Li

    Abstract: The existing safety alignment of Large Language Models (LLMs) is found fragile and could be easily attacked through different strategies, such as through fine-tuning on a few harmful examples or manipulating the prefix of the generation results. However, the attack mechanisms of these strategies are still underexplored. In this paper, we ask the following question: \textit{while these approaches c… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: work in progress

  36. arXiv:2405.12538  [pdf, other

    cs.CV

    Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

    Authors: Yi Cheng, Ziwei Xu, Dongyun Lin, Harry Cheng, Yongkang Wong, Ying Sun, Joo Hwee Lim, Mohan Kankanhalli

    Abstract: For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leadi… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  37. arXiv:2405.12512  [pdf, other

    cs.CV cs.AI cs.MM

    Rethink Predicting the Optical Flow with the Kinetics Perspective

    Authors: Yuhao Cheng, Siru Zhang, Yiqiang Yan

    Abstract: Optical flow estimation is one of the fundamental tasks in low-level computer vision, which describes the pixel-wise displacement and can be used in many other tasks. From the apparent aspect, the optical flow can be viewed as the correlation between the pixels in consecutive frames, so continuously refining the correlation volume can achieve an outstanding performance. However, it will make the m… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  38. arXiv:2405.11729  [pdf

    cs.NE cs.LG

    Optimization of Worker Scheduling at Logistics Depots Using Genetic Algorithms and Simulated Annealing

    Authors: Jinxin Xu, Haixin Wu, Yu Cheng, Liyang Wang, Xin Yang, Xintong Fu, Yuelong Su

    Abstract: This paper addresses the optimization of scheduling for workers at a logistics depot using a combination of genetic algorithm and simulated annealing algorithm. The efficient scheduling of permanent and temporary workers is crucial for optimizing the efficiency of the logistics depot while minimizing labor usage. The study begins by establishing a 0-1 integer linear programming model, with decisio… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  39. arXiv:2405.11204  [pdf, other

    cs.LG stat.ML

    Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling

    Authors: Yuwei Cheng, Fan Yao, Xuefeng Liu, Haifeng Xu

    Abstract: This paper studies Learning from Imperfect Human Feedback (LIHF), motivated by humans' potential irrationality or imperfect perception of true preference. We revisit the classic dueling bandit problem as a model of learning from comparative human feedback, and enrich it by casting the imperfection in human feedback as agnostic corruption to user utilities. We start by identifying the fundamental l… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  40. arXiv:2405.10762  [pdf

    q-fin.RM cs.AI cs.LG

    Research on Credit Risk Early Warning Model of Commercial Banks Based on Neural Network Algorithm

    Authors: Yu Cheng, Qin Yang, Liyang Wang, Ao Xiang, Jingyu Zhang

    Abstract: In the realm of globalized financial markets, commercial banks are confronted with an escalating magnitude of credit risk, thereby imposing heightened requisites upon the security of bank assets and financial stability. This study harnesses advanced neural network techniques, notably the Backpropagation (BP) neural network, to pioneer a novel model for preempting credit risk in commercial banks. T… ▽ More

    Submitted 30 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  41. arXiv:2405.08448  [pdf, other

    cs.LG cs.AI

    Understanding the performance gap between online and offline alignment algorithms

    Authors: Yunhao Tang, Daniel Zhaohan Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, Will Dabney

    Abstract: Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of reward over-optimization, we start with an opening set of experiments that demonstrate the clear advantage of online methods over offline methods. This pro… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  42. arXiv:2405.05616  [pdf, other

    cs.CL cs.AI

    G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning

    Authors: Ruiting Dai, Yuqiao Tan, Lisi Mo, Shuang Liang, Guohao Huo, Jiayi Luo, Yao Cheng

    Abstract: Commonsense question answering has demonstrated considerable potential across various applications like assistants and social robots. Although fully fine-tuned pre-trained Language Models(LM) have achieved remarkable performance in commonsense reasoning, their tendency to excessively prioritize textual information hampers the precise transfer of structural knowledge and undermines interpretability… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  43. arXiv:2405.02686  [pdf, other

    cs.CV cs.AI

    Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images

    Authors: Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Weidong Cai

    Abstract: Neuron reconstruction, one of the fundamental tasks in neuroscience, rebuilds neuronal morphology from 3D light microscope imaging data. It plays a critical role in analyzing the structure-function relationship of neurons in the nervous system. However, due to the scarcity of neuron datasets and high-quality SWC annotations, it is still challenging to develop robust segmentation methods for single… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 3 pages

  44. arXiv:2404.19168  [pdf, other

    cs.CV

    PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition

    Authors: Dongyun Lin, Yi Cheng, Shangbo Mao, Aiyuan Guo, Yiqun Li

    Abstract: Large vision-language models have impressively promote the performance of 2D visual recognition under zero/few-shot scenarios. In this paper, we focus on exploiting the large vision-language model, i.e., CLIP, to address zero/few-shot 3D shape recognition based on multi-view representations. The key challenge for both tasks is to generate a discriminative descriptor of the 3D shape represented by… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  45. arXiv:2404.18919  [pdf, other

    cs.CV

    TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

    Authors: Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: Recent advances in diffusion models can generate high-quality and stunning images from text. However, multi-turn image generation, which is of high demand in real-world scenarios, still faces challenges in maintaining semantic consistency between images and texts, as well as contextual consistency of the same subject across multiple interactive turns. To address this issue, we introduce TheaterGen… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  46. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  47. arXiv:2404.18225  [pdf, other

    cs.RO

    Quadruped robot traversing 3D complex environments with limited perception

    Authors: Yi Cheng, Hang Liu, Guoping Pan, Linqi Ye, Houde Liu, Bin Liang

    Abstract: Traversing 3-D complex environments has always been a significant challenge for legged locomotion. Existing methods typically rely on external sensors such as vision and lidar to preemptively react to obstacles by acquiring environmental information. However, in scenarios like nighttime or dense forests, external sensors often fail to function properly, necessitating robots to rely on propriocepti… ▽ More

    Submitted 29 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 10 pages, 8 figures,submitted to iros2024

  48. arXiv:2404.17609  [pdf, other

    cs.LG cs.AI cs.CL

    CoSD: Collaborative Stance Detection with Contrastive Heterogeneous Topic Graph Learning

    Authors: Yinghan Cheng, Qi Zhang, Chongyang Shi, Liang Xiao, Shufeng Hao, Liang Hu

    Abstract: Stance detection seeks to identify the viewpoints of individuals either in favor or against a given target or a controversial topic. Current advanced neural models for stance detection typically employ fully parametric softmax classifiers. However, these methods suffer from several limitations, including lack of explainability, insensitivity to the latent data structure, and unimodality, which gre… ▽ More

    Submitted 19 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 13 pages

  49. arXiv:2404.17486  [pdf, other

    cs.CV

    TextGaze: Gaze-Controllable Face Generation with Natural Language

    Authors: Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang

    Abstract: Generating face image with specific gaze information has attracted considerable attention. Existing approaches typically input gaze values directly for face generation, which is unnatural and requires annotated gaze datasets for training, thereby limiting its application. In this paper, we present a novel gaze-controllable face generation task. Our approach inputs textual descriptions that describ… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Under review

  50. arXiv:2404.16771  [pdf, other

    cs.CV cs.AI

    ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

    Authors: Jiehui Huang, Xiao Dong, Wenhui Song, Hanhui Li, Jun Zhou, Yuhao Cheng, Shutao Liao, Long Chen, Yiqiang Yan, Shengcai Liao, Xiaodan Liang

    Abstract: Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial de… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Project page: https://ssugarwh.github.io/consistentid.github.io/