Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 2,288 results for author: Chen, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14319  [pdf, other

    cs.AI cs.CL

    LiveMind: Low-latency Large Language Models with Simultaneous Inference

    Authors: Chuangtao Chen, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, Bing Li

    Abstract: In this paper, we introduce a novel low-latency inference framework for large language models (LLMs) inference which enables LLMs to perform inferences with incomplete prompts. By reallocating computational processes to prompt input phase, we achieve a substantial reduction in latency, thereby significantly enhancing the interactive experience for users of LLMs. The framework adeptly manages the v… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.14130  [pdf, other

    cs.CV

    ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning

    Authors: Zhongjie Duan, Wenmeng Zhou, Cen Chen, Yaliang Li, Weining Qian

    Abstract: Recently, advancements in video synthesis have attracted significant attention. Video synthesis models such as AnimateDiff and Stable Video Diffusion have demonstrated the practical applicability of diffusion models in creating dynamic visual content. The emergence of SORA has further spotlighted the potential of video generation technologies. Nonetheless, the extension of video lengths has been c… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  3. arXiv:2406.14123  [pdf

    cs.CY

    Mapping AI Ethics Narratives: Evidence from Twitter Discourse Between 2015 and 2022

    Authors: Mengyi Wei, Puzhen Zhang, Chuan Chen, Dongsheng Chen, Chenyu Zuo, Liqiu Meng

    Abstract: Public participation is indispensable for an insightful understanding of the ethics issues raised by AI technologies. Twitter is selected in this paper to serve as an online public sphere for exploring discourse on AI ethics, facilitating broad and equitable public engagement in the development of AI technology. A research framework is proposed to demonstrate how to transform AI ethics-related dis… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 22 pages, 6 figures

  4. arXiv:2406.13933  [pdf, other

    cs.CR

    EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations

    Authors: Jie Ren, Yingqian Cui, Chen Chen, Vikash Sehwag, Yue Xing, Jiliang Tang, Lingjuan Lyu

    Abstract: Generative models, especially text-to-image diffusion models, have significantly advanced in their ability to generate images, benefiting from enhanced architectures, increased computational power, and large-scale datasets. While the datasets play an important role, their protection has remained as an unsolved issue. Current protection strategies, such as watermarks and membership inference, are e… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2406.13925  [pdf, other

    cs.CL cs.AI

    GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models

    Authors: Tao Zhang, Ziqian Zeng, Yuxiang Xiao, Huiping Zhuang, Cen Chen, James Foulds, Shimei Pan

    Abstract: Large Language Models (LLMs) are prone to generating content that exhibits gender biases, raising significant ethical concerns. Alignment, the process of fine-tuning LLMs to better align with desired behaviors, is recognized as an effective approach to mitigate gender biases. Although proprietary LLMs have made significant strides in mitigating gender bias, their alignment datasets are not publicl… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.12779  [pdf, other

    cs.CL

    Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition

    Authors: Xingming Liao, Nankai Lin, Haowen Li, Lianglun Cheng, Zhuowei Wang, Chong Chen

    Abstract: Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by CSCWD 2024

  7. arXiv:2406.12300  [pdf

    eess.IV cs.CV q-bio.NC

    IR2QSM: Quantitative Susceptibility Mapping via Deep Neural Networks with Iterative Reverse Concatenations and Recurrent Modules

    Authors: Min Li, Chen Chen, Zhuang Xiong, Ying Liu, Pengfei Rong, Shanshan Shan, Feng Liu, Hongfu Sun, Yang Gao

    Abstract: Quantitative susceptibility mapping (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-bas… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 9 figures

  8. arXiv:2406.11766  [pdf, other

    cs.CV

    Matching Query Image Against Selected NeRF Feature for Efficient and Scalable Localization

    Authors: Huaiji Zhou, Bing Wang, Changhao Chen

    Abstract: Neural implicit representations such as NeRF have revolutionized 3D scene representation with photo-realistic quality. However, existing methods for visual localization within NeRF representations suffer from inefficiency and scalability issues, particularly in large-scale environments. This work proposes MatLoc-NeRF, a novel matching-based localization framework using selected NeRF features. It a… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 12 pages, 2 figures

  9. arXiv:2406.11753  [pdf, other

    cs.CL cs.LG

    A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models

    Authors: Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

    Abstract: Finetuning language models (LMs) is crucial for adapting the models to downstream data and tasks. However, full finetuning is usually costly. Existing work, such as parameter-efficient finetuning (PEFT), often focuses on \textit{how to finetune} but neglects the issue of \textit{where to finetune}. As a pioneering work on answering where to finetune (at the layer level), we conduct a semantic anal… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 13 pages, 5 figures, under peer-review

  10. arXiv:2406.10778  [pdf, other

    cs.CE stat.AP

    Heterogeneous Entity Representation for Medicinal Synergy Prediction

    Authors: Jiawei Wu, Jun Wen, Mingyuan Yan, Anqi Dong, Can Chen

    Abstract: Medicinal synergy prediction is a powerful tool in drug discovery and development that harnesses the principles of combination therapy to enhance therapeutic outcomes by improving efficacy, reducing toxicity, and preventing drug resistance. While a myriad of computational methods has emerged for predicting synergistic drug combinations, a large portion of them may overlook the intricate, yet criti… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

    MSC Class: 92C50; 05C65; 68T07

  11. arXiv:2406.10479  [pdf, other

    cs.AI

    Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning

    Authors: Wenjun Li, Changyu Chen, Pradeep Varakantham

    Abstract: Large language models (LLMs) have demonstrated impressive task-solving capabilities, achieved through either prompting techniques or system designs. However, concerns have arisen regarding their proficiency in planning tasks, as they often struggle to generate valid plans. This paper investigates the impact of fine-tuning on LLMs' planning capabilities. Our findings indicate that LLMs can achieve… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9 pages of main paper, 2 pages of references

  12. arXiv:2406.10313  [pdf, ps, other

    cs.CL cs.CV

    CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge

    Authors: Chen Chen, Zehua Liu, Xiaolou Li, Lantian Li, Dong Wang

    Abstract: The first Chinese Continuous Visual Speech Recognition Challenge aimed to probe the performance of Large Vocabulary Continuous Visual Speech Recognition (LVC-VSR) on two tasks: (1) Single-speaker VSR for a particular speaker and (2) Multi-speaker VSR for a set of registered speakers. The challenge yielded highly successful results, with the best submission significantly outperforming the baseline,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  13. arXiv:2406.09760  [pdf, other

    cs.CL cs.LG

    Bootstrapping Language Models with DPO Implicit Rewards

    Authors: Changyu Chen, Zichen Liu, Chao Du, Tianyu Pang, Qian Liu, Arunesh Sinha, Pradeep Varakantham, Min Lin

    Abstract: Human alignment in large language models (LLMs) is an active area of research. A recent groundbreaking work, direct preference optimization (DPO), has greatly simplified the process from past work in reinforcement learning from human feedback (RLHF) by bypassing the reward learning stage in RLHF. DPO, after training, provides an implicit reward model. In this work, we make a novel observation that… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  14. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Kai Yu, Aidi Lin, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Wei Chen, Yilong Luo, Yifan Chen, Jingcheng Wang, Yih Chung Tham, Dianbo Liu, Wendy Wong, Sahil Thakur, Beau Fenner, Yanda Meng, Yukun Zhou , et al. (11 additional authors not shown)

    Abstract: The current retinal artificial intelligence models were trained using data with a limited category of diseases and limited knowledge. In this paper, we present a retinal vision-language foundation model (RetiZero) with knowledge of over 400 fundus diseases. Specifically, we collected 341,896 fundus images paired with text descriptions from 29 publicly available datasets, 180 ophthalmic books, and… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.09272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

    Authors: Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman

    Abstract: Generating realistic audio for human interactions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during training, yet many sounds happen off-screen and have weak to no correspondence with the visuals -- resulting in uncontrolled ambient sounds or hallucinat… ▽ More

    Submitted 20 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://vision.cs.utexas.edu/projects/action2sound

  16. arXiv:2406.08897  [pdf, other

    cs.LG

    Motif-driven Subgraph Structure Learning for Graph Classification

    Authors: Zhiyao Zhou, Sheng Zhou, Bochao Mao, Jiawei Chen, Qingyun Sun, Yan Feng, Chun Chen, Can Wang

    Abstract: To mitigate the suboptimal nature of graph structure, Graph Structure Learning (GSL) has emerged as a promising approach to improve graph structure and boost performance in downstream tasks. Despite the proposal of numerous GSL methods, the progresses in this field mostly concentrated on node-level tasks, while graph-level tasks (e.g., graph classification) remain largely unexplored. Notably, appl… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures

  17. arXiv:2406.08638  [pdf, other

    cs.LG

    Conditional Similarity Triplets Enable Covariate-Informed Representations of Single-Cell Data

    Authors: Chi-Jane Chen, Haidong Yi, Natalie Stanley

    Abstract: Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per cell. In order to translate data from immune profiling assays into powerful diagnostics, machine learning approaches are used to compute per-sample immunological summaries, or featurizations that can be used as inputs to models for outcomes of interest. Cur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  18. arXiv:2406.07953  [pdf, other

    cs.CR cs.DS cs.LG

    DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding Windows (Technical Report)

    Authors: Yiping Wang, Yanhao Wang, Cen Chen

    Abstract: The sliding window model of computation captures scenarios in which data are continually arriving in the form of a stream, and only the most recent $w$ items are used for analysis. In this setting, an algorithm needs to accurately track some desired statistics over the sliding window using a small space. When data streams contain sensitive information about individuals, the algorithm is also urgen… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted for publication at KDD 2024

  19. arXiv:2406.07854  [pdf, other

    cs.SD cs.MM eess.AS

    Zero-Shot Fake Video Detection by Audio-Visual Consistency

    Authors: Xiaolou Li, Zehua Liu, Chen Chen, Lantian Li, Li Guo, Dong Wang

    Abstract: Recent studies have advocated the detection of fake videos as a one-class detection task, predicated on the hypothesis that the consistency between audio and visual modalities of genuine data is more significant than that of fake data. This methodology, which solely relies on genuine audio-visual data while negating the need for forged counterparts, is thus delineated as a `zero-shot' detection pa… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  20. arXiv:2406.07754  [pdf, other

    cs.CV

    HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

    Authors: Zihui Xue, Mi Luo, Changan Chen, Kristen Grauman

    Abstract: We study the problem of precisely swapping objects in videos, with a focus on those interacted with by hands, given one user-provided reference object image. Despite the great advancements that diffusion models have made in video editing recently, these models often fall short in handling the intricacies of hand-object interactions (HOI), failing to produce realistic edits -- especially when objec… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Project website: https://vision.cs.utexas.edu/projects/HOI-Swap/

  21. arXiv:2406.07478  [pdf, other

    quant-ph cs.CC

    Incompressibility and spectral gaps of random circuits

    Authors: Chi-Fang Chen, Jeongwan Haah, Jonas Haferkamp, Yunchao Liu, Tony Metger, Xinyu Tan

    Abstract: Random reversible and quantum circuits form random walks on the alternating group $\mathrm{Alt}(2^n)$ and unitary group $\mathrm{SU}(2^n)$, respectively. Known bounds on the spectral gap for the $t$-th moment of these random walks have inverse-polynomial dependence in both $n$ and $t$. We prove that the gap for random reversible circuits is $Ω(n^{-3})$ for all $t\geq 1$, and the gap for random qua… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 79 pages, 5 figures

  22. arXiv:2406.06793  [pdf, other

    cs.LG cs.AI

    PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

    Authors: Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

    Abstract: Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline \textit{value function learning}, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows.~On the other hand, models that ca… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  23. arXiv:2406.06730  [pdf, other

    cs.CV cs.AI

    TRINS: Towards Multimodal Language Models that Can Read

    Authors: Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun

    Abstract: Large multimodal language models have shown remarkable proficiency in understanding and editing images. However, a majority of these visually-tuned models struggle to comprehend the textual content embedded in images, primarily due to the limitation of training data. In this work, we introduce TRINS: a Text-Rich image INStruction dataset, with the objective of enhancing the reading ability of the… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  24. arXiv:2406.06606  [pdf, other

    cs.CL cs.AI

    Prototypical Reward Network for Data-Efficient RLHF

    Authors: Jinghan Zhang, Xiting Wang, Yiqiao Jin, Changyu Chen, Xinhao Zhang, Kunpeng Liu

    Abstract: The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven effective in fine-tuning Large Language Models (LLMs). Notably, collecting human feedback for RLHF can be resource-intensive and lead to scalability issues for LLMs and complex tasks. Our proposed framework Proto-RM leverages prototypical networks to enhance reward models under limited human feedback. By enabling sta… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024

  25. arXiv:2406.06559  [pdf, other

    cs.CL cs.AI cs.LG

    Harnessing Business and Media Insights with Large Language Models

    Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

    Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  26. arXiv:2406.06386  [pdf, other

    cs.CV

    FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography

    Authors: Julia Yang, Alina Jade Barnett, Jon Donnelly, Satvik Kishore, Jerry Fang, Fides Regina Schwartz, Chaofan Chen, Joseph Y. Lo, Cynthia Rudin

    Abstract: Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency t… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 8 pages, 6 figures, Accepted for oral presentation at the 2024 CVPR Workshop on Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis (DEF-AI-MIA)

  27. arXiv:2406.05768  [pdf, other

    cs.CV cs.AI

    MLCM: Multistep Consistency Distillation of Latent Diffusion Model

    Authors: Qingsong Xie, Zhenyi Liao, Chen chen, Zhijie Deng, Shixiang Tang, Haonan Lu

    Abstract: Distilling large latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest. However, the majority of existing methods face a dilemma where they either (i) depend on multiple individual distilled models for different sampling budgets, or (ii) sacrifice generation quality with limited (e.g., 2-4) and/or moderate (e.g., 5-8) sampling steps. To addre… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  28. arXiv:2406.04680  [pdf, other

    eess.IV cs.CV

    MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome

    Authors: Yixin Huang, Yiqi Jin, Ke Tao, Kaijian Xia, Jianfeng Gu, Lei Yu, Lan Du, Cunjian Chen

    Abstract: May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  29. arXiv:2406.04662  [pdf, other

    cs.CV

    Evaluating and Mitigating IP Infringement in Visual Generative AI

    Authors: Zhenting Wang, Chen Chen, Vikash Sehwag, Minzhou Pan, Lingjuan Lyu

    Abstract: The popularity of visual generative AI models like DALL-E 3, Stable Diffusion XL, Stable Video Diffusion, and Sora has been increasing. Through extensive evaluation, we discovered that the state-of-the-art visual generative models can generate content that bears a striking resemblance to characters protected by intellectual property rights held by major entertainment companies (such as Sony, Marve… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  30. arXiv:2406.03880  [pdf, other

    cs.LG cs.AI

    Memorization in deep learning: A survey

    Authors: Jiaheng Wei, Yanjun Zhang, Leo Yu Zhang, Ming Ding, Chao Chen, Kok-Leong Ong, Jun Zhang, Yang Xiang

    Abstract: Deep Learning (DL) powered by Deep Neural Networks (DNNs) has revolutionized various domains, yet understanding the intricacies of DNN decision-making and learning processes remains a significant challenge. Recent investigations have uncovered an interesting memorization phenomenon in which DNNs tend to memorize specific details from examples rather than learning general patterns, affecting model… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  31. arXiv:2406.03836  [pdf, other

    cs.CR cs.AI

    Proactive Detection of Physical Inter-rule Vulnerabilities in IoT Services Using a Deep Learning Approach

    Authors: Bing Huang, Chen Chen, Kwok-Yan Lam, Fuqun Huang

    Abstract: Emerging Internet of Things (IoT) platforms provide sophisticated capabilities to automate IoT services by enabling occupants to create trigger-action rules. Multiple trigger-action rules can physically interact with each other via shared environment channels, such as temperature, humidity, and illumination. We refer to inter-rule interactions via shared environment channels as a physical inter-ru… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE ICWS 2024 Workshop

  32. arXiv:2406.03293  [pdf, other

    cs.CV

    Text-to-Image Rectified Flow as Plug-and-Play Priors

    Authors: Xiaofeng Yang, Cheng Chen, Xulei Yang, Fayao Liu, Guosheng Lin

    Abstract: Large-scale diffusion models have achieved remarkable performance in generative tasks. Beyond their initial training applications, these models have proven their ability to function as versatile plug-and-play priors. For instance, 2D diffusion models can serve as loss functions to optimize 3D implicit models. Rectified flow, a novel class of generative models, enforces a linear progression from th… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/yangxiaofeng/rectified_flow_prior

  33. arXiv:2406.02884  [pdf, other

    cs.CV

    PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

    Authors: Tao Yang, Yingmin Luo, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen

    Abstract: Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  34. arXiv:2406.02642  [pdf, other

    cs.LG cs.AI

    E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory

    Authors: Zhou Yang, Zhaochun Ren, Chenglong Ye, Yufeng Wang, Haizhou Sun, Chao Chen, Xiaofei Zhu, Yunbing Wu, Xiangwen Liao

    Abstract: In-context learning (ICL) achieves remarkable performance in various domains such as knowledge acquisition, commonsense reasoning, and semantic understanding. However, its performance significantly deteriorates for emotion detection tasks, especially fine-grained emotion recognition. The underlying reasons for this remain unclear. In this paper, we identify the reasons behind ICL's poor performanc… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures, 5 tables

  35. arXiv:2406.01987  [pdf, other

    cs.CV

    Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization

    Authors: Yunpeng Zhao, Cheng Chen, Qing You Pang, Quanzheng Li, Carol Tang, Beng-Ti Ang, Yueming Jin

    Abstract: Addressing missing modalities presents a critical challenge in multimodal learning. Current approaches focus on developing models that can handle modality-incomplete inputs during inference, assuming that the full set of modalities are available for all the data during training. This reliance on full-modality data for training limits the use of abundant modality-incomplete samples that are often e… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  36. arXiv:2406.01394  [pdf, other

    cs.CR cs.AI

    PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration

    Authors: Ziqian Zeng, Jianwei Wang, Zhengdong Lu, Huiping Zhuang, Cen Chen

    Abstract: The widespread usage of online Large Language Models (LLMs) inference services has raised significant privacy concerns about the potential exposure of private information in user inputs to eavesdroppers or untrustworthy service providers. Existing privacy protection methods for LLMs suffer from insufficient privacy protection, performance degradation, or severe inference time overhead. In this pap… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  37. arXiv:2406.00654  [pdf, other

    cs.CL cs.SD eess.AS

    Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

    Authors: Chen Chen, Yuchen Hu, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang

    Abstract: In recent years, text-to-speech (TTS) technology has witnessed impressive advancements, particularly with large-scale training datasets, showcasing human-level speech quality and impressive zero-shot capabilities on unseen speakers. However, despite human subjective evaluations, such as the mean opinion score (MOS), remaining the gold standard for assessing the quality of synthetic speech, even st… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 19 pages, Preprint

  38. arXiv:2406.00329  [pdf, other

    eess.IV cs.CV cs.LG

    Whole Heart 3D+T Representation Learning Through Sparse 2D Cardiac MR Images

    Authors: Yundi Zhang, Chen Chen, Suprosanna Shit, Sophie Starck, Daniel Rueckert, Jiazhen Pan

    Abstract: Cardiac Magnetic Resonance (CMR) imaging serves as the gold-standard for evaluating cardiac morphology and function. Typically, a multi-view CMR stack, covering short-axis (SA) and 2/3/4-chamber long-axis (LA) views, is acquired for a thorough cardiac assessment. However, efficiently streamlining the complex, high-dimensional 3D+T CMR data and distilling compact, coherent representation remains a… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  39. arXiv:2405.20725  [pdf, other

    cs.AI cs.CV

    GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search

    Authors: Wenbo Yu, Hao Fang, Bin Chen, Xiaohang Sui, Chuan Chen, Hao Wu, Shu-Tao Xia, Ke Xu

    Abstract: Gradient Inversion Attacks invert the transmitted gradients in Federated Learning (FL) systems to reconstruct the sensitive data of local clients and have raised considerable privacy concerns. A majority of gradient inversion methods rely heavily on explicit prior knowledge (e.g., a well pre-trained generative model), which is often unavailable in realistic scenarios. To alleviate this issue, rese… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  40. arXiv:2405.20708  [pdf, other

    cs.CL cs.AI

    FinGen: A Dataset for Argument Generation in Finance

    Authors: Chung-Chi Chen, Hiroya Takamura, Ichiro Kobayashi, Yusuke Miyao

    Abstract: Thinking about the future is one of the important activities that people do in daily life. Futurists also pay a lot of effort into figuring out possible scenarios for the future. We argue that the exploration of this direction is still in an early stage in the NLP research. To this end, we propose three argument generation tasks in the financial application scenario. Our experimental results show… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  41. arXiv:2405.20633  [pdf, other

    cs.CV

    Action-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action Detection

    Authors: Jing Xu, Anqi Zhu, Jingyu Lin, Qiuhong Ke, Cunjian Chen

    Abstract: Human action recognition is a crucial task in computer vision systems. However, in real-world scenarios, human actions often fall outside the distribution of training data, requiring a model to both recognize in-distribution (ID) actions and reject out-of-distribution (OOD) ones. Despite its importance, there has been limited research on OOD detection in human actions. Existing works on OOD detect… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Under consideration at Computer Vision and Image Understanding

  42. arXiv:2405.20322  [pdf, other

    quant-ph cs.DS

    Quantum generalizations of Glauber and Metropolis dynamics

    Authors: András Gilyén, Chi-Fang Chen, Joao F. Doriguello, Michael J. Kastoryano

    Abstract: Classical Markov Chain Monte Carlo methods have been essential for simulating statistical physical systems and have proven well applicable to other systems with complex degrees of freedom. Motivated by the statistical physics origins, Chen, Kastoryano, and Gilyén [CKG23] proposed a continuous-time quantum thermodynamic analog to Glauber dynamic that is (i) exactly detailed balanced, (ii) efficient… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  43. arXiv:2405.19888  [pdf, other

    cs.LG cs.AI

    Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

    Authors: Chaofan Lin, Zhenhua Han, Chengruidong Zhang, Yuqing Yang, Fan Yang, Chen Chen, Lili Qiu

    Abstract: The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could design complex workflows using multiple LLM requests to accomplish one task. However, they have to use the over-simplified request-level API provided by… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: To appear on USENIX OSDI 2024

  44. arXiv:2405.19677  [pdf, other

    cs.CR cs.AI

    Large Language Model Watermark Stealing With Mixed Integer Programming

    Authors: Zhaoxi Zhang, Xiaomei Zhang, Yanjun Zhang, Leo Yu Zhang, Chao Chen, Shengshan Hu, Asif Gill, Shirui Pan

    Abstract: The Large Language Model (LLM) watermark is a newly emerging technique that shows promise in addressing concerns surrounding LLM copyright, monitoring AI-generated text, and preventing its misuse. The LLM watermark scheme commonly includes generating secret keys to partition the vocabulary into green and red lists, applying a perturbation to the logits of tokens in the green list to increase their… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 12 pages

  45. arXiv:2405.18744  [pdf, other

    cs.CR

    PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN

    Authors: Fei Zheng, Chaochao Chen, Zhongxuan Han, Xiaolin Zheng

    Abstract: The emergence of ChatGPT marks the arrival of the large language model (LLM) era. While LLMs demonstrate their power in a variety of fields, they also raise serious privacy concerns as the users' queries are sent to the model provider. On the other side, deploying the LLM on the user's device will also leak all the model data. Existing methods based on secure multiparty computation (MPC) managed t… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  46. arXiv:2405.18655  [pdf, other

    cs.LG cs.AI q-bio.GN

    CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data

    Authors: Ping-Han Hsieh, Ru-Xiu Hsiao, Katalin Ferenc, Anthony Mathelier, Rebekka Burkholz, Chien-Yu Chen, Geir Kjetil Sandve, Tatiana Belova, Marieke Lydia Kuijjer

    Abstract: Paired single-cell sequencing technologies enable the simultaneous measurement of complementary modalities of molecular data at single-cell resolution. Along with the advances in these technologies, many methods based on variational autoencoders have been developed to integrate these data. However, these methods do not explicitly incorporate prior biological relationships between the data modaliti… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  47. arXiv:2405.17779  [pdf, other

    cs.LG cs.RO

    Online Analytic Exemplar-Free Continual Learning with Large Models for Imbalanced Autonomous Driving Task

    Authors: Huiping Zhuang, Di Fang, Kai Tong, Yuchen Liu, Ziqian Zeng, Xu Zhou, Cen Chen

    Abstract: In the field of autonomous driving, even a meticulously trained model can encounter failures when faced with unfamiliar sceanrios. One of these scenarios can be formulated as an online continual learning (OCL) problem. That is, data come in an online fashion, and models are updated according to these streaming data. Two major OCL challenges are catastrophic forgetting and data imbalance. To addres… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  48. arXiv:2405.16890  [pdf, other

    cs.CV

    PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance

    Authors: Haohan Weng, Yikai Wang, Tong Zhang, C. L. Philip Chen, Jun Zhu

    Abstract: Generating compact and sharply detailed 3D meshes poses a significant challenge for current 3D generative models. Different from extracting dense meshes from neural representation, some recent works try to model the native mesh distribution (i.e., a set of triangles), which generates more compact results as humans crafted. However, due to the complexity and variety of mesh topology, these methods… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project website: https://whaohan.github.io/pivotmesh

  49. arXiv:2405.15190  [pdf, other

    cs.IR

    Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search

    Authors: Marie Al Ghossein, Ching-Wei Chen, Jason Tang

    Abstract: Recent advances in the fields of Information Retrieval and Machine Learning have focused on improving the performance of search engines to enhance the user experience, especially in the world of online shopping. The focus has thus been on leveraging cutting-edge learning techniques and relying on large enriched datasets. This paper introduces the Shopping Queries Image Dataset (SQID), an extension… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  50. arXiv:2405.15062  [pdf, other

    cs.LG

    Model-Agnostic Utility-Preserving Biometric Information Anonymization

    Authors: Chun-Fu Chen, Bill Moriarty, Shaohan Hu, Sean Moran, Marco Pistoia, Vincenzo Piuri, Pierangela Samarati

    Abstract: The recent rapid advancements in both sensing and machine learning technologies have given rise to the universal collection and utilization of people's biometrics, such as fingerprints, voices, retina/facial scans, or gait/motion/gestures data, enabling a wide range of applications including authentication, health monitoring, or much more sophisticated analytics. While providing better user experi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint of IJIS version, https://link.springer.com/article/10.1007/s10207-024-00862-8