Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,252 results for author: Chen, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20455  [pdf, other

    cs.CV

    Learning Feature-Preserving Portrait Editing from Generated Pairs

    Authors: Bowei Chen, Tiancheng Zhi, Peihao Zhu, Shen Sang, Jing Liu, Linjie Luo

    Abstract: Portrait editing is challenging for existing techniques due to difficulties in preserving subject features like identity. In this paper, we propose a training-based method leveraging auto-generated paired data to learn desired editing while ensuring the preservation of unchanged subject features. Specifically, we design a data generation process to create reasonably good training pairs for desired… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2407.20256  [pdf

    cs.DB cs.AI cs.LG

    Making LLMs Work for Enterprise Data Tasks

    Authors: Çağatay Demiralp, Fabian Wenz, Peter Baile Chen, Moe Kayali, Nesime Tatbul, Michael Stonebraker

    Abstract: Large language models (LLMs) know little about enterprise database tables in the private data ecosystem, which substantially differ from web text in structure and content. As LLMs' performance is tied to their training data, a crucial question is how useful they can be in improving enterprise database management and analysis tasks. To address this, we contribute experimental results on LLMs' perfo… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Poster at North East Database Day 2024

  3. arXiv:2407.19078  [pdf, other

    cs.LG stat.ML

    Practical Marketplace Optimization at Uber Using Causally-Informed Machine Learning

    Authors: Bobby Chen, Siyu Chen, Jason Dowlatabadi, Yu Xuan Hong, Vinayak Iyer, Uday Mantripragada, Rishabh Narang, Apoorv Pandey, Zijun Qin, Abrar Sheikh, Hongtao Sun, Jiaqi Sun, Matthew Walker, Kaichen Wei, Chen Xu, Jingnan Yang, Allen T. Zhang, Guoqing Zhang

    Abstract: Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: To be published in the 2nd Workshop on Causal Inference and Machine Learning in Practice, KDD 2024, August 25 to 29, 2024, Barcelona, Spain, 10 pages

    MSC Class: 62J99

  4. HICEScore: A Hierarchical Metric for Image Captioning Evaluation

    Authors: Zequn Zeng, Jianqiao Sun, Hao Zhang, Tiansheng Wen, Yudi Su, Yan Xie, Zhengjue Wang, Bo Chen

    Abstract: Image captioning evaluation metrics can be divided into two categories, reference-based metrics and reference-free metrics. However, reference-based approaches may struggle to evaluate descriptive captions with abundant visual details produced by advanced multimodal large language models, due to their heavy reliance on limited human-annotated references. In contrast, previous reference-free metric… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM2024

  5. arXiv:2407.18046  [pdf, other

    cs.CV cs.AI

    GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

    Authors: Jintong Hu, Bin Xia, Bin Chen, Wenming Yang, Lei Zhang

    Abstract: Implicit neural representations (INRs) have significantly advanced the field of arbitrary-scale super-resolution (ASSR) of images. Most existing INR-based ASSR networks first extract features from the given low-resolution image using an encoder, and then render the super-resolved result via a multi-layer perceptron decoder. Although these approaches have shown promising results, their performance… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 13 pages, 12 figures

  6. arXiv:2407.16564  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning

    Authors: Fang-Duo Tsai, Shih-Lun Wu, Haven Kim, Bo-Yu Chen, Hao-Chung Cheng, Yi-Hsuan Yang

    Abstract: Text-to-music models allow users to generate nearly realistic musical audio with textual commands. However, editing music audios remains challenging due to the conflicting desiderata of performing fine-grained alterations on the audio while maintaining a simple user interface. To address this challenge, we propose Audio Prompt Adapter (or AP-Adapter), a lightweight addition to pretrained text-to-m… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by the 25th International Society for Music Information Retrieval (ISMIR)

  7. arXiv:2407.15892  [pdf, other

    cs.LG

    MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

    Authors: Cheng Luo, Jiawei Zhao, Zhuoming Chen, Beidi Chen, Anima Anandkumar

    Abstract: We introduce Mini-Sequence Transformer (MsT), a simple and effective methodology for highly efficient and accurate LLM training with extremely long sequences. MsT partitions input sequences and iteratively processes mini-sequences to reduce intermediate memory usage. Integrated with activation recomputation, it enables significant memory savings in both forward and backward passes. In experiments… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  8. arXiv:2407.15754  [pdf, other

    cs.CV cs.CL cs.LG

    LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

    Authors: Haoning Wu, Dongxu Li, Bei Chen, Junnan Li

    Abstract: Large multimodal models (LMMs) are processing increasingly longer and richer inputs. Albeit the progress, few public benchmark is available to measure such development. To mitigate this gap, we introduce LongVideoBench, a question-answering benchmark that features video-language interleaved inputs up to an hour long. Our benchmark includes 3,763 varying-length web-collected videos with their subti… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 29 pages

  9. arXiv:2407.14098  [pdf, other

    cs.DB

    Top-k Representative Search for Comparative Tree Summarization

    Authors: Yuqi Chen, Xin Huang, Bilian Chen

    Abstract: Data summarization aims at utilizing a small-scale summary to represent massive datasets as a whole, which is useful for visualization and information sipped generation. However, most existing studies of hierarchical summarization only work on \emph{one single tree} by selecting $k$ representative nodes, which neglects an important problem of comparative summarization on two trees. In this paper,… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  10. arXiv:2407.13863  [pdf, other

    cs.CV

    A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

    Authors: Yixiang Qiu, Hao Fang, Hongyao Yu, Bin Chen, MeiKang Qiu, Shu-Tao Xia

    Abstract: Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information, raising extensive concerns about the security of Deep Neural Networks (DNNs). Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks due to their powerful ability to generate realistic image… ▽ More

    Submitted 27 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  11. arXiv:2407.11699  [pdf, other

    cs.CV

    Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

    Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen, Xuguang Lan

    Abstract: This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment o… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  12. arXiv:2407.10179  [pdf, other

    cs.CV

    CLIP-Guided Networks for Transferable Targeted Attacks

    Authors: Hao Fang, Jiawei Kong, Bin Chen, Tao Dai, Hao Wu, Shu-Tao Xia

    Abstract: Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios. Recent studies have introduced \textit{single-target} generative attacks that train a generator for each target class to generate highly transferable perturbations, resulting in substantial computational overhead when handling multiple classes. \textit{Multi-targe… ▽ More

    Submitted 22 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  13. arXiv:2407.10081  [pdf, other

    cs.IR

    All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era

    Authors: Bo Chen, Xinyi Dai, Huifeng Guo, Wei Guo, Weiwen Liu, Yong Liu, Jiarui Qin, Ruiming Tang, Yichao Wang, Chuhan Wu, Yaxiong Wu, Hao Zhang

    Abstract: Recommender systems (RS) are vital for managing information overload and delivering personalized content, responding to users' diverse information needs. The emergence of large language models (LLMs) offers a new horizon for redefining recommender systems with vast general knowledge and reasoning capabilities. Standing across this LLM era, we aim to integrate recommender systems into a broader pic… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  14. arXiv:2407.10074  [pdf, ps, other

    cs.IT

    Optimal linear codes with few weights from simplicial complexes

    Authors: Bing Chen, Yunge Xu, Zhao Hu, Nian Li, Xiangyong Zeng

    Abstract: Recently, constructions of optimal linear codes from simplicial complexes have attracted much attention and some related nice works were presented. Let $q$ be a prime power. In this paper, by using the simplicial complexes of ${\mathbb F}_{q}^m$ with one single maximal element, we construct four families of linear codes over the ring ${\mathbb F}_{q}+u{\mathbb F}_{q}$ ($u^2=0$), which generalizes… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 18 pages

  15. arXiv:2407.08722  [pdf, other

    cs.RO cs.CV cs.LG

    Unifying 3D Representation and Control of Diverse Robots with a Single Camera

    Authors: Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann

    Abstract: Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Project Page: https://sizhe-li.github.io/publication/neural_jacobian_field

  16. arXiv:2407.08019  [pdf, other

    cs.CV

    Coherent and Multi-modality Image Inpainting via Latent Space Optimization

    Authors: Lingzhi Pan, Tong Zhang, Bingyuan Chen, Qi Zhou, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: With the advancements in denoising diffusion probabilistic models (DDPMs), image inpainting has significantly evolved from merely filling information based on nearby regions to generating content conditioned on various prompts such as text, exemplar images, and sketches. However, existing methods, such as model fine-tuning and simple concatenation of latent vectors, often result in generation fail… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  17. arXiv:2407.05425  [pdf, other

    cs.RO

    ClutterGen: A Cluttered Scene Generator for Robot Learning

    Authors: Yinsen Jia, Boyuan Chen

    Abstract: We introduce ClutterGen, a physically compliant simulation scene generator capable of producing highly diverse, cluttered, and stable scenes for robot learning. Generating such scenes is challenging as each object must adhere to physical laws like gravity and collision. As the number of objects increases, finding valid poses becomes more difficult, necessitating significant human engineering effor… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  18. arXiv:2407.05125  [pdf, other

    cs.DC cs.LG

    A Joint Approach to Local Updating and Gradient Compression for Efficient Asynchronous Federated Learning

    Authors: Jiajun Song, Jiajun Luo, Rongwei Lu, Shuzhao Xie, Bin Chen, Zhi Wang

    Abstract: Asynchronous Federated Learning (AFL) confronts inherent challenges arising from the heterogeneity of devices (e.g., their computation capacities) and low-bandwidth environments, both potentially causing stale model updates (e.g., local gradients) for global aggregation. Traditional approaches mitigating the staleness of updates typically focus on either adjusting the local updating or gradient co… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  19. arXiv:2407.04960  [pdf, other

    cs.IR

    MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models

    Authors: Yunjia Xi, Weiwen Liu, Jianghao Lin, Bo Chen, Ruiming Tang, Weinan Zhang, Yong Yu

    Abstract: Conversational recommender systems (CRSs) aim to capture user preferences and provide personalized recommendations through multi-round natural language dialogues. However, most existing CRS models mainly focus on dialogue comprehension and preferences mining from the current dialogue session, overlooking user preferences in historical dialogue sessions. The preferences embedded in the user's histo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  20. arXiv:2407.04147  [pdf, other

    cs.SE

    ALPINE: An adaptive language-agnostic pruning method for language models for code

    Authors: Mootez Saad, José Antonio Hernández López, Boqi Chen, Dániel Varró, Tushar Sharma

    Abstract: Language models of code have demonstrated state-of-the-art performance across various software engineering and source code analysis tasks. However, their demanding computational resource requirements and consequential environmental footprint remain as significant challenges. This work introduces ALPINE, an adaptive programming language-agnostic pruning technique designed to substantially reduce th… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  21. arXiv:2407.03963  [pdf, other

    cs.CL cs.AI

    LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

    Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

    Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  22. arXiv:2407.01955  [pdf, other

    cs.CL

    S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models

    Authors: Parsa Kavehzadeh, Mohammadreza Pourreza, Mojtaba Valipour, Tinashu Zhu, Haoli Bai, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

    Abstract: Deployment of autoregressive large language models (LLMs) is costly, and as these models increase in size, the associated costs will become even more considerable. Consequently, different methods have been proposed to accelerate the token generation process and reduce costs. Speculative decoding (SD) is among the most promising approaches to speed up the LLM decoding process by verifying multiple… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  23. arXiv:2407.01392  [pdf, other

    cs.LG cs.CV cs.RO

    Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

    Authors: Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann

    Abstract: This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones. Our approach is shown to combine the strengths of… ▽ More

    Submitted 4 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Project website: https://boyuan.space/diffusion-forcing Code: https://github.com/buoyancy99/diffusion-forcing

  24. arXiv:2407.00467  [pdf, other

    cs.LG cs.DC eess.IV

    VcLLM: Video Codecs are Secretly Tensor Codecs

    Authors: Ceyu Xu, Yongji Wu, Xinyu Yang, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills

    Abstract: As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs. To mitigate these bottlenecks, various tensor compression techniques have been proposed to reduce the data size, thereby alleviating memory requirements and communication pressur… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  25. arXiv:2406.19995  [pdf, other

    cs.CL cs.AI cs.LG

    Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model

    Authors: Habib Hajimolahoseini, Mohammad Hassanpour, Foozhan Ataiefard, Boxing Chen, Yang Liu

    Abstract: This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. Our approach leverages a pre-trained model, which is then incrementally decompressed to smaller sizes using progressively lower ranks. This method allows for significant reductions in computational overhead and energy consumption, as subsequent models are derived… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  26. arXiv:2406.19971  [pdf, other

    cs.RO

    Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies

    Authors: Pingcheng Jian, Easop Lee, Zachary Bell, Michael M. Zavlanos, Boyuan Chen

    Abstract: Vision-based imitation learning has shown promising capabilities of endowing robots with various motion skills given visual observation. However, current visuomotor policies fail to adapt to drastic changes in their visual observations. We present Perception Stitching that enables strong zero-shot adaptation to large visual changes by directly stitching novel combinations of visual encoders. Our k… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  27. arXiv:2406.19963  [pdf, other

    cs.RO cs.AI cs.LG

    Text2Robot: Evolutionary Robot Design from Text Descriptions

    Authors: Ryan P. Ringel, Zachary S. Charlick, Jiaxun Liu, Boxi Xia, Boyuan Chen

    Abstract: Robot design has traditionally been costly and labor-intensive. Despite advancements in automated processes, it remains challenging to navigate a vast design space while producing physically manufacturable robots. We introduce Text2Robot, a framework that converts user text specifications and performance preferences into physical quadrupedal robots. Within minutes, Text2Robot can use text-to-3D mo… ▽ More

    Submitted 1 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: Our project website is at: http://generalroboticslab.com/Text2Robot

  28. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  29. arXiv:2406.18825  [pdf, other

    cs.IR

    ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation

    Authors: Jizheng Chen, Kounianhua Du, Jianghao Lin, Bo Chen, Ruiming Tang, Weinan Zhang

    Abstract: Large language models have been flourishing in the natural language processing (NLP) domain, and their potential for recommendation has been paid much attention to. Despite the intelligence shown by the recommendation-oriented finetuned models, LLMs struggle to fully understand the user behavior patterns due to their innate weakness in interpreting numerical features and the overhead for long cont… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  30. arXiv:2406.18204  [pdf, other

    cs.NI

    Analysis of Channel Uncertainty in Trusted Wireless Services via Repeated Interactions

    Authors: Bingwen Chen, Xintong Ling, Weihang Cao, Jiaheng Wang, Zhi Ding

    Abstract: The coexistence of heterogeneous sub-networks in 6G poses new security and trust concerns and thus calls for a perimeterless-security model. Blockchain radio access network (B-RAN) provides a trust-building approach via repeated interactions rather than relying on pre-established trust or central authentication. Such a trust-building process naturally supports dynamic trusted services across vario… ▽ More

    Submitted 2 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  31. arXiv:2406.17932  [pdf, other

    cs.RO cs.MM cs.SD eess.AS

    SonicSense: Object Perception from In-Hand Acoustic Vibration

    Authors: Jiaxun Liu, Boyuan Chen

    Abstract: We introduce SonicSense, a holistic design of hardware and software to enable rich robot object perception through in-hand acoustic vibration sensing. While previous studies have shown promising results with acoustic sensing for object perception, current solutions are constrained to a handful of objects with simple geometries and homogeneous materials, single-finger sensing, and mixing training a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Our project website is at: http://generalroboticslab.com/SonicSense

  32. arXiv:2406.17600  [pdf, other

    cs.CL

    "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

    Authors: Beiduo Chen, Xinpeng Wang, Siyao Peng, Robert Litschko, Anna Korhonen, Barbara Plank

    Abstract: Human label variation (HLV) is a valuable source of information that arises when multiple human annotators provide different labels for valid reasons. In Natural Language Inference (NLI) earlier approaches to capturing HLV involve either collecting annotations from many crowd workers to represent human judgment distribution (HJD) or use expert linguists to provide detailed explanations for their c… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 22 pages, 9 figures

  33. arXiv:2406.17577  [pdf, other

    eess.IV cs.CV

    Advancing Cell Detection in Anterior Segment Optical Coherence Tomography Images

    Authors: Boyu Chen, Ameenat L. Solebo, Paul Taylor

    Abstract: Anterior uveitis, a common form of eye inflammation, can lead to permanent vision loss if not promptly diagnosed. Monitoring this condition involves quantifying inflammatory cells in the anterior chamber (AC) of the eye, which can be captured using Anterior Segment Optical Coherence Tomography (AS-OCT). However, manually identifying cells in AS-OCT images is time-consuming and subjective. Moreover… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  34. Detecting Frames in News Headlines and Lead Images in U.S. Gun Violence Coverage

    Authors: Isidora Chara Tourni, Lei Guo, Hengchang Hu, Edward Halim, Prakash Ishwar, Taufiq Daryanto, Mona Jalal, Boqi Chen, Margrit Betke, Fabian Zhafransyah, Sha Lai, Derry Tanti Wijaya

    Abstract: News media structure their reporting of events or issues using certain perspectives. When describing an incident involving gun violence, for example, some journalists may focus on mental health or gun regulation, while others may emphasize the discussion of gun rights. Such perspectives are called \say{frames} in communication research. We study, for the first time, the value of combining lead i… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: published at Findings of the Association for Computational Linguistics: EMNLP 2021

  35. arXiv:2406.16370  [pdf, other

    cs.RO

    An Active Search Strategy with Multiple Unmanned Aerial Systems for Multiple Targets

    Authors: Chuanxiang Gao, Xinyi Wang, Xi Chen, Ben M. Chen

    Abstract: The challenge of efficient target searching in vast natural environments has driven the need for advanced multi-UAV active search strategies. This paper introduces a novel method in which global and local information is adeptly merged to avoid issues such as myopia and redundant back-and-forth movements. In addition, a trajectory generation method is used to ensure the search pattern within contin… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  36. arXiv:2406.15513  [pdf, other

    cs.AI cs.CL

    PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models

    Authors: Jiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang

    Abstract: In this work, we introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in large language models (LLMs). As a sibling project to SafeRLHF and BeaverTails, we separate annotations of helpfulness and harmlessness for question-answering pairs, providing distinct perspectives on these coupled attributes. Overall, we provide 44.6k refined prompts and 265k question-answer p… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: a sibling project to SafeRLHF and BeaverTails

  37. arXiv:2406.15043  [pdf, other

    cs.LG

    Discovering Common Information in Multi-view Data

    Authors: Qi Zhang, Mingfei Lu, Shujian Yu, Jingmin Xin, Badong Chen

    Abstract: We introduce an innovative and mathematically rigorous definition for computing common information from multi-view data, drawing inspiration from Gács-Körner common information in information theory. Leveraging this definition, we develop a novel supervised multi-view learning framework to capture both common and unique information. By explicitly minimizing a total correlation term, the extracted… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Manuscript accepted by Information Fusion (\url{https://www.sciencedirect.com/science/article/pii/S1566253524001787}). We have updated a few descriptions for clarity. Code is available at \url{https://github.com/archy666/CUMI}

  38. arXiv:2406.14909  [pdf, other

    cs.LG cs.AI cs.CL

    MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

    Authors: Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Sparse attention can effectively mitigate the significant memory and throughput demands of Large Language Models (LLMs) in long contexts. Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths. However, this uniform approach fails to capture the diverse attention patterns inherent in LLMs, ignoring thei… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 10 pages

    ACM Class: I.2.7

  39. arXiv:2406.14874  [pdf, other

    cs.CV

    TraceNet: Segment one thing efficiently

    Authors: Mingyuan Wu, Zichuan Liu, Haozhen Zheng, Hongpeng Guo, Bo Chen, Xin Lu, Klara Nahrstedt

    Abstract: Efficient single instance segmentation is essential for unlocking features in the mobile imaging applications, such as capture or editing. Existing on-the-fly mobile imaging applications scope the segmentation task to portraits or the salient subject due to the computational constraints. Instance segmentation, despite its recent developments towards efficient networks, is still heavy due to the co… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  40. MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization

    Authors: Zhaozhe Hu, Jia-Li Yin, Bin Chen, Luojun Lin, Bo-Hao Chen, Ximeng Liu

    Abstract: Self-ensemble adversarial training methods improve model robustness by ensembling models at different training epochs, such as model weight averaging (WA). However, previous research has shown that self-ensemble defense methods in adversarial training (AT) still suffer from robust overfitting, which severely affects the generalization performance. Empirically, in the late phases of training, the A… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  41. arXiv:2406.13923  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

    Authors: Junjie Wang, Yin Zhang, Yatai Ji, Yuxiang Zhang, Chunyang Jiang, Yubo Wang, Kang Zhu, Zekun Wang, Tiezhen Wang, Wenhao Huang, Jie Fu, Bei Chen, Qunshu Lin, Minghao Liu, Ge Zhang, Wenhu Chen

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, particularly in interpreting intricate visual data and deducing multimodal relationships. Addressing these issues, we introduce a novel dataset format, PI… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  42. arXiv:2406.12433  [pdf, other

    cs.IR

    LLM-enhanced Reranking in Recommender Systems

    Authors: Jingtong Gao, Bo Chen, Xiangyu Zhao, Weiwen Liu, Xiangyang Li, Yichao Wang, Zijian Zhang, Wanyu Wang, Yuyang Ye, Shanru Lin, Huifeng Guo, Ruiming Tang

    Abstract: Reranking is a critical component in recommender systems, playing an essential role in refining the output of recommendation algorithms. Traditional reranking models have focused predominantly on accuracy, but modern applications demand consideration of additional criteria such as diversity and fairness. Existing reranking approaches often fail to harmonize these diverse criteria effectively at th… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  43. arXiv:2406.12292  [pdf, other

    cs.SD cs.AI eess.AS

    JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning

    Authors: Boyu Chen, Peike Li, Yao Yao, Alex Wang

    Abstract: Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  44. arXiv:2406.11784  [pdf, other

    cs.CL cs.AI

    MDCR: A Dataset for Multi-Document Conditional Reasoning

    Authors: Peter Baile Chen, Yi Zhang, Chunwei Liu, Sejal Gupta, Yoon Kim, Michael Cafarella

    Abstract: The same real-life questions posed to different individuals may lead to different answers based on their unique situations. For instance, whether a student is eligible for a scholarship depends on eligibility conditions, such as major or degree required. ConditionalQA was proposed to evaluate models' capability of reading a document and answering eligibility questions, considering unmentioned cond… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  45. arXiv:2406.11147  [pdf, other

    cs.SE cs.AI

    Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

    Authors: Xueying Du, Geng Zheng, Kaixin Wang, Jiayi Feng, Wentai Deng, Mingwei Liu, Bihuan Chen, Xin Peng, Tao Ma, Yiling Lou

    Abstract: Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in… ▽ More

    Submitted 19 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  46. arXiv:2406.10873  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies

    Authors: Chung-Wen Wu, Berlin Chen

    Abstract: Automatic Speech Assessment (ASA) has seen notable advancements with the utilization of self-supervised features (SSL) in recent research. However, a key challenge in ASA lies in the imbalanced distribution of data, particularly evident in English test datasets. To address this challenge, we approach ASA as an ordinal classification task, introducing Weighted Vectors Ranking Similarity (W-RankSim)… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  47. arXiv:2406.10393  [pdf, other

    cs.CL

    EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems

    Authors: Mohammad Dehghan, Mohammad Ali Alomrani, Sunyam Bagga, David Alfonso-Hermelo, Khalil Bibi, Abbas Ghaddar, Yingxue Zhang, Xiaoguang Li, Jianye Hao, Qun Liu, Jimmy Lin, Boxing Chen, Prasanna Parthasarathi, Mahdi Biparva, Mehdi Rezagholizadeh

    Abstract: The emerging citation-based QA systems are gaining more attention especially in generative AI search applications. The importance of extracted knowledge provided to these systems is vital from both accuracy (completeness of information) and efficiency (extracting the information in a timely manner). In this regard, citation-based QA systems are suffering from two shortcomings. First, they usually… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  48. arXiv:2406.10208  [pdf, other

    cs.CV

    Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

    Authors: Zeyu Liu, Weicong Liang, Yiming Zhao, Bohan Chen, Lin Liang, Lijuan Wang, Ji Li, Yuhui Yuan

    Abstract: Recently, Glyph-ByT5 has achieved highly accurate visual text rendering performance in graphic design images. However, it still focuses solely on English and performs relatively poorly in terms of visual appeal. In this work, we address these two fundamental limitations by presenting Glyph-ByT5-v2 and Glyph-SDXL-v2, which not only support accurate visual text rendering for 10 different languages b… ▽ More

    Submitted 12 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Project page: https://glyph-byt5-v2.github.io/

  49. arXiv:2406.09357  [pdf, other

    cs.LG stat.ML

    Advancing Graph Generation through Beta Diffusion

    Authors: Yilin He, Xinyang Liu, Bo Chen, Mingyuan Zhou

    Abstract: Diffusion models have demonstrated effectiveness in generating natural images and have been extended to generate diverse data types, including graphs. This new generation of diffusion-based graph generative models has demonstrated significant performance improvements over methods that rely on variational autoencoders or generative adversarial networks. It's important to recognize, however, that mo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  50. arXiv:2406.09082  [pdf

    eess.SY cs.AI

    Data-driven modeling and supervisory control system optimization for plug-in hybrid electric vehicles

    Authors: Hao Zhang, Nuo Lei, Boli Chen, Bingbing Li, Rulong Li, Zhi Wang

    Abstract: Learning-based intelligent energy management systems for plug-in hybrid electric vehicles (PHEVs) are crucial for achieving efficient energy utilization. However, their application faces system reliability challenges in the real world, which prevents widespread acceptance by original equipment manufacturers (OEMs). This paper begins by establishing a PHEV model based on physical and data-driven mo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.