Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 4,610 results for author: Zhang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02353  [pdf, other

    eess.SP cs.AR eess.SY

    Roadmap to Neuromorphic Computing with Emerging Technologies

    Authors: Adnan Mehonic, Daniele Ielmini, Kaushik Roy, Onur Mutlu, Shahar Kvatinsky, Teresa Serrano-Gotarredona, Bernabe Linares-Barranco, Sabina Spiga, Sergey Savelev, Alexander G Balanov, Nitin Chawla, Giuseppe Desoli, Gerardo Malavena, Christian Monzio Compagnoni, Zhongrui Wang, J Joshua Yang, Ghazi Sarwat Syed, Abu Sebastian, Thomas Mikolajick, Beatriz Noheda, Stefan Slesazeck, Bernard Dieny, Tuo-Hung, Hou, Akhil Varri , et al. (28 additional authors not shown)

    Abstract: The roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 90 pages, 22 figures, roadmap

  2. arXiv:2407.02188  [pdf, other

    cs.LG cs.CV

    Structure-Aware Consensus Network on Graphs with Few Labeled Nodes

    Authors: Shuaike Xu, Xiaolin Zhang, Peng Zhang, Kun Zhan

    Abstract: Graph node classification with few labeled nodes presents significant challenges due to limited supervision. Conventional methods often exploit the graph in a transductive learning manner. They fail to effectively utilize the abundant unlabeled data and the structural information inherent in graphs. To address these issues, we introduce a Structure-Aware Consensus Network (SACN) from three perspec… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: under review

  3. arXiv:2407.02061  [pdf, other

    cs.RO

    LiDAR-based HD Map Localization using Semantic Generalized ICP with Road Marking Detection

    Authors: Yansong Gong, Xinglian Zhang, Jingyi Feng, Xiao He, Dan Zhang

    Abstract: In GPS-denied scenarios, a robust environmental perception and localization system becomes crucial for autonomous driving. In this paper, a LiDAR-based online localization system is developed, incorporating road marking detection and registration on a high-definition (HD) map. Within our system, a road marking detection approach is proposed with real-time performance, in which an adaptive segmenta… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  4. arXiv:2407.02047  [pdf, other

    cs.CV

    CountFormer: Multi-View Crowd Counting Transformer

    Authors: Hong Mo, Xiong Zhang, Jianchao Tan, Cheng Yang, Qiong Gu, Bo Hang, Wenqi Ren

    Abstract: Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted By ECCV2024

  5. arXiv:2407.01925  [pdf, other

    cs.CV

    Looking From the Future: Multi-order Iterations Can Enhance Adversarial Attack Transferability

    Authors: Zijian Ying, Qianmu Li, Tao Wang, Zhichao Lian, Shunmei Meng, Xuyun Zhang

    Abstract: Various methods try to enhance adversarial transferability by improving the generalization from different perspectives. In this paper, we rethink the optimization process and propose a novel sequence optimization concept, which is named Looking From the Future (LFF). LFF makes use of the original optimization process to refine the very first local optimization choice. Adapting the LFF concept to t… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2407.01531  [pdf, other

    cs.RO cs.LG

    Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

    Authors: Yixiao Wang, Yifei Zhang, Mingxiao Huo, Ran Tian, Xiang Zhang, Yichen Xie, Chenfeng Xu, Pengliang Ji, Wei Zhan, Mingyu Ding, Masayoshi Tomizuka

    Abstract: The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). B… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2407.01094  [pdf, other

    cs.CV

    Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

    Authors: Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, Wangmeng Zuo, Qixiang Ye, Jingdong Wang

    Abstract: Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  8. arXiv:2407.00933  [pdf, other

    cs.DC eess.SP

    Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks: Design Optimization and Analysis

    Authors: Xueyao Zhang, Bo Yang, Zhiwen Yu, Xuelin Cao, George C. Alexandropoulos, Yan Zhang, Merouane Debbah, Chau Yuen

    Abstract: This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  9. arXiv:2407.00925  [pdf, other

    cs.MM

    SIDQL: An Efficient Keyframe Extraction and Motion Reconstruction Framework in Motion Capture

    Authors: Xuling Zhang, Ziru Zhang, Yuyang Wang, Lik-hang Lee, Pan Hui

    Abstract: Metaverse, which integrates the virtual and physical worlds, has emerged as an innovative paradigm for changing people's lifestyles. Motion capture has become a reliable approach to achieve seamless synchronization of the movements between avatars and human beings, which plays an important role in diverse Metaverse applications. However, due to the continuous growth of data, current communication… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  10. arXiv:2407.00769  [pdf, other

    quant-ph cs.DC

    Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation

    Authors: Rong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-Yang Lu, Jian-Wei Pan, Zhiling Pei, Xingcheng Zhang, Wanli Ouyang

    Abstract: Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and de… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  11. arXiv:2407.00737  [pdf, other

    cs.CV

    LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

    Authors: Mushui Liu, Yuhang Ma, Xinfeng Zhang, Yang Zhen, Zeng Zhao, Zhipeng Hu, Bai Liu, Changjie Fan

    Abstract: Diffusion Models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts that involve multiple objects, attribute binding, and long descriptions. This paper proposes a framework called \textbf{LLM4GEN}, which enhances the semantic understanding ability of text-to-image diffusion models by leveraging the se… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 11 pages, 13 figures

  12. arXiv:2407.00717  [pdf, other

    cs.LG cs.AI eess.SY

    Learning System Dynamics without Forgetting

    Authors: Xikun Zhang, Dongjin Song, Yushan Jiang, Yixin Chen, Dacheng Tao

    Abstract: Predicting the trajectories of systems with unknown dynamics (\textit{i.e.} the governing rules) is crucial in various research fields, including physics and biology. This challenge has gathered significant attention from diverse communities. Most existing works focus on learning fixed system dynamics within one single system. However, real-world applications often involve multiple systems with di… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  13. UWBAD: Towards Effective and Imperceptible Jamming Attacks Against UWB Ranging Systems with COTS Chips

    Authors: Yuqiao Yang, Zhongjie Wu, Yongzhao Zhang, Ting Chen, Jun Li, Jie Yang, Wenhao Liu, Xiaosong Zhang, Ruicong Shi, Jingwei Li, Yu Jiang, Zhuo Su

    Abstract: UWB ranging systems have been adopted in many critical and security sensitive applications due to its precise positioning and secure ranging capabilities. We present a practical jamming attack, namely UWBAD, against commercial UWB ranging systems, which exploits the vulnerability of the adoption of the normalized cross-correlation process in UWB ranging and can selectively and quickly block rangin… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security

  14. arXiv:2407.00466  [pdf, other

    cs.CL cs.AI

    BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

    Authors: Xinna Lin, Siqi Ma, Junjie Shan, Xiaojing Zhang, Shell Xu Hu, Tiannan Guo, Stan Z. Li, Kaicheng Yu

    Abstract: Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  15. arXiv:2407.00042  [pdf

    q-bio.NC cs.SI eess.SY

    Module control of network analysis in psychopathology

    Authors: Chunyu Pan, Quan Zhang, Yue Zhu, Shengzhou Kong, Juan Liu, Changsheng Zhang, Fei Wang, Xizhe Zhang

    Abstract: The network approach to characterizing psychopathology departs from traditional latent categorical and dimensional approaches. Causal interplay among symptoms contributed to dynamic psychopathology system. Therefore, analyzing the symptom clusters is critical for understanding mental disorders. Furthermore, despite extensive research studying the topological features of symptom networks, the contr… ▽ More

    Submitted 30 May, 2024; originally announced July 2024.

  16. arXiv:2407.00005  [pdf, other

    cs.DC

    Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, GPU and CSD

    Authors: Jia Wei, Xingjun Zhang, Witold Pedrycz, Longxiang Wang, Jie Zhao

    Abstract: Most existing data preprocessing is done at the CPU. Although some studies use techniques such as multi-processing and double buffering to accelerate CPU preprocessing, CPU computational speed and storage bandwidth still limit the processing speed. Other studies try to use intelligent data storage devices, such as computational storage devices, to complete data preprocessing instead of CPUs. The c… ▽ More

    Submitted 17 April, 2024; originally announced July 2024.

  17. arXiv:2406.19741  [pdf, other

    cs.RO cs.AI

    ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

    Authors: Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar

    Abstract: We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connect… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This document contains 26 pages and 13 figures

  18. arXiv:2406.19650  [pdf, other

    cs.CL

    DECOR: Improving Coherence in L2 English Writing with a Novel Benchmark for Incoherence Detection, Reasoning, and Rewriting

    Authors: Xuanming Zhang, Anthony Diaz, Zixun Chen, Qingyang Wu, Kun Qian, Erik Voss, Zhou Yu

    Abstract: Coherence in writing, an aspect that second-language (L2) English learners often struggle with, is crucial in assessing L2 English writing. Existing automated writing evaluation systems primarily use basic surface linguistic features to detect coherence in writing. However, little effort has been made to correct the detected incoherence, which could significantly benefit L2 language learners seeki… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 21 pages, 5 figures, 20 tables

  19. arXiv:2406.19433  [pdf, other

    cs.CR cs.CY

    Private Hierarchical Governance for Encrypted Messaging

    Authors: Armin Namavari, Barry Wang, Sanketh Menda, Ben Nassi, Nirvan Tyagi, James Grimmelmann, Amy Zhang, Thomas Ristenpart

    Abstract: The increasing harms caused by hate, harassment, and other forms of abuse online have motivated major platforms to explore hierarchical governance. The idea is to allow communities to have designated members take on moderation and leadership duties; meanwhile, members can still escalate issues to the platform. But these promising approaches have only been explored in plaintext settings where commu… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Published in IEEE Security and Privacy 2024

  20. arXiv:2406.18966  [pdf, other

    cs.CL

    UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

    Authors: Siyuan Wu, Yue Huang, Chujie Gao, Dongping Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

    Abstract: Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges remain in the areas of generalization, controllability, diversity, and truthfulness within the existing generative frameworks. To address these challenges, this pap… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  21. arXiv:2406.18462  [pdf, other

    cs.CV cs.GR

    GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

    Authors: Taoran Yi, Jiemin Fang, Zanwei Zhou, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Xinggang Wang, Qi Tian

    Abstract: Recently, 3D Gaussian splatting (3D-GS) has achieved great success in reconstructing and rendering real-world scenes. To transfer the high rendering quality to generation tasks, a series of research works attempt to generate 3D-Gaussian assets from text. However, the generated assets have not achieved the same quality as those in reconstruction tasks. We observe that Gaussians tend to grow without… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Project page: https://taoranyi.com/gaussiandreamerpro/

  22. arXiv:2406.18394  [pdf, other

    q-fin.CP cs.AI

    AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors

    Authors: Hao Shi, Cuicui Luo, Weili Song, Xinting Zhang, Xiang Ao

    Abstract: The variability and low signal-to-noise ratio in financial data, combined with the necessity for interpretability, make the alpha factor mining workflow a crucial component of quantitative investment. Transitioning from early manual extraction to genetic programming, the most advanced approach in this domain currently employs reinforcement learning to mine a set of combination factors with fixed w… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  23. arXiv:2406.18152  [pdf, other

    cs.MA

    Intrinsic Action Tendency Consistency for Cooperative Multi-Agent Reinforcement Learning

    Authors: Junkai Zhang, Yifan Zhang, Xi Sheryl Zhang, Yifan Zang, Jian Cheng

    Abstract: Efficient collaboration in the centralized training with decentralized execution (CTDE) paradigm remains a challenge in cooperative multi-agent systems. We identify divergent action tendencies among agents as a significant obstacle to CTDE's training efficiency, requiring a large number of training samples to achieve a unified consensus on agents' policies. This divergence stems from the lack of a… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: The AAAI-2024 paper with the appendix

  24. arXiv:2406.17841  [pdf, other

    quant-ph cs.AI

    Probing many-body Bell correlation depth with superconducting qubits

    Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong Jin, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, Jinfeng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

    Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages,6 figures + 14 pages, 6 figures

  25. arXiv:2406.17681  [pdf, other

    cs.CL

    VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

    Authors: Kun Qian, Shunji Wan, Claudia Tang, Youzhi Wang, Xuanming Zhang, Maximillian Chen, Zhou Yu

    Abstract: As large language models achieve impressive scores on traditional benchmarks, an increasing number of researchers are becoming concerned about benchmark data leakage during pre-training, commonly known as the data contamination problem. To ensure fair evaluation, recent benchmarks release only the training and validation sets, keeping the test set labels closed-source. They require anyone wishing… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  26. arXiv:2406.17679  [pdf, other

    cs.CV

    Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation

    Authors: Xuming Zhang, Naoto Yokoya, Xingfa Gu, Qingjiu Tian, Lorenzo Bruzzone

    Abstract: Hyperspectral image (HSI) classification has recently reached its performance bottleneck. Multimodal data fusion is emerging as a promising approach to overcome this bottleneck by providing rich complementary information from the supplementary modality (X-modality). However, achieving comprehensive cross-modal interaction and fusion that can be generalized across different sensing modalities is ch… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  27. arXiv:2406.17675  [pdf, other

    cs.CL

    Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

    Authors: Yuan Li, Yue Huang, Hongyi Wang, Xiangliang Zhang, James Zou, Lichao Sun

    Abstract: Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. The broader integration of LLMs into society has sparked interest in whether they manifest psychological attributes, and whether these attributes are stable-inquiries that could deepen the understanding of their behaviors. Inspired by psychometrics, this… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  28. arXiv:2406.17659  [pdf, other

    cs.AI cs.RO

    DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

    Authors: Xiaohan Zhang, Zainab Altaweel, Yohei Hayamizu, Yan Ding, Saeid Amiri, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang

    Abstract: Vision-language models (VLMs) have been applied to robot task planning problems, where the robot receives a task in natural language and generates plans based on visual inputs. While current VLMs have demonstrated strong vision-language understanding capabilities, their performance is still far from being satisfactory in planning tasks. At the same time, although classical task planners, such as P… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  29. arXiv:2406.17419  [pdf, other

    cs.CL cs.AI

    Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

    Authors: Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, Fei Huang, Yongbin Li

    Abstract: Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up. However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-contex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: We release our code and data publicly at https://github.com/MozerWang/Loong

  30. arXiv:2406.17404  [pdf, other

    cs.CL cs.LG

    Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training

    Authors: Yixuan Wang, Xianzhen Luo, Fuxuan Wei, Yijun Liu, Qingfu Zhu, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

    Abstract: Existing speculative decoding methods typically require additional model structure and training processes to assist the model for draft token generation. This makes the migration of acceleration methods to the new model more costly and more demanding on device memory. To address this problem, we propose the Make Some Noise (MSN) training framework as a replacement for the supervised fine-tuning st… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages, 6 figures

  31. arXiv:2406.17328  [pdf, other

    cs.CL cs.AI

    Dual-Space Knowledge Distillation for Large Language Models

    Authors: Songming Zhang, Xue Zhang, Zengkui Sun, Yufeng Chen, Jinan Xu

    Abstract: Knowledge distillation (KD) is known as a promising solution to compress large language models (LLMs) via transferring their knowledge to smaller models. During this process, white-box KD methods usually minimize the distance between the output distributions of the two models so that more knowledge can be transferred. However, in the current white-box KD framework, the output distributions are fro… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 17 pages, 11 figures, code available at: https://github.com/songmzhang/DSKD

  32. arXiv:2406.16855  [pdf, other

    cs.CV

    DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

    Authors: Yuang Peng, Yuxin Cui, Haomiao Tang, Zekun Qi, Runpei Dong, Jing Bai, Chunrui Han, Zheng Ge, Xiangyu Zhang, Shu-Tao Xia

    Abstract: Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content. However, current evaluations either are automated but misalign with humans or require human evaluations that are time-consuming and expensive. In this work, we present DreamBench++, a human-aligned benchmark automated by advan… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Project page: https://dreambenchplus.github.io/

  33. arXiv:2406.16845  [pdf, other

    cs.CL

    RaTEScore: A Metric for Radiology Report Generation

    Authors: Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models. RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. Technically, we developed a comprehens… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  34. arXiv:2406.16756  [pdf, other

    cs.LG cs.AI cs.CY

    Addressing Polarization and Unfairness in Performative Prediction

    Authors: Kun Jin, Tian Xie, Yang Liu, Xueru Zhang

    Abstract: When machine learning (ML) models are used in applications that involve humans (e.g., online recommendation, school admission, hiring, lending), the model itself may trigger changes in the distribution of targeted data it aims to predict. Performative prediction (PP) is a framework that explicitly considers such model-dependent distribution shifts when learning ML models. While significant efforts… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  35. arXiv:2406.16743  [pdf, other

    cs.CL

    Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization

    Authors: Zhengyue Zhao, Xiaoyun Zhang, Kaidi Xu, Xing Hu, Rui Zhang, Zidong Du, Qi Guo, Yunji Chen

    Abstract: With the widespread application of Large Language Models (LLMs), it has become a significant concern to ensure their safety and prevent harmful responses. While current safe-alignment methods based on instruction fine-tuning and Reinforcement Learning from Human Feedback (RLHF) can effectively reduce harmful responses from LLMs, they often require high-quality datasets and heavy computational over… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  36. arXiv:2406.16601  [pdf, other

    cs.CV

    Do As I Do: Pose Guided Human Motion Copy

    Authors: Sifan Wu, Zhenguang Liu, Beibei Zhang, Roger Zimmermann, Zhongjie Ba, Xiaosong Zhang, Kui Ren

    Abstract: Human motion copy is an intriguing yet challenging task in artificial intelligence and computer vision, which strives to generate a fake video of a target person performing the motion of a source person. The problem is inherently challenging due to the subtle human-body texture details to be generated and the temporal consistency to be considered. Existing approaches typically adopt a conventional… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  37. arXiv:2406.16505  [pdf, other

    q-fin.CP cs.AI

    $\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning

    Authors: Feng Xu, Yan Yin, Xinyu Zhang, Tianyuan Liu, Shengyi Jiang, Zongzhang Zhang

    Abstract: Alphas are pivotal in providing signals for quantitative trading. The industry highly values the discovery of formulaic alphas for their interpretability and ease of analysis, compared with the expressive yet overfitting-prone black-box alphas. In this work, we focus on discovering formulaic alphas. Prior studies on automatically generating a collection of formulaic alphas were mostly based on gen… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  38. arXiv:2406.16416  [pdf, other

    cs.CL

    Multilingual Knowledge Editing with Language-Agnostic Factual Neurons

    Authors: Xue zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Yufeng Chen, Jinan Xu, Jie Zhou

    Abstract: Multilingual knowledge editing (MKE) aims to simultaneously revise factual knowledge across multilingual languages within large language models (LLMs). However, most existing MKE methods just adapt existing monolingual editing methods to multilingual scenarios, overlooking the deep semantic connections of the same factual knowledge between different languages, thereby limiting edit performance. To… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures, 7 tables

  39. arXiv:2406.16328  [pdf, other

    cs.CE

    Convolutional neural network based reduced order modeling for multiscale problems

    Authors: Xuhan Zhang, Lijian Jiang

    Abstract: In this paper, we combine convolutional neural networks (CNNs) with reduced order modeling (ROM) for efficient simulations of multiscale problems. These problems are modeled by partial differential equations with high-dimensional random inputs. The proposed method involves two separate CNNs: Basis CNNs and Coefficient CNNs (Coef CNNs), which correspond to two main parts of ROM. The method is calle… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 35 pages, 29 figures

  40. arXiv:2406.15817  [pdf, ps, other

    cs.CC cs.IT math.LO

    Computable one-way functions on the reals

    Authors: George Barmpalias, Xiaoyan Zhang

    Abstract: A major open problem in computational complexity is the existence of a one-way function, namely a function from strings to strings which is computationally easy to compute but hard to invert. Levin (2023) formulated the notion of one-way functions from reals (infinite bit-sequences) to reals in terms of computability, and asked whether partial computable one-way functions exist. We give a strong p… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  41. arXiv:2406.15718  [pdf, other

    cs.CL

    Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

    Authors: Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu

    Abstract: As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can lis… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  42. arXiv:2406.15486  [pdf, other

    cs.CL cs.AI cs.LG

    SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

    Authors: Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang

    Abstract: Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for nea… ▽ More

    Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  43. arXiv:2406.15330  [pdf, other

    cs.AI cs.CL

    Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance

    Authors: Haoling Li, Xin Zhang, Xiao Liu, Yeyun Gong, Yifan Wang, Yujiu Yang, Qi Chen, Peng Cheng

    Abstract: Large language models (LLMs) have revolutionized lots of fields of research. Although it is well-known that fine-tuning is essential for enhancing the capabilities of LLMs, existing research suggests that there is potential redundancy in the fine-tuning process and therefore proposes to update only a subset of parameters. However, these methods fail to leverage the task-specific information to ide… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  44. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  45. arXiv:2406.15056  [pdf, ps, other

    cs.IT eess.SP

    Continuous Aperture Array (CAPA)-Based Wireless Communications: Capacity Characterization

    Authors: Boqun Zhao, Chongjun Ouyang, Xingqi Zhang, Yuanwei Liu

    Abstract: The capacity limits of continuous-aperture array (CAPA)-based wireless communications are characterized. To this end, an analytically tractable transmission framework is established for both uplink and downlink CAPA systems. Based on this framework, closed-form expressions for the single-user channel capacity are derived. The results are further extended to a multiuser case by characterizing the c… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  46. arXiv:2406.15019  [pdf, other

    cs.CL

    MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens

    Authors: Yongqi Fan, Hongli Sun, Kui Xue, Xiaofan Zhang, Shaoting Zhang, Tong Ruan

    Abstract: Numerous advanced Large Language Models (LLMs) now support context lengths up to 128K, and some extend to 200K. Some benchmarks in the generic domain have also followed up on evaluating long-context capabilities. In the medical domain, tasks are distinctive due to the unique contexts and need for domain expertise, necessitating further evaluation. However, despite the frequent presence of long tex… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  47. arXiv:2406.14991  [pdf, other

    cs.CL cs.SE

    SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

    Authors: Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, Jie Tang

    Abstract: We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Homepage: https://spreadsheetbench.github.io/

  48. arXiv:2406.14721  [pdf, other

    cs.CL

    1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

    Authors: Yue Huang, Chenrui Fan, Yuan Li, Siyuan Wu, Tianyi Zhou, Xiangliang Zhang, Lichao Sun

    Abstract: Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling identical queries in different languages, presenting challenges for further advancement. This paper introduces a method to enhance the multilingual performance of LLMs by aggregating kn… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  49. arXiv:2406.14422  [pdf, other

    cs.CV cs.AI

    FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

    Authors: Mingkun Wang, Xiaoguang Ren, Ruochun Jin, Minglong Li, Xiaochuan Zhang, Changqian Yu, Mingxu Wang, Wenjing Yang

    Abstract: Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e.g., vehicles or pedestrians). To address this, we propose FutureNet, which explicitly integrates initially predicted trajectories into the future scenario and further encodes these future contexts to e… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 10 pages

  50. arXiv:2406.14275  [pdf, other

    cs.CL cs.AI

    Step-Back Profiling: Distilling User History for Personalized Scientific Writing

    Authors: Xiangru Tang, Xingyao Zhang, Yanjun Shao, Jie Wu, Yilun Zhao, Arman Cohan, Ming Gong, Dongmei Zhang, Mark Gerstein

    Abstract: Large language models (LLMs) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals, particularly in real-world scenarios like scientific writing. Addressing this challenge, we introduce Step-Back Profiling to personalize LLMs by distilling user history into concise profiles, including essential traits and preferences of users. R… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.