Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 4,787 results for author: li, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20920  [pdf, other

    cs.CV

    SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition

    Authors: Hao Tan, Zichang Tan, Jun Li, Jun Wan, Zhen Lei, Stan Z. Li

    Abstract: Multi-label image recognition is a fundamental task in computer vision. Recently, Vision-Language Models (VLMs) have made notable advancements in this area. However, previous methods fail to effectively leverage the rich knowledge in language models and often incorporate label semantics into visual features unidirectionally. To overcome these problems, we propose a Split-and-Synthesize Prompting w… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 13 pages, 8 figures

  2. arXiv:2407.20898  [pdf, other

    cs.SE

    ThinkRepair: Self-Directed Automated Program Repair

    Authors: Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, Xiaohu Yang

    Abstract: Though many approaches have been proposed for Automated Program Repair (APR) and indeed achieved remarkable performance, they still have limitations in fixing bugs that require analyzing and reasoning about the logic of the buggy program. Recently, large language models (LLMs) instructed by prompt engineering have attracted much attention for their powerful ability to address many kinds of tasks i… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted By ISSTA'24

  3. arXiv:2407.20224  [pdf, other

    cs.CL

    Can Editing LLMs Inject Harm?

    Authors: Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu

    Abstract: Knowledge editing techniques have been increasingly adopted to efficiently correct the false or outdated knowledge in Large Language Models (LLMs), due to the high cost of retraining from scratch. Meanwhile, one critical but under-explored question is: can knowledge editing be used to inject harm into LLMs? In this paper, we propose to reformulate knowledge editing as a new type of safety threat f… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally. 9 pages for main paper, 36 pages including appendix. The code, results, dataset for this paper and more resources are on the project website: https://llm-editing.github.io

  4. arXiv:2407.20053  [pdf, other

    cs.LG physics.ao-ph

    Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models

    Authors: Zhe Li, Ronghui Xu, Jilin Hu, Zhong Peng, Xi Lu, Chenjuan Guo, Bin Yang

    Abstract: Significant wave height (SWH) is a vital metric in marine science, and accurate SWH estimation is crucial for various applications, e.g., marine energy development, fishery, early warning systems for potential risks, etc. Traditional SWH estimation methods that are based on numerical models and physical theories are hindered by computational inefficiencies. Recently, machine learning has emerged a… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  5. arXiv:2407.19789  [pdf, other

    cs.CV

    Interpreting Low-level Vision Models with Causal Effect Maps

    Authors: Jinfan Hu, Jinjin Gu, Shiyao Yu, Fanghua Yu, Zheyuan Li, Zhiyuan You, Chaochao Lu, Chao Dong

    Abstract: Deep neural networks have significantly improved the performance of low-level vision tasks but also increased the difficulty of interpretability. A deep understanding of deep models is beneficial for both network design and practical reliability. To take up this challenge, we introduce causality theory to interpret low-level vision models and propose a model-/task-agnostic method called Causal Eff… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  6. arXiv:2407.19638  [pdf, other

    cs.CL

    From Pre-training Corpora to Large Language Models: What Factors Influence LLM Performance in Causal Discovery Tasks?

    Authors: Tao Feng, Lizhen Qu, Niket Tandon, Zhuang Li, Xiaoxi Kang, Gholamreza Haffari

    Abstract: Recent advances in artificial intelligence have seen Large Language Models (LLMs) demonstrate notable proficiency in causal discovery tasks. This study explores the factors influencing the performance of LLMs in causal discovery tasks. Utilizing open-source LLMs, we examine how the frequency of causal relations within their pre-training corpora affects their ability to accurately respond to causal… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  7. arXiv:2407.19497  [pdf, other

    cs.CV

    Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph

    Authors: Zhengcen Li, Xinle Chang, Yueran Li, Jingyong Su

    Abstract: Group Activity Recognition aims to understand collective activities from videos. Existing solutions primarily rely on the RGB modality, which encounters challenges such as background variations, occlusions, motion blurs, and significant computational overhead. Meanwhile, current keypoint-based methods offer a lightweight and informative representation of human motions but necessitate accurate indi… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  8. arXiv:2407.19323  [pdf, other

    cs.CV

    MSP-MVS: Multi-granularity Segmentation Prior Guided Multi-View Stereo

    Authors: Zhenlong Yuan, Cong Liu, Fei Shen, Zhaoxin Li, Tianlu Mao, Zhaoqi Wang

    Abstract: Reconstructing textureless areas in MVS poses challenges due to the absence of reliable pixel correspondences within fixed patch. Although certain methods employ patch deformation to expand the receptive field, their patches mistakenly skip depth edges to calculate areas with depth discontinuity, thereby causing ambiguity. Consequently, we introduce Multi-granularity Segmentation Prior Multi-View… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  9. arXiv:2407.19205  [pdf, other

    cs.CV cs.AI

    Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions

    Authors: Ashkan Taghipour, Morteza Ghahremani, Mohammed Bennamoun, Aref Miri Rekavandi, Zinuo Li, Hamid Laga, Farid Boussaid

    Abstract: This paper investigates the role of CLIP image embeddings within the Stable Video Diffusion (SVD) framework, focusing on their impact on video generation quality and computational efficiency. Our findings indicate that CLIP embeddings, while crucial for aesthetic quality, do not significantly contribute towards the subject and background consistency of video outputs. Moreover, the computationally… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  10. arXiv:2407.18854  [pdf, other

    cs.CV cs.AI

    Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment

    Authors: Yuze Zheng, Zixuan Li, Xiangxian Li, Jinxing Liu, Yuqing Wang, Xiangxu Meng, Lei Meng

    Abstract: Image classification models often demonstrate unstable performance in real-world applications due to variations in image information, driven by differing visual perspectives of subject objects and lighting discrepancies. To mitigate these challenges, existing studies commonly incorporate additional modal information matching the visual data to regularize the model's learning process, enabling the… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  11. arXiv:2407.18209  [pdf, other

    cs.ET cs.AR

    SuperFlow: A Fully-Customized RTL-to-GDS Design Automation Flow for Adiabatic Quantum-Flux-Parametron Superconducting Circuits

    Authors: Yanyue Xie, Peiyan Dong, Geng Yuan, Zhengang Li, Masoud Zabihi, Chao Wu, Sung-En Chang, Xufeng Zhang, Xue Lin, Caiwen Ding, Nobuyuki Yoshikawa, Olivia Chen, Yanzhi Wang

    Abstract: Superconducting circuits, like Adiabatic Quantum-Flux-Parametron (AQFP), offer exceptional energy efficiency but face challenges in physical design due to sophisticated spacing and timing constraints. Current design tools often neglect the importance of constraint adherence throughout the entire design flow. In this paper, we propose SuperFlow, a fully-customized RTL-to-GDS design flow tailored fo… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by DATE 2024

  12. arXiv:2407.18175  [pdf, other

    cs.LG cs.AI cs.CV

    Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

    Authors: Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang

    Abstract: Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limited edge devices. This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ICS 2024

  13. arXiv:2407.18003  [pdf, other

    cs.CL

    Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption

    Authors: Luohe Shi, Hongyi Zhang, Yao Yao, Zuchao Li, Hai Zhao

    Abstract: Large Language Models (LLMs), epitomized by ChatGPT' s release in late 2022, have revolutionized various industries with their advanced language comprehension. However, their efficiency is challenged by the Transformer architecture' s struggle with handling long texts. KV-Cache has emerged as a pivotal solution to this issue, converting the time complexity of token generation from quadratic to lin… ▽ More

    Submitted 28 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: to be published in CoLM 2024

  14. arXiv:2407.17905  [pdf, other

    cs.CV cs.RO

    StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory

    Authors: Zhiheng Li, Yubo Cui, Jiexi Zhong, Zheng Fang

    Abstract: Moving object segmentation based on LiDAR is a crucial and challenging task for autonomous driving and mobile robotics. Most approaches explore spatio-temporal information from LiDAR sequences to predict moving objects in the current frame. However, they often focus on transferring temporal cues in a single inference and regard every prediction as independent of others. This may cause inconsistent… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 8 pages, 7 figures

  15. arXiv:2407.17757  [pdf, other

    cs.CV cs.RO

    CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions

    Authors: Haicheng Liao, Haoyu Sun, Huanming Shen, Chengyue Wang, Kahou Tam, Chunlin Tian, Li Li, Chengzhong Xu, Zhenning Li

    Abstract: Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs). This task presents substantial challenges stemming from the unpredictable nature of traffic accidents, their long-tail distribution, the intricacies of traffic scene dynamics, and the inherently constrained field of vision of onboard cameras. To… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  16. arXiv:2407.17730  [pdf, other

    cs.CL

    Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy?

    Authors: Hao Shen, Zihan Li, Minqiang Yang, Minghui Ni, Yongfeng Tao, Zhengyang Yu, Weihao Zheng, Chen Xu, Bin Hu

    Abstract: In contemporary society, the issue of psychological health has become increasingly prominent, characterized by the diversification, complexity, and universality of mental disorders. Cognitive Behavioral Therapy (CBT), currently the most influential and clinically effective psychological treatment method with no side effects, has limited coverage and poor quality in most countries. In recent years,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  17. arXiv:2407.17392  [pdf, other

    cs.RO eess.SY

    Sampling-Based Hierarchical Trajectory Planning for Formation Flight

    Authors: Qingzhao Liu, Bailing Tian, Xuewei Zhang, Junjie Lu, Zhiyu Li

    Abstract: Formation flight of unmanned aerial vehicles (UAVs) poses significant challenges in terms of safety and formation keeping, particularly in cluttered environments. However, existing methods often struggle to simultaneously satisfy these two critical requirements. To address this issue, this paper proposes a sampling-based trajectory planning method with a hierarchical structure for formation flight… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  18. arXiv:2407.17152  [pdf, other

    cs.CV cs.AI

    XMeCap: Meme Caption Generation with Sub-Image Adaptability

    Authors: Yuyan Chen, Songzhou Yan, Zhihong Zhu, Zhixu Li, Yanghua Xiao

    Abstract: Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines. While advances have been made in natural language processing, real-world humor often thrives in a multi-modal context, encapsulated distinctively by memes. This paper poses a particular emphasis on the impact of multi-images on meme captioning. After that, we introduce the \textsc{XMeCap} framewo… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted to MM 2024

  19. arXiv:2407.16955  [pdf, other

    cs.CV cs.RO

    DVPE: Divided View Position Embedding for Multi-View 3D Object Detection

    Authors: Jiasen Wang, Zhenglin Li, Ke Sun, Xianyuan Liu, Yang Zhou

    Abstract: Sparse query-based paradigms have achieved significant success in multi-view 3D detection for autonomous vehicles. Current research faces challenges in balancing between enlarging receptive fields and reducing interference when aggregating multi-view features. Moreover, different poses of cameras present challenges in training global attention models. To address these problems, this paper proposes… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  20. arXiv:2407.16940  [pdf, other

    cs.LG q-bio.GN

    GV-Rep: A Large-Scale Dataset for Genetic Variant Representation Learning

    Authors: Zehui Li, Vallijah Subasri, Guy-Bart Stan, Yiren Zhao, Bo Wang

    Abstract: Genetic variants (GVs) are defined as differences in the DNA sequences among individuals and play a crucial role in diagnosing and treating genetic diseases. The rapid decrease in next generation sequencing cost has led to an exponential increase in patient-level GV data. This growth poses a challenge for clinicians who must efficiently prioritize patient-specific GVs and integrate them with exist… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Preprint

  21. arXiv:2407.16928  [pdf, other

    cs.CR

    From Sands to Mansions: Enabling Automatic Full-Life-Cycle Cyberattack Construction with LLM

    Authors: Lingzhi Wang, Jiahui Wang, Kyle Jung, Kedar Thiagarajan, Emily Wei, Xiangmin Shen, Yan Chen, Zhenyuan Li

    Abstract: The escalating battles between attackers and defenders in cybersecurity make it imperative to test and evaluate defense capabilities from the attackers' perspective. However, constructing full-life-cycle cyberattacks and performing red team emulations requires significant time and domain knowledge from security experts. Existing cyberattack simulation frameworks face challenges such as limited tec… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  22. arXiv:2407.16833  [pdf, other

    cs.CL cs.AI cs.LG

    Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

    Authors: Zhuowan Li, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

    Abstract: Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and L… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  23. arXiv:2407.16719  [pdf, other

    cs.OH

    A Brief Discussion on the Philosophical Principles and Development Directions of Data Circulation

    Authors: Zhi Li, Lei Zhang, Junyi Xin, Jianfei He, Yan Li, Zhenjun Ma, Qi Sun

    Abstract: The data circulation is a complex scenario involving a large number of participants and different types of requirements, which not only has to comply with the laws and regulations, but also faces multiple challenges in technical and business areas. In order to systematically and comprehensively address these issues, it is essential to have a comprehensive and profound understanding of 'data circul… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  24. arXiv:2407.16394  [pdf, other

    cs.CV

    SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval

    Authors: Longtao Jiang, Min Wang, Zecheng Li, Yao Fang, Wengang Zhou, Houqiang Li

    Abstract: Different from traditional video retrieval, sign language retrieval is more biased towards understanding the semantic information of human actions contained in video clips. Previous works typically only encode RGB videos to obtain high-level semantic features, resulting in local action details drowned in a large amount of visual information redundancy. Furthermore, existing RGB-based sign retrieva… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM International Conference on Multimedia (MM) 2024

  25. arXiv:2407.16277  [pdf, other

    cs.CV cs.HC

    When, Where, and What? A Novel Benchmark for Accident Anticipation and Localization with Large Language Models

    Authors: Haicheng Liao, Yongkang Li, Chengyue Wang, Yanchen Guan, KaHou Tam, Chunlin Tian, Li Li, Chengzhong Xu, Zhenning Li

    Abstract: As autonomous driving systems increasingly become part of daily transportation, the ability to accurately anticipate and mitigate potential traffic accidents is paramount. Traditional accident anticipation models primarily utilizing dashcam videos are adept at predicting when an accident may occur but fall short in localizing the incident and identifying involved entities. Addressing this gap, thi… ▽ More

    Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  26. arXiv:2407.16161  [pdf, other

    cs.LG

    TransFeat-TPP: An Interpretable Deep Covariate Temporal Point Processes

    Authors: Zizhuo Meng, Boyu Li, Xuhui Fan, Zhidong Li, Yang Wang, Fang Chen, Feng Zhou

    Abstract: The classical temporal point process (TPP) constructs an intensity function by taking the occurrence times into account. Nevertheless, occurrence time may not be the only relevant factor, other contextual data, termed covariates, may also impact the event evolution. Incorporating such covariates into the model is beneficial, while distinguishing their relevance to the event dynamics is of great pr… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  27. arXiv:2407.16131  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Crystals with Transformers on Graphs, for Prediction of Unconventional Crystal Material Properties and the Benchmark

    Authors: Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong, Bolong Huang, Hua Zhang

    Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of cr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  28. arXiv:2407.16115  [pdf, other

    cs.LG cs.AI

    Transformer-based Graph Neural Networks for Battery Range Prediction in AIoT Battery-Swap Services

    Authors: Zhao Li, Yang Liu, Chuan Zhou, Xuanwu Liu, Xuming Pan, Buqing Cao, Xindong Wu

    Abstract: The concept of the sharing economy has gained broad recognition, and within this context, Sharing E-Bike Battery (SEB) have emerged as a focal point of societal interest. Despite the popularity, a notable discrepancy remains between user expectations regarding the remaining battery range of SEBs and the reality, leading to a pronounced inclination among users to find an available SEB during emerge… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 9pages, 6figures, accepted by IEEE ICWS 2024 The International Conference on Web Services

  29. arXiv:2407.15729  [pdf, other

    cs.IT eess.SP

    Self-Sustainable Metasurface-Assisted mmWave Indoor Communication System

    Authors: Zhenyu Li, Ozan Alp Topal, Özlem Tuğfe Demir, Emil Björnson, Cicek Cavdar

    Abstract: In the design of a metasurface-assisted system for indoor environments, it is essential to take into account not only the performance gains and coverage extension provided by the metasurface but also the operating costs brought by its reconfigurability, such as powering and cabling. These costs can present challenges, particularly in indoor dense spaces (IDSs). A self-sustainable metasurface (SSM)… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 13 Pages, 16 Figures, submitted to IEEE Transaction on Wireless Communication, under review

  30. arXiv:2407.15489  [pdf, other

    cs.CL

    Two Stacks Are Better Than One: A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives

    Authors: Zihao Li, Shaoxiong Ji, Timothee Mickus, Vincent Segonne, Jörg Tiedemann

    Abstract: Pretrained language models (PLMs) display impressive performances and have captured the attention of the NLP community. Establishing the best practices in pretraining has therefore become a major point of focus for much of NLP research -- especially since the insights developed for monolingual English models need not carry to more complex multilingual. One significant caveat of the current state o… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  31. arXiv:2407.15441  [pdf, other

    cs.CL

    Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned

    Authors: Song Wang, Xun Wang, Jie Mei, Yujia Xie, Sean Muarray, Zhang Li, Lingfeng Wu, Si-Qing Chen, Wayne Xiong

    Abstract: Hallucination, a phenomenon where large language models (LLMs) produce output that is factually incorrect or unrelated to the input, is a major challenge for LLM applications that require accuracy and dependability. In this paper, we introduce a reliable and high-speed production system aimed at detecting and rectifying the hallucination issue within LLMs. Our system encompasses named entity recog… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  32. arXiv:2407.15273  [pdf, other

    cs.LG cs.AI

    Unifying Invariant and Variant Features for Graph Out-of-Distribution via Probability of Necessity and Sufficiency

    Authors: Xuexin Chen, Ruichu Cai, Kaitao Zheng, Zhifan Jiang, Zhengting Huang, Zhifeng Hao, Zijian Li

    Abstract: Graph Out-of-Distribution (OOD), requiring that models trained on biased data generalize to the unseen test data, has considerable real-world applications. One of the most mainstream methods is to extract the invariant subgraph by aligning the original and augmented data with the help of environment augmentation. However, these solutions might lead to the loss or redundancy of semantic subgraphs a… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  33. arXiv:2407.15098  [pdf, other

    cs.CR cs.LG

    SeqMIA: Sequential-Metric Based Membership Inference Attack

    Authors: Hao Li, Zheng Li, Siyuan Wu, Chengrui Hu, Yutong Ye, Min Zhang, Dengguo Feng, Yang Zhang

    Abstract: Most existing membership inference attacks (MIAs) utilize metrics (e.g., loss) calculated on the model's final state, while recent advanced attacks leverage metrics computed at various stages, including both intermediate and final stages, throughout the model training. Nevertheless, these attacks often process multiple intermediate states of the metric independently, ignoring their time-dependent… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM CCS 2024

  34. arXiv:2407.15025  [pdf, other

    physics.soc-ph cs.LG

    Digital Twin-based Driver Risk-Aware Intelligent Mobility Analytics for Urban Transportation Management

    Authors: Tao Li, Zilin Bian, Haozhe Lei, Fan Zuo, Ya-Ting Yang, Quanyan Zhu, Zhenning Li, Zhibin Chen, Kaan Ozbay

    Abstract: Traditional mobility management strategies emphasize macro-level mobility oversight from traffic-sensing infrastructures, often overlooking safety risks that directly affect road users. To address this, we propose a Digital Twin-based Driver Risk-Aware Intelligent Mobility Analytics (DT-DIMA) system. The DT-DIMA system integrates real-time traffic information from pan-tilt-cameras (PTCs), synchron… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  35. arXiv:2407.14768  [pdf, other

    cs.LG cs.AI

    Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation

    Authors: Lirong Wu, Yunfan Liu, Haitao Lin, Yufei Huang, Stan Z. Li

    Abstract: To bridge the gaps between powerful Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs), GNN-to-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student MLP. In this paper, we revisit the knowledge samples (nodes) in teacher GNNs from the perspective of hardness, and identify that hard sample distillation may be a major per… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  36. arXiv:2407.14532  [pdf, other

    cs.DC cs.LG

    A Scenario-Oriented Benchmark for Assessing AIOps Algorithms in Microservice Management

    Authors: Yongqian Sun, Jiaju Wang, Zhengdan Li, Xiaohui Nie, Minghua Ma, Shenglin Zhang, Yuhe Ji, Lu Zhang, Wen Long, Hengmao Chen, Yongnan Luo, Dan Pei

    Abstract: AIOps algorithms play a crucial role in the maintenance of microservice systems. Many previous benchmarks' performance leaderboard provides valuable guidance for selecting appropriate algorithms. However, existing AIOps benchmarks mainly utilize offline datasets to evaluate algorithms. They cannot consistently evaluate the performance of algorithms using real-time datasets, and the operation scena… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/MicroServo/microservo, datasets are available at https://github.com/MicroServo/hot-plugging

  37. arXiv:2407.14507  [pdf, other

    cs.CL

    Internal Consistency and Self-Feedback in Large Language Models: A Survey

    Authors: Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Feiyu Xiong, Zhiyu Li

    Abstract: Large language models (LLMs) are expected to respond accurately but often exhibit deficient reasoning or generate hallucinatory content. To address these, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating itself to mitigate the issues. Nonetheless, these efforts lack a unifie… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 27 pages, 9 figures, 10 tables, 14 equations

  38. arXiv:2407.14505  [pdf, other

    cs.CV

    T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

    Authors: Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu

    Abstract: Text-to-video (T2V) generation models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous text-to-video benchmarks also neglect this important ability for evaluation. In this work, we conduct the first systematic study on compositional text-to-video generation. We propose T2V-CompBench, the first be… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 13 pages (30 in total), project page: https://t2v-compbench.github.io/

  39. arXiv:2407.14419  [pdf, other

    cs.CV

    HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

    Authors: Zezeng Li, Weimin Wang, WenHai Li, Na Lei, Xianfeng Gu

    Abstract: Recent CLIP-guided 3D generation methods have achieved promising results but struggle with generating faithful 3D shapes that conform with input text due to the gap between text and image embeddings. To this end, this paper proposes HOTS3D which makes the first attempt to effectively bridge this gap by aligning text features to the image features with spherical optimal transport (SOT). However, in… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  40. arXiv:2407.14266  [pdf, other

    cs.IR cs.LG

    L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering

    Authors: Xinzhou Jin, Jintang Li, Liang Chen, Chenyun Yu, Yuanzhen Xie, Tao Xie, Chengxiang Zhuo, Zang Li, Zibin Zheng

    Abstract: Graph neural networks (GNNs) have recently emerged as an effective approach to model neighborhood signals in collaborative filtering. Towards this research line, graph contrastive learning (GCL) demonstrates robust capabilities to address the supervision label shortage issue through generating massive self-supervised signals. Despite its effectiveness, GCL for recommendation suffers seriously from… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  41. arXiv:2407.14197  [pdf, other

    cs.CV

    A Benchmark for Gaussian Splatting Compression and Quality Assessment Study

    Authors: Qi Yang, Kaifa Yang, Yuke Xing, Yiling Xu, Zhu Li

    Abstract: To fill the gap of traditional GS compression method, in this paper, we first propose a simple and effective GS data compression anchor called Graph-based GS Compression (GGSC). GGSC is inspired by graph signal processing theory and uses two branches to compress the primitive center and attributes. We split the whole GS sample via KDTree and clip the high-frequency components after the graph Fouri… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  42. arXiv:2407.14007  [pdf, other

    cs.CV cs.AI

    Multi-modal Relation Distillation for Unified 3D Representation Learning

    Authors: Huiqun Wang, Yiping Bao, Panwang Pan, Zeming Li, Xiao Liu, Ruijie Yang, Di Huang

    Abstract: Recent advancements in multi-modal pre-training for 3D point clouds have demonstrated promising results by aligning heterogeneous features across 3D shapes and their corresponding 2D images and language descriptions. However, current straightforward solutions often overlook intricate structural relations among samples, potentially limiting the full capabilities of multi-modal learning. To address… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  43. arXiv:2407.14003  [pdf, other

    stat.ML cs.LG eess.IV stat.ME

    Time Series Generative Learning with Application to Brain Imaging Analysis

    Authors: Zhenghao Li, Sanyou Wu, Long Feng

    Abstract: This paper focuses on the analysis of sequential image data, particularly brain imaging data such as MRI, fMRI, CT, with the motivation of understanding the brain aging process and neurodegenerative diseases. To achieve this goal, we investigate image generation in a time series context. Specifically, we formulate a min-max problem derived from the $f$-divergence between neighboring pairs to learn… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 45 pages

  44. arXiv:2407.13764  [pdf, other

    cs.CV

    Shape of Motion: 4D Reconstruction from a Single Video

    Authors: Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, Angjoo Kanazawa

    Abstract: Monocular dynamic reconstruction is a challenging and long-standing vision problem due to the highly ill-posed nature of the task. Existing approaches are limited in that they either depend on templates, are effective only in quasi-static scenes, or fail to model 3D motion explicitly. In this work, we introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-seq… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  45. arXiv:2407.13759  [pdf, other

    cs.CV cs.GR

    Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

    Authors: Boyang Deng, Richard Tucker, Zhengqi Li, Leonidas Guibas, Noah Snavely, Gordon Wetzstein

    Abstract: We present a method for generating Streetscapes-long sequences of views through an on-the-fly synthesized city-scale scene. Our generation is conditioned by language input (e.g., city name, weather), as well as an underlying map/layout hosting the desired trajectory. Compared to recent models for video generation or 3D view synthesis, our method can scale to much longer-range camera trajectories,… ▽ More

    Submitted 25 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: *Equal Contributions; Fixed few duplicated references from 1st upload; Project Page: https://boyangdeng.com/streetscapes

  46. arXiv:2407.13675  [pdf, other

    cs.CV

    MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis

    Authors: Ziming Zhong, Yanxu Xu, Jing Li, Jiale Xu, Zhengxin Li, Chaohui Yu, Shenghua Gao

    Abstract: We present MeshSegmenter, a simple yet effective framework designed for zero-shot 3D semantic segmentation. This model successfully extends the powerful capabilities of 2D segmentation models to 3D meshes, delivering accurate 3D segmentation across diverse meshes and segment descriptions. Specifically, our model leverages the Segment Anything Model (SAM) model to segment the target regions from im… ▽ More

    Submitted 25 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: The paper was accepted by ECCV2024

  47. arXiv:2407.13584  [pdf, other

    cs.CV

    Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation

    Authors: Zongrui Li, Minghui Hu, Qian Zheng, Xudong Jiang

    Abstract: Although recent advancements in text-to-3D generation have significantly improved generation quality, issues like limited level of detail and low fidelity still persist, which requires further improvement. To understand the essence of those issues, we thoroughly analyze current score distillation methods by connecting theories of consistency distillation to score distillation. Based on the insight… ▽ More

    Submitted 20 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Paper accepted by ECCV2024

  48. arXiv:2407.13553  [pdf, other

    cs.CV

    SAM-Driven Weakly Supervised Nodule Segmentation with Uncertainty-Aware Cross Teaching

    Authors: Xingyue Zhao, Peiqi Li, Xiangde Luo, Meng Yang, Shi Chang, Zhongyu Li

    Abstract: Automated nodule segmentation is essential for computer-assisted diagnosis in ultrasound images. Nevertheless, most existing methods depend on precise pixel-level annotations by medical professionals, a process that is both costly and labor-intensive. Recently, segmentation foundation models like SAM have shown impressive generalizability on natural images, suggesting their potential as pseudo-lab… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ISBI 2024 Oral

  49. arXiv:2407.13439  [pdf, other

    cs.SD cs.AI eess.AS

    Reducing Barriers to the Use of Marginalised Music Genres in AI

    Authors: Nick Bryan-Kinns, Zijin Li

    Abstract: AI systems for high quality music generation typically rely on extremely large musical datasets to train the AI models. This creates barriers to generating music beyond the genres represented in dominant datasets such as Western Classical music or pop music. We undertook a 4 month international research project summarised in this paper to explore the eXplainable AI (XAI) challenges and opportuniti… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: In Proceedings of Explainable AI for the Arts Workshop 2024 (XAIxArts 2024) arXiv:2406.14485

    Report number: XAIxArts/2024/3

  50. arXiv:2407.13185  [pdf, other

    cs.CV

    KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter

    Authors: Yifan Zhan, Zhuoxiao Li, Muyao Niu, Zhihang Zhong, Shohei Nobuhara, Ko Nishino, Yinqiang Zheng

    Abstract: We introduce KFD-NeRF, a novel dynamic neural radiance field integrated with an efficient and high-quality motion reconstruction framework based on Kalman filtering. Our key idea is to model the dynamic radiance field as a dynamic system whose temporally varying states are estimated based on two sources of knowledge: observations and predictions. We introduce a novel plug-in Kalman filter guided d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: accepted to eccv2024