Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,478 results for author: Li, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02111  [pdf, other

    cs.LG

    Toward Large-scale Spiking Neural Networks: A Comprehensive Survey and Future Directions

    Authors: Yangfan Hu, Qian Zheng, Guoqi Li, Huajin Tang, Gang Pan

    Abstract: Deep learning has revolutionized artificial intelligence (AI), achieving remarkable progress in fields such as computer vision, speech recognition, and natural language processing. Moreover, the recent success of large language models (LLMs) has fueled a surge in research on large-scale neural networks. However, the escalating demand for computing resources and energy consumption has prompted the… ▽ More

    Submitted 19 August, 2024; originally announced September 2024.

  2. arXiv:2409.02010  [pdf, other

    quant-ph cs.ET

    Ternary Tree Fermion-to-Qubit Mapping with Hamiltonian Aware Optimization

    Authors: Yuhao Liu, Kevin Yao, Jonathan Hong, Julien Froustey, Yunong Shi, Ermal Rrapaj, Costin Iancu, Gushu Li

    Abstract: This paper introduces the Hamiltonian-Aware Ternary Tree (HATT) framework to compile optimized Fermion-to-qubit mapping for specific Fermionic Hamiltonians. In the simulation of Fermionic quantum systems, efficient Fermion-to-qubit mapping plays a critical role in transforming the Fermionic system into a qubit system. HATT utilizes ternary tree mapping and a bottom-up construction procedure to gen… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  3. arXiv:2409.01534  [pdf, other

    cs.CV cs.AI cs.MM

    Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition

    Authors: Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: We propose a new strategy called think twice before recognizing to improve fine-grained traffic sign recognition (TSR). Fine-grained TSR in the wild is difficult due to the complex road conditions, and existing approaches particularly struggle with cross-country TSR when data is lacking. Our strategy achieves effective fine-grained TSR by stimulating the multiple-thinking capability of large multi… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  4. arXiv:2409.00553  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Multi-Output Distributional Fairness via Post-Processing

    Authors: Gang Li, Qihang Lin, Ayush Ghosh, Tianbao Yang

    Abstract: The post-processing approaches are becoming prominent techniques to enhance machine learning models' fairness because of their intuitiveness, low computational cost, and excellent scalability. However, most existing post-processing methods are designed for task-specific fairness measures and are limited to single-output models. In this paper, we introduce a post-processing method for multi-output… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 17 pages, 4 figures

  5. arXiv:2408.17334  [pdf

    q-bio.NC cs.CE cs.SC q-bio.TO

    Role of Data-driven Regional Growth Model in Shaping Brain Folding Patterns

    Authors: Jixin Hou, Zhengwang Wu, Xianyan Chen, Dajiang Zhu, Tianming Liu, Gang Li, Xianqiao Wang

    Abstract: The surface morphology of the developing mammalian brain is crucial for understanding brain function and dysfunction. Computational modeling offers valuable insights into the underlying mechanisms for early brain folding. While previous studies generally assume uniform growth, recent findings indicate significant regional variations in brain tissue growth. However, the role of these variations in… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 43 pages, 16 figures

  6. arXiv:2408.17258  [pdf, other

    cs.LG

    Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach

    Authors: Tong Nie, Junlin He, Yuewen Mei, Guoyang Qin, Guilong Li, Jian Sun, Wei Ma

    Abstract: The proliferation of e-commerce and urbanization has significantly intensified delivery operations in urban areas, boosting the volume and complexity of delivery demand. Data-driven predictive methods, especially those utilizing machine learning techniques, have emerged to handle these complexities in urban delivery demand management problems. One particularly pressing problem that has not yet bee… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  7. arXiv:2408.16765  [pdf, ps, other

    cs.LG cs.AI math.PR math.ST stat.ML

    A Score-Based Density Formula, with Applications in Diffusion Generative Models

    Authors: Gen Li, Yuling Yan

    Abstract: Score-based generative models (SGMs) have revolutionized the field of generative modeling, achieving unprecedented success in generating realistic and diverse content. Despite empirical advances, the theoretical basis for why optimizing the evidence lower bound (ELBO) on the log-likelihood is effective for training diffusion generative models, such as DDPMs, remains largely unexplored. In this pap… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  8. arXiv:2408.16537  [pdf, other

    cs.LG cs.AI

    SFR-GNN: Simple and Fast Robust GNNs against Structural Attacks

    Authors: Xing Ai, Guanyu Zhu, Yulin Zhu, Yu Zheng, Gaolei Li, Jianhua Li, Kai Zhou

    Abstract: Graph Neural Networks (GNNs) have demonstrated commendable performance for graph-structured data. Yet, GNNs are often vulnerable to adversarial structural attacks as embedding generation relies on graph topology. Existing efforts are dedicated to purifying the maliciously modified structure or applying adaptive aggregation, thereby enhancing the robustness against adversarial structural attacks. I… ▽ More

    Submitted 1 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

  9. arXiv:2408.16273  [pdf, other

    cs.CV

    SAU: A Dual-Branch Network to Enhance Long-Tailed Recognition via Generative Models

    Authors: Guangxi Li, Yinsheng Song, Mingkai Zheng

    Abstract: Long-tailed distributions in image recognition pose a considerable challenge due to the severe imbalance between a few dominant classes with numerous examples and many minority classes with few samples. Recently, the use of large generative models to create synthetic data for image classification has been realized, but utilizing synthetic data to address the challenge of long-tailed recognition re… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 15 pages

  10. arXiv:2408.15915  [pdf, other

    cs.CV cs.AI cs.CL

    Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

    Authors: Yuncheng Yang, Yulei Qin, Tong Wu, Zihan Xu, Gang Li, Pengcheng Guo, Hang Shao, Yucheng Shi, Ke Li, Xing Sun, Jie Yang, Yun Gu

    Abstract: The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas often requires special-purpose tuning with calibrated behaviors on the expected stable outputs. To avoid huge cost brought by manual preparation of instruction datasets and training resources up to hundreds of hours, the exploitation of open knowledge including a wealth of low rank adaptation (LoRA) mode… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 28 pages, 12 tables, 10 figures

  11. arXiv:2408.14925  [pdf, other

    cs.NE cs.AI

    Distance-Forward Learning: Enhancing the Forward-Forward Algorithm Towards High-Performance On-Chip Learning

    Authors: Yujie Wu, Siyuan Xu, Jibin Wu, Lei Deng, Mingkun Xu, Qinghao Wen, Guoqi Li

    Abstract: The Forward-Forward (FF) algorithm was recently proposed as a local learning method to address the limitations of backpropagation (BP), offering biological plausibility along with memory-efficient and highly parallelized computational benefits. However, it suffers from suboptimal performance and poor generalization, largely due to inadequate theoretical support and a lack of effective learning str… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  12. arXiv:2408.14270  [pdf, other

    eess.IV cs.CV

    Reliable Multi-modal Medical Image-to-image Translation Independent of Pixel-wise Aligned Data

    Authors: Langrui Zhou, Guang Li

    Abstract: The current mainstream multi-modal medical image-to-image translation methods face a contradiction. Supervised methods with outstanding performance rely on pixel-wise aligned training data to constrain the model optimization. However, obtaining pixel-wise aligned multi-modal medical image datasets is challenging. Unsupervised methods can be trained without paired data, but their reliability cannot… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted as a research article by Medical Physics

  13. arXiv:2408.14051  [pdf, other

    cs.CV

    Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection

    Authors: Yuncheng Jiang, Zixun Zhang, Jun Wei, Chun-Mei Feng, Guanbin Li, Xiang Wan, Shuguang Cui, Zhen Li

    Abstract: AI-assisted lesion detection models play a crucial role in the early screening of cancer. However, previous image-based models ignore the inter-frame contextual information present in videos. On the other hand, video-based models capture the inter-frame context but are computationally expensive. To mitigate this contradiction, we delve into Video-to-Image knowledge distillation leveraging DEtectio… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: BIBM2024

  14. arXiv:2408.13976  [pdf, other

    cs.SE

    Sifting through the Chaff: On Utilizing Execution Feedback for Ranking the Generated Code Candidates

    Authors: Zhihong Sun, Yao Wan, Jia Li, Hongyu Zhang, Zhi Jin, Ge Li, Chen Lyu

    Abstract: Large Language Models (LLMs), such as GPT-4, StarCoder, and CodeLlama, are transforming the way developers approach programming by automatically generating code based on given natural language descriptions. Despite advancements, generating syntactically and semantically correct code remains challenging, especially for complex programming tasks. Typically, individuals generate multiple candidate so… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted by the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)

  15. arXiv:2408.13459  [pdf, other

    cs.CV

    Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model

    Authors: Chen Rao, Guangyuan Li, Zehua Lan, Jiakai Sun, Junsheng Luan, Wei Xing, Lei Zhao, Huaizhong Lin, Jianfeng Dong, Dalong Zhang

    Abstract: Current video deblurring methods have limitations in recovering high-frequency information since the regression losses are conservative with high-frequency details. Since Diffusion Models (DMs) have strong capabilities in generating high-frequency details, we consider introducing DMs into the video deblurring task. However, we found that directly applying DMs to the video deblurring task has the f… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: accepted by ECCV2024

    ACM Class: I.4.4

  16. arXiv:2408.12558  [pdf, other

    cs.MM

    Exploring the Role of Audio in Multimodal Misinformation Detection

    Authors: Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Guanjun Li

    Abstract: With the rapid development of deepfake technology, especially the deep audio fake technology, misinformation detection on the social media scene meets a great challenge. Social media data often contains multimodal information which includes audio, video, text, and images. However, existing multimodal misinformation detection methods tend to focus only on some of these modalities, failing to compre… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  17. arXiv:2408.12329  [pdf, ps, other

    cs.IT eess.SP

    Asynchronous Cell-Free Massive MIMO-OFDM: Mixed Coherent and Non-Coherent Transmissions

    Authors: Guoyu Li, Shaochuan Wu, Changsheng You, Wenbin Zhang, Guanyu Shang

    Abstract: In this letter, we analyze the performance of mixed coherent and non-coherent transmissions approach, which can improve the performance of cell-free multiple-input multiple-output orthogonal frequency division multiplexing (CF mMIMO-OFDM) systems under asynchronous reception. To this end, we first obtain the achievable downlink sum-rate for the mixed coherent and non-coherent transmissions, and th… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: This work is submitted to IEEE for possible publication

  18. arXiv:2408.12245  [pdf, other

    cs.CV

    Scalable Autoregressive Image Generation with Mamba

    Authors: Haopeng Li, Jinyue Yang, Kexin Wang, Xuerui Qiu, Yuhong Chou, Xin Li, Guoqi Li

    Abstract: We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its exceptional performance for long-sequence modeling with linear time complexity, to supplant the commonly utilized Transformers in AR image generation models, aiming to achieve both superior generation quality and enhanced inference speed. Un… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 9 pages, 8 figures

  19. arXiv:2408.11971  [pdf, other

    cs.DC

    HoSZp: An Efficient Homomorphic Error-bounded Lossy Compressor for Scientific Data

    Authors: Tripti Agarwal, Sheng Di, Jiajun Huang, Yafan Huang, Ganesh Gopalakrishnan, Robert Underwood, Kai Zhao, Xin Liang, Guanpeng Li, Franck Cappello

    Abstract: Error-bounded lossy compression has been a critical technique to significantly reduce the sheer amounts of simulation datasets for high-performance computing (HPC) scientific applications while effectively controlling the data distortion based on user-specified error bound. In many real-world use cases, users must perform computational operations on the compressed data (a.k.a. homomorphic compress… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 12 pages, 7 figures, 9 tables

  20. arXiv:2408.11611  [pdf, other

    cs.IR cs.LG

    DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation

    Authors: Yaowen Bi, Yuteng Lian, Jie Cui, Jun Liu, Peijian Wang, Guanghui Li, Xuejun Chen, Jinglin Zhao, Hao Wen, Jing Zhang, Zhaoqi Zhang, Wenzhuo Song, Yang Sun, Weiwei Zhang, Mingchen Cai, Guanxing Zhang

    Abstract: Neural-based multi-task learning (MTL) has been successfully applied to many recommendation applications. However, these MTL models (e.g., MMoE, PLE) did not consider feature interaction during the optimization, which is crucial for capturing complex high-order features and has been widely used in ranking models for real-world recommender systems. Moreover, through feature importance analysis acro… ▽ More

    Submitted 23 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  21. arXiv:2408.11324  [pdf, other

    cs.SE

    HITS: High-coverage LLM-based Unit Test Generation via Method Slicing

    Authors: Zejun Wang, Kaibo Liu, Ge Li, Zhi Jin

    Abstract: Large language models (LLMs) have behaved well in generating unit tests for Java projects. However, the performance for covering the complex focal methods within the projects is poor. Complex methods comprise many conditions and loops, requiring the test cases to be various enough to cover all lines and branches. However, existing test generation methods with LLMs provide the whole method-to-test… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: to be published in ASE 24' Research Track

  22. arXiv:2408.10853  [pdf, other

    cs.SD cs.AI eess.AS

    Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

    Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye

    Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  23. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  24. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  25. arXiv:2408.10575  [pdf, other

    cs.CV

    MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval

    Authors: Haoran Tang, Meng Cao, Jinfa Huang, Ruyang Liu, Peng Jin, Ge Li, Xiaodan Liang

    Abstract: Text-Video Retrieval (TVR) aims to align and associate relevant video content with corresponding natural language queries. Most existing TVR methods are based on large-scale pre-trained vision-language models (e.g., CLIP). However, due to the inherent plain structure of CLIP, few TVR methods explore the multi-scale representations which offer richer contextual information for a more thorough under… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 8 pages

  26. arXiv:2408.10473  [pdf, other

    cs.CL cs.LG

    Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

    Authors: Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum

    Abstract: Pre-trained language models (PLMs) are engineered to be robust in contextual understanding and exhibit outstanding performance in various natural language processing tasks. However, their considerable size incurs significant computational and storage costs. Modern pruning strategies employ one-shot techniques to compress PLMs without the need for retraining on task-specific or otherwise general da… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  27. arXiv:2408.10123  [pdf, other

    cs.RO cs.CV

    Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

    Authors: Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, Laura Sevilla-Lara

    Abstract: Affordance, defined as the potential actions that an object offers, is crucial for robotic manipulation tasks. A deep understanding of affordance can lead to more intelligent AI systems. For example, such knowledge directs an agent to grasp a knife by the handle for cutting and by the blade when passing it to someone. In this paper, we present a streamlined affordance learning system that encompas… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Project page: https://reagan1311.github.io/affgrasp

  28. arXiv:2408.10115  [pdf, other

    cs.CL

    GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization

    Authors: Ran Liu, Ming Liu, Min Yu, Jianguo Jiang, Gang Li, Dan Zhang, Jingyuan Li, Xiang Meng, Weiqing Huang

    Abstract: Pre-trained language models are increasingly being used in multi-document summarization tasks. However, these models need large-scale corpora for pre-training and are domain-dependent. Other non-neural unsupervised summarization approaches mostly rely on key sentence extraction, which can lead to information loss. To address these challenges, we propose a lightweight yet effective unsupervised app… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 19 pages, 7 figures. Accepted by ECAI 2024

  29. arXiv:2408.08870  [pdf, other

    cs.CV

    SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation

    Authors: Xinyu Xiong, Zihuang Wu, Shuangyi Tan, Wenxue Li, Feilong Tang, Ying Chen, Siying Li, Jie Ma, Guanbin Li

    Abstract: Image segmentation plays an important role in vision understanding. Recently, the emerging vision foundation models continuously achieved superior performance on various tasks. Following such success, in this paper, we prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models. We propose a simple but effective framework, termed SAM2-UNet, for versatile… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Technical Report

  30. arXiv:2408.08707  [pdf, other

    cs.LG cs.AI

    Beam Prediction based on Large Language Models

    Authors: Yucheng Sheng, Kai Huang, Le Liang, Peng Liu, Shi Jin, Geoffrey Ye Li

    Abstract: Millimeter-wave (mmWave) communication is promising for next-generation wireless networks but suffers from significant path loss, requiring extensive antenna arrays and frequent beam training. Traditional deep learning models, such as long short-term memory (LSTM), enhance beam tracking accuracy however are limited by poor robustness and generalization. In this letter, we use large language models… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  31. arXiv:2408.08610  [pdf, other

    cs.CV cs.AI cs.LG

    Generative Dataset Distillation Based on Diffusion Model

    Authors: Duo Su, Junjie Hou, Guang Li, Ren Togo, Rui Song, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper presents our method for the generative track of The First Dataset Distillation Challenge at ECCV 2024. Since the diffusion model has become the mainstay of generative models because of its high-quality generative effects, we focus on distillation methods based on the diffusion model. Considering that the track can only generate a fixed number of images in 10 minutes using a generative m… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: The Third Place Winner in Generative Track of the ECCV 2024 DD Challenge

  32. arXiv:2408.07397  [pdf, other

    cs.MA

    Bridging Training and Execution via Dynamic Directed Graph-Based Communication in Cooperative Multi-Agent Systems

    Authors: Zhuohui Zhang, Bin He, Bin Cheng, Gang Li

    Abstract: Multi-agent systems must learn to communicate and understand interactions between agents to achieve cooperative goals in partially observed tasks. However, existing approaches lack a dynamic directed communication mechanism and rely on global states, thus diminishing the role of communication in centralized training. Thus, we propose the transformer-based graph coarsening network (TGCNet), a novel… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 9 pages, 7 figures

  33. arXiv:2408.06969  [pdf, ps, other

    cs.NI cs.LG

    IRS-Assisted Lossy Communications Under Correlated Rayleigh Fading: Outage Probability Analysis and Optimization

    Authors: Guanchang Li, Wensheng Lin, Lixin Li, Yixuan He, Fucheng Yang, Zhu Han

    Abstract: This paper focuses on an intelligent reflecting surface (IRS)-assisted lossy communication system with correlated Rayleigh fading. We analyze the correlated channel model and derive the outage probability of the system. Then, we design a deep reinforce learning (DRL) method to optimize the phase shift of IRS, in order to maximize the received signal power. Moreover, this paper presents results of… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  34. arXiv:2408.06601  [pdf, other

    cs.HC cs.GR

    HiRegEx: Interactive Visual Query and Exploration of Multivariate Hierarchical Data

    Authors: Guozheng Li, Haotian Mi, Chi Harold Liu, Takayuki Itoh, Guoren Wang

    Abstract: When using exploratory visual analysis to examine multivariate hierarchical data, users often need to query data to narrow down the scope of analysis. However, formulating effective query expressions remains a challenge for multivariate hierarchical data, particularly when datasets become very large. To address this issue, we develop a declarative grammar, HiRegEx (Hierarchical data Regular Expres… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 11 pages, 8 figures, accepted at IEEE VIS 2024

    MSC Class: 65D18 ACM Class: I.3.6

  35. arXiv:2408.05416  [pdf, other

    cs.CV cs.AI cs.MM

    High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model

    Authors: Weizhi Zhong, Junfan Lin, Peixin Chen, Liang Lin, Guanbin Li

    Abstract: Audio-driven talking face video generation has attracted increasing attention due to its huge industrial potential. Some previous methods focus on learning a direct mapping from audio to visual content. Despite progress, they often struggle with the ambiguity of the mapping process, leading to flawed results. An alternative strategy involves facial structural representations (e.g., facial landmark… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: submitted to IEEE Transactions on Image Processing(TIP)

  36. arXiv:2408.05412  [pdf, other

    cs.CV cs.AI cs.MM

    Style-Preserving Lip Sync via Audio-Aware Style Reference

    Authors: Weizhi Zhong, Jichang Li, Yinqi Cai, Liang Lin, Guanbin Li

    Abstract: Audio-driven lip sync has recently drawn significant attention due to its widespread application in the multimedia domain. Individuals exhibit distinct lip shapes when speaking the same utterance, attributed to the unique speaking styles of individuals, posing a notable challenge for audio-driven lip sync. Earlier methods for such task often bypassed the modeling of personalized speaking styles, r… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: submitted to IEEE Transactions on Image Processing(TIP)

  37. arXiv:2408.05109  [pdf, other

    cs.DB

    A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

    Authors: Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuyu Luo, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang

    Abstract: Translating users' natural language queries (NL) into SQL queries (i.e., NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of NL2SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs, covering its e… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  38. arXiv:2408.03429  [pdf, other

    quant-ph cs.ET

    MarQSim: Reconciling Determinism and Randomness in Compiler Optimization for Quantum Simulation

    Authors: Xiuqi Cao, Junyu Zhou, Yuhao Liu, Yunong Shi, Gushu Li

    Abstract: Quantum simulation, fundamental in quantum algorithm design, extends far beyond its foundational roots, powering diverse quantum computing applications. However, optimizing the compilation of quantum Hamiltonian simulation poses significant challenges. Existing approaches fall short in reconciling deterministic and randomized compilation, lack appropriate intermediate representations, and struggle… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  39. Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models

    Authors: Tongtong Feng, Qing Li, Xin Wang, Mingzi Wang, Guangyao Li, Wenwu Zhu

    Abstract: Cross-view geo-localization in GNSS-denied environments aims to determine an unknown location by matching drone-view images with the correct geo-tagged satellite-view images from a large gallery. Recent research shows that learning discriminative image representations under specific weather conditions can significantly enhance performance. However, the frequent occurrence of unseen extreme weather… ▽ More

    Submitted 27 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM24 workshop

  40. arXiv:2408.02320  [pdf, ps, other

    cs.LG eess.SP math.NA math.ST stat.ML

    A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models

    Authors: Gen Li, Yuting Wei, Yuejie Chi, Yuxin Chen

    Abstract: Diffusion models, which convert noise into new data instances by learning to reverse a diffusion process, have become a cornerstone in contemporary generative modeling. In this work, we develop non-asymptotic convergence theory for a popular diffusion-based sampler (i.e., the probability flow ODE sampler) in discrete time, assuming access to $\ell_2$-accurate estimates of the (Stein) score functio… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: This manuscript presents improved theory for probability flow ODEs compared to its earlier version arXiv:2306.09251

  41. arXiv:2408.02085  [pdf, other

    cs.CV cs.AI cs.CL eess.SP

    Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models

    Authors: Yulei Qin, Yuncheng Yang, Pengcheng Guo, Gang Li, Hang Shao, Yuchen Shi, Zihan Xu, Yun Gu, Ke Li, Xing Sun

    Abstract: Instruction tuning plays a critical role in aligning large language models (LLMs) with human preference. Despite the vast amount of open instruction datasets, naively training a LLM on all existing instructions may not be optimal and practical. To pinpoint the most beneficial datapoints, data assessment and selection methods have been proposed in the fields of natural language processing (NLP) and… ▽ More

    Submitted 7 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: review, survey, 28 pages, 2 figures, 4 tables

  42. arXiv:2408.01929  [pdf, other

    eess.IV cs.CV

    Advancing H&E-to-IHC Stain Translation in Breast Cancer: A Multi-Magnification and Attention-Based Approach

    Authors: Linhao Qu, Chengsheng Zhang, Guihui Li, Haiyong Zheng, Chen Peng, Wei He

    Abstract: Breast cancer presents a significant healthcare challenge globally, demanding precise diagnostics and effective treatment strategies, where histopathological examination of Hematoxylin and Eosin (H&E) stained tissue sections plays a central role. Despite its importance, evaluating specific biomarkers like Human Epidermal Growth Factor Receptor 2 (HER2) for personalized treatment remains constraine… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE CIS-RAM 2024 Invited Session Oral

  43. arXiv:2408.00788  [pdf, other

    cs.NE cs.LG

    SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network

    Authors: Kexin Wang, Jiahong Zhang, Yong Ren, Man Yao, Di Shang, Bo Xu, Guoqi Li

    Abstract: Brain-inspired Spiking Neural Network (SNN) has demonstrated its effectiveness and efficiency in vision, natural language, and speech understanding tasks, indicating their capacity to "see", "listen", and "read". In this paper, we design \textbf{SpikeVoice}, which performs high-quality Text-To-Speech (TTS) via SNN, to explore the potential of SNN to "speak". A major obstacle to using SNN for such… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

    Comments: 9 pages

  44. arXiv:2408.00465  [pdf, ps, other

    cs.DS cs.LG math.OC

    Infrequent Resolving Algorithm for Online Linear Programming

    Authors: Guokai Li, Zizhuo Wang, Jingwei Zhang

    Abstract: Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requi… ▽ More

    Submitted 1 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: 35 pages, 7 figures

  45. arXiv:2407.21465  [pdf, other

    cs.CV

    MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection

    Authors: Kuo Wang, Lechao Cheng, Weikai Chen, Pingping Zhang, Liang Lin, Fan Zhou, Guanbin Li

    Abstract: Learning from pseudo-labels that generated with VLMs~(Vision Language Models) has been shown as a promising solution to assist open vocabulary detection (OVD) in recent studies. However, due to the domain gap between VLM and vision-detection tasks, pseudo-labels produced by the VLMs are prone to be noisy, while the training design of the detector further amplifies the bias. In this work, we invest… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/wkfdb/MarvelOVD

  46. arXiv:2407.21282  [pdf, ps, other

    cs.LG cs.HC

    FedBChain: A Blockchain-enabled Federated Learning Framework for Improving DeepConvLSTM with Comparative Strategy Insights

    Authors: Gaoxuan Li, Chern Hong Lim, Qiyao Ma, Xinyu Tang, Hwa Hui Tew, Fan Ding, Xuewen Luo

    Abstract: Recent research in the field of Human Activity Recognition has shown that an improvement in prediction performance can be achieved by reducing the number of LSTM layers. However, this kind of enhancement is only significant on monolithic architectures, and when it runs on large-scale distributed training, data security and privacy issues will be reconsidered, and its prediction performance is unkn… ▽ More

    Submitted 7 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  47. arXiv:2407.20853  [pdf, other

    cs.CV

    NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding

    Authors: Hongjia Zhai, Gan Huang, Qirui Hu, Guanglin Li, Hujun Bao, Guofeng Zhang

    Abstract: In recent years, the paradigm of neural implicit representations has gained substantial attention in the field of Simultaneous Localization and Mapping (SLAM). However, a notable gap exists in the existing approaches when it comes to scene understanding. In this paper, we introduce NIS-SLAM, an efficient neural implicit semantic RGB-D SLAM system, that leverages a pre-trained 2D segmentation netwo… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accept by TVCG (ISMAR 2024 Journal Track)

  48. arXiv:2407.20708  [pdf, other

    cs.AI

    Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection

    Authors: Xinhao Luo, Man Yao, Yuhong Chou, Bo Xu, Guoqi Li

    Abstract: Brain-inspired Spiking Neural Networks (SNNs) have bio-plausibility and low-power advantages over Artificial Neural Networks (ANNs). Applications of SNNs are currently limited to simple classification tasks because of their poor performance. In this work, we focus on bridging the performance gap between ANNs and SNNs on object detection. Our design revolves around network architecture and spiking… ▽ More

    Submitted 5 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024; 19 pages, 4 figures

  49. arXiv:2407.20693  [pdf, other

    cs.CV cs.AI cs.MM

    Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

    Authors: Guangyao Li, Henghui Du, Di Hu

    Abstract: The Audio Visual Question Answering (AVQA) task aims to answer questions related to various visual objects, sounds, and their interactions in videos. Such naturally multimodal videos contain rich and complex dynamic audio-visual components, with only a portion of them closely related to the given questions. Hence, effectively perceiving audio-visual cues relevant to the given questions is crucial… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  50. arXiv:2407.20679  [pdf, other

    cs.CE

    Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems

    Authors: Qionghua Liao, Guilong Li, Jiajie Yu, Ziyuan Gu, Wei Ma

    Abstract: With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. Existing literature largely overlooks the interactions between power grid security and traffic effici… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 33 pages, 31 figures