Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 288 results for author: He, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04697  [pdf, other

    cs.CV cs.MM

    VCoME: Verbal Video Composition with Multimodal Editing Effects

    Authors: Weibo Gong, Xiaojie Jin, Xin Li, Dongliang He, Xinglong Wu

    Abstract: Verbal videos, featuring voice-overs or text overlays, provide valuable content but present significant challenges in composition, especially when incorporating editing effects to enhance clarity and visual appeal. In this paper, we introduce the novel task of verbal video composition with editing effects. This task aims to generate coherent and visually appealing verbal videos by integrating mult… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2407.03157  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    Let the Code LLM Edit Itself When You Edit the Code

    Authors: Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He

    Abstract: In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the next token or next line on the fly. Naively, the LLM needs to re-encode the entire KV cache to provide an accurate prediction. However, this process is computationally expensive, especially when the sequ… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Preprint. Work in Progress

  3. arXiv:2407.00743  [pdf, other

    cs.MM cs.AI cs.CL eess.AS

    AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations

    Authors: Sheng Wu, Jiaxing Liu, Longbiao Wang, Dongxiao He, Xiaobao Wang, Jianwu Dang

    Abstract: Emotion Recognition in Conversations (ERC) is a popular task in natural language processing, which aims to recognize the emotional state of the speaker in conversations. While current research primarily emphasizes contextual modeling, there exists a dearth of investigation into effective multimodal fusion methods. We propose a novel framework called AIMDiT to solve the problem of multimodal fusion… ▽ More

    Submitted 12 April, 2024; originally announced July 2024.

  4. arXiv:2406.18099  [pdf, other

    cs.DB

    CompassDB: Pioneering High-Performance Key-Value Store with Perfect Hash

    Authors: Jin Jiang, Dongsheng He, Yu Hu, Dong Liu, Chenfan Xiao, Hongxiao Bi, Yusong Zhang, Chaoqu Jiang, Zhijun Fu

    Abstract: Modern mainstream persistent key-value storage engines utilize Log-Structured Merge tree (LSM-tree) based designs, optimizing read/write performance by leveraging sequential disk I/O. However, the advent of SSDs, with their significant improvements in bandwidth and IOPS, shifts the bottleneck from I/O to CPU. The high compaction cost and large read/write amplification associated with LSM trees hav… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  5. arXiv:2406.16853  [pdf, other

    cs.LG cond-mat.mtrl-sci cs.AI q-bio.BM

    GeoMFormer: A General Architecture for Geometric Molecular Representation Learning

    Authors: Tianlang Chen, Shengjie Luo, Di He, Shuxin Zheng, Tie-Yan Liu, Liwei Wang

    Abstract: Molecular modeling, a central topic in quantum mechanics, aims to accurately calculate the properties and simulate the behaviors of molecular systems. The molecular model is governed by physical laws, which impose geometric constraints such as invariance and equivariance to coordinate rotation and translation. While numerous deep learning approaches have been developed to learn molecular represent… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 25 pages, 13 tables, l figure; ICML 2024 camera ready version

  6. arXiv:2406.15474  [pdf, other

    cs.AI cs.CL cs.HC

    WundtGPT: Shaping Large Language Models To Be An Empathetic, Proactive Psychologist

    Authors: Chenyu Ren, Yazhou Zhang, Daihai He, Jing Qin

    Abstract: Large language models (LLMs) are raging over the medical domain, and their momentum has carried over into the mental health domain, leading to the emergence of few mental health LLMs. Although such mental health LLMs could provide reasonable suggestions for psychological counseling, how to develop an authentic and effective doctor-patient relationship (DPR) through LLMs is still an important probl… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.14064  [pdf, other

    cs.IT eess.SP

    PAPR Reduction with Pre-chirp Selection for Affine Frequency Division Multiple

    Authors: Haozhi Yuan, Yin Xu, Xinghao Guo, Tianyao Ma, Haoyang Li, Dazhi He, Wenjun Zhang

    Abstract: Affine frequency division multiplexing (AFDM) is a promising new multicarrier technique based on discrete affine Fourier transform (DAFT). By properly tuning pre-chirp parameter and post-chirp parameter in the DAFT, the effective channel in the DAFT domain can completely avoid overlap of different paths, thus constitutes a full representation of delay-Doppler profile, which significantly improves… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  8. arXiv:2406.11775  [pdf, other

    cs.CV cs.AI

    Task Me Anything

    Authors: Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna

    Abstract: Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their spec… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: website: https://www.task-me-anything.org

  9. arXiv:2406.09771  [pdf, other

    cs.DS

    Block Coordinate Descent Methods for Optimization under J-Orthogonality Constraints with Applications

    Authors: Di He, Ganzhao Yuan, Xiao Wang, Pengxiang Xu

    Abstract: The J-orthogonal matrix, also referred to as the hyperbolic orthogonal matrix, is a class of special orthogonal matrix in hyperbolic space, notable for its advantageous properties. These matrices are integral to optimization under J-orthogonal constraints, which have widespread applications in statistical learning and data science. However, addressing these problems is generally challenging due to… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  10. arXiv:2405.17229  [pdf, other

    cs.HC

    InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning

    Authors: Guozheng Li, Peng He, Xinyu Wang, Runfei Li, Chi Harold Liu, Chuangxin Ou, Dong He, Guoren Wang

    Abstract: Embedding visual representations within original hierarchical tables can mitigate additional cognitive load stemming from the division of users' attention. The created hierarchical table visualizations can help users understand and explore complex data with multi-level attributes. However, because of many options available for transforming hierarchical tables and selecting subsets for embedding, t… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  11. arXiv:2405.16851  [pdf, other

    cs.NE cs.AI cs.LG

    Temporal Spiking Neural Networks with Synaptic Delay for Graph Reasoning

    Authors: Mingqing Xiao, Yixin Zhu, Di He, Zhouchen Lin

    Abstract: Spiking neural networks (SNNs) are investigated as biologically inspired models of neural computation, distinguished by their computational capability and energy efficiency due to precise spiking times and sparse spikes with event-driven computation. A significant question is how SNNs can emulate human-like graph-based reasoning of concepts and relations, especially leveraging the temporal domain… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  12. arXiv:2405.13388  [pdf, other

    cs.CV

    Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation

    Authors: Dingwen Zhang, Hao Li, Diqi He, Nian Liu, Lechao Cheng, Jingdong Wang, Junwei Han

    Abstract: In recent times, following the paradigm of DETR (DEtection TRansformer), query-based end-to-end instance segmentation (QEIS) methods have exhibited superior performance compared to CNN-based models, particularly when trained on large-scale datasets. Nevertheless, the effectiveness of these QEIS methods diminishes significantly when confronted with limited training data. This limitation arises from… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: https://github.com/lifuguan/UPLVP

  13. arXiv:2405.13179  [pdf, other

    cs.CL

    RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts

    Authors: Yuelyu Ji, Zhuochun Li, Rui Meng, Sonish Sivarajkumar, Yanshan Wang, Zeshui Yu, Hui Ji, Yushui Han, Hanyu Zeng, Daqing He

    Abstract: This paper introduces the RAG-RLRC-LaySum framework, designed to make complex biomedical research understandable to laymen through advanced Natural Language Processing (NLP) techniques. Our Retrieval Augmented Generation (RAG) solution, enhanced by a reranking method, utilizes multiple knowledge sources to ensure the precision and pertinence of lay summaries. Additionally, our Reinforcement Learni… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  14. arXiv:2405.07047  [pdf, other

    cs.CV

    Unsupervised Density Neural Representation for CT Metal Artifact Reduction

    Authors: Qing Wu, Xu Guo, Lixuan Chen, Dongming He, Hongjiang Wei, Xudong Wang, S. Kevin Zhou, Yifeng Zhang, Jingyi Yu, Yuyao Zhang

    Abstract: Emerging unsupervised reconstruction techniques based on implicit neural representation (INR), such as NeRP, CoIL, and SCOPE, have shown unique capabilities in CT linear inverse imaging. In this work, we propose a novel unsupervised density neural representation (Diner) to tackle the challenging problem of CT metal artifacts when scanned objects contain metals. The drastic variation of linear atte… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 13 pages

  15. arXiv:2405.06201  [pdf, other

    cs.CV

    PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement

    Authors: Jiyao Wang, Hao Lu, Ange Wang, Xiao Yang, Yingcong Chen, Dengbo He, Kaishun Wu

    Abstract: Remote photoplethysmography (rPPG) has been widely applied to measure heart rate from face videos. To increase the generalizability of the algorithms, domain generalization (DG) attracted increasing attention in rPPG. However, when rPPG is extended to simultaneously measure more vital signs (e.g., respiration and blood oxygen saturation), achieving generalizability brings new challenges. Although… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  16. arXiv:2405.03136  [pdf, other

    cs.CR

    FOBNN: Fast Oblivious Binarized Neural Network Inference

    Authors: Xin Chen, Zhili Chen, Benchang Dong, Shiwen Wei, Lin Chen, Daojing He

    Abstract: The superior performance of deep learning has propelled the rise of Deep Learning as a Service, enabling users to transmit their private data to service providers for model execution and inference retrieval. Nevertheless, the primary concern remains safeguarding the confidentiality of sensitive user data while optimizing the efficiency of secure protocols. To address this, we develop a fast oblivi… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  17. arXiv:2404.18922  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    DPO Meets PPO: Reinforced Token Optimization for RLHF

    Authors: Han Zhong, Guhao Feng, Wei Xiong, Li Zhao, Di He, Jiang Bian, Liwei Wang

    Abstract: In the classical Reinforcement Learning from Human Feedback (RLHF) framework, Proximal Policy Optimization (PPO) is employed to learn from sparse, sentence-level rewards -- a challenging scenario in traditional deep reinforcement learning. Despite the great successes of PPO in the alignment of state-of-the-art closed-source large language models (LLMs), its open-source implementation is still larg… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  18. arXiv:2404.16313  [pdf, ps, other

    cs.IT

    Further Investigations on Nonlinear Complexity of Periodic Binary Sequences

    Authors: Qin Yuan, Chunlei Li, Xiangyong Zeng, Tor Helleseth, Debiao He

    Abstract: Nonlinear complexity is an important measure for assessing the randomness of sequences. In this paper we investigate how circular shifts affect the nonlinear complexities of finite-length binary sequences and then reveal a more explicit relation between nonlinear complexities of finite-length binary sequences and their corresponding periodic sequences. Based on the relation, we propose two algorit… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  19. arXiv:2404.10384  [pdf, other

    cs.CL cs.AI cs.IR

    Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering

    Authors: Yuqi Wang, Boran Jiang, Yi Luo, Dawei He, Peng Cheng, Liangcai Gao

    Abstract: Large language models (LLMs), such as GPT3.5, GPT4 and LLAMA2 perform surprisingly well and outperform human experts on many tasks. However, in many domain-specific evaluations, these LLMs often suffer from hallucination problems due to insufficient training of relevant corpus. Furthermore, fine-tuning large models may face problems such as the LLMs are not open source or the construction of high-… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  20. arXiv:2404.08674  [pdf, other

    cs.CL cs.AI cs.HC

    Effects of Different Prompts on the Quality of GPT-4 Responses to Dementia Care Questions

    Authors: Zhuochun Li, Bo Xie, Robin Hilsabeck, Alyssa Aguirre, Ning Zou, Zhimeng Luo, Daqing He

    Abstract: Evidence suggests that different prompts lead large language models (LLMs) to generate responses with varying quality. Yet, little is known about prompts' effects on response quality in healthcare domains. In this exploratory study, we address this gap, focusing on a specific healthcare domain: dementia caregiving. We first developed an innovative prompt template with three components: (1) system… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  21. arXiv:2404.06563  [pdf, other

    cs.DB cs.LG cs.MM

    Demonstration of MaskSearch: Efficiently Querying Image Masks for Machine Learning Workflows

    Authors: Lindsey Linxi Wei, Chung Yik Edward Yeung, Hongjian Yu, Jingchuan Zhou, Dong He, Magdalena Balazinska

    Abstract: We demonstrate MaskSearch, a system designed to accelerate queries over databases of image masks generated by machine learning models. MaskSearch formalizes and accelerates a new category of queries for retrieving images and their corresponding masks based on mask properties, which support various applications, from identifying spurious correlations learned by models to exploring discrepancies bet… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  22. arXiv:2404.04848  [pdf, other

    eess.IV cs.AI cs.CV

    Task-Aware Encoder Control for Deep Video Compression

    Authors: Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan Wang, Jun Zhang, Hongwei Qin

    Abstract: Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an… ▽ More

    Submitted 20 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  23. arXiv:2403.19907  [pdf, ps, other

    cs.LG cs.AI

    Beyond the Known: Novel Class Discovery for Open-world Graph Learning

    Authors: Yucheng Jin, Yun Xiong, Juncheng Fang, Xixi Wu, Dongxiao He, Xing Jia, Bingchen Zhao, Philip Yu

    Abstract: Node classification on graphs is of great importance in many applications. Due to the limited labeling capability and evolution in real-world open scenarios, novel classes can emerge on unlabeled testing nodes. However, little attention has been paid to novel class discovery on graphs. Discovering novel classes is challenging as novel and known class nodes are correlated by edges, which makes thei… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  24. arXiv:2403.12063  [pdf, other

    cs.CV cs.LG

    Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers

    Authors: Tongda Xu, Ziran Zhu, Jian Li, Dailan He, Yuanyuan Wang, Ming Sun, Ling Li, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

    Abstract: Diffusion Inverse Solvers (DIS) are designed to sample from the conditional distribution $p_θ(X_0|y)$, with a predefined diffusion model $p_θ(X_0)$, an operator $f(\cdot)$, and a measurement $y=f(x'_0)$ derived from an unknown image $x'_0$. Existing DIS estimate the conditional score function by evaluating $f(\cdot)$ with an approximated posterior sample drawn from $p_θ(X_0|X_t)$. However, most pr… ▽ More

    Submitted 1 June, 2024; v1 submitted 8 February, 2024; originally announced March 2024.

  25. arXiv:2403.08551  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

    Authors: Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang

    Abstract: Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation an… ▽ More

    Submitted 10 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  26. arXiv:2403.08505  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    Content-aware Masked Image Modeling Transformer for Stereo Image Compression

    Authors: Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang

    Abstract: Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compressi… ▽ More

    Submitted 19 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  27. arXiv:2403.05916  [pdf, other

    cs.CV cs.AI

    GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing

    Authors: Hao Lu, Xuesong Niu, Jiyao Wang, Yin Wang, Qingyong Hu, Jiaqi Tang, Yuting Zhang, Kaishen Yuan, Bin Huang, Zitong Yu, Dengbo He, Shuiguang Deng, Hao Chen, Yingcong Chen, Shiguang Shan

    Abstract: Multimodal large language models (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos. Despite its success in language understanding, it is critical to evaluate the performance of downstream tasks for better human-centric applications. This paper assesses the application of MLLMs with 5 crucial abilities for affective computing,… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  28. arXiv:2403.03472  [pdf, other

    cs.LG cs.CV

    Boosting Meta-Training with Base Class Information for Few-Shot Learning

    Authors: Weihao Jiang, Guodong Liu, Di He, Kun He

    Abstract: Few-shot learning, a challenging task in machine learning, aims to learn a classifier adaptable to recognize new, unseen classes with limited labeled examples. Meta-learning has emerged as a prominent framework for few-shot learning. Its training framework is originally a task-level learning method, such as Model-Agnostic Meta-Learning (MAML) and Prototypical Networks. And a recently proposed trai… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 11 pages, 6 figures, submitted to a journal

  29. arXiv:2402.18152  [pdf, other

    eess.IV cs.AI cs.CV

    Boosting Neural Representations for Videos with a Conditional Decoder

    Authors: Xinjie Zhang, Ren Yang, Dailan He, Xingtong Ge, Tongda Xu, Yan Wang, Hongwei Qin, Jun Zhang

    Abstract: Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting frame… ▽ More

    Submitted 16 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accept by CVPR 2024

  30. arXiv:2402.15853  [pdf, other

    cs.CV

    RAUCA: A Novel Physical Adversarial Attack on Vehicle Detectors via Robust and Accurate Camouflage Generation

    Authors: Jiawei Zhou, Linye Lyu, Daojing He, Yu Li

    Abstract: Adversarial camouflage is a widely used physical attack against vehicle detectors for its superiority in multi-view attack performance. One promising approach involves using differentiable neural renderers to facilitate adversarial camouflage optimization through gradient back-propagation. However, existing methods often struggle to capture environmental characteristics during the rendering proces… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  31. arXiv:2402.13934  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Do Efficient Transformers Really Save Computation?

    Authors: Kai Yang, Jan Ackermann, Zhenyu He, Guhao Feng, Bohang Zhang, Yunzhen Feng, Qiwei Ye, Di He, Liwei Wang

    Abstract: As transformer-based language models are trained on increasingly large datasets and with vast numbers of parameters, finding more efficient alternatives to the standard Transformer has become very valuable. While many efficient Transformers and Transformer alternatives have been proposed, none provide theoretical guarantees that they are a suitable replacement for the standard Transformer. This ma… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  32. arXiv:2402.11984  [pdf, other

    cs.NE cs.AI cs.LG

    Hebbian Learning based Orthogonal Projection for Continual Learning of Spiking Neural Networks

    Authors: Mingqing Xiao, Qingyan Meng, Zongpeng Zhang, Di He, Zhouchen Lin

    Abstract: Neuromorphic computing with spiking neural networks is promising for energy-efficient artificial intelligence (AI) applications. However, different from humans who continually learn different tasks in a lifetime, neural network models suffer from catastrophic forgetting. How could neuronal operations solve this problem is an important question for AI and neuroscience. Many previous studies draw in… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR 2024

  33. arXiv:2402.09730  [pdf, ps, other

    cs.LG

    DOF: Accelerating High-order Differential Operators with Forward Propagation

    Authors: Ruichen Li, Chuwei Wang, Haotian Ye, Di He, Liwei Wang

    Abstract: Solving partial differential equations (PDEs) efficiently is essential for analyzing complex physical systems. Recent advancements in leveraging deep learning for solving PDE have shown significant promise. However, machine learning methods, such as Physics-Informed Neural Networks (PINN), face challenges in handling high-order derivatives of neural network-parameterized functions. Inspired by For… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  34. arXiv:2401.16421  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

    Authors: Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Liwei Wang, Jingjing Xu, Zhi Zhang, Hongxia Yang, Di He

    Abstract: In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute pos… ▽ More

    Submitted 17 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 17 pages, 7 figures, 8 tables; ICML 2024 Camera Ready version; Code: https://github.com/zhenyuhe00/BiPE

  35. arXiv:2401.14717  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

    Authors: Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

    Abstract: We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: To appear in IEEE ICASSP 2024

  36. arXiv:2401.08920  [pdf, other

    eess.IV cs.CV

    Idempotence and Perceptual Image Compression

    Authors: Tongda Xu, Ziran Zhu, Dailan He, Yanghao Li, Lina Guo, Yuanyuan Wang, Zhe Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

    Abstract: Idempotence is the stability of image codec to re-compression. At the first glance, it is unrelated to perceptual image compression. However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec. Based on this newfound equivalence, we prop… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  37. Two-pass Endpoint Detection for Speech Recognition

    Authors: Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow

    Abstract: Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands. The endpoint detector has to trade-off between accuracy and latency, since waiting longer reduces the cases of users being cut-off early. We propose a novel two-pass solution for endpointing, where the utterance endpoint detected from a first pass endpointer is verified b… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: ASRU 2023

  38. arXiv:2401.08514  [pdf, other

    cs.LG cs.DM cs.DS math.CO

    Beyond Weisfeiler-Lehman: A Quantitative Framework for GNN Expressiveness

    Authors: Bohang Zhang, Jingchu Gai, Yiheng Du, Qiwei Ye, Di He, Liwei Wang

    Abstract: Designing expressive Graph Neural Networks (GNNs) is a fundamental topic in the graph learning community. So far, GNN expressiveness has been primarily assessed via the Weisfeiler-Lehman (WL) hierarchy. However, such an expressivity measure has notable limitations: it is inherently coarse, qualitative, and may not well reflect practical requirements (e.g., the ability to encode substructures). In… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 73 pages, 9 figures, 9 tables; Extended from ICLR 2024 (Oral Presentation). This version polishes all proofs for better readability

  39. arXiv:2401.06176  [pdf, other

    cs.LG cs.AI

    GOODAT: Towards Test-time Graph Out-of-Distribution Detection

    Authors: Luzhi Wang, Dongxiao He, He Zhang, Yixin Liu, Wenjie Wang, Shirui Pan, Di Jin, Tat-Seng Chua

    Abstract: Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains. While GNNs excel in scenarios where the testing data shares the distribution of their training counterparts (in distribution, ID), they often exhibit incorrect predictions when confronted with samples from an unfamiliar distribution (out-of-distribution, OOD). To identify and reject OOD sa… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 9 pages, 5 figures

  40. arXiv:2401.03862  [pdf, other

    physics.chem-ph cs.LG

    End-to-End Crystal Structure Prediction from Powder X-Ray Diffraction

    Authors: Qingsi Lai, Lin Yao, Zhifeng Gao, Siyuan Liu, Hongshuai Wang, Shuqi Lu, Di He, Liwei Wang, Cheng Wang, Guolin Ke

    Abstract: Crystal structure prediction (CSP) has made significant progress, but most methods focus on unconditional generations of inorganic crystal with limited atoms in the unit cell. This study introduces XtalNet, the first equivariant deep generative model for end-to-end CSP from Powder X-ray Diffraction (PXRD). Unlike previous methods that rely solely on composition, XtalNet leverages PXRD as an additi… ▽ More

    Submitted 1 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  41. arXiv:2311.15296  [pdf, other

    cs.CL

    UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation

    Authors: Xun Liang, Shichao Song, Simin Niu, Zhiyu Li, Feiyu Xiong, Bo Tang, Yezhaohui Wang, Dawei He, Peng Cheng, Zhonghao Wang, Haiying Deng

    Abstract: Large language models (LLMs) have emerged as pivotal contributors in contemporary natural language processing and are increasingly being applied across a diverse range of industries. However, these large-scale probabilistic statistical models cannot currently ensure the requisite quality in professional content generation. These models often produce hallucinated text, compromising their practical… ▽ More

    Submitted 23 May, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: Accepted by ACL 2024

  42. arXiv:2311.11646  [pdf, other

    cs.CV

    Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning

    Authors: Yan Li, Weiwei Guo, Xue Yang, Ning Liao, Dunyun He, Jiaqi Zhou, Wenxian Yu

    Abstract: An increasingly massive number of remote-sensing images spurs the development of extensible object detectors that can detect objects beyond training categories without costly collecting new labeled data. In this paper, we aim to develop open-vocabulary object detection (OVD) technique in aerial images that scales up object vocabulary size beyond training data. The fundamental challenges hinder ope… ▽ More

    Submitted 13 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  43. arXiv:2311.08252  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    REST: Retrieval-Based Speculative Decoding

    Authors: Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He

    Abstract: We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation. The key insight driving the development of REST is the observation that the process of text generation often includes certain common phases and patterns. Unlike previous methods that rely on a draft language model for speculative decoding, REST harnesses the power of retrieva… ▽ More

    Submitted 4 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: NAACL 2024, camera ready

  44. arXiv:2311.04769  [pdf

    eess.IV cs.CV

    An attention-based deep learning network for predicting Platinum resistance in ovarian cancer

    Authors: Haoming Zhuang, Beibei Li, Jingtong Ma, Patrice Monkam, Shouliang Qi, Wei Qian, Dianning He

    Abstract: Background: Ovarian cancer is among the three most frequent gynecologic cancers globally. High-grade serous ovarian cancer (HGSOC) is the most common and aggressive histological type. Guided treatment for HGSOC typically involves platinum-based combination chemotherapy, necessitating an assessment of whether the patient is platinum-resistant. The purpose of this study is to propose a deep learning… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  45. arXiv:2310.17425  [pdf, ps, other

    eess.SP cs.IT

    Detecting Abrupt Change of Channel Covariance Matrix in IRS-Assisted Communication

    Authors: Runnan Liu, Liang Liu, Yin Xu, Dazhi He, Wenjun Zhang, Chang Wen Chen

    Abstract: The knowledge of channel covariance matrices is crucial to the design of intelligent reflecting surface (IRS) assisted communication. However, channel covariance matrices may change suddenly in practice. This letter focuses on the detection of the above change in IRS-assisted communication. Specifically, we consider the uplink communication system consisting of a single-antenna user (UE), an IRS,… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: accepted by IEEE Wireless Communications Letters

  46. arXiv:2310.15566  [pdf, other

    cs.IT eess.SP

    RIS-Aided Receive Generalized Spatial Modulation Design with Reflecting Modulation

    Authors: Xinghao Guo, Yin Xu, Hanjiang Hong, De Mi, Ruiqi Liu, Dazhi He, Wenjun Zhang, Yi-yan Wu

    Abstract: Spatial modulation (SM) transmits additional information bits by the selection of antennas. Generalized spatial modulation (GSM), as an advanced type of SM, can be divided into diversity and multiplexing (MUX) schemes according to the symbols carried on the selected antennas are identical or different. Recently, reconfigurable intelligent surface (RIS) assisted SM exhibits better reception perform… ▽ More

    Submitted 15 April, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: 6 pages, submitted to Conference

  47. arXiv:2310.15565  [pdf, other

    cs.IT eess.SP

    Capacity-based Spatial Modulation Constellation and Pre-scaling Design

    Authors: Xinghao Guo, Hanjiang Hong, Yin Xu, Yi-yan Wu, Dazhi He, Wenjun Zhang

    Abstract: Spatial Modulation (SM) can utilize the index of the transmit antenna (TA) to transmit additional information. In this paper, to improve the performance of SM, a non-uniform constellation (NUC) and pre-scaling coefficients optimization design scheme is proposed. The bit-interleaved coded modulation (BICM) capacity calculation formula of SM system is firstly derived. The constellation and pre-scali… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 6 pages,conference

  48. arXiv:2310.13913  [pdf, other

    cs.LG cs.CE q-bio.BM

    Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models

    Authors: Lihang Liu, Shanzhuo Zhang, Donglong He, Xianbin Ye, Jingbo Zhou, Xiaonan Zhang, Yaoyao Jiang, Weiming Diao, Hang Yin, Hua Chai, Fan Wang, Jingzhou He, Liang Zheng, Yonghui Li, Xiaomin Fang

    Abstract: Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Recent advances have incorporated deep learning techniques to improve the accuracy of protein-ligand structure prediction. Nevertheless, the experimental validation of docking conformations remains costly, it raises conce… ▽ More

    Submitted 22 May, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

  49. arXiv:2309.13307  [pdf, other

    cs.LG

    CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity

    Authors: Pengyun Yue, Hanzhen Zhao, Cong Fang, Di He, Liwei Wang, Zhouchen Lin, Song-chun Zhu

    Abstract: With distributed machine learning being a prominent technique for large-scale machine learning tasks, communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers. In this paper, we propose a new technique named Common randOm REconstruction(CORE), which can be used to compress the information transmitted between machines in order to reduce communic… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  50. arXiv:2309.08895  [pdf, other

    cs.IT eess.SP

    CDDM: Channel Denoising Diffusion Models for Wireless Semantic Communications

    Authors: Tong Wu, Zhiyong Chen, Dazhi He, Liang Qian, Yin Xu, Meixia Tao, Wenjun Zhang

    Abstract: Diffusion models (DM) can gradually learn to remove noise, which have been widely used in artificial intelligence generated content (AIGC) in recent years. The property of DM for eliminating noise leads us to wonder whether DM can be applied to wireless communications to help the receiver mitigate the channel noise. To address this, we propose channel denoising diffusion models (CDDM) for semantic… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: submitted to IEEE Transactions on Wireless Communications. arXiv admin note: substantial text overlap with arXiv:2305.09161