Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 263 results for author: Zheng, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02908  [pdf, other

    cs.LG cs.AI cs.CL

    Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling

    Authors: Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, Qinsheng Zhang

    Abstract: Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data, thanks to their superior performance over other discrete diffusion models, and are rivaling the auto-regressive models (ARMs) for language modeling tasks. The recent effort in simplifying the masked diffusion framework further leads to alignment with continuous-space diffusion models a… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 40 pages

  2. arXiv:2408.12153  [pdf, other

    cs.IR cs.LG

    DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models

    Authors: Wuchao Li, Rui Huang, Haijun Zhao, Chi Liu, Kai Zheng, Qi Liu, Na Mou, Guorui Zhou, Defu Lian, Yang Song, Wentian Bao, Enyun Yu, Wenwu Ou

    Abstract: Sequential Recommendation (SR) plays a pivotal role in recommender systems by tailoring recommendations to user preferences based on their non-stationary historical interactions. Achieving high-quality performance in SR requires attention to both item representation and diversity. However, designing an SR method that simultaneously optimizes these merits remains a long-standing challenge. In this… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  3. arXiv:2408.10636  [pdf

    eess.IV cs.CV

    UWF-RI2FA: Generating Multi-frame Ultrawide-field Fluorescein Angiography from Ultrawide-field Retinal Imaging Improves Diabetic Retinopathy Stratification

    Authors: Ruoyu Chen, Kezheng Xu, Kangyan Zheng, Weiyi Zhang, Yan Lu, Danli Shi, Mingguang He

    Abstract: Ultrawide-field fluorescein angiography (UWF-FA) facilitates diabetic retinopathy (DR) detection by providing a clear visualization of peripheral retinal lesions. However, the intravenous dye injection with potential risks hamper its application. We aim to acquire dye-free UWF-FA images from noninvasive UWF retinal imaging (UWF-RI) using generative artificial intelligence (GenAI) and evaluate its… ▽ More

    Submitted 27 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 22 pages, 2 figures

  4. arXiv:2408.09706  [pdf, other

    cs.CV

    MePT: Multi-Representation Guided Prompt Tuning for Vision-Language Model

    Authors: Xinyang Wang, Yi Yang, Minfeng Zhu, Kecheng Zheng, Shi Liu, Wei Chen

    Abstract: Recent advancements in pre-trained Vision-Language Models (VLMs) have highlighted the significant potential of prompt tuning for adapting these models to a wide range of downstream tasks. However, existing prompt tuning methods typically map an image to a single representation, limiting the model's ability to capture the diverse ways an image can be described. To address this limitation, we invest… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  5. arXiv:2408.09429  [pdf, other

    cs.LG cs.CL cs.CV

    Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models

    Authors: Kening Zheng, Junkai Chen, Yibo Yan, Xin Zou, Xuming Hu

    Abstract: Hallucination issues persistently plagued current multimodal large language models (MLLMs). While existing research primarily focuses on object-level or attribute-level hallucinations, sidelining the more sophisticated relation hallucinations that necessitate advanced reasoning abilities from MLLMs. Besides, recent benchmarks regarding relation hallucinations lack in-depth evaluation and effective… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  6. arXiv:2408.08134  [pdf, other

    cs.CV

    CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning

    Authors: Wei Zhu, Yicheng Liu, Yuping He, Tangfei Liao, Kang Zheng, Xiaoqiu Xu, Tao Wang, Tong Lu

    Abstract: In the fields of computer vision and robotics, accurate pixel-level correspondences are essential for enabling advanced tasks such as structure-from-motion and simultaneous localization and mapping. Recent correspondence pruning methods usually focus on learning local consistency through k-nearest neighbors, which makes it difficult to capture robust context for each correspondence. We propose Cor… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures, accepted by ECAI

  7. arXiv:2408.06359  [pdf, other

    eess.SP cs.AI cs.LG

    An Adaptive CSI Feedback Model Based on BiLSTM for Massive MIMO-OFDM Systems

    Authors: Hongrui Shen, Long Zhao, Kan Zheng, Yuhua Cao, Pingzhi Fan

    Abstract: Deep learning (DL)-based channel state information (CSI) feedback has the potential to improve the recovery accuracy and reduce the feedback overhead in massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. However, the length of input CSI and the number of feedback bits should be adjustable in different scenarios, which can not be efficiently achie… ▽ More

    Submitted 26 July, 2024; originally announced August 2024.

    Comments: 13 pages, 14 figures, 3 tables

  8. arXiv:2407.21771  [pdf, other

    cs.CV

    Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs

    Authors: Shi Liu, Kecheng Zheng, Wei Chen

    Abstract: Existing Large Vision-Language Models (LVLMs) primarily align image features of vision encoder with Large Language Models (LLMs) to leverage their superior text generation capabilities. However, the scale disparity between vision encoder and language model may led to LLMs assuming a predominant role in multi-modal comprehension. This imbalance in LVLMs may result in the instances of hallucinatory.… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  9. TWIN V2: Scaling Ultra-Long User Behavior Sequence Modeling for Enhanced CTR Prediction at Kuaishou

    Authors: Zihua Si, Lin Guan, ZhongXiang Sun, Xiaoxue Zang, Jing Lu, Yiqun Hui, Xingchao Cao, Zeyu Yang, Yichen Zheng, Dewei Leng, Kai Zheng, Chenbin Zhang, Yanan Niu, Yang Song, Kun Gai

    Abstract: The significance of modeling long-term user interests for CTR prediction tasks in large-scale recommendation systems is progressively gaining attention among researchers and practitioners. Existing work, such as SIM and TWIN, typically employs a two-stage approach to model long-term user behavior sequences for efficiency concerns. The first stage rapidly retrieves a subset of sequences related to… ▽ More

    Submitted 16 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by CIKM 2024

  10. arXiv:2407.15819  [pdf, other

    cs.CV

    Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight

    Authors: Ziyuan Huang, Kaixiang Ji, Biao Gong, Zhiwu Qing, Qinglong Zhang, Kecheng Zheng, Jian Wang, Jingdong Chen, Ming Yang

    Abstract: This paper introduces Chain-of-Sight, a vision-language bridge module that accelerates the pre-training of Multimodal Large Language Models (MLLMs). Our approach employs a sequence of visual resamplers that capture visual details at various spacial scales. This architecture not only leverages global and local visual contexts effectively, but also facilitates the flexible extension of visual tokens… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  11. arXiv:2407.15273  [pdf, other

    cs.LG cs.AI

    Unifying Invariant and Variant Features for Graph Out-of-Distribution via Probability of Necessity and Sufficiency

    Authors: Xuexin Chen, Ruichu Cai, Kaitao Zheng, Zhifan Jiang, Zhengting Huang, Zhifeng Hao, Zijian Li

    Abstract: Graph Out-of-Distribution (OOD), requiring that models trained on biased data generalize to the unseen test data, has considerable real-world applications. One of the most mainstream methods is to extract the invariant subgraph by aligning the original and augmented data with the help of environment augmentation. However, these solutions might lead to the loss or redundancy of semantic subgraphs a… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  12. arXiv:2407.12248  [pdf, other

    cs.DC

    Mitigating Interference of Microservices with a Scoring Mechanism in Large-scale Clusters

    Authors: Dingyu Yang, Kangpeng Zheng, Shiyou Qian, Jian Cao, Guangtao Xue

    Abstract: Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we ob… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  13. arXiv:2407.12067  [pdf, other

    cs.CV cs.LG

    MaskVD: Region Masking for Efficient Video Object Detection

    Authors: Sreetama Sarkar, Gourav Datta, Souvik Kundu, Kai Zheng, Chirayata Bhattacharyya, Peter A. Beerel

    Abstract: Video tasks are compute-heavy and thus pose a challenge when deploying in real-time applications, particularly for tasks that require state-of-the-art Vision Transformers (ViTs). Several research efforts have tried to address this challenge by leveraging the fact that large portions of the video undergo very little change across frames, leading to redundant computations in frame-based video proces… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  14. arXiv:2407.09024  [pdf, other

    cs.LG

    Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control

    Authors: Huayu Chen, Kaiwen Zheng, Hang Su, Jun Zhu

    Abstract: Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generaliz… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  15. arXiv:2406.15735  [pdf, other

    cs.CV cs.AI

    Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

    Authors: Min Zhao, Hongzhou Zhu, Chendong Xiang, Kaiwen Zheng, Chongxuan Li, Jun Zhu

    Abstract: Diffusion models have obtained substantial progress in image-to-video (I2V) generation. However, such models are not fully understood. In this paper, we report a significant but previously overlooked issue in I2V diffusion models (I2V-DMs), namely, conditional image leakage. I2V-DMs tend to over-rely on the conditional image at large time steps, neglecting the crucial task of predicting the clean… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Project page: https://cond-image-leak.github.io/

  16. arXiv:2406.14015  [pdf, other

    cs.LG

    CohortNet: Empowering Cohort Discovery for Interpretable Healthcare Analytics

    Authors: Qingpeng Cai, Kaiping Zheng, H. V. Jagadish, Beng Chin Ooi, James Yip

    Abstract: Cohort studies are of significant importance in the field of healthcare analysis. However, existing methods typically involve manual, labor-intensive, and expert-driven pattern definitions or rely on simplistic clustering techniques that lack medical relevance. Automating cohort studies with interpretable patterns has great potential to facilitate healthcare analysis but remains an unmet need in p… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 10 pages, 12 figures

  17. arXiv:2406.12707  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

    Authors: Haoqiu Yan, Yongxin Zhu, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu

    Abstract: Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 9 pages, 3 figures, ACL24 accepted

  18. arXiv:2406.11357  [pdf, other

    cs.CL cs.AI

    Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities

    Authors: Zhonghao Li, Xuming Hu, Aiwei Liu, Kening Zheng, Sirui Huang, Hui Xiong

    Abstract: Large Language Models (LLMs) are limited by their parametric knowledge, leading to hallucinations in knowledge-extensive tasks. To address this, Retrieval-Augmented Generation (RAG) incorporates external document chunks to expand LLM knowledge. Furthermore, compressing information from document chunks through extraction or summarization can improve LLM performance. Nonetheless, LLMs still struggle… ▽ More

    Submitted 17 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages

  19. arXiv:2406.09305  [pdf, other

    cs.CV

    Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

    Authors: Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun

    Abstract: In subject-driven text-to-image generation, recent works have achieved superior performance by training the model on synthetic datasets containing numerous image pairs. Trained on these datasets, generative models can produce text-aligned images for specific subject from arbitrary testing image in a zero-shot manner. They even outperform methods which require additional fine-tuning on testing imag… ▽ More

    Submitted 7 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  20. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 29 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  21. arXiv:2406.07146  [pdf, other

    cs.CV cs.AI

    Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

    Authors: Che Liu, Zhongwei Wan, Yuqi Wang, Hui Shen, Haozhe Wang, Kangyu Zheng, Mi Zhang, Rossella Arcucci

    Abstract: Automatic radiology report generation can significantly benefit the labor-intensive process of report writing by radiologists, especially for 3D radiographs like CT scans, which are crucial for broad clinical diagnostics yet underexplored compared to 2D radiographs. Existing methods often handle 3D volumes either slice-wise or with aggressive downsampling due to current GPU memory limitations, whi… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  22. arXiv:2406.06776  [pdf, other

    cs.CV cs.LG

    SeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models

    Authors: James Lowman, Kelly Liu Zheng, Roydon Fraser, Jesse Van Griensven The, Mojtaba Valipour

    Abstract: SeeFar is an evolving collection of multi-resolution satellite images from public and commercial satellites. We specifically curated this dataset for training geospatial foundation models, unconstrained by satellite type. In recent years, advances in technology have made satellite imagery more accessible than ever. More earth-observing satellites have been launched in the last five years than in t… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Work in Progress!

  23. arXiv:2406.03403  [pdf, other

    cs.LG cs.AI q-bio.QM

    Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?

    Authors: Kangyu Zheng, Yingzhou Lu, Zaixi Zhang, Zhongwei Wan, Yao Ma, Marinka Zitnik, Tianfan Fu

    Abstract: Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the perfo… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  24. arXiv:2405.18756  [pdf, other

    cs.LG cs.AI cs.CV stat.AP stat.ML

    Provable Contrastive Continual Learning

    Authors: Yichen Wen, Zhiquan Tan, Kaipeng Zheng, Chuanlong Xie, Weiran Huang

    Abstract: Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contrastive continual learning framework lacks convincing theoretical explanations. In this work, we fill… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  25. arXiv:2405.16928  [pdf

    cs.SI cs.GT

    TopoLa: a novel embedding framework for understanding complex networks

    Authors: Kai Zheng, Qilong Feng, Yaohang Li, Qichang Zhao, Jinhui Xu, Jianxin Wang

    Abstract: Complex networks, which are the abstractions of many real-world systems, present a persistent challenge across disciplines for people to decipher their underlying information. Recently, hyperbolic geometry of latent spaces has gained traction in network analysis, due to its ability to preserve certain local intrinsic properties of the nodes. In this study, we explore the problem from a much broade… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 85 pages, 17 figures

  26. arXiv:2405.15885  [pdf, other

    cs.LG stat.ML

    Diffusion Bridge Implicit Models

    Authors: Kaiwen Zheng, Guande He, Jianfei Chen, Fan Bao, Jun Zhu

    Abstract: Denoising diffusion bridge models (DDBMs) are a powerful variant of diffusion models for interpolating between two arbitrary paired distributions given as endpoints. Despite their promising performance in tasks like image translation, DDBMs require a computationally intensive sampling process that involves the simulation of a (stochastic) differential equation through hundreds of network evaluatio… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  27. arXiv:2405.15325  [pdf, other

    cs.LG stat.ML

    On the Identification of Temporally Causal Representation with Instantaneous Dependence

    Authors: Zijian Li, Yifan Shen, Kaitao Zheng, Ruichu Cai, Xiangchen Song, Mingming Gong, Zhengmao Zhu, Guangyi Chen, Kun Zhang

    Abstract: Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observa… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  28. arXiv:2405.10951  [pdf, other

    cs.CV cs.LG

    Block Selective Reprogramming for On-device Training of Vision Transformers

    Authors: Sreetama Sarkar, Souvik Kundu, Kai Zheng, Peter A. Beerel

    Abstract: The ubiquity of vision transformers (ViTs) for various edge applications, including personalized learning, has created the demand for on-device fine-tuning. However, training with the limited memory and computation power of edge devices remains a significant challenge. In particular, the memory required for training is much higher than that needed for inference, primarily due to the need to store… ▽ More

    Submitted 25 March, 2024; originally announced May 2024.

  29. arXiv:2405.04844  [pdf, ps, other

    cs.IR

    Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

    Authors: Kai Zheng, Haijun Zhao, Rui Huang, Beichuan Zhang, Na Mou, Yanan Niu, Yang Song, Hongning Wang, Kun Gai

    Abstract: The Probability Ranking Principle (PRP) has been considered as the foundational standard in the design of information retrieval (IR) systems. The principle requires an IR module's returned list of results to be ranked with respect to the underlying user interests, so as to maximize the results' utility. Nevertheless, we point out that it is inappropriate to indiscriminately apply PRP through eve… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by WWW 2024

  30. arXiv:2405.04233  [pdf, other

    cs.CV cs.LG

    Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models

    Authors: Fan Bao, Chendong Xiang, Gang Yue, Guande He, Hongzhou Zhu, Kaiwen Zheng, Min Zhao, Shilong Liu, Yaole Wang, Jun Zhu

    Abstract: We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation. Vidu is a diffusion model with U-ViT as its backbone, which unlocks the scalability and the capability for handling long videos. Vidu exhibits strong coherence and dynamism, and is capable of generating both realistic and imaginative videos, as well as un… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Project page at https://www.shengshu-ai.com/vidu

  31. arXiv:2405.03409  [pdf, other

    cs.LG

    LightTR: A Lightweight Framework for Federated Trajectory Recovery

    Authors: Ziqiao Liu, Hao Miao, Yan Zhao, Chenxi Liu, Kai Zheng, Huan Li

    Abstract: With the proliferation of GPS-equipped edge devices, huge trajectory data is generated and accumulated in various domains, motivating a variety of urban applications. Due to the limited acquisition capabilities of edge devices, a lot of trajectories are recorded at a low sampling rate, which may lead to the effectiveness drop of urban applications. We aim to recover a high-sampled trajectory based… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: The paper was accepted by ICDE 2024

  32. arXiv:2404.14999  [pdf, other

    cs.DB cs.LG

    A Unified Replay-based Continuous Learning Framework for Spatio-Temporal Prediction on Streaming Data

    Authors: Hao Miao, Yan Zhao, Chenjuan Guo, Bin Yang, Kai Zheng, Feiteng Huang, Jiandong Xie, Christian S. Jensen

    Abstract: The widespread deployment of wireless and mobile devices results in a proliferation of spatio-temporal data that is used in applications, e.g., traffic prediction, human mobility mining, and air quality prediction, where spatio-temporal prediction is often essential to enable safety, predictability, or reliability. Many recent proposals that target deep learning for spatio-temporal prediction suff… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by ICDE 2024

  33. arXiv:2404.11450  [pdf, other

    cs.DB cs.CR

    Real-Time Trajectory Synthesis with Local Differential Privacy

    Authors: Yujia Hu, Yuntao Du, Zhikun Zhang, Ziquan Fang, Lu Chen, Kai Zheng, Yunjun Gao

    Abstract: Trajectory streams are being generated from location-aware devices, such as smartphones and in-vehicle navigation systems. Due to the sensitive nature of the location data, directly sharing user trajectories suffers from privacy leakage issues. Local differential privacy (LDP), which perturbs sensitive data on the user side before it is shared or analyzed, emerges as a promising solution for priva… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by ICDE 2024. Code is available at: https://github.com/ZJU-DAILY/RetraSyn

  34. arXiv:2404.09520  [pdf, other

    cs.IR

    UniSAR: Modeling User Transition Behaviors between Search and Recommendation

    Authors: Teng Shi, Zihua Si, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Dewei Leng, Yanan Niu, Yang Song

    Abstract: Nowadays, many platforms provide users with both search and recommendation services as important tools for accessing information. The phenomenon has led to a correlation between user search and recommendation behaviors, providing an opportunity to model user interests in a fine-grained way. Existing approaches either model user search and recommendation behaviors separately or overlook the differe… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024

  35. arXiv:2404.07441  [pdf, ps, other

    cs.CC cs.DS

    Near Optimal Alphabet-Soundness Tradeoff PCPs

    Authors: Dor Minzer, Kai Zhe Zheng

    Abstract: We show that for all $\varepsilon>0$, for sufficiently large prime power $q$, for all $δ>0$, it is NP-hard to distinguish whether a 2-Prover-1-Round projection game with alphabet size $q$ has value at least $1-δ$, or value at most $1/q^{(1-ε)}$. This establishes a nearly optimal alphabet-to-soundness tradeoff for 2-query PCPs with alphabet size $q$, improving upon a result of [Chan 2016]. Our resu… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: STOC 2024, 91 pages

  36. arXiv:2404.05673  [pdf, other

    cs.CV

    CoReS: Orchestrating the Dance of Reasoning and Segmentation

    Authors: Xiaoyi Bao, Siyang Sun, Shuailei Ma, Kecheng Zheng, Yuxin Guo, Guosheng Zhao, Yun Zheng, Xingang Wang

    Abstract: The reasoning segmentation task, which demands a nuanced comprehension of intricate queries to accurately pinpoint object regions, is attracting increasing attention. However, Multi-modal Large Language Models (MLLM) often find it difficult to accurately localize the objects described in complex reasoning contexts. We believe that the act of reasoning segmentation should mirror the cognitive stage… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at ECCV 2024

  37. arXiv:2404.01177  [pdf, other

    cs.CR cs.IR

    Poisoning Decentralized Collaborative Recommender System and Its Countermeasures

    Authors: Ruiqi Zheng, Liang Qu, Tong Chen, Kai Zheng, Yuhui Shi, Hongzhi Yin

    Abstract: To make room for privacy and efficiency, the deployment of many recommender systems is experiencing a shift from central servers to personal devices, where the federated recommender systems (FedRecs) and decentralized collaborative recommender systems (DecRecs) are arguably the two most representative paradigms. While both leverage knowledge (e.g., gradients) sharing to facilitate learning local m… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  38. arXiv:2403.17688  [pdf, other

    cs.IR

    Large Language Models Enhanced Collaborative Filtering

    Authors: Zhongxiang Sun, Zihua Si, Xiaoxue Zang, Kai Zheng, Yang Song, Xiao Zhang, Jun Xu

    Abstract: Recent advancements in Large Language Models (LLMs) have attracted considerable interest among researchers to leverage these models to enhance Recommender Systems (RSs). Existing work predominantly utilizes LLMs to generate knowledge-rich texts or utilizes LLM-derived embeddings as features to improve RSs. Although the extensive world knowledge embedded in LLMs generally benefits RSs, the applicat… ▽ More

    Submitted 23 July, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by CIKM 2024

  39. arXiv:2403.17007  [pdf, other

    cs.CV

    DreamLIP: Language-Image Pre-training with Long Captions

    Authors: Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen

    Abstract: Language-image pre-training largely relies on how precisely and thoroughly a text describes its paired image. In practice, however, the contents of an image can be so rich that well describing them requires lengthy captions (e.g., with 10 sentences), which are usually missing in existing datasets. Consequently, there are currently no clear evidences on whether and how language-image pre-training c… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  40. arXiv:2403.14151  [pdf, other

    cs.LG cs.AI cs.CY cs.DB

    Deep Learning for Trajectory Data Management and Mining: A Survey and Beyond

    Authors: Wei Chen, Yuxuan Liang, Yuanshao Zhu, Yanchuan Chang, Kang Luo, Haomin Wen, Lei Li, Yanwei Yu, Qingsong Wen, Chao Chen, Kai Zheng, Yunjun Gao, Xiaofang Zhou, Yu Zheng

    Abstract: Trajectory computing is a pivotal domain encompassing trajectory data management and mining, garnering widespread attention due to its crucial role in various practical applications such as location services, urban traffic, and public safety. Traditional methods, focusing on simplistic spatio-temporal features, face challenges of complex calculations, limited scalability, and inadequate adaptabili… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 25 pages, 12 figures, 5 tables

  41. arXiv:2403.12995  [pdf, other

    q-bio.BM cs.CE cs.LG

    ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling

    Authors: Kangjie Zheng, Siyu Long, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, Hao Zhou

    Abstract: Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small mole… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: ICML2024 camera-ready, update some experimental results, add github url, fix some typos

  42. arXiv:2403.12922  [pdf, other

    cs.CV

    Contextual AD Narration with Interleaved Multimodal Sequence

    Authors: Hanlin Wang, Zhan Tong, Kecheng Zheng, Yujun Shen, Limin Wang

    Abstract: The Audio Description (AD) task aims to generate descriptions of visual elements for visually impaired individuals to help them access long-form video contents, like movie. With video feature, text, character bank and context information as inputs, the generated ADs are able to correspond to the characters by name and provide reasonable, contextual descriptions to help audience understand the stor… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  43. arXiv:2403.09167  [pdf, other

    cs.CL

    Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse

    Authors: Jianwei Sun, Chaoyang Mei, Linlin Wei, Kaiyu Zheng, Na Liu, Ming Cui, Tianyi Li

    Abstract: The efficacy of large language models (LLMs) is heavily dependent on the quality of the underlying data, particularly within specialized domains. A common challenge when fine-tuning LLMs for domain-specific applications is the potential degradation of the model's generalization capabilities. To address these issues, we propose a two-stage approach for the construction of production prompts designe… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  44. arXiv:2403.09039  [pdf, other

    cs.LG cs.AI

    Detecting Anomalies in Dynamic Graphs via Memory enhanced Normality

    Authors: Jie Liu, Xuequn Shang, Xiaolin Han, Kai Zheng, Hongzhi Yin

    Abstract: Anomaly detection in dynamic graphs presents a significant challenge due to the temporal evolution of graph structures and attributes. The conventional approaches that tackle this problem typically employ an unsupervised learning framework, capturing normality patterns with exclusive normal data during training and identifying deviations as anomalies during testing. However, these methods face cri… ▽ More

    Submitted 14 August, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  45. arXiv:2402.11769  [pdf, other

    eess.SY cs.GT math.OC

    Connection-Aware P2P Trading: Simultaneous Trading and Peer Selection

    Authors: Cheng Feng, Kedi Zheng, Lanqing Shan, Hani Alers, Lampros Stergioulas, Hongye Guo, Qixin Chen

    Abstract: Peer-to-peer (P2P) trading is seen as a viable solution to handle the growing number of distributed energy resources in distribution networks. However, when dealing with large-scale consumers, there are several challenges that must be addressed. One of these challenges is limited communication capabilities. Additionally, prosumers may have specific preferences when it comes to trading. Both can re… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE PES Transactions

  46. arXiv:2402.11148  [pdf, other

    cs.LG cs.CV

    Knowledge Distillation Based on Transformed Teacher Matching

    Authors: Kaixiang Zheng, En-Hui Yang

    Abstract: As a technique to bridge logit matching and probability distribution matching, temperature scaling plays a pivotal role in knowledge distillation (KD). Conventionally, temperature scaling is applied to both teacher's logits and student's logits in KD. Motivated by some recent works, in this paper, we drop instead temperature scaling on the student side, and systematically study the resulting varia… ▽ More

    Submitted 7 March, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Published as a conference paper at ICLR 2024

  47. arXiv:2402.10398  [pdf, ps, other

    cs.SE

    Prompt Learning for Multi-Label Code Smell Detection: A Promising Approach

    Authors: Haiyang Liu, Yang Zhang, Vidya Saikrishna, Quanquan Tian, Kun Zheng

    Abstract: Code smells indicate the potential problems of software quality so that developers can identify refactoring opportunities by detecting code smells. State-of-the-art approaches leverage heuristics, machine learning, and deep learning to detect code smells. However, existing approaches have not fully explored the potential of large language models (LLMs). In this paper, we propose \textit{PromptSmel… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  48. arXiv:2402.09165  [pdf, other

    cs.LG

    Unifying Invariance and Spuriousity for Graph Out-of-Distribution via Probability of Necessity and Sufficiency

    Authors: Xuexin Chen, Ruichu Cai, Kaitao Zheng, Zhifan Jiang, Zhengting Huang, Zhifeng Hao, Zijian Li

    Abstract: Graph Out-of-Distribution (OOD), requiring that models trained on biased data generalize to the unseen test data, has a massive of real-world applications. One of the most mainstream methods is to extract the invariant subgraph by aligning the original and augmented data with the help of environment augmentation. However, these solutions might lead to the loss or redundancy of semantic subgraph an… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  49. arXiv:2402.05154  [pdf, other

    cs.SI cs.AI

    Adaptive Hypergraph Network for Trust Prediction

    Authors: Rongwei Xu, Guanfeng Liu, Yan Wang, Xuyun Zhang, Kai Zheng, Xiaofang Zhou

    Abstract: Trust plays an essential role in an individual's decision-making. Traditional trust prediction models rely on pairwise correlations to infer potential relationships between users. However, in the real world, interactions between users are usually complicated rather than pairwise only. Hypergraphs offer a flexible approach to modeling these complex high-order correlations (not just pairwise connect… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  50. arXiv:2401.14583  [pdf, other

    cs.IR

    Physical Trajectory Inference Attack and Defense in Decentralized POI Recommendation

    Authors: Jing Long, Tong Chen, Guanhua Ye, Kai Zheng, Nguyen Quoc Viet Hung, Hongzhi Yin

    Abstract: As an indispensable personalized service within Location-Based Social Networks (LBSNs), the Point-of-Interest (POI) recommendation aims to assist individuals in discovering attractive and engaging places. However, the accurate recommendation capability relies on the powerful server collecting a vast amount of users' historical check-in data, posing significant risks of privacy breaches. Although s… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.