Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 578 results for author: Ding, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18375  [pdf, other

    cs.CV

    From Majority to Minority: A Diffusion-based Augmentation for Underrepresented Groups in Skin Lesion Analysis

    Authors: Janet Wang, Yunsung Chung, Zhengming Ding, Jihun Hamm

    Abstract: AI-based diagnoses have demonstrated dermatologist-level performance in classifying skin cancer. However, such systems are prone to under-performing when tested on data from minority groups that lack sufficient representation in the training sets. Although data collection and annotation offer the best means for promoting minority groups, these processes are costly and time-consuming. Prior works h… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.18284  [pdf, other

    cs.CV

    RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

    Authors: Xiaozhong Ji, Chuming Lin, Zhonggan Ding, Ying Tai, Jian Yang, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Donghao Luo, Chengjie Wang

    Abstract: Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical applications. The challenges are two-fold: 1) Preserving unique individual traits for achieving high-precision lip synchronization. 2) Generating high-qual… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.18204  [pdf, other

    cs.NI

    Analysis of Channel Uncertainty in Trusted Wireless Services via Repeated Interactions

    Authors: Bingwen Chen, Xintong Ling, Weihang Cao, Jiaheng Wang, Zhi Ding

    Abstract: The coexistence of heterogeneous sub-networks in 6G poses new security and trust concerns and thus calls for a perimeterless-security model. Blockchain radio access network (B-RAN) provides a trust-building approach via repeated interactions rather than relying on pre-established trust or central authentication. Such a trust-building process naturally supports dynamic trusted services across vario… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.17764  [pdf, other

    cs.CL cs.AI

    BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning

    Authors: Ercong Nie, Bo Shao, Zifeng Ding, Mingyang Wang, Helmut Schmid, Hinrich Schütze

    Abstract: Large language models (LLMs) possess extensive parametric knowledge, but this knowledge is difficult to update with new information because retraining is very expensive and infeasible for closed-source models. Knowledge editing (KE) has emerged as a viable solution for updating the knowledge of LLMs without compromising their overall performance. On-the-fly KE methods, inspired by in-context learn… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  5. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  6. arXiv:2406.14556  [pdf, other

    cs.RO cs.CV

    Asynchronous Large Language Model Enhanced Planner for Autonomous Driving

    Authors: Yuan Chen, Zi-han Ding, Ziqin Wang, Yan Wang, Lijun Zhang, Si Liu

    Abstract: Despite real-time planners exhibiting remarkable performance in autonomous driving, the growing exploration of Large Language Models (LLMs) has opened avenues for enhancing the interpretability and controllability of motion planning. Nevertheless, LLM-based planners continue to encounter significant challenges, including elevated resource consumption and extended inference times, which pose substa… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  7. arXiv:2406.11455  [pdf, other

    cs.CL cs.AI

    Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction

    Authors: Zepeng Ding, Ruiyang Ke, Wenhao Huang, Guochao Jiang, Yanda Li, Deqing Yang, Yanghua Xiao, Jiaqing Liang

    Abstract: Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. However, their extraction behavior on complex sentences and tasks is unstable, emerging issues such as false positives and missing elements. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performan… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.09881  [pdf, other

    cs.CL

    A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation

    Authors: Yongkang Liu, Ercong Nie, Zheng Hua, Zifeng Ding, Daling Wang, Yifei Zhang, Hinrich Schütze

    Abstract: Current state-of-the-art dialogue systems heavily rely on extensive training datasets. However, challenges arise in domains where domain-specific training datasets are insufficient or entirely absent. To tackle this challenge, we propose a novel data \textbf{A}ugmentation framework for \textbf{M}ulti-\textbf{D}omain \textbf{D}ialogue \textbf{G}eneration, referred to as \textbf{AMD$^2$G}. The AMD… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 17pages,ECML-PKDD

    Journal ref: 2024 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

  9. arXiv:2406.09606  [pdf, other

    cs.LG cs.AI cs.AR

    Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis

    Authors: Zongyue Qin, Yunsheng Bai, Atefeh Sograbizadeh, Zijian Ding, Ziniu Hu, Yizhou Sun, Jason Cong

    Abstract: In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written in C/C++ into a design with low-level hardware description languages that eventually synthesize DSAs on circuits. However, creating a high-quality… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages, 8 figures. arXiv admin note: text overlap with arXiv:2305.10838

  10. Practical, Automated Scenario-based Mobile App Testing

    Authors: Shengcheng Yu, Chunrong Fang, Mingzhe Du, Zimin Ding, Zhenyu Chen, Zhendong Su

    Abstract: The importance of mobile application (app) quality insurance is increasing with the rapid development of the mobile Internet. Automated test generation approaches, as a dominant direction of app quality insurance, follow specific models or strategies, targeting at optimizing the code coverage. Such approaches lead to a huge gap between testing execution and app business logic. Test scripts develop… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transaction on Software Engineering in 2024

  11. arXiv:2406.03508  [pdf, other

    cs.LG cs.AI cs.CR

    Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders

    Authors: Tingxu Han, Weisong Sun, Ziqi Ding, Chunrong Fang, Hanwei Qian, Jiaxun Li, Zhenyu Chen, Xiangyu Zhang

    Abstract: Self-supervised learning (SSL) is increasingly attractive for pre-training encoders without requiring labeled data. Downstream tasks built on top of those pre-trained encoders can achieve nearly state-of-the-art performance. The pre-trained encoders by SSL, however, are vulnerable to backdoor attacks as demonstrated by existing studies. Numerous backdoor mitigation techniques are designed for down… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  12. arXiv:2406.02081  [pdf, other

    cs.MA cs.AI cs.LG

    FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

    Authors: Wenzhe Li, Zihan Ding, Seth Karten, Chi Jin

    Abstract: Recent advances in reinforcement learning (RL) heavily rely on a variety of well-designed benchmarks, which provide environmental platforms and consistent criteria to evaluate existing and novel algorithms. Specifically, in multi-agent RL (MARL), a plethora of benchmarks based on cooperative games have spurred the development of algorithms that improve the scalability of cooperative multi-agent sy… ▽ More

    Submitted 23 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  13. arXiv:2406.01956  [pdf, other

    cs.CV

    Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt

    Authors: Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li

    Abstract: This paper presents a novel approach to enhance image-to-image generation by leveraging the multimodal capabilities of the Large Language and Vision Assistant (LLaVA). We propose a framework where LLaVA analyzes input images and generates textual descriptions, hereinafter LLaVA-generated prompts. These prompts, along with the original image, are fed into the image-to-image generation pipeline. Thi… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 5th International Conference on Information Science, Parallel and Distributed Systems

  14. arXiv:2406.01213  [pdf, other

    cs.CL cs.AI

    Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

    Authors: Zhuojun Ding, Wei Wei, Xiaoye Qu, Dangyang Chen

    Abstract: Cross-lingual named entity recognition (NER) aims to train an NER model for the target language leveraging only labeled source language data and unlabeled target language data. Prior approaches either perform label projection on translated source language data or employ a source model to assign pseudo labels for target language data and train a target model on these pseudo-labeled data to generali… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI 2024

  15. arXiv:2406.00990  [pdf, other

    cs.LG cs.RO

    Constraint-Aware Diffusion Models for Trajectory Optimization

    Authors: Anjian Li, Zihan Ding, Adji Bousso Dieng, Ryne Beeson

    Abstract: The diffusion model has shown success in generating high-quality and diverse solutions to trajectory optimization problems. However, diffusion models with neural networks inevitably make prediction errors, which leads to constraint violations such as unmet goals or collisions. This paper presents a novel constraint-aware diffusion model for trajectory optimization. We introduce a novel hybrid loss… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  16. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, Jingyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  17. arXiv:2405.19027  [pdf, other

    cs.DC

    A Dual-functional Blockchain Framework for Solving Distributed Optimization

    Authors: Weihang Cao, Xintong Ling, Jiaheng Wang, Xiqi Gao, Zhi Ding

    Abstract: Proof of Work (PoW) has been extensively utilized as the foundation of blockchain's security, consistency, and tamper-resistance. However, long has it been criticized for its tremendous and inefficient utilization of computational power and energy. In this work, we design a dual-functional blockchain framework that uses solving optimization problems to reach consensus as an alternative to PoW, cha… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  18. arXiv:2405.17512  [pdf, other

    cs.LG cs.AI cs.CY

    On Fairness of Low-Rank Adaptation of Large Models

    Authors: Zhoujie Ding, Ken Ziyu Liu, Pura Peetathawatchai, Berivan Isik, Sanmi Koyejo

    Abstract: Low-rank adaptation of large models, particularly LoRA, has gained traction due to its computational efficiency. This efficiency, contrasted with the prohibitive costs of full-model fine-tuning, means that practitioners often turn to LoRA and sometimes without a complete understanding of its ramifications. In this study, we focus on fairness and ask whether LoRA has an unexamined impact on utility… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  19. arXiv:2405.17069  [pdf, other

    cs.CV cs.LG

    Training-free Editioning of Text-to-Image Models

    Authors: Jinqi Wang, Yunfei Fu, Zhangcan Ding, Bailin Deng, Yu-Kun Lai, Yipeng Qin

    Abstract: Inspired by the software industry's practice of offering different editions or versions of a product tailored to specific user groups or use cases, we propose a novel task, namely, training-free editioning, for text-to-image models. Specifically, we aim to create variations of a base text-to-image model without retraining, enabling the model to cater to the diverse needs of different user groups o… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  20. arXiv:2405.17067  [pdf, other

    cs.CL cs.AI

    Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization

    Authors: Dixuan Wang, Yanda Li, Junyuan Jiang, Zepeng Ding, Guochao Jiang, Jiaqing Liang, Deqing Yang

    Abstract: Large Language Models (LLMs) have shown remarkable capabilities in language understanding and generation. Nonetheless, it was also witnessed that LLMs tend to produce inaccurate responses to specific queries. This deficiency can be traced to the tokenization step LLMs must undergo, which is an inevitable limitation inherent to all LLMs. In fact, incorrect tokenization is the critical point that hi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 17 pages, 3 figures, this paper is submitted to neurips 2024

  21. arXiv:2405.16283  [pdf, other

    cs.DC

    TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload

    Authors: Zhimin Ding, Jiawen Yao, Brianna Barrow, Tania Lorido Botran, Christopher Jermaine, Yuxin Tang, Jiehui Li, Xinyu Yao, Sleem Mahmoud Abdelghafar, Daniel Bourgeois

    Abstract: An obvious way to alleviate memory difficulties in GPU-based AI computing is via CPU offload, where data are moved between GPU and CPU RAM, so inexpensive CPU RAM is used to increase the amount of storage available. While CPU offload is an obvious idea, it can greatly slow down a computation, due to the relatively slow transfer rate between CPU RAM and GPU RAM. Thus, any system for CPU offload nee… ▽ More

    Submitted 27 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  22. arXiv:2405.16085  [pdf, other

    cs.CV

    Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration

    Authors: Junjie Gao, Chongjian Wang, Zhongjun Ding, Shuangmin Chen, Shiqing Xin, Changhe Tu, Wenping Wang

    Abstract: In the realm of point cloud registration, the most prevalent pose evaluation approaches are statistics-based, identifying the optimal transformation by maximizing the number of consistent correspondences. However, registration recall decreases significantly when point clouds exhibit a low overlap rate, despite efforts in designing feature descriptors and establishing correspondences. In this paper… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 22 pages, 16 figures

  23. arXiv:2405.15336  [pdf, other

    cs.RO eess.IV

    An iterative closest point algorithm for marker-free 3D shape registration of continuum robots

    Authors: Matthias K. Hoffmann, Julian Mühlenhoff, Zhaoheng Ding, Thomas Sattel, Kathrin Flaßkamp

    Abstract: Continuum robots have emerged as a promising technology in the medical field due to their potential of accessing deep sited locations of the human body with low surgical trauma. When deriving physics-based models for these robots, evaluating the models poses a significant challenge due to the difficulty in accurately measuring their intricate shapes. In this work, we present an optimization based… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 11 pages, 8 figures, 2 algorithms, journal

  24. arXiv:2405.12751  [pdf, other

    cs.CR

    A Stealthy Backdoor Attack for Without-Label-Sharing Split Learning

    Authors: Yuwen Pu, Zhuoyuan Ding, Jiahao Chen, Chunyi Zhou, Qingming Li, Chunqiang Hu, Shouling Ji

    Abstract: As a novel privacy-preserving paradigm aimed at reducing client computational costs and achieving data utility, split learning has garnered extensive attention and proliferated widespread applications across various fields, including smart health and smart transportation, among others. While recent studies have primarily concentrated on addressing privacy leakage concerns in split learning, such a… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 15 pages

  25. arXiv:2405.10515  [pdf

    cs.LG

    Improved AdaBoost for Virtual Reality Experience Prediction Based on Long Short-Term Memory Network

    Authors: Wenhan Fan, Zhicheng Ding, Ruixin Huang, Chang Zhou, Xuyang Zhang

    Abstract: A classification prediction algorithm based on Long Short-Term Memory Network (LSTM) improved AdaBoost is used to predict virtual reality (VR) user experience. The dataset is randomly divided into training and test sets in the ratio of 7:3.During the training process, the model's loss value decreases from 0.65 to 0.31, which shows that the model gradually reduces the discrepancy between the predic… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  26. arXiv:2405.07977  [pdf, other

    q-bio.QM cs.LG q-bio.NC

    A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds

    Authors: Anton Orlichenko, Gang Qu, Ziyu Zhou, Anqi Liu, Hong-Wen Deng, Zhengming Ding, Julia M. Stephen, Tony W. Wilson, Vince D. Calhoun, Yu-Ping Wang

    Abstract: Objective: fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized re… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 12 pages

  27. arXiv:2405.05512  [pdf, other

    cs.LG cs.AI math.NA math.ST

    Characteristic Learning for Provable One Step Generation

    Authors: Zhao Ding, Chenguang Duan, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang, Pingwen Zhang

    Abstract: We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, We estimate the velocity field t… ▽ More

    Submitted 13 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  28. arXiv:2405.04960  [pdf, other

    cs.CL

    P-ICL: Point In-Context Learning for Named Entity Recognition with Large Language Models

    Authors: Guochao Jiang, Zepeng Ding, Yuchen Shi, Deqing Yang

    Abstract: In recent years, the rise of large language models (LLMs) has made it possible to directly achieve named entity recognition (NER) without any demonstration samples or only using a few samples through in-context learning (ICL). However, standard ICL only helps LLMs understand task instructions, format and input-label mapping, but neglects the particularity of the NER task itself. In this paper, we… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  29. arXiv:2405.03764  [pdf, other

    cs.CL cs.IR

    GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation

    Authors: Wenjie Zhou, Zhenxin Ding, Xiaodong Zhang, Haibo Shi, Junfeng Wang, Dawei Yin

    Abstract: Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. For practical deployment, it is critical to carry out knowledge distillation to preserve high performance under computational constraints. In this paper, we address a key question: given the importance of unsupervised distillation for student performance, how does one effe… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  30. arXiv:2404.18196  [pdf, other

    cs.HC

    Towards Intent-based User Interfaces: Charting the Design Space of Intent-AI Interactions Across Task Types

    Authors: Zijian Ding

    Abstract: Technological advances continue to redefine the dynamics of human-machine interactions, particularly in task execution. This proposal responds to the advancements in Generative AI by outlining a research plan that probes intent-AI interaction across a diverse set of tasks: fixed-scope content curation task, atomic creative tasks, and complex and interdependent tasks. This exploration aims to infor… ▽ More

    Submitted 2 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  31. arXiv:2404.17123  [pdf

    cs.CL cs.LG

    Text Sentiment Analysis and Classification Based on Bidirectional Gated Recurrent Units (GRUs) Model

    Authors: Wei Xu, Jianlong Chen, Zhicheng Ding, Jinyin Wang

    Abstract: This paper explores the importance of text sentiment analysis and classification in the field of natural language processing, and proposes a new approach to sentiment analysis and classification based on the bidirectional gated recurrent units (GRUs) model. The study firstly analyses the word cloud model of the text with six sentiment labels, and then carries out data preprocessing, including the… ▽ More

    Submitted 12 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: accepted by the 2nd International Conference on Software Engineering and Machine Learning (CONF-SEML 2024)

  32. arXiv:2404.14441  [pdf

    cs.CV cs.AI cs.LG eess.IV

    Optimizing Contrail Detection: A Deep Learning Approach with EfficientNet-b4 Encoding

    Authors: Qunwei Lin, Qian Leng, Zhicheng Ding, Chao Yan, Xiaonan Xu

    Abstract: In the pursuit of environmental sustainability, the aviation industry faces the challenge of minimizing its ecological footprint. Among the key solutions is contrail avoidance, targeting the linear ice-crystal clouds produced by aircraft exhaust. These contrails exacerbate global warming by trapping atmospheric heat, necessitating precise segmentation and comprehensive analysis of contrail images… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  33. arXiv:2404.13880  [pdf, other

    cs.CV

    Regional Style and Color Transfer

    Authors: Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li, Qingtian Gong

    Abstract: This paper presents a novel contribution to the field of regional style transfer. Existing methods often suffer from the drawback of applying style homogeneously across the entire image, leading to stylistic inconsistencies or foreground object twisted when applied to image with foreground elements such as person figures. To address this limitation, we propose a new approach that leverages a segme… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by 2024 5th International Conference on Computer Vision, Image and Deep Learning

  34. arXiv:2404.13812  [pdf, other

    cs.SI cs.AI

    A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation

    Authors: Qikai Yang, Panfeng Li, Xinhe Xu, Zhicheng Ding, Wenjing Zhou, Yi Nian

    Abstract: In the ever-evolving landscape of social network advertising, the volume and accuracy of data play a critical role in the performance of predictive models. However, the development of robust predictive algorithms is often hampered by the limited size and potential bias present in real-world datasets. This study presents and explores a generative augmentation framework of social network advertising… ▽ More

    Submitted 28 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by 2024 4th International Conference on Machine Learning and Intelligent Systems Engineering (MLISE)

  35. arXiv:2404.13565  [pdf, other

    cs.CV cs.AI cs.CL

    Exploring Diverse Methods in Visual Question Answering

    Authors: Panfeng Li, Qikai Yang, Xieming Geng, Wenjing Zhou, Zhicheng Ding, Yi Nian

    Abstract: This study explores innovative methods for improving Visual Question Answering (VQA) using Generative Adversarial Networks (GANs), autoencoders, and attention mechanisms. Leveraging a balanced VQA dataset, we investigate three distinct strategies. Firstly, GAN-based approaches aim to generate answer embeddings conditioned on image and question inputs, showing potential but struggling with more com… ▽ More

    Submitted 20 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by 2024 5th International Conference on Electronic Communication and Artificial Intelligence

  36. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  37. arXiv:2404.09593  [pdf, other

    cs.CL

    Improving Recall of Large Language Models: A Model Collaboration Approach for Relational Triple Extraction

    Authors: Zepeng Ding, Wenhao Huang, Jiaqing Liang, Deqing Yang, Yanghua Xiao

    Abstract: Relation triple extraction, which outputs a set of triples from long sentences, plays a vital role in knowledge acquisition. Large language models can accurately extract triples from simple sentences through few-shot learning or fine-tuning when given appropriate instructions. However, they often miss out when extracting from complex sentences. In this paper, we design an evaluation-filtering fram… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted at LREC-COLING 2024 main conference

  38. arXiv:2404.09387  [pdf, other

    cs.CV cs.AI cs.LG

    RankCLIP: Ranking-Consistent Language-Image Pretraining

    Authors: Yiming Zhang, Zhuokai Zhao, Zhaorun Chen, Zhili Feng, Zenghui Ding, Yining Sun

    Abstract: Self-supervised contrastive learning models, such as CLIP, have set new benchmarks for vision-language models in many downstream tasks. However, their dependency on rigid one-to-one mappings overlooks the complex and often multifaceted relationships between and within texts and images. To this end, we introduce RANKCLIP, a novel pretraining method that extends beyond the rigid one-to-one matching… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: 12 pages, 4 figures, 6 tables. Code and model checkpoints are available at https://github.com/Jam1ezhang/RankCLIP

  39. arXiv:2404.09115  [pdf, other

    cs.CV

    GCC: Generative Calibration Clustering

    Authors: Haifeng Xia, Hai Huang, Zhengming Ding

    Abstract: Deep clustering as an important branch of unsupervised representation learning focuses on embedding semantically similar samples into the identical feature space. This core demand inspires the exploration of contrastive learning and subspace clustering. However, these solutions always rely on the basic assumption that there are sufficient and category-balanced samples for generating valid high-lev… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  40. arXiv:2404.06970  [pdf, other

    cs.CL

    Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning

    Authors: Peipei Liu, Gaosheng Wang, Ying Tong, Jian Liang, Zhenquan Ding, Hongsong Zhu

    Abstract: Few-shot named entity recognition can identify new types of named entities based on a few labeled examples. Previous methods employing token-level or span-level metric learning suffer from the computational burden and a large number of negative sample spans. In this paper, we propose the Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning (MsFNER), which splits the… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  41. arXiv:2404.04012  [pdf, ps, other

    cs.IT eess.SP

    Next Generation Multiple Access for IMT Towards 2030 and Beyond

    Authors: Zhiguo Ding, Robert Schober, Pingzhi Fan, H. Vincent Poor

    Abstract: Multiple access techniques are fundamental to the design of wireless communication systems, since many crucial components of such systems depend on the choice of the multiple access technique. Because of the importance of multiple access, there has been an ongoing quest during the past decade to develop next generation multiple access (NGMA). Among those potential candidates for NGMA, non-orthogon… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  42. arXiv:2404.03885  [pdf, ps, other

    cs.IT cs.DS eess.SP math.ST

    The ESPRIT algorithm under high noise: Optimal error scaling and noisy super-resolution

    Authors: Zhiyan Ding, Ethan N. Epperly, Lin Lin, Ruizhe Zhang

    Abstract: Subspace-based signal processing techniques, such as the Estimation of Signal Parameters via Rotational Invariant Techniques (ESPRIT) algorithm, are popular methods for spectral estimation. These algorithms can achieve the so-called super-resolution scaling under low noise conditions, surpassing the well-known Nyquist limit. However, the performance of these algorithms under high-noise conditions… ▽ More

    Submitted 22 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  43. arXiv:2404.03411  [pdf, ps, other

    cs.LG cs.CL cs.CR

    Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?

    Authors: Shuo Chen, Zhen Han, Bailan He, Zifeng Ding, Wenqian Yu, Philip Torr, Volker Tresp, Jindong Gu

    Abstract: Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproductio… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: technical report

  44. arXiv:2404.01843  [pdf, other

    cs.CV

    Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation

    Authors: Wangguandong Zheng, Haifeng Xia, Rui Chen, Ming Shao, Siyu Xia, Zhengming Ding

    Abstract: Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To ove… ▽ More

    Submitted 7 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  45. arXiv:2404.00144  [pdf, other

    eess.IV cs.CV

    An Interpretable Cross-Attentive Multi-modal MRI Fusion Framework for Schizophrenia Diagnosis

    Authors: Ziyu Zhou, Anton Orlichenko, Gang Qu, Zening Fu, Vince D Calhoun, Zhengming Ding, Yu-Ping Wang

    Abstract: Both functional and structural magnetic resonance imaging (fMRI and sMRI) are widely used for the diagnosis of mental disorder. However, combining complementary information from these two modalities is challenging due to their heterogeneity. Many existing methods fall short of capturing the interaction between these modalities, frequently defaulting to a simple combination of latent features. In t… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  46. arXiv:2403.16635  [pdf, other

    cs.CV

    V2X-PC: Vehicle-to-everything Collaborative Perception via Point Cluster

    Authors: Si Liu, Zihan Ding, Jiahui Fu, Hongyu Li, Siheng Chen, Shifeng Zhang, Xu Zhou

    Abstract: The objective of the collaborative vehicle-to-everything perception task is to enhance the individual vehicle's perception capability through message communication among neighboring traffic agents. Previous methods focus on achieving optimal performance within bandwidth limitations and typically adopt BEV maps as the basic collaborative message units. However, we demonstrate that collaboration wit… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  47. arXiv:2403.16498  [pdf, ps, other

    cs.IT eess.SP

    BackCom Assisted Hybrid NOMA Uplink Transmission for Ambient IoT

    Authors: Zhiguo Ding

    Abstract: Hybrid non-orthogonal multiple access (H-NOMA) has recently received significant attention as a general framework of multiple access, where both conventional orthogonal multiple access (OMA) and pure NOMA are its special cases. This paper focuses on the application of H-NOMA to ambient Internet of Things (IoT) with energy-constrained devices, where a new backscatter communication (BackCom) assiste… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  48. arXiv:2403.14611  [pdf, other

    cs.CV

    Explorative Inbetweening of Time and Space

    Authors: Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang

    Abstract: We introduce bounded generation as a generalized task to control video generation to synthesize arbitrary camera and subject motion based only on a given start and end frame. Our objective is to fully leverage the inherent generalization capability of an image-to-video model without additional training or fine-tuning of the original model. This is achieved through the proposed new sampling strateg… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: project page at https://time-reversal.github.io

  49. arXiv:2403.13846  [pdf, other

    cs.LG cs.AI

    A Clustering Method with Graph Maximum Decoding Information

    Authors: Xinrun Xu, Manying Lv, Zhanbiao Lian, Yurong Wu, Jin Yan, Shan Jiang, Zhiming Ding

    Abstract: The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of rela… ▽ More

    Submitted 18 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 9 pages, 9 figures, IJCNN 2024

  50. arXiv:2403.12011  [pdf, other

    cs.CV

    HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

    Authors: Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang

    Abstract: 3D hand-object interaction data is scarce due to the hardware constraints in scaling up the data collection process. In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data. Our model is a conditional diffusion model that takes both the 3D hand-object geometric structure and text description as inputs for image synthesis. This offers a more contr… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: https://mq-zhang1.github.io/HOIDiffusion