Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 617 results for author: Cao, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20870  [pdf, other

    cs.CV cs.CG

    Mean of Means: A 10-dollar Solution for Human Localization with Calibration-free and Unconstrained Camera Settings

    Authors: Tianyi Zhang, Wengyu Zhang, Xulu Zhang, Jiaxin Wu, Xiao-Yong Wei, Jiannong Cao, Qing Li

    Abstract: Accurate human localization is crucial for various applications, especially in the Metaverse era. Existing high precision solutions rely on expensive, tag-dependent hardware, while vision-based methods offer a cheaper, tag-free alternative. However, current vision solutions based on stereo vision face limitations due to rigid perspective transformation principles and error propagation in multi-sta… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2407.17940  [pdf, other

    cs.CL cs.AI

    Positive Text Reframing under Multi-strategy Optimization

    Authors: Shutong Jia, Biwei Cao, Qingqing Gao, Jiuxin Cao, Bo Liu

    Abstract: Differing from sentiment transfer, positive reframing seeks to substitute negative perspectives with positive expressions while preserving the original meaning. With the emergence of pre-trained language models (PLMs), it is possible to achieve acceptable results by fine-tuning PLMs. Nevertheless, generating fluent, diverse and task-constrained reframing text remains a significant challenge. To ta… ▽ More

    Submitted 27 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  3. arXiv:2407.17120  [pdf, other

    cs.LG cs.AI

    Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

    Authors: Jingren Liu, Zhong Ji, YunLong Yu, Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li

    Abstract: Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To tackle this complexity, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics fo… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  4. arXiv:2407.16670  [pdf, other

    cs.CV cs.CY cs.MM

    FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process

    Authors: Yuyan Bu, Qiang Sheng, Juan Cao, Peng Qi, Danding Wang, Jintao Li

    Abstract: As short-form video-sharing platforms become a significant channel for news consumption, fake news in short videos has emerged as a serious threat in the online information ecosystem, making developing detection methods for this new scenario an urgent need. Compared with that in text and image formats, fake news on short video platforms contains rich but heterogeneous information in various modali… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Will appear at ACM Multimedia 2024 (MM 2024), 13 pages, 15 figures

  5. arXiv:2407.16224  [pdf, other

    cs.CV

    OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

    Authors: Ke Sun, Jian Cao, Qi Wang, Linrui Tian, Xindi Zhang, Lian Zhuo, Bang Zhang, Liefeng Bo, Wenbo Zhou, Weiming Zhang, Daiheng Gao

    Abstract: Virtual Try-On (VTON) has become a transformative technology, empowering users to experiment with fashion without ever having to physically try on clothing. However, existing methods often struggle with generating high-fidelity and detail-consistent results. While diffusion models, such as Stable Diffusion series, have shown their capability in creating high-quality and photorealistic images, they… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 10 pages, 13 figures

  6. arXiv:2407.15481  [pdf, other

    cs.CV

    Diverse Image Harmonization

    Authors: Xinhao Tao, Tianyuan Qiu, Junyan Cao, Li Niu

    Abstract: Image harmonization aims to adjust the foreground illumination in a composite image to make it harmonious. The existing harmonization methods can only produce one deterministic result for a composite image, ignoring that a composite image could have multiple plausible harmonization results due to multiple plausible reflectances. In this work, we first propose a reflectance-guided harmonization net… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  7. arXiv:2407.15464  [pdf, other

    cs.LG cs.DC

    The Diversity Bonus: Learning from Dissimilar Distributed Clients in Personalized Federated Learning

    Authors: Xinghao Wu, Xuefeng Liu, Jianwei Niu, Guogang Zhu, Shaojie Tang, Xiaotian Li, Jiannong Cao

    Abstract: Personalized Federated Learning (PFL) is a commonly used framework that allows clients to collaboratively train their personalized models. PFL is particularly useful for handling situations where data from different clients are not independent and identically distributed (non-IID). Previous research in PFL implicitly assumes that clients can gain more benefits from those with similar data distribu… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 14 pages, 9 figures

  8. arXiv:2407.14829  [pdf, other

    cs.CL

    Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks

    Authors: Jiayu Lin, Guanrong Chen, Bojun Jin, Chenyang Li, Shutong Jia, Wancong Lin, Yang Sun, Yuhang He, Caihua Yang, Jianzhu Bao, Jipeng Wu, Wen Su, Jinglu Chen, Xinyi Li, Tianyu Chen, Mingjie Han, Shuaiwen Du, Zijian Wang, Jiyin Li, Fuzhong Suo, Hao Wang, Nuanchen Lin, Xuanjing Huang, Changjian Jiang, RuiFeng Xu , et al. (4 additional authors not shown)

    Abstract: In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct data… ▽ More

    Submitted 24 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

  9. arXiv:2407.13205  [pdf, ps, other

    cs.CL

    Transformer-based Single-Cell Language Model: A Survey

    Authors: Wei Lan, Guohang He, Mingyang Liu, Qingfeng Chen, Junyue Cao, Wei Peng

    Abstract: The transformers have achieved significant accomplishments in the natural language processing as its outstanding parallel processing capabilities and highly flexible attention mechanism. In addition, increasing studies based on transformers have been proposed to model single-cell data. In this review, we attempt to systematically summarize the single-cell language models and applications based on… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  10. arXiv:2407.12421  [pdf, other

    cs.LG cs.AI

    SafePowerGraph: Safety-aware Evaluation of Graph Neural Networks for Transmission Power Grids

    Authors: Salah Ghamizi, Aleksandar Bojchevski, Aoxiang Ma, Jun Cao

    Abstract: Power grids are critical infrastructures of paramount importance to modern society and their rapid evolution and interconnections has heightened the complexity of power systems (PS) operations. Traditional methods for grid analysis struggle with the computational demands of large-scale RES and ES integration, prompting the adoption of machine learning (ML) techniques, particularly Graph Neural Net… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  11. arXiv:2407.12248  [pdf, other

    cs.DC

    Mitigating Interference of Microservices with a Scoring Mechanism in Large-scale Clusters

    Authors: Dingyu Yang, Kangpeng Zheng, Shiyou Qian, Jian Cao, Guangtao Xue

    Abstract: Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we ob… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  12. arXiv:2407.11385  [pdf, other

    cs.RO cs.GR

    Grasping Diverse Objects with Simulated Humanoids

    Authors: Zhengyi Luo, Jinkun Cao, Sammy Christen, Alexander Winkler, Kris Kitani, Weipeng Xu

    Abstract: We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. T… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Project page: https://www.zhengyiluo.com/Omnigrasp/

  13. arXiv:2407.10486  [pdf, other

    cs.AI cs.CL

    IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization

    Authors: Jie Cao, Dian Jiao, Qiang Yan, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

    Abstract: Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual understanding through large-scale pretraining, which implies the great potential of extractive snippet generation. In this paper, we systematically i… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  14. arXiv:2407.08136  [pdf, other

    cs.CV

    EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions

    Authors: Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma

    Abstract: The area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into videos, while they can yield satisfactory results, certain issues exist. For instance, methods driven solely by audios can be unstable at times due to… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  15. arXiv:2407.03937  [pdf, other

    cs.CL

    TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models

    Authors: Jiahuan Cao, Dezhi Peng, Peirong Zhang, Yongxin Shi, Yang Liu, Kai Ding, Lianwen Jin

    Abstract: Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowle… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  16. arXiv:2407.02759  [pdf

    cs.LG cs.AI

    Multi-Scenario Combination Based on Multi-Agent Reinforcement Learning to Optimize the Advertising Recommendation System

    Authors: Yang Zhao, Chang Zhou, Jin Cao, Yi Zhao, Shaobo Liu, Chiyu Cheng, Xingchen Li

    Abstract: This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a sh… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by 2024 5th International Conference on Artificial Intelligence and Electromechanical Automation IEEE (ISBN: 979-8-3503-6617-4)

  17. arXiv:2407.00187  [pdf, other

    cs.RO cs.CV cs.GR

    SMPLOlympics: Sports Environments for Physically Simulated Humanoids

    Authors: Zhengyi Luo, Jiashun Wang, Kangni Liu, Haotian Zhang, Chen Tessler, Jingbo Wang, Ye Yuan, Jinkun Cao, Zihui Lin, Fengyi Wang, Jessica Hodgins, Kris Kitani

    Abstract: We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for m… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Project page: https://smplolympics.github.io/SMPLOlympics

  18. arXiv:2406.19434  [pdf, other

    cs.GR cs.AI

    Lightweight Predictive 3D Gaussian Splats

    Authors: Junli Cao, Vidit Goel, Chaoyang Wang, Anil Kag, Ju Hu, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren

    Abstract: Recent approaches representing 3D objects and scenes using Gaussian splats show increased rendering speed across a variety of platforms and devices. While rendering such representations is indeed extremely efficient, storing and transmitting them is often prohibitively expensive. To represent large-scale scenes, one often needs to store millions of 3D Gaussians, occupying gigabytes of disk space.… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project Page: https://plumpuddings.github.io/LPGS//

  19. arXiv:2406.18069  [pdf, other

    eess.SP cs.AI cs.CL

    Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

    Authors: Zengding Liu, Chen Chen, Jiannong Cao, Minglei Pan, Jikui Liu, Nan Li, Fen Miao, Ye Li

    Abstract: Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood press… ▽ More

    Submitted 4 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  20. arXiv:2406.17624  [pdf, other

    cs.CL cs.AI

    Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models

    Authors: Zhiyuan Wen, Yu Yang, Jiannong Cao, Haoming Sun, Ruosong Yang, Shuaiqi Liu

    Abstract: As large language models (LLMs) appear to behave increasingly human-like in text-based interactions, more and more researchers become interested in investigating personality in LLMs. However, the diversity of psychological personality research and the rapid development of LLMs have led to a broad yet fragmented landscape of studies in this interdisciplinary field. Extensive studies across differen… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  21. arXiv:2406.17309  [pdf, other

    cs.CV

    Zero-Shot Long-Form Video Understanding through Screenplay

    Authors: Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

    Abstract: The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information. In this paper, we present MM-Screenplayer, an advanced video understanding system with multi-modal perception capabilities that can convert any video into textual screenplay representations. Unlike pr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Highest Score Award to the CVPR'2024 LOVEU Track 1 Challenge

  22. arXiv:2406.15781  [pdf, other

    cs.CL

    DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models

    Authors: Wei Guan, Jian Cao, Jianqi Gao, Haiyan Zhao, Shiyou Qian

    Abstract: Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anoma… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  23. arXiv:2406.15769  [pdf, other

    cs.DC

    Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers

    Authors: Qin Hua, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue, Minglu Li

    Abstract: An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two signif… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages; 27 figures

  24. arXiv:2406.14644  [pdf, other

    cs.CL

    Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation

    Authors: Chunyuan Deng, Yilun Zhao, Yuzhao Heng, Yitong Li, Jiannan Cao, Xiangru Tang, Arman Cohan

    Abstract: Data contamination has garnered increased attention in the era of large language models (LLMs) due to the reliance on extensive internet-derived training corpora. The issue of training corpus overlap with evaluation benchmarks--referred to as contamination--has been the focus of significant recent research. This body of work aims to identify contamination, understand its impacts, and explore mitig… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Camera-Ready Version

  25. arXiv:2406.14558  [pdf, other

    cs.RO cs.AI

    CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

    Authors: Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang

    Abstract: Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  26. arXiv:2406.13201  [pdf, other

    cs.LG cs.SI

    Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing Approach

    Authors: Yicong Li, Yu Yang, Jiannong Cao, Shuaiqi Liu, Haoran Tang, Guandong Xu

    Abstract: Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic graph embedding remains an open problem. Neglecting degree changes in dynamic graphs will significantly impair embedding effectiveness without notably… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  27. arXiv:2406.12902  [pdf, other

    cs.LG cs.AI cs.PL cs.SE

    Can AI Beat Undergraduates in Entry-level Java Assignments? Benchmarking Large Language Models on JavaBench

    Authors: Jialun Cao, Zhiyong Chen, Jiarong Wu, Shing-chi Cheung, Chang Xu

    Abstract: Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs' capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, imbalanced programming language. 95.8% of benchmarks involve Python, while only 5 benchmarks involve Java. Second, imbalanced code granularity. Function-/statement-level benchmarks account for over 83.… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  28. arXiv:2406.12200  [pdf, other

    cs.LG cs.DC cs.ET cs.MM cs.NE

    SFedCA: Credit Assignment-Based Active Client Selection Strategy for Spiking Federated Learning

    Authors: Qiugang Zhan, Jinbo Cao, Xiurui Xie, Malu Zhang, Huajin Tang, Guisong Liu

    Abstract: Spiking federated learning is an emerging distributed learning paradigm that allows resource-constrained devices to train collaboratively at low power consumption without exchanging local data. It takes advantage of both the privacy computation property in federated learning (FL) and the energy efficiency in spiking neural networks (SNN). Thus, it is highly promising to revolutionize the efficient… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 9 pages

  29. arXiv:2406.11517  [pdf, other

    cs.LG cs.AI

    Revisiting Spurious Correlation in Domain Generalization

    Authors: Bin Qin, Jiangmeng Li, Yi Li, Xuesong Wu, Yupeng Wang, Wenwen Qiang, Jianwen Cao

    Abstract: Without loss of generality, existing machine learning techniques may learn spurious correlation dependent on the domain, which exacerbates the generalization of models in out-of-distribution (OOD) scenarios. To address this issue, recent works build a structural causal model (SCM) to describe the causality within data generation process, thereby motivating methods to avoid the learning of spurious… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  30. arXiv:2406.11501  [pdf, other

    cs.LG cs.AI stat.ME

    Teleporter Theory: A General and Simple Approach for Modeling Cross-World Counterfactual Causality

    Authors: Jiangmeng Li, Bin Qin, Qirui Ji, Yi Li, Wenwen Qiang, Jianwen Cao, Fanjiang Xu

    Abstract: Leveraging the development of structural causal model (SCM), researchers can establish graphical models for exploring the causal mechanisms behind machine learning techniques. As the complexity of machine learning applications rises, single-world interventionism causal analysis encounters theoretical adaptation limitations. Accordingly, cross-world counterfactual approach extends our understanding… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  31. arXiv:2406.11087  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    MemDPT: Differential Privacy for Memory Efficient Language Models

    Authors: Yanming Liu, Xinyue Peng, Jiannan Cao, Yuwei Zhang, Chen Ma, Songhang Deng, Mengchen Fu, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

    Abstract: Large language models have consistently demonstrated remarkable performance across a wide spectrum of applications. Nonetheless, the deployment of these models can inadvertently expose user privacy to potential risks. The substantial memory demands of these models during training represent a significant resource consumption challenge. The sheer size of these models imposes a considerable burden on… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages first version

  32. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  33. arXiv:2406.10424  [pdf, other

    cs.CV cs.AI

    What is the Visual Cognition Gap between Humans and Multimodal LLMs?

    Authors: Xu Cao, Bolin Lai, Wenqian Ye, Yunsheng Ma, Joerg Heintz, Jintai Chen, Jianguo Cao, James M. Rehg

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addressing visual cognition problems that require high-level reasoning is not well-established. One such challenge is abstract visual reasoning (AVR) -- the cognitive ability to discern relationships… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures, the appendix will be updated soon

    MSC Class: 68T01

  34. arXiv:2406.10239  [pdf

    cs.IR cs.LG

    Predict Click-Through Rates with Deep Interest Network Model in E-commerce Advertising

    Authors: Chang Zhou, Yang Zhao, Yuelin Zou, Jin Cao, Wenhan Fan, Yi Zhao, Chiyu Cheng

    Abstract: This paper proposes new methods to enhance click-through rate (CTR) prediction models using the Deep Interest Network (DIN) model, specifically applied to the advertising system of Alibaba's Taobao platform. Unlike traditional deep learning approaches, this research focuses on localized user behavior activation for tailored ad targeting by leveraging extensive user behavior data. Compared to tradi… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by the 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS 2024), 2024 IEEE

  35. arXiv:2406.09828  [pdf, other

    cs.RO

    Dynamic Decentralized 3D Urban Coverage and Patrol with UAVs

    Authors: Wai Lun Leong, Jiawei Cao, Rodney Teo

    Abstract: In the event of natural or man-made disasters in an urban environment, such as fires, floods, and earthquakes, a swarm of unmanned aerial vehicles (UAVs) can rapidly sweep and provide coverage to monitor the area of interest and locate survivors. We propose a modular framework and patrol strategy that enables a swarm of UAVs to perform cooperative and periodic coverage in such scenarios. Our appro… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted for the 2024 International Conference on Unmanned Aircraft Systems (ICUAS 2024) in Chania, Greece

  36. arXiv:2406.09779  [pdf, other

    cs.AI cs.CL cs.CV

    OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

    Authors: Jingtao Cao, Zheng Zhang, Hongru Wang, Bin Liang, Hao Wang, Kam-Fai Wong

    Abstract: Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Langu… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  37. arXiv:2406.07944  [pdf, other

    cs.SE cs.AI

    DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis

    Authors: Meiziniu Li, Dongze Li, Jianmeng Liu, Jialun Cao, Yongqiang Tian, Shing-Chi Cheung

    Abstract: Testing is a major approach to ensuring the quality of deep learning (DL) libraries. Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction. However, these techniques are limited in finding implementations that offer the same functionality and generating diverse test inputs for differential testing. This paper introduces DLLens, a novel dif… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    ACM Class: D.2.5; I.2.5

  38. arXiv:2406.07472  [pdf, other

    cs.CV

    4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

    Authors: Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependen… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  39. arXiv:2406.05572  [pdf, other

    cs.RO cs.AI

    Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction

    Authors: Aidan Curtis, Nishanth Kumar, Jing Cao, Tomás Lozano-Pérez, Leslie Pack Kaelbling

    Abstract: Recent developments in pretrained large language models (LLMs) applied to robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks. In this paper, we examine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and physical constra… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  40. arXiv:2406.04844  [pdf, other

    cs.CV

    Multi-Granularity Language-Guided Multi-Object Tracking

    Authors: Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

    Abstract: Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as o… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  41. arXiv:2406.04658  [pdf, other

    cs.CR cs.AI cs.LG

    Advanced Payment Security System:XGBoost, LightGBM and SMOTE Integrated

    Authors: Qi Zheng, Chang Yu, Jin Cao, Yongshun Xu, Qianwen Xing, Yinxin Jin

    Abstract: With the rise of various online and mobile payment systems, transaction fraud has become a significant threat to financial security. This study explores the application of advanced machine learning models, specifically based on XGBoost and LightGBM, for developing a more accurate and robust Payment Security Protection Model. To enhance data reliability, we meticulously processed the data sources a… ▽ More

    Submitted 26 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: This paper is received by https://ieee-metacom.org

  42. arXiv:2406.04466  [pdf, other

    cs.HC

    Dog Heart Rate and Blood Oxygen Metaverse Interaction System

    Authors: Yanhui Jiang, Jin Cao, Chang Yu

    Abstract: This study developed an improved dog heart rate and blood oxygen sensor system using Arduino. Traditional methods face accuracy and reliability issues. Our system integrates advanced computational techniques with hardware-based sensing to enhance measurement precision. An Arduino microcontroller connected to a heart rate and blood oxygen sensor collects raw data, which is preprocessed and filtered… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures, conference for IEEE metacom accepted (https://ieee-metacom.org/)

  43. arXiv:2406.04465  [pdf, other

    cs.HC

    Rough Set improved Therapy-Based Metaverse Assisting System

    Authors: Jin Cao, Yanhui Jiang, Chang Yu, Feiwei Qin, Zekun Jiang

    Abstract: Chronic neck and shoulder pain (CNSP) is a major global public health issue. Traditional treatments like physiotherapy and rehabilitation have drawbacks, including high costs, low precision, and user discomfort. This paper presents an interactive system based on Cognitive Therapy Theory (CBT) for CNSP treatment. The system includes a pain detection module using EMG and IMU to monitor pain and opti… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, conference for IEEE metacom accepted (https://ieee-metacom.org/)

  44. arXiv:2406.04333  [pdf, other

    cs.CV

    BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

    Authors: Yang Sui, Yanyu Li, Anil Kag, Yerlan Idelbayev, Junli Cao, Ju Hu, Dhritiman Sagar, Bo Yuan, Sergey Tulyakov, Jian Ren

    Abstract: Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this wor… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/BitsFusion

  45. arXiv:2406.04324  [pdf, other

    cs.CV eess.IV

    SF-V: Single Forward Video Generation Model

    Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

    Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/SF-V

  46. arXiv:2406.03807  [pdf, other

    cs.AI cs.CL cs.RO

    Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering

    Authors: Yanming Liu, Xinyue Peng, Yuwei Zhang, Jiannan Cao, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

    Abstract: Large language models (LLMs) have demonstrated exceptional reasoning capabilities, enabling them to solve various complex problems. Recently, this ability has been applied to the paradigm of tool learning. Tool learning involves providing examples of tool usage and their corresponding functions, allowing LLMs to formulate plans and demonstrate the process of invoking and executing each tool. LLMs… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 46pages first version

  47. arXiv:2406.03733  [pdf, other

    cs.LG cs.AI

    Credit Card Fraud Detection Using Advanced Transformer Model

    Authors: Chang Yu, Yongshun Xu, Jin Cao, Ye Zhang, Yinxin Jin, Mengran Zhu

    Abstract: With the proliferation of various online and mobile payment systems, credit card fraud has emerged as a significant threat to financial security. This study focuses on innovative applications of the latest Transformer models for more robust and precise fraud detection. To ensure the reliability of the data, we meticulously processed the data sources, balancing the dataset to address the issue of d… ▽ More

    Submitted 26 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: This paper have been received by https://ieee-metacom.org/

  48. arXiv:2406.02013  [pdf, other

    cs.LG

    Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning

    Authors: Jiahang Cao, Qiang Zhang, Ziqing Wang, Jiaxu Wang, Hao Cheng, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, Renjing Xu

    Abstract: Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretica… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures

  49. arXiv:2406.01125  [pdf, other

    cs.CV

    $Δ$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

    Authors: Pengtao Chen, Mingzhu Shen, Peng Ye, Jianjian Cao, Chongjun Tu, Christos-Savvas Bouganis, Yiren Zhao, Tao Chen

    Abstract: Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful results achieved by diffusion transformers (DiT), there is still a lack of exploration regarding the impact of DiT structure on generation, as well as the absence of… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, 6 tables

  50. arXiv:2406.00490  [pdf, other

    cs.CV cs.AI

    Research on the Application of Computer Vision Based on Deep Learning in Autonomous Driving Technology

    Authors: Jingyu Zhang, Jin Cao, Jinghao Chang, Xinjin Li, Houze Liu, Zhenglin Li

    Abstract: This research aims to explore the application of deep learning in autonomous driving computer vision technology and its impact on improving system performance. By using advanced technologies such as convolutional neural networks (CNN), multi-task joint learning methods, and deep reinforcement learning, this article analyzes in detail the application of deep learning in image recognition, real-time… ▽ More

    Submitted 3 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.