Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 2,754 results for author: Yang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.12727  [pdf, ps, other

    cs.SC

    A Generalization of Habicht's Theorem for Subresultants of Several Univariate Polynomials

    Authors: Hoon Hong, Jiaqi Meng, Jing Yang

    Abstract: Subresultants of two univariate polynomials are one of the most classic and ubiquitous objects in computational algebra and algebraic geometry. In 1948, Habicht discovered and proved interesting relationships among subresultants. Those relationships were found to be useful for both structural understanding and efficient computation. Often one needs to consider several (possibly more than two) poly… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  2. arXiv:2409.12186  [pdf, other

    cs.CL

    Qwen2.5-Coder Technical Report

    Authors: Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, An Yang, Rui Men, Fei Huang, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, Junyang Lin

    Abstract: In this report, we introduce the Qwen2.5-Coder series, a significant upgrade from its predecessor, CodeQwen1.5. This series includes two models: Qwen2.5-Coder-1.5B and Qwen2.5-Coder-7B. As a code-specific model, Qwen2.5-Coder is built upon the Qwen2.5 architecture and continues pretrained on a vast corpus of over 5.5 trillion tokens. Through meticulous data cleaning, scalable synthetic data genera… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  3. arXiv:2409.11744  [pdf, other

    cs.CV cs.AI cs.HC

    Exploring Gaze Pattern in Autistic Children: Clustering, Visualization, and Prediction

    Authors: Weiyan Shi, Haihong Zhang, Jin Yang, Ruiqing Ding, YongWei Zhu, Kenny Tsu Wei Choo

    Abstract: Autism Spectrum Disorder (ASD) significantly affects the social and communication abilities of children, and eye-tracking is commonly used as a diagnostic tool by identifying associated atypical gaze patterns. Traditional methods demand manual identification of Areas of Interest in gaze patterns, lowering the performance of gaze behavior analysis in ASD subjects. To tackle this limitation, we prop… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  4. arXiv:2409.10824  [pdf, other

    cs.RO

    Evaluating and Improving the Robustness of LiDAR-based Localization and Mapping

    Authors: Bo Yang, Tri Minh Triet Pham, Jinqiu Yang

    Abstract: LiDAR is one of the most commonly adopted sensors for simultaneous localization and mapping (SLAM) and map-based global localization. SLAM and map-based localization are crucial for the independent operation of autonomous systems, especially when external signals such as GNSS are unavailable or unreliable. While state-of-the-art (SOTA) LiDAR SLAM systems could achieve 0.5% (i.e., 0.5m per 100m) of… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  5. arXiv:2409.10587  [pdf, other

    cs.CV

    SoccerNet 2024 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Victor Joos, Floriane Magera, Jan Held, Seyed Abolfazl Ghasemzadeh, Xin Zhou, Karolina Seweryn, Mateusz Kowalczyk, Zuzanna Mróz, Szymon Łukasik, Michał Hałoń, Hassan Mkhallati, Adrien Deliège, Carlos Hinojosa, Karen Sanchez, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Adam Gorski , et al. (59 additional authors not shown)

    Abstract: The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely loca… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 7 pages, 1 figure

  6. arXiv:2409.10094  [pdf, other

    cs.CV cs.LG

    DDoS: Diffusion Distribution Similarity for Out-of-Distribution Detection

    Authors: Kun Fang, Qinghua Tao, Zuopeng Yang, Xiaolin Huang, Jie Yang

    Abstract: Out-of-Distribution (OoD) detection determines whether the given samples are from the training distribution of the classifier-under-protection, i.e., the In-Distribution (InD), or from a different OoD. Latest researches introduce diffusion models pre-trained on InD data to advocate OoD detection by transferring an OoD image into a generated one that is close to InD, so that one could capture the d… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  7. arXiv:2409.10022  [pdf, ps, other

    cs.DS

    Entrywise Approximate Laplacian Solving

    Authors: Jingbang Chen, Mehrdad Ghadiri, Hoai-An Nguyen, Richard Peng, Junzhao Yang

    Abstract: We study the escape probability problem in random walks over graphs. Given vertices, $s,t,$ and $p$, the problem asks for the probability that a random walk starting at $s$ will hit $t$ before hitting $p$. Such probabilities can be exponentially small even for unweighted undirected graphs with polynomial mixing time. Therefore current approaches, which are mostly based on fixed-point arithmetic, r… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 22 pages

  8. arXiv:2409.09588  [pdf, other

    cs.CV

    GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection

    Authors: Yanguang Sun, Hanyu Xuan, Jian Yang, Lei Luo

    Abstract: Recently, biological perception has been a powerful tool for handling the camouflaged object detection (COD) task. However, most existing methods are heavily dependent on the local spatial information of diverse scales from convolutional operations to optimize initial features. A commonly neglected point in these methods is the long-range dependencies between feature pixels from different scale sp… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: Accepted at TNNLS 2024

  9. arXiv:2409.09350  [pdf, other

    cs.CV

    OPUS: Occupancy Prediction Using a Sparse Set

    Authors: Jiabao Wang, Zhaojiang Liu, Qiang Meng, Liujiang Yan, Ke Wang, Jie Yang, Wei Liu, Qibin Hou, Ming-Ming Cheng

    Abstract: Occupancy prediction, aiming at predicting the occupancy status within voxelized 3D environment, is quickly gaining momentum within the autonomous driving community. Mainstream occupancy prediction works first discretize the 3D environment into voxels, then perform classification on such dense grids. However, inspection on sample data reveals that the vast majority of voxels is unoccupied. Perform… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

  10. arXiv:2409.08905  [pdf, other

    eess.IV cs.CV

    D2-MLP: Dynamic Decomposed MLP Mixer for Medical Image Segmentation

    Authors: Jin Yang, Xiaobing Yu, Peijie Qiu

    Abstract: Convolutional neural networks are widely used in various segmentation tasks in medical images. However, they are challenged to learn global features adaptively due to the inherent locality of convolutional operations. In contrast, MLP Mixers are proposed as a backbone to learn global information across channels with low complexity. However, they cannot capture spatial features efficiently. Additio… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: 5 pages, 2 figures

  11. arXiv:2409.08609  [pdf, other

    cs.LG

    Optimizing Item-based Marketing Promotion Efficiency in C2C Marketplace with Dynamic Sequential Coupon Allocation Framework

    Authors: Jie Yang, Padunna Valappil Krishnaraj Sekhar, Sho Sekine, Yilin Li

    Abstract: In e-commerce platforms, coupons play a crucial role in boosting transactions. In the customer-to-customer (C2C) marketplace, ensuring the satisfaction of both buyers and sellers is essential. While buyer-focused marketing strategies often receive more attention, addressing the needs of sellers is equally important. Additionally, the existing strategies tend to optimize each promotion independentl… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Journal ref: ACM SIGKDD 3rd Workshop on End-to-End Customer Journey Optimization, 2024

  12. arXiv:2409.07972  [pdf, other

    cs.CV

    Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction

    Authors: Yuan Wu, Zhiqiang Yan, Zhengxue Wang, Xiang Li, Le Hui, Jian Yang

    Abstract: The task of vision-based 3D occupancy prediction aims to reconstruct 3D geometry and estimate its semantic classes from 2D color images, where the 2D-to-3D view transformation is an indispensable step. Most previous methods conduct forward projection, such as BEVPooling and VoxelPooling, both of which map the 2D image features into 3D grids. However, the current grid representing features within a… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  13. arXiv:2409.07464  [pdf, other

    cs.CV cs.AI

    Reflective Human-Machine Co-adaptation for Enhanced Text-to-Image Generation Dialogue System

    Authors: Yuheng Feng, Yangfan He, Yinghui Xia, Tianyu Shi, Jun Wang, Jinsong Yang

    Abstract: Today's image generation systems are capable of producing realistic and high-quality images. However, user prompts often contain ambiguities, making it difficult for these systems to interpret users' potential intentions. Consequently, machines need to interact with users multiple rounds to better understand users' intents. The unpredictable costs of using or learning image generation models throu… ▽ More

    Submitted 27 August, 2024; originally announced September 2024.

  14. arXiv:2409.07406  [pdf, other

    cs.HC

    Trust Dynamics in Human-Autonomy Interaction: Uncover Associations between Trust Dynamics and Personal Characteristics

    Authors: Hyesun Chung, X. Jessie Yang

    Abstract: While personal characteristics influence people's snapshot trust towards autonomous systems, their relationships with trust dynamics remain poorly understood. We conducted a human-subject experiment with 130 participants performing a simulated surveillance task aided by an automated threat detector. A comprehensive pre-experimental survey collected data on participants' personal characteristics ac… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  15. arXiv:2409.07268  [pdf, other

    cs.LG

    Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences

    Authors: Ziang Liu, Junjie Xu, Xingjiao Wu, Jing Yang, Liang He

    Abstract: Preference-Based reinforcement learning (PBRL) learns directly from the preferences of human teachers regarding agent behaviors without needing meticulously designed reward functions. However, existing PBRL methods often learn primarily from explicit preferences, neglecting the possibility that teachers may choose equal preferences. This neglect may hinder the understanding of the agent regarding… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 7 pages, 6 figures

  16. arXiv:2409.07067  [pdf, other

    cs.CV

    Edge Modeling Activation Free Fourier Network for Spacecraft Image Denoising

    Authors: Jingfan Yang, Hu Gao, Ying Zhang, Bowen Ma, Depeng Dang

    Abstract: Spacecraft image denoising is a crucial basic technology closely related to aerospace research. However, the existing deep learning-based image denoising methods lack deep consideration of the characteristics of spacecraft image. To address the aforementioned shortcomings, we analyses spacecraft noise image and identifies two main characteristics. One is that there are a large number of low-light… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  17. arXiv:2409.06851  [pdf, other

    cs.CV cs.AI

    LIME-M: Less Is More for Evaluation of MLLMs

    Authors: Kang Zhu, Qianbo Zang, Shian Jia, Siwei Wu, Feiteng Fang, Yizhi Li, Shuyue Guo, Tianyu Zheng, Bo Li, Haoning Wu, Xingwei Qu, Jian Yang, Zachary Liu, Xiang Yue, J. H. Liu, Chenghua Lin, Min Yang, Shiwen Ni, Wenhao Huang, Ge Zhang

    Abstract: With the remarkable success achieved by Multimodal Large Language Models (MLLMs), numerous benchmarks have been designed to assess MLLMs' ability to guide their development in image perception tasks (e.g., image captioning and visual question answering). However, the existence of numerous benchmarks results in a substantial computational burden when evaluating model performance across all of them.… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  18. arXiv:2409.06644  [pdf

    cs.CV cs.AI

    EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis

    Authors: Danli Shi, Weiyi Zhang, Jiancheng Yang, Siyu Huang, Xiaolan Chen, Mayinuer Yusufu, Kai Jin, Shan Lin, Shunming Liu, Qing Zhang, Mingguang He

    Abstract: Early detection of eye diseases like glaucoma, macular degeneration, and diabetic retinopathy is crucial for preventing vision loss. While artificial intelligence (AI) foundation models hold significant promise for addressing these challenges, existing ophthalmic foundation models primarily focus on a single modality, whereas diagnosing eye diseases requires multiple modalities. A critical yet oft… ▽ More

    Submitted 11 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

  19. arXiv:2409.06030  [pdf, other

    cs.GR cs.CV

    NESI: Shape Representation via Neural Explicit Surface Intersection

    Authors: Congyi Zhang, Jinfan Yang, Eric Hedlin, Suzuran Takikawa, Nicholas Vining, Kwang Moo Yi, Wenping Wang, Alla Sheffer

    Abstract: Compressed representations of 3D shapes that are compact, accurate, and can be processed efficiently directly in compressed form, are extremely useful for digital media applications. Recent approaches in this space focus on learned implicit or parametric representations. While implicits are well suited for tasks such as in-out queries, they lack natural 2D parameterization, complicating tasks such… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  20. arXiv:2409.05864  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Neural MP: A Generalist Neural Motion Planner

    Authors: Murtaza Dalal, Jiahui Yang, Russell Mendonca, Youssef Khaky, Ruslan Salakhutdinov, Deepak Pathak

    Abstract: The current paradigm for motion planning generates solutions from scratch for every new problem, which consumes significant amounts of time and computational resources. For complex, cluttered scenes, motion planning approaches can often take minutes to produce a solution, while humans are able to accurately and safely reach any goal in seconds by leveraging their prior experience. We seek to do th… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: Website at mihdalal.github.io/neuralmotionplanner. Main paper: 7 pages, 4 figures, 2 tables. Appendix: 9 pages, 5 figures, 6 tables

  21. arXiv:2409.05393  [pdf, other

    cs.CV

    TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation

    Authors: Jiaqi Yang, Ye Huang, Xiangjian He, Linlin Shen, Guoping Qiu

    Abstract: Under the backdrop of large-scale pre-training, large visual models (LVM) have demonstrated significant potential in image understanding. The recent emergence of the Segment Anything Model (SAM) has brought a qualitative shift in the field of image segmentation, supporting flexible interactive cues and strong learning capabilities. However, its performance often falls short in cross-domain and few… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  22. arXiv:2409.05166  [pdf, other

    cs.CV

    CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes

    Authors: Zhenhuan Liu, Shuai Liu, Zhiwei Ning, Jie Yang, Wei Liu

    Abstract: We present CD-NGP, which is a fast and scalable representation for 3D reconstruction and novel view synthesis in dynamic scenes. Inspired by continual learning, our method first segments input videos into multiple chunks, followed by training the model chunk by chunk, and finally, fuses features of the first branch and subsequent branches. Experiments on the prevailing DyNeRF dataset demonstrate t… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

    Comments: 23 pages, full version

  23. arXiv:2409.04597  [pdf, other

    cs.SE cs.LG cs.PL

    Detecting Buggy Contracts via Smart Testing

    Authors: Sally Junsong Wang, Jianan Yao, Kexin Pei, Hidedaki Takahashi, Junfeng Yang

    Abstract: Smart contracts are susceptible to critical vulnerabilities. Hybrid dynamic analyses, such as concolic execution assisted fuzzing and foundation model assisted fuzzing, have emerged as highly effective testing techniques for smart contract bug detection recently. This hybrid approach has shown initial promise in real-world benchmarks, but it still suffers from low scalability to find deep bugs bur… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  24. arXiv:2409.04585  [pdf, other

    cs.LG cs.AI cs.DC

    CubicML: Automated ML for Distributed ML Systems Co-design with ML Prediction of Performance

    Authors: Wei Wen, Quanyu Zhu, Weiwei Chu, Wen-Yen Chen, Jiyan Yang

    Abstract: Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models, especially for industry recommendation models and large language models. The co-design of distributed ML systems and algorithms (to maximize training performance) plays a pivotal role for its success. As it scales, the number of co-design hyper-parameters grows rapidly which brings ch… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  25. arXiv:2409.04504  [pdf, other

    cs.CR

    Comment on Revisiting Neural Program Smoothing for Fuzzing

    Authors: Dongdong She, Kexin Pei, Junfeng Yang, Baishakhi Ray, Suman Jana

    Abstract: MLFuzz, a work accepted at ACM FSE 2023, revisits the performance of a machine learning-based fuzzer, NEUZZ. We demonstrate that its main conclusion is entirely wrong due to several fatal bugs in the implementation and wrong evaluation setups, including an initialization bug in persistent mode, a program crash, an error in training dataset collection, and a mistake in fuzzing result collection. Ad… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: Comment on 10.1145/3611643.3616308

  26. arXiv:2409.03597  [pdf, other

    cs.SD cs.AI eess.AS

    Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Cord Paralysis

    Authors: Yucong Zhang, Xin Zou, Jinshan Yang, Wenjun Chen, Faya Liang, Ming Li

    Abstract: This paper presents the Multimodal Analyzing System for Laryngoscope (MASL), a system that combines audio and video data to automatically extract key segments and metrics from laryngeal videostroboscopic videos for clinical assessment. MASL integrates glottis detection with keyword spotting to analyze patient vocalizations and refine video highlights for better inspection of vocal cord movements.… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  27. arXiv:2409.02795  [pdf, other

    cs.CL

    Towards a Unified View of Preference Learning for Large Language Models: A Survey

    Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

    Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to unde… ▽ More

    Submitted 9 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: 23 pages, 6 figures

  28. arXiv:2409.02634  [pdf, other

    cs.CV

    Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

    Authors: Jianwen Jiang, Chao Liang, Jiaqi Yang, Gaojie Lin, Tianyun Zhong, Yanbo Zheng

    Abstract: With the introduction of diffusion-based video generation techniques, audio-conditioned human video generation has recently achieved significant breakthroughs in both the naturalness of motion and the synthesis of portrait details. Due to the limited control of audio signals in driving human motion, existing methods often add auxiliary spatial signals to stabilize movements, which may compromise t… ▽ More

    Submitted 5 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: Homepage: https://loopyavatar.github.io/

  29. arXiv:2409.02616  [pdf, other

    cs.IT

    Group Information Geometry Approach for Ultra-Massive MIMO Signal Detection

    Authors: Jiyuan Yang, Yan Chen, Xiqi Gao, Xiang-Gen Xia, Dirk Slock

    Abstract: We propose a group information geometry approach (GIGA) for ultra-massive multiple-input multiple-output (MIMO) signal detection. The signal detection task is framed as computing the approximate marginals of the a posteriori distribution of the transmitted data symbols of all users. With the approximate marginals, we perform the maximization of the {\textsl{a posteriori}} marginals (MPM) detection… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  30. arXiv:2409.02581  [pdf, other

    cs.CV

    Object Gaussian for Monocular 6D Pose Estimation from Sparse Views

    Authors: Luqing Luo, Shichu Sun, Jiangang Yang, Linfang Zheng, Jinwei Du, Jian Liu

    Abstract: Monocular object pose estimation, as a pivotal task in computer vision and robotics, heavily depends on accurate 2D-3D correspondences, which often demand costly CAD models that may not be readily available. Object 3D reconstruction methods offer an alternative, among which recent advancements in 3D Gaussian Splatting (3DGS) afford a compelling potential. Yet its performance still suffers and tend… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  31. arXiv:2409.01944  [pdf, other

    cs.CL

    FuzzCoder: Byte-level Fuzzing Test via Large Language Model

    Authors: Liqun Yang, Jian Yang, Chaoren Wei, Guanglin Niu, Ge Zhang, Yunli Wang, Linzheng ChaI, Wanxu Xia, Hongcheng Guo, Shun Zhang, Jiaheng Liu, Yuwei Yin, Junran Peng, Jiaxin Ma, Liang Sun, Zhoujun Li

    Abstract: Fuzzing is an important dynamic program analysis technique designed for finding vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and the best approaches often apply uniform random mutations to p… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 11 pages

  32. arXiv:2409.01876  [pdf, other

    cs.CV cs.AI

    CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention

    Authors: Gaojie Lin, Jianwen Jiang, Chao Liang, Tianyun Zhong, Jiaqi Yang, Yanbo Zheng

    Abstract: Diffusion-based video generation technology has advanced significantly, catalyzing a proliferation of research in human animation. However, the majority of these studies are confined to same-modality driving settings, with cross-modality human body animation remaining relatively underexplored. In this paper, we introduce, an end-to-end audio-driven human animation framework that ensures hand integ… ▽ More

    Submitted 4 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

    Comments: Homepage: https://cyberhost.github.io/

  33. arXiv:2409.01686  [pdf, other

    cs.CV

    Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

    Authors: Yanguang Sun, Chunyan Xu, Jian Yang, Hanyu Xuan, Lei Luo

    Abstract: Camouflaged object detection has attracted a lot of attention in computer vision. The main challenge lies in the high degree of similarity between camouflaged objects and their surroundings in the spatial domain, making identification difficult. Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design, bu… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted at ECCV 2024

  34. arXiv:2409.01661  [pdf, other

    cs.CR cs.CV cs.LG

    $S^2$NeRF: Privacy-preserving Training Framework for NeRF

    Authors: Bokang Zhang, Yanglin Zhang, Zhikun Zhang, Jinglan Yang, Lingying Huang, Junfeng Wu

    Abstract: Neural Radiance Fields (NeRF) have revolutionized 3D computer vision and graphics, facilitating novel view synthesis and influencing sectors like extended reality and e-commerce. However, NeRF's dependence on extensive data collection, including sensitive scene image data, introduces significant privacy risks when users upload this data for model training. To address this concern, we first propose… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: To appear in the ACM Conference on Computer and Communications Security (CCS'24), October 14-18, 2024, Salt Lake City, UT, USA

  35. arXiv:2409.01607  [pdf, other

    cs.LG math.OC

    Data-driven topology design based on principal component analysis for 3D structural design problems

    Authors: Jun Yang, Kentaro Yaji, Shintaro Yamasaki

    Abstract: Topology optimization is a structural design methodology widely utilized to address engineering challenges. However, sensitivity-based topology optimization methods struggle to solve optimization problems characterized by strong non-linearity. Leveraging the sensitivity-free nature and high capacity of deep generative models, data-driven topology design (DDTD) methodology is considered an effectiv… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 19 pages, 18 figures

  36. arXiv:2409.01566  [pdf, other

    cs.IT eess.SP

    Exploring Hannan Limitation for 3D Antenna Array

    Authors: Ran Ji, Chongwen Huang, Xiaoming Chen, Wei E. I. Sha, Zhaoyang Zhang, Jun Yang, Kun Yang, Chau Yuen, Mérouane Debbah

    Abstract: Hannan Limitation successfully links the directivity characteristics of 2D arrays with the aperture gain limit, providing the radiation efficiency upper limit for large 2D planar antenna arrays. This demonstrates the inevitable radiation efficiency degradation caused by mutual coupling effects between array elements. However, this limitation is derived based on the assumption of infinitely large 2… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 13 pages, 16 figures

  37. arXiv:2409.01251  [pdf, ps, other

    cs.LG cs.DC

    GAS: Generative Activation-Aided Asynchronous Split Federated Learning

    Authors: Jiarong Yang, Yuan Liu

    Abstract: Split Federated Learning (SFL) splits and collaboratively trains a shared model between clients and server, where clients transmit activations and client-side models to server for updates. Recent SFL studies assume synchronous transmission of activations and client-side models from clients to server. However, due to significant variations in computational and communication capabilities among clien… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  38. arXiv:2409.00872  [pdf, other

    cs.CL

    Self-evolving Agents with reflective and memory-augmented abilities

    Authors: Xuechen Liang, Meiling Tao, Yinghui Xia, Tianyu Shi, Jun Wang, JingSong Yang

    Abstract: Large language models (LLMs) have made significant advances in the field of natural language processing, but they still face challenges such as continuous decision-making. In this research, we propose a novel framework by integrating iterative feedback, reflective mechanisms, and a memory optimization mechanism based on the Ebbinghaus forgetting curve, it significantly enhances the agents' capabil… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  39. arXiv:2409.00855  [pdf, other

    cs.CL stat.ML

    LanguaShrink: Reducing Token Overhead with Psycholinguistics

    Authors: Xuechen Liang, Meiling Tao, Yinghui Xia, Tianyu Shi, Jun Wang, JingSong Yang

    Abstract: As large language models (LLMs) improve their capabilities in handling complex tasks, the issues of computational cost and efficiency due to long prompts are becoming increasingly prominent. To accelerate model inference and reduce costs, we propose an innovative prompt compression framework called LanguaShrink. Inspired by the observation that LLM performance depends on the density and position o… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  40. arXiv:2409.00591  [pdf, other

    cs.CV

    Attention-Guided Multi-scale Interaction Network for Face Super-Resolution

    Authors: Xujie Wan, Wenjie Li, Guangwei Gao, Huimin Lu, Jian Yang, Chia-Wen Lin

    Abstract: Recently, CNN and Transformer hybrid networks demonstrated excellent performance in face super-resolution (FSR) tasks. Since numerous features at different scales in hybrid networks, how to fuse these multi-scale features and promote their complementarity is crucial for enhancing FSR. However, existing hybrid network-based FSR methods ignore this, only simply combining the Transformer and CNN. To… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 12 pages, 8 figures, 8 tables

  41. arXiv:2409.00084  [pdf

    cs.CL cs.AI

    Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models

    Authors: Seyed Amir Ahmad Safavi-Naini, Shuhaib Ali, Omer Shahab, Zahra Shahhoseini, Thomas Savage, Sara Rafiee, Jamil S Samaan, Reem Al Shabeeb, Farah Ladak, Jamie O Yang, Juan Echavarria, Sumbal Babar, Aasma Shaukat, Samuel Margolis, Nicholas P Tatonetti, Girish Nadkarni, Bara El Kurdi, Ali Soroush

    Abstract: Background and Aims: This study evaluates the medical reasoning performance of large language models (LLMs) and vision language models (VLMs) in gastroenterology. Methods: We used 300 gastroenterology board exam-style multiple-choice questions, 138 of which contain images to systematically assess the impact of model configurations and parameters and prompt engineering strategies utilizing GPT-3.… ▽ More

    Submitted 4 September, 2024; v1 submitted 25 August, 2024; originally announced September 2024.

    Comments: Manuscript Pages: 34, Figures: 7, Tables: 2, Supplementary File Pages: 35, Data Transparency Statement: Code is available at: https://github.com/Sdamirsa/LLM-VLM-in-Gastroenterology . Study data from American College of Gastroenterology (ACG) are restricted and available upon request with ACG permission. Correction: updated abstract considering Llama3.1 results

    MSC Class: 92C50; 68T50 ACM Class: J.3

  42. arXiv:2408.16760  [pdf, other

    cs.CV

    OmniRe: Omni Urban Scene Reconstruction

    Authors: Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, Yue Wang

    Abstract: We introduce OmniRe, a holistic approach for efficiently reconstructing high-fidelity dynamic urban scenes from on-device logs. Recent methods for modeling driving sequences using neural radiance fields or Gaussian Splatting have demonstrated the potential of reconstructing challenging dynamic scenes, but often overlook pedestrians and other non-vehicle dynamic actors, hindering a complete pipelin… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: See the project page for code, video results and demos: https://ziyc.github.io/omnire/

  43. arXiv:2408.16469  [pdf, other

    cs.CV

    Multi-source Domain Adaptation for Panoramic Semantic Segmentation

    Authors: Jing Jiang, Sicheng Zhao, Jiankun Zhu, Wenbo Tang, Zhaopan Xu, Jidong Yang, Pengfei Xu, Hongxun Yao

    Abstract: Panoramic semantic segmentation has received widespread attention recently due to its comprehensive 360\degree field of view. However, labeling such images demands greater resources compared to pinhole images. As a result, many unsupervised domain adaptation methods for panoramic semantic segmentation have emerged, utilizing real pinhole images or low-cost synthetic panoramic images. But, the segm… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 9 pages, 7 figures, 5 tables

  44. arXiv:2408.15915  [pdf, other

    cs.CV cs.AI cs.CL

    Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

    Authors: Yuncheng Yang, Yulei Qin, Tong Wu, Zihan Xu, Gang Li, Pengcheng Guo, Hang Shao, Yuchen Shi, Ke Li, Xing Sun, Jie Yang, Yun Gu

    Abstract: The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas often requires special-purpose tuning with calibrated behaviors on the expected stable outputs. To avoid huge cost brought by manual preparation of instruction datasets and training resources up to hundreds of hours, the exploitation of open knowledge including a wealth of low rank adaptation (LoRA) mode… ▽ More

    Submitted 7 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: 29 pages, 12 tables, 10 figures

  45. arXiv:2408.15217  [pdf, other

    eess.IV cs.AI cs.CV

    Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance

    Authors: Weiyi Zhang, Siyu Huang, Jiancheng Yang, Ruoyu Chen, Zongyuan Ge, Yingfeng Zheng, Danli Shi, Mingguang He

    Abstract: Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted by Medical Image Computing and Computer Assisted Intervention Society (MICCAI) 2024

  46. arXiv:2408.14423  [pdf, other

    eess.AS cs.SD

    DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance

    Authors: Jinhyeok Yang, Junhyeok Lee, Hyeong-Seok Choi, Seunghun Ji, Hyeongju Kim, Juheon Lee

    Abstract: Text-to-Speech (TTS) models have advanced significantly, aiming to accurately replicate human speech's diversity, including unique speaker identities and linguistic nuances. Despite these advancements, achieving an optimal balance between speaker-fidelity and text-intelligibility remains a challenge, particularly when diverse control demands are considered. Addressing this, we introduce DualSpeech… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: Accepted to INTERSPEECH 2024

  47. arXiv:2408.14087  [pdf, other

    cs.CV

    LSM-YOLO: A Compact and Effective ROI Detector for Medical Detection

    Authors: Zhongwen Yu, Qiu Guan, Jianmin Yang, Zhiqiang Yang, Qianwei Zhou, Yang Chen, Feng Chen

    Abstract: In existing medical Region of Interest (ROI) detection, there lacks an algorithm that can simultaneously satisfy both real-time performance and accuracy, not meeting the growing demand for automatic detection in medicine. Although the basic YOLO framework ensures real-time detection due to its fast speed, it still faces challenges in maintaining precision concurrently. To alleviate the above probl… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  48. arXiv:2408.13771  [pdf, other

    cs.CV

    ICFRNet: Image Complexity Prior Guided Feature Refinement for Real-time Semantic Segmentation

    Authors: Xin Zhang, Teodor Boyadzhiev, Jinglei Shi, Jufeng Yang

    Abstract: In this paper, we leverage image complexity as a prior for refining segmentation features to achieve accurate real-time semantic segmentation. The design philosophy is based on the observation that different pixel regions within an image exhibit varying levels of complexity, with higher complexities posing a greater challenge for accurate segmentation. We thus introduce image complexity as prior g… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  49. arXiv:2408.13713  [pdf, other

    quant-ph cs.LG

    Verifiable cloud-based variational quantum algorithms

    Authors: Junhong Yang, Banghai Wang, Junyu Quan, Qin Li

    Abstract: Variational quantum algorithms (VQAs) have shown potential for quantum advantage with noisy intermediate-scale quantum (NISQ) devices for quantum machine learning (QML). However, given the high cost and limited availability of quantum resources, delegating VQAs via cloud networks is a more practical solution for clients with limited quantum capabilities. Recently, Shingu et al.[Physical Review A,… ▽ More

    Submitted 3 September, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

  50. arXiv:2408.13686  [pdf, other

    cs.SE

    Perception-Guided Fuzzing for Simulated Scenario-Based Testing of Autonomous Driving Systems

    Authors: Tri Minh Triet Pham, Bo Yang, Jinqiu Yang

    Abstract: Autonomous Driving Systems (ADS) have made huge progress and started on-road testing or even commercializing trials. ADS are complex and difficult to test: they receive input data from multiple sensors and make decisions using a combination of multiple deep neural network models and code logic. The safety of ADS is of utmost importance as their misbehavior can result in costly catastrophes, includ… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.