Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 243 results for author: Gao, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11084  [pdf, other

    eess.IV cs.CV

    A Survey of Distance-Based Vessel Trajectory Clustering: Data Pre-processing, Methodologies, Applications, and Experimental Evaluation

    Authors: Maohan Liang, Ryan Wen Liu, Ruobin Gao, Zhe Xiao, Xiaocai Zhang, Hua Wang

    Abstract: Vessel trajectory clustering, a crucial component of the maritime intelligent transportation systems, provides valuable insights for applications such as anomaly detection and trajectory prediction. This paper presents a comprehensive survey of the most prevalent distance-based vessel trajectory clustering methods, which encompass two main steps: trajectory similarity measurement and clustering. I… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  2. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang , et al. (34 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  3. arXiv:2407.10374  [pdf, other

    cs.CV cs.AI

    An Empirical Study of Mamba-based Pedestrian Attribute Recognition

    Authors: Xiao Wang, Weizhe Kong, Jiandong Jin, Shiao Wang, Ruichong Gao, Qingchuan Ma, Chenglong Li, Jin Tang

    Abstract: Current strong pedestrian attribute recognition models are developed based on Transformer networks, which are computationally heavy. Recently proposed models with linear complexity (e.g., Mamba) have garnered significant attention and have achieved a good balance between accuracy and computational cost across a variety of visual tasks. Relevant review articles also suggest that while these models… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: In Peer Review

  4. arXiv:2407.08751  [pdf, other

    q-bio.NC cs.LG

    Latent Diffusion for Neural Spiking Data

    Authors: Jaivardhan Kapoor, Auguste Schulz, Julius Vetter, Felix Pei, Richard Gao, Jakob H. Macke

    Abstract: Modern datasets in neuroscience enable unprecedented inquiries into the relationship between complex behaviors and the activity of many simultaneously recorded neurons. While latent variable models can successfully extract low-dimensional embeddings from such recordings, using them to generate realistic spiking data, especially in a behavior-dependent manner, still poses a challenge. Here, we pres… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  5. arXiv:2407.06479  [pdf, other

    cs.CL cs.SI

    Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations

    Authors: Rena Gao, Carsten Roever, Jey Han Lau

    Abstract: We present an evaluation framework for interactive dialogue assessment in the context of English as a Second Language (ESL) speakers. Our framework collects dialogue-level interactivity labels (e.g., topic management; 4 labels in total) and micro-level span features (e.g., backchannels; 17 features in total). Given our annotated data, we study how the micro-level features influence the (higher lev… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  6. arXiv:2407.02913  [pdf, other

    cs.LG cs.AI eess.IV eess.SP math.NA

    SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

    Authors: Liulu He, Yufei Zhao, Rui Gao, Yuan Du, Li Du

    Abstract: Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast co… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  7. arXiv:2407.01851  [pdf, other

    cs.CV cs.AI cs.LG eess.AS

    Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

    Authors: Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Jun Chen, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha

    Abstract: Leveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio. However, the progress in these directions has been mostly focused on tasks that only require a coarse-grained understanding of the audio-visual semantics. We present Meerkat, an audio-visual LLM equipped with a fine-grained un… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  8. arXiv:2407.01552  [pdf

    cs.NI physics.optics

    High Spectral-Efficiency, Ultra-low MIMO SDM Transmission over a Field-Deployed Multi-Core OAM Fiber

    Authors: Junyi Liu, Zengquan Xu, Shuqi Mo, Yuming Huang, Yining Huang, Zhenhua Li, Yuying Guo, Lei Shen, Shuo Xu, Ran Gao, Cheng Du, Qian Feng, Jie Luo, Jie Liu, Siyuan Yu

    Abstract: Few-mode multi-core fiber (FM-MCF) based Space-Division Multiplexing (SDM) systems possess the potential to maximize the number of multiplexed spatial channels per fiber by harnessing both the space (fiber cores) and mode (optical mode per core) dimensions. However, to date, no SDM transmissions over field-deployed FM-MCFs in realistic outdoor settings have been reported, which contrasts with SDM… ▽ More

    Submitted 29 April, 2024; originally announced July 2024.

    Comments: 17 pages, 8 figures

  9. arXiv:2406.16083  [pdf, other

    eess.IV cs.CV

    Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

    Authors: Ruisheng Gao, Zeyu Xiao, Zhiwei Xiong

    Abstract: Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 17 pages,7 figures

  10. arXiv:2406.15758  [pdf, other

    cs.LG cs.DC

    EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

    Authors: Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin

    Abstract: Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and ef… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  11. arXiv:2406.08204  [pdf, other

    cs.CV

    Diffusion-Promoted HDR Video Reconstruction

    Authors: Yuanshen Guan, Ruikang Xu, Mingde Yao, Ruisheng Gao, Lizhi Wang, Zhiwei Xiong

    Abstract: High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Arxiv Preprint

  12. arXiv:2406.07532  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Hearing Anything Anywhere

    Authors: Mason Wang, Ryosuke Sawata, Samuel Clarke, Ruohan Gao, Shangzhe Wu, Jiajun Wu

    Abstract: Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024. The first two authors contributed equally. Project page: https://masonlwang.com/hearinganythinganywhere/

    ACM Class: I.2.10; I.4.8

  13. arXiv:2406.07393  [pdf, other

    cs.CL

    Limited Out-of-Context Knowledge Reasoning in Large Language Models

    Authors: Peng Hu, Changjiang Gao, Ruiqi Gao, Jiajun Chen, Shujian Huang

    Abstract: Large Language Models (LLMs) have demonstrated strong capabilities as knowledge bases and significant in-context reasoning capabilities. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant facet of out-of-context reasoning: Out-of-Context… ▽ More

    Submitted 24 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  14. arXiv:2406.05746  [pdf

    cs.AI cs.HC cs.LG

    Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

    Authors: Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou

    Abstract: AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Journal ref: Artificaial Intelligence Review, (2024) 57:151

  15. arXiv:2406.01853  [pdf, other

    cs.LG cs.AI cs.MA

    Multi-Agent Reinforcement Learning Meets Leaf Sequencing in Radiotherapy

    Authors: Riqiang Gao, Florin C. Ghesu, Simon Arberet, Shahab Basiri, Esa Kuusela, Martin Kraus, Dorin Comaniciu, Ali Kamen

    Abstract: In contemporary radiotherapy planning (RTP), a key module leaf sequencing is predominantly addressed by optimization-based approaches. In this paper, we propose a novel deep reinforcement learning (DRL) model termed as Reinforced Leaf Sequencer (RLS) in a multi-agent framework for leaf sequencing. The RLS model offers improvements to time-consuming iterative optimization steps via large-scale trai… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  16. arXiv:2405.16865  [pdf, other

    q-bio.NC cs.LG stat.ML

    An Investigation of Conformal Isometry Hypothesis for Grid Cells

    Authors: Dehong Xu, Ruiqi Gao, Wen-Hao Zhang, Xue-Xin Wei, Ying Nian Wu

    Abstract: This paper investigates the conformal isometry hypothesis as a potential explanation for the emergence of hexagonal periodic patterns in the response maps of grid cells. The hypothesis posits that the activities of the population of grid cells form a high-dimensional vector in the neural space, representing the agent's self-position in 2D physical space. As the agent moves in the 2D physical space… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.19192

  17. arXiv:2405.16852  [pdf, other

    cs.LG cs.AI stat.ML

    EM Distillation for One-step Diffusion Models

    Authors: Sirui Xie, Zhisheng Xiao, Diederik P Kingma, Tingbo Hou, Ying Nian Wu, Kevin Patrick Murphy, Tim Salimans, Ben Poole, Ruiqi Gao

    Abstract: While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Disti… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  18. arXiv:2405.16730  [pdf, other

    cs.LG cs.AI stat.AP

    Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

    Authors: Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

    Abstract: Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  19. arXiv:2405.14475  [pdf, other

    cs.CV cs.AI

    MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

    Authors: Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, Qiang Xu

    Abstract: While controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs. In this paper, we introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation that supports multi-condition control, including B… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  20. arXiv:2405.13206  [pdf, other

    cs.CV

    Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding

    Authors: Rong Gao, Xin Liu, Bohao Xing, Zitong Yu, Bjorn W. Schuller, Heikki Kälviäinen

    Abstract: In this work, we focus on a special group of human body language -- the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings. This characteristic introduces two novel challenges regarding micro-gestures that are worth rethin… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  21. arXiv:2405.13045  [pdf, other

    cs.HC cs.AI

    CoLay: Controllable Layout Generation through Multi-conditional Latent Diffusion

    Authors: Chin-Yi Cheng, Ruiqi Gao, Forrest Huang, Yang Li

    Abstract: Layout design generation has recently gained significant attention due to its potential applications in various fields, including UI, graphic, and floor plan design. However, existing models face two main challenges that limits their adoption in practice. Firstly, the limited expressiveness of individual condition types used in previous works restricts designers' ability to convey complex design i… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  22. arXiv:2405.10314  [pdf, other

    cs.CV

    CAT3D: Create Anything in 3D with Multi-View Diffusion Models

    Authors: Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T. Barron, Ben Poole

    Abstract: Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent nov… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Project page: https://cat3d.github.io

  23. arXiv:2405.03950  [pdf, other

    cs.LG cs.AI

    Relating-Up: Advancing Graph Neural Networks through Inter-Graph Relationships

    Authors: Qi Zou, Na Yu, Daoliang Zhang, Wei Zhang, Rui Gao

    Abstract: Graph Neural Networks (GNNs) have excelled in learning from graph-structured data, especially in understanding the relationships within a single graph, i.e., intra-graph relationships. Despite their successes, GNNs are limited by neglecting the context of relationships across graphs, i.e., inter-graph relationships. Recognizing the potential to extend this capability, we introduce Relating-Up, a p… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 16 pages, 6 figures, 9 tables

  24. arXiv:2404.10763  [pdf, other

    cs.AI cs.CL cs.CV

    LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

    Authors: Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun

    Abstract: Diffusion models have exhibited remarkable capabilities in text-to-image generation. However, their performance in image-to-text generation, specifically image captioning, has lagged behind Auto-Regressive (AR) models, casting doubt on their applicability for such tasks. In this work, we revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding. With… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  25. arXiv:2404.10595  [pdf, other

    cs.CV

    Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

    Authors: Kai Chen, Yanze Li, Wenhua Zhang, Yanxin Liu, Pengxiang Li, Ruiyuan Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-Yan Yeung, Huchuan Lu, Xu Jia

    Abstract: Large Vision-Language Models (LVLMs) have received widespread attention in advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on the multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases. In this paper, we propose CODA-LM, the very first benchmark for the autom… ▽ More

    Submitted 26 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Project Page: https://coda-dataset.github.io/coda-lm/

  26. arXiv:2404.01296  [pdf, other

    cs.CV

    MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space

    Authors: Armand Comas-Massagué, Di Qiu, Menglei Chai, Marcel Bühler, Amit Raj, Ruiqi Gao, Qiangeng Xu, Mark Matthews, Paulo Gotardo, Octavia Camps, Sergio Orts-Escolano, Thabo Beeler

    Abstract: We introduce a novel framework for 3D human avatar generation and personalization, leveraging text prompts to enhance user engagement and customization. Central to our approach are key innovations aimed at overcoming the challenges in photo-realistic avatar synthesis. Firstly, we utilize a conditional Neural Radiance Fields (NeRF) model, trained on a large-scale unannotated multi-view dataset, to… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  27. arXiv:2403.19221  [pdf, other

    cs.CV cs.AI

    Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

    Authors: Sishuo Chen, Lei Li, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu Sun, Lu Hou

    Abstract: Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries. However, the existing models are constrained by the assumption of constant availability of a single auxiliary modality, which is impractical given the diversity and unpredictable nature of real-world scenarios. To this end, we propose a Miss… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Code available at https://github.com/lancopku/MR-VPC

  28. Exploring Holistic HMI Design for Automated Vehicles: Insights from a Participatory Workshop to Bridge In-Vehicle and External Communication

    Authors: Haoyu Dong, Tram Thi Minh Tran, Rutger Verstegen, Silvia Cazacu, Ruolin Gao, Marius Hoggenmüller, Debargha Dey, Mervyn Franssen, Markus Sasalovici, Pavlo Bazilinskyy, Marieke Martens

    Abstract: Human-Machine Interfaces (HMIs) for automated vehicles (AVs) are typically divided into two categories: internal HMIs for interactions within the vehicle, and external HMIs for communication with other road users. In this work, we examine the prospects of bridging these two seemingly distinct domains. Through a participatory workshop with automotive user interface researchers and practitioners, we… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  29. arXiv:2403.16848  [pdf, other

    cs.CV

    Multiple Object Tracking as ID Prediction

    Authors: Ruopeng Gao, Yijun Zhang, Limin Wang

    Abstract: In Multiple Object Tracking (MOT), tracking-by-detection methods have stood the test for a long time, which split the process into two parts according to the definition: object detection and association. They leverage robust single-frame detectors and treat object association as a post-processing step through hand-crafted heuristic algorithms and surrogate tasks. However, the nature of heuristic t… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 71.4 HOTA on DanceTrack (with CrowdHuman), 67.5/70.0 HOTA on DanceTrack built upon Deformable DETR and DAB-Deformable DETR respectively (without additional data). The code repository will be created within several days

  30. arXiv:2403.14822  [pdf, other

    stat.ML cs.LG math.OC

    Non-Convex Robust Hypothesis Testing using Sinkhorn Uncertainty Sets

    Authors: Jie Wang, Rui Gao, Yao Xie

    Abstract: We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-c… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 26 pages, 2 figures

  31. arXiv:2403.13304  [pdf, other

    cs.CV

    DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

    Authors: Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang

    Abstract: Current perceptive models heavily depend on resource-intensive datasets, prompting the need for innovative solutions. Leveraging recent advances in diffusion models, synthetic data, by constructing image inputs from various annotations, proves beneficial for downstream tasks. While prior methods have separately addressed generative and perceptive models, DetDiffusion, for the first time, harmonize… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  32. arXiv:2403.12636  [pdf, other

    cs.LG stat.ML

    A Practical Guide to Statistical Distances for Evaluating Generative Models in Science

    Authors: Sebastian Bischoff, Alana Darcher, Michael Deistler, Richard Gao, Franziska Gerken, Manuel Gloeckler, Lisa Haxel, Jaivardhan Kapoor, Janne K Lappalainen, Jakob H Macke, Guy Moss, Matthijs Pals, Felix Pei, Rachel Rapp, A Erdem Sağtekin, Cornelius Schröder, Auguste Schulz, Zinovia Stefanidi, Shoji Toyota, Linda Ulmer, Julius Vetter

    Abstract: Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular notions of statistical distances, requiring only foundati… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  33. Holistic HMI Design for Automated Vehicles: Bridging In-Vehicle and External Communication

    Authors: Haoyu Dong, Tram Thi Minh Tran, Pavlo Bazilinskyy, Marius Hoggenmüller, Debargha Dey, Silvia Cazacu, Mervyn Franssen, Ruolin Gao

    Abstract: As the field of automated vehicles (AVs) advances, it has become increasingly critical to develop human-machine interfaces (HMI) for both internal and external communication. Critical dialogue is emerging around the potential necessity for a holistic approach to HMI designs, which promotes the integration of both in-vehicle user and external road user perspectives. This approach aims to create a u… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  34. arXiv:2403.09363  [pdf, other

    cs.CV

    Sentinel-Guided Zero-Shot Learning: A Collaborative Paradigm without Real Data Exposure

    Authors: Fan Wan, Xingyu Miao, Haoran Duan, Jingjing Deng, Rui Gao, Yang Long

    Abstract: With increasing concerns over data privacy and model copyrights, especially in the context of collaborations between AI service providers and data owners, an innovative SG-ZSL paradigm is proposed in this work. SG-ZSL is designed to foster efficient collaboration without the need to exchange models or sensitive data. It consists of a teacher model, a student model and a generator that links both m… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  35. arXiv:2403.02867  [pdf, other

    cs.SI cs.LG

    Scalable Continuous-time Diffusion Framework for Network Inference and Influence Estimation

    Authors: Keke Huang, Ruize Gao, Bogdan Cautis, Xiaokui Xiao

    Abstract: The study of continuous-time information diffusion has been an important area of research for many applications in recent years. When only the diffusion traces (cascades) are accessible, cascade-based network inference and influence estimation are two essential problems to explore. Alas, existing methods exhibit limited capability to infer and process networks with more than a few thousand nodes,… ▽ More

    Submitted 20 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  36. arXiv:2403.01446  [pdf, other

    cs.CV

    GuardT2I: Defending Text-to-Image Models from Adversarial Prompts

    Authors: Yijun Yang, Ruiyuan Gao, Xiao Yang, Jianyuan Zhong, Qiang Xu

    Abstract: Recent advancements in Text-to-Image (T2I) models have raised significant safety concerns about their potential misuse for generating inappropriate or Not-Safe-For-Work (NSFW) contents, despite existing countermeasures such as NSFW classifiers or model fine-tuning for inappropriate concept removal. Addressing this challenge, our study unveils GuardT2I, a novel moderation framework that adopts a ge… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  37. arXiv:2402.17718  [pdf

    cs.LG eess.SP

    Towards a Digital Twin Framework in Additive Manufacturing: Machine Learning and Bayesian Optimization for Time Series Process Optimization

    Authors: Vispi Karkaria, Anthony Goeckner, Rujing Zha, Jie Chen, Jianjing Zhang, Qi Zhu, Jian Cao, Robert X. Gao, Wei Chen

    Abstract: Laser-directed-energy deposition (DED) offers advantages in additive manufacturing (AM) for creating intricate geometries and material grading. Yet, challenges like material inconsistency and part variability remain, mainly due to its layer-wise fabrication. A key issue is heat accumulation during DED, which affects the material microstructure and properties. While closed-loop control methods for… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 Pages, 10 Figures, 1 Table, NAMRC Conference

  38. arXiv:2402.07808  [pdf, other

    cs.LG

    Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation

    Authors: Julius Vetter, Guy Moss, Cornelius Schröder, Richard Gao, Jakob H. Macke

    Abstract: Scientific modeling applications often require estimating a distribution of parameters consistent with a dataset of observations - an inference task also known as source distribution estimation. This problem can be ill-posed, however, since many different source distributions might produce the same distribution of data-consistent simulations. To make a principled choice among many equally valid so… ▽ More

    Submitted 15 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  39. arXiv:2402.06841  [pdf

    eess.IV cs.CV

    Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA

    Authors: Shaojie Tang, Penpen Miao, Xingyu Gao, Yu Zhong, Dantong Zhu, Haixing Wen, Zhihui Xu, Qiuyue Wei, Hongping Yao, Xin Huang, Rui Gao, Chen Zhao, Weihua Zhou

    Abstract: A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point c… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  40. arXiv:2402.00575  [pdf, other

    cs.CV

    Diffusion-based Light Field Synthesis

    Authors: Ruisheng Gao, Yutong Liu, Zeyu Xiao, Zhiwei Xiong

    Abstract: Light fields (LFs), conducive to comprehensive scene radiance recorded across angular dimensions, find wide applications in 3D reconstruction, virtual reality, and computational photography.However, the LF acquisition is inevitably time-consuming and resource-intensive due to the mainstream acquisition strategy involving manual capture or laborious software synthesis.Given such a challenge, we int… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 11 pages,9 figures

  41. arXiv:2401.01217  [pdf, other

    cs.DC

    KCES: A Workflow Containerization Scheduling Scheme Under Cloud-Edge Collaboration Framework

    Authors: Chenggang Shan, Runze Gao, Qinghua Han, Zhen Yang, Jinhui Zhang, Yuanqing Xia

    Abstract: As more IoT applications gradually move towards the cloud-edge collaborative mode, the containerized scheduling of workflows extends from the cloud to the edge. However, given the high delay of the communication network, loose coupling of structure, and resource heterogeneity between cloud and edge, workflow containerization scheduling in the cloud-edge scenarios faces the difficulty of resource c… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  42. arXiv:2312.12870  [pdf, other

    cs.CV

    The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

    Authors: Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao

    Abstract: In recent years, the thriving development of research related to egocentric videos has provided a unique perspective for the study of conversational interactions, where both visual and audio signals play a crucial role. While most prior work focus on learning about behaviors that directly involve the camera wearer, we introduce the Ego-Exocentric Conversational Graph Prediction problem, marking th… ▽ More

    Submitted 3 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  43. arXiv:2312.02981  [pdf, other

    cs.CV

    ReconFusion: 3D Reconstruction with Diffusion Priors

    Authors: Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P. Srinivasan, Dor Verbin, Jonathan T. Barron, Ben Poole, Aleksander Holynski

    Abstract: 3D reconstruction methods such as Neural Radiance Fields (NeRFs) excel at rendering photorealistic novel views of complex scenes. However, recovering a high-quality NeRF typically requires tens to hundreds of input images, resulting in a time-consuming capture process. We present ReconFusion to reconstruct real-world scenes using only a few photos. Our approach leverages a diffusion prior for nove… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Project page: https://reconfusion.github.io/

  44. arXiv:2312.00820  [pdf, other

    cs.LG cs.AI

    Non-Cross Diffusion for Semantic Consistency

    Authors: Ziyang Zheng, Ruiyuan Gao, Qiang Xu

    Abstract: In diffusion models, deviations from a straight generative flow are a common issue, resulting in semantic inconsistencies and suboptimal generations. To address this challenge, we introduce `Non-Cross Diffusion', an innovative approach in generative modeling for learning ordinary differential equation (ODE) models. Our methodology strategically incorporates an ascending dimension of input to effec… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  45. arXiv:2312.00651  [pdf, other

    cs.CV cs.AI

    TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models

    Authors: Pengxiang Li, Kai Chen, Zhili Liu, Ruiyuan Gao, Lanqing Hong, Guo Zhou, Hua Yao, Dit-Yan Yeung, Huchuan Lu, Xu Jia

    Abstract: Despite remarkable achievements in video synthesis, achieving granular control over complex dynamics, such as nuanced movement among multiple interacting objects, still presents a significant hurdle for dynamic world modeling, compounded by the necessity to manage appearance and disappearance, drastic scale changes, and ensure consistency for instances across frames. These challenges hinder the de… ▽ More

    Submitted 20 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  46. arXiv:2311.17516  [pdf, other

    cs.CR cs.CV

    MMA-Diffusion: MultiModal Attack on Diffusion Models

    Authors: Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Tsung-Yi Ho, Nan Xu, Qiang Xu

    Abstract: In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effecti… ▽ More

    Submitted 30 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024. Our codes and benchmarks are available at https://github.com/cure-lab/MMA-Diffusion

  47. arXiv:2311.17404  [pdf, other

    cs.CV cs.AI cs.CL

    VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

    Authors: Shicheng Li, Lei Li, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu Sun, Lu Hou

    Abstract: The ability to perceive how objects change over time is a crucial ingredient in human intelligence. However, current benchmarks cannot faithfully reflect the temporal understanding abilities of video-language models (VidLMs) due to the existence of static visual shortcuts. To remedy this issue, we present VITATECS, a diagnostic VIdeo-Text dAtaset for the evaluation of TEmporal Concept underStandin… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 23 pages, 6 figures, 18 tables, data is available at https://github.com/lscpku/VITATECS

  48. arXiv:2311.07955  [pdf, other

    cs.CV cs.AI

    Deep Learning-Based Object Detection in Maritime Unmanned Aerial Vehicle Imagery: Review and Experimental Comparisons

    Authors: Chenjie Zhao, Ryan Wen Liu, Jingxiang Qu, Ruobin Gao

    Abstract: With the advancement of maritime unmanned aerial vehicles (UAVs) and deep learning technologies, the application of UAV-based object detection has become increasingly significant in the fields of maritime industry and ocean engineering. Endowed with intelligent sensing capabilities, the maritime UAVs enable effective and efficient maritime surveillance. To further promote the development of mariti… ▽ More

    Submitted 14 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 32 pages, 18 figures

  49. arXiv:2311.04533  [pdf, ps, other

    cs.DS

    Improved Approximations for Ultrametric Violation Distance

    Authors: Moses Charikar, Ruiquan Gao

    Abstract: We study the Ultrametric Violation Distance problem introduced by Cohen-Addad, Fan, Lee, and Mesmay [FOCS, 2022]. Given pairwise distances $x\in \mathbb{R}_{>0}^{\binom{[n]}{2}}$ as input, the goal is to modify the minimum number of distances so as to make it a valid ultrametric. In other words, this is the problem of fitting an ultrametric to given data, where the quality of the fit is measured b… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: SODA 2024

  50. arXiv:2311.03517  [pdf, other

    cs.SD cs.CV eess.AS

    SoundCam: A Dataset for Finding Humans Using Room Acoustics

    Authors: Mason Wang, Samuel Clarke, Jui-Hsien Wang, Ruohan Gao, Jiajun Wu

    Abstract: A room's acoustic properties are a product of the room's geometry, the objects within the room, and their specific positions. A room's acoustic properties can be characterized by its impulse response (RIR) between a source and listener location, or roughly inferred from recordings of natural signals present in the room. Variations in the positions of objects in a room can effect measurable changes… ▽ More

    Submitted 15 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: In NeurIPS 2023 Datasets and Benchmarks Track. Project page: https://masonlwang.com/soundcam/. Wang and Clarke contributed equally to this work