Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 141 results for author: Ge, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.17678  [pdf, other

    cs.CL

    Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

    Authors: Xihui Lin, Yunan Zhang, Suyu Ge, Barun Patra, Vishrav Chaudhary, Xia Song

    Abstract: Existing LLM training and inference frameworks struggle in boosting efficiency with sparsity while maintaining the integrity of context and model architecture. Inspired by the sharding concept in database and the fact that attention parallelizes over heads on accelerators, we propose Sparsely-Sharded (S2) Attention, an attention algorithm that allocates heterogeneous context partitions for differe… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 10 pages

  2. arXiv:2407.14304  [pdf, ps, other

    cs.IT

    MDS Generalized Convertible Code

    Authors: Songping Ge, Han Cai, Xiaohu Tang

    Abstract: In this paper, we consider the convertible codes with the maximum distance separable (MDS) property, which can adjust the code rate according to the failure rates of devices. We first extend the notion of convertible codes to allow initial and final codes with different parameters. Then, we investigate the relationship between these parameters and thus establish new lower bounds on the access cost… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  3. arXiv:2407.10550  [pdf, other

    cs.CV

    Learning Natural Consistency Representation for Face Forgery Video Detection

    Authors: Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, Shiming Ge

    Abstract: Face Forgery videos have elicited critical social public concerns and various detectors have been proposed. However, fully-supervised detectors may lead to easily overfitting to specific forgery methods or videos, and existing self-supervised detectors are strict on auxiliary tasks, such as requiring audio or multi-modalities, leading to limited generalization and robustness. In this paper, we exa… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2406.14846  [pdf, other

    cs.LG

    Graph Edge Representation via Tensor Product Graph Convolutional Representation

    Authors: Bo Jiang, Sheng Ge, Ziyan Zhang, Beibei Wang, Jin Tang, Bin Luo

    Abstract: Graph Convolutional Networks (GCNs) have been widely studied. The core of GCNs is the definition of convolution operators on graphs. However, existing Graph Convolution (GC) operators are mainly defined on adjacency matrix and node features and generally focus on obtaining effective node embeddings which cannot be utilized to address the graphs with (high-dimensional) edge features. To address thi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.09417  [pdf, other

    cs.CV cs.GR cs.LG

    Rethinking Score Distillation as a Bridge Between Image Distributions

    Authors: David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa

    Abstract: Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project webpage: https://sds-bridge.github.io/

  6. arXiv:2406.06279  [pdf, other

    cs.CL

    Multi-Prompting Decoder Helps Better Language Understanding

    Authors: Zifeng Cheng, Zhaoling Chen, Zhiwei Jiang, Yafeng Yin, Shiping Ge, Yuliang Liu, Qing Gu

    Abstract: Recent Pre-trained Language Models (PLMs) usually only provide users with the inference APIs, namely the emerging Model-as-a-Service (MaaS) setting. To adapt MaaS PLMs to downstream tasks without accessing their parameters and gradients, some existing methods focus on the output-side adaptation of PLMs, viewing the PLM as an encoder and then optimizing a task-specific decoder for decoding the outp… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  7. arXiv:2406.04337  [pdf, other

    cs.CV cs.AI

    Coherent Zero-Shot Visual Instruction Generation

    Authors: Quynh Phung, Songwei Ge, Jia-Bin Huang

    Abstract: Despite the advances in text-to-image synthesis, particularly with diffusion models, generating visual instructions that require consistent representation and smooth state transitions of objects across sequential steps remains a formidable challenge. This paper introduces a simple, training-free framework to tackle the issues, capitalizing on the advancements in diffusion models and large language… ▽ More

    Submitted 8 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: https://instruct-vis-zero.github.io/

  8. arXiv:2406.01063  [pdf, other

    cs.CV

    DANCE: Dual-View Distribution Alignment for Dataset Condensation

    Authors: Hansong Zhang, Shikun Li, Fanzhao Lin, Weiping Wang, Zhenxing Qian, Shiming Ge

    Abstract: Dataset condensation addresses the problem of data burden by learning a small synthetic training set that preserves essential knowledge from the larger real training set. To date, the state-of-the-art (SOTA) results are often yielded by optimization-oriented methods, but their inefficiency hinders their application to realistic datasets. On the other hand, the Distribution-Matching (DM) methods sh… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This work has been accepted by IJCAI-24

  9. arXiv:2405.16761  [pdf, other

    cs.CV cs.AI cs.LG

    Masked Face Recognition with Generative-to-Discriminative Representations

    Authors: Shiming Ge, Weijia Guo, Chenyu Li, Junzheng Zhang, Yong Li, Dan Zeng

    Abstract: Masked face recognition is important for social good but challenged by diverse occlusions that cause insufficient or inaccurate representations. In this work, we propose a unified deep network to learn generative-to-discriminative representations for facilitating masked face recognition. To this end, we split the network into three modules and learn them on synthetic masked faces in a greedy modul… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted by International Conference on Machine Learning 2024

  10. arXiv:2405.14709  [pdf, other

    cs.CV cs.MM

    OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance

    Authors: Shuheng Ge, Haoyu Xing, Li Zhang, Xiangqian Wu

    Abstract: Creating realistic, natural, and lip-readable talking face videos remains a formidable challenge. Previous research primarily concentrated on generating and aligning single-frame images while overlooking the smoothness of frame-to-frame transitions and temporal dependencies. This often compromised visual quality and effects in practical settings, particularly when handling complex facial data and… ▽ More

    Submitted 28 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  11. arXiv:2405.10832  [pdf, other

    cs.CV

    Open-Vocabulary Spatio-Temporal Action Detection

    Authors: Tao Wu, Shuqiu Ge, Jie Qin, Gangshan Wu, Limin Wang

    Abstract: Spatio-temporal action detection (STAD) is an important fine-grained video understanding task. Current methods require box and label supervision for all action classes in advance. However, in real-world applications, it is very likely to come across new action classes not seen in training because the action category space is large and hard to enumerate. Also, the cost of data annotation and model… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  12. arXiv:2404.12608  [pdf, other

    cs.DB cs.CL cs.PL

    Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations

    Authors: Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri

    Abstract: Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface. Today, spreadsheets are used by billions of users to manipulate tables, most of whom are neither database experts nor professional programmers. Despite the success of spreadsheets, authoring complex formulas remains challe… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: full version of a paper to appear in SIGMOD 2024

  13. arXiv:2404.12391  [pdf, other

    cs.CV cs.GR cs.LG

    On the Content Bias in Fréchet Video Distance

    Authors: Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, Jia-Bin Huang

    Abstract: Fréchet Video Distance (FVD), a prominent metric for evaluating video generation models, is known to conflict with human perception occasionally. In this paper, we aim to explore the extent of FVD's bias toward per-frame quality over temporal realism and identify its sources. We first quantify the FVD's sensitivity to the temporal axis by decoupling the frame and motion quality and find that the F… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project webpage: https://content-debiased-fvd.github.io/

  14. arXiv:2402.14301   

    cs.IR cs.LG

    GenSERP: Large Language Models for Whole Page Presentation

    Authors: Zhenning Zhang, Yunan Zhang, Suyu Ge, Guangwei Weng, Mridu Narang, Xia Song, Saurabh Tiwary

    Abstract: The advent of large language models (LLMs) brings an opportunity to minimize the effort in search engine result page (SERP) organization. In this paper, we propose GenSERP, a framework that leverages LLMs with vision in a few-shot setting to dynamically organize intermediate search results, including generated chat answers, website snippets, multimedia data, knowledge panels into a coherent SERP l… ▽ More

    Submitted 16 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Microsoft corp policy

  15. arXiv:2402.00404  [pdf, other

    cs.NE

    Improving Critical Node Detection Using Neural Network-based Initialization in a Genetic Algorithm

    Authors: Chanjuan Liu, Shike Ge, Zhihan Chen, Wenbin Pei, Enqiang Zhu, Yi Mei, Hisao Ishibuchi

    Abstract: The Critical Node Problem (CNP) is concerned with identifying the critical nodes in a complex network. These nodes play a significant role in maintaining the connectivity of the network, and removing them can negatively impact network performance. CNP has been studied extensively due to its numerous real-world applications. Among the different versions of CNP, CNP-1a has gained the most popularity… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 14 pages, 13 figures

  16. arXiv:2401.12237  [pdf, other

    math.AT cs.LG q-bio.QM

    A distribution-guided Mapper algorithm

    Authors: Yuyang Tao, Shufei Ge

    Abstract: Motivation: The Mapper algorithm is an essential tool to explore shape of data in topology data analysis. With a dataset as an input, the Mapper algorithm outputs a graph representing the topological features of the whole dataset. This graph is often regarded as an approximation of a reeb graph of data. The classic Mapper algorithm uses fixed interval lengths and overlapping ratios, which might fa… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  17. arXiv:2312.15927  [pdf, other

    cs.CV cs.LG

    M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy

    Authors: Hansong Zhang, Shikun Li, Pengju Wang, Dan Zeng, Shiming Ge

    Abstract: Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs. To address these challenges, dataset condensation has been developed to learn a small synthetic set that preserves essential information from the original large-scale dataset. Nowadays, optimization-oriented methods have been the primary method in the field of dataset co… ▽ More

    Submitted 25 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: This work has been accepted by AAAI-24

  18. arXiv:2312.07331  [pdf, other

    cs.LG cs.CV cs.HC

    Coupled Confusion Correction: Learning from Crowds with Sparse Annotations

    Authors: Hansong Zhang, Shikun Li, Dan Zeng, Chenggang Yan, Shiming Ge

    Abstract: As the size of the datasets getting larger, accurately annotating such datasets is becoming more impractical due to the expensiveness on both time and economy. Therefore, crowd-sourcing has been widely adopted to alleviate the cost of collecting labels, which also inevitably introduces label noise and eventually degrades the performance of the model. To learn from crowd-sourcing annotations, model… ▽ More

    Submitted 20 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: This work has been accepted by AAAI-24

  19. arXiv:2311.12228  [pdf, ps, other

    cs.CC quant-ph

    A Collapsible Polynomial Hierarchy for Promise Problems

    Authors: Chirag Falor, Shu Ge, Anand Natarajan

    Abstract: The polynomial hierarchy has been widely studied in classical complexity theory. In this paper, we will generalize some commonly known results about the polynomial hierarchy to a version of the hierarchy extended to promise problems. This paper proposes new definitions of existential and universal operators for classes of promise problems. Applying these to BQP, we recover the hierarchy proposed b… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 11 pages

  20. arXiv:2311.10585  [pdf, ps, other

    cs.CC

    Popularity on the 3D-Euclidean Stable Roommates

    Authors: Steven Ge, Toshiya Itoh

    Abstract: We study the 3D-Euclidean Multidimensional Stable Roommates problem, which asks whether a given set $V$ of $s\cdot n$ agents with a location in 3-dimensional Euclidean space can be partitioned into $n$ disjoint subsets $π= \{R_1 ,\dots , R_n\}$ with $|R_i| = s$ for each $R_i \in π$ such that $π$ is (strictly) popular, where $s$ is the room size. A partitioning is popular if there does not exist an… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: 27 pages, 23 figures

    MSC Class: 91A68; 91A10; 68Q25; 68Q17 ACM Class: F.2.2; G.2.1

  21. arXiv:2311.09566  [pdf, other

    cs.LG

    A Knowledge Distillation Approach for Sepsis Outcome Prediction from Multivariate Clinical Time Series

    Authors: Anna Wong, Shu Ge, Nassim Oufattole, Adam Dejl, Megan Su, Ardavan Saeedi, Li-wei H. Lehman

    Abstract: Sepsis is a life-threatening condition triggered by an extreme infection response. Our objective is to forecast sepsis patient outcomes using their medical history and treatments, while learning interpretable state representations to assess patients' risks in developing various adverse outcomes. While neural networks excel in outcome prediction, their limited interpretability remains a key issue.… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 12 pages

  22. arXiv:2311.07689  [pdf, other

    cs.CL

    MART: Improving LLM Safety with Multi-round Automatic Red-Teaming

    Authors: Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, Yuning Mao

    Abstract: Red-teaming is a common practice for mitigating unsafe behaviors in Large Language Models (LLMs), which involves thoroughly assessing LLMs to identify potential flaws and addressing them with responsible and accurate responses. While effective, manual red-teaming is costly, and existing automatic red-teaming typically discovers safety risks without addressing them. In this paper, we propose a Mult… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  23. arXiv:2310.09263  [pdf, other

    cs.CL cs.AI cs.DB

    Table-GPT: Table-tuned GPT for Diverse Table Tasks

    Authors: Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, Surajit Chaudhuri

    Abstract: Language models, such as GPT-3.5 and ChatGPT, demonstrate remarkable abilities to follow diverse human instructions and perform a wide range of tasks. However, when probing language models using a range of basic table-understanding tasks, we observe that today's language models are still sub-optimal in many table-related tasks, likely because they are pre-trained predominantly on \emph{one-dimensi… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  24. arXiv:2310.07602  [pdf, other

    cs.CV

    Dual Radar: A Multi-modal Dataset with Dual 4D Radar for Autonomous Driving

    Authors: Xinyu Zhang, Li Wang, Jian Chen, Cheng Fang, Lei Yang, Ziying Song, Guangqi Yang, Yichen Wang, Xiaofei Zhang, Jun Li, Zhiwei Li, Qingshan Yang, Zhenlin Zhang, Shuzhi Sam Ge

    Abstract: Radar has stronger adaptability in adverse scenarios for autonomous driving environmental perception compared to widely adopted cameras and LiDARs. Compared with commonly used 3D radars, the latest 4D radars have precise vertical resolution and higher point cloud density, making it a highly promising sensor for autonomous driving in complex environmental perception. However, due to the much higher… ▽ More

    Submitted 9 November, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  25. arXiv:2310.01801  [pdf, other

    cs.CL

    Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

    Authors: Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao

    Abstract: In this study, we introduce adaptive KV cache compression, a plug-and-play method that reduces the memory footprint of generative inference for Large Language Models (LLMs). Different from the conventional KV cache that retains key and value vectors for all context tokens, we conduct targeted profiling to discern the intrinsic structure of attention modules. Based on the recognized structure, we t… ▽ More

    Submitted 29 January, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  26. arXiv:2309.12706  [pdf, other

    cs.LG cs.AI cs.CV

    Multi-Label Noise Transition Matrix Estimation with Label Correlations: Theory and Algorithm

    Authors: Shikun Li, Xiaobo Xia, Hansong Zhang, Shiming Ge, Tongliang Liu

    Abstract: Noisy multi-label learning has garnered increasing attention due to the challenges posed by collecting large-scale accurate labels, making noisy labels a more practical alternative. Motivated by noisy multi-class learning, the introduction of transition matrices can help model multi-label noise and enable the development of statistically consistent algorithms for noisy multi-label learning. Howeve… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  27. arXiv:2309.04795  [pdf, other

    cs.CV

    Self-Supervised Transformer with Domain Adaptive Reconstruction for General Face Forgery Video Detection

    Authors: Daichi Zhang, Zihao Xiao, Jianmin Li, Shiming Ge

    Abstract: Face forgery videos have caused severe social public concern, and various detectors have been proposed recently. However, most of them are trained in a supervised manner with limited generalization when detecting videos from different forgery methods or real source videos. To tackle this issue, we explore to take full advantage of the difference between real and forgery videos by only exploring th… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  28. arXiv:2308.05005  [pdf, other

    eess.SP cs.CV

    Deep Learning Model Transfer in Forest Mapping using Multi-source Satellite SAR and Optical Images

    Authors: Shaojia Ge, Oleg Antropov, Tuomas Häme, Ronald E. McRoberts, Jukka Miettinen

    Abstract: Deep learning (DL) models are gaining popularity in forest variable prediction using Earth Observation images. However, in practical forest inventories, reference datasets are often represented by plot- or stand-level measurements, while high-quality representative wall-to-wall reference data for end-to-end training of DL models are rarely available. Transfer learning facilitates expansion of the… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  29. arXiv:2307.06354  [pdf, other

    quant-ph cs.ET

    Faster-than-Clifford Simulations of Entanglement Purification Circuits and Their Full-stack Optimization

    Authors: Vaishnavi L. Addala, Shu Ge, Stefan Krastanov

    Abstract: Quantum Entanglement is a fundamentally important resource in Quantum Information Science; however, generating it in practice is plagued by noise and decoherence, limiting its utility. Entanglement distillation and forward error correction are the tools we employ to combat this noise, but designing the best distillation and error correction circuits that function well, especially on today's imperf… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  30. arXiv:2306.05427  [pdf, other

    cs.CV

    Grounded Text-to-Image Synthesis with Attention Refocusing

    Authors: Quynh Phung, Songwei Ge, Jia-Bin Huang

    Abstract: Driven by the scalable diffusion models trained on large-scale datasets, text-to-image synthesis methods have shown compelling results. However, these models still fail to precisely follow the text prompt involving multiple objects, attributes, or spatial compositions. In this paper, we reveal the potential causes in the diffusion model's cross-attention and self-attention layers. We propose two n… ▽ More

    Submitted 1 December, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Project page: https://attention-refocusing.github.io/

  31. arXiv:2306.04932  [pdf, other

    cs.RO

    Jigsaw-based Benchmarking for Learning Robotic Manipulation

    Authors: Xiaobo Liu, Fang Wan, Sheng Ge, Haokun Wang, Haoran Sun, Chaoyang Song

    Abstract: Benchmarking provides experimental evidence of the scientific baseline to enhance the progression of fundamental research, which is also applicable to robotics. In this paper, we propose a method to benchmark metrics of robotic manipulation, which addresses the spatial-temporal reasoning skills for robot learning with the jigsaw game. In particular, our approach exploits a simple set of jigsaw pie… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 7 pages, 7 figures, accepted to 2023 IEEE International Conference on Advanced Robotics and Mechatronics (ICARM)

  32. arXiv:2306.03116  [pdf, other

    cs.HC cs.AI cs.LG

    Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds

    Authors: Shikun Li, Xiaobo Xia, Jiankang Deng, Shiming Ge, Tongliang Liu

    Abstract: Learning from crowds describes that the annotations of training data are obtained with crowd-sourcing services. Multiple annotators each complete their own small part of the annotations, where labeling mistakes that depend on annotators occur frequently. Modeling the label-noise generation process by the noise transition matrix is a power tool to tackle the label noise. In real-world crowd-sourcin… ▽ More

    Submitted 14 April, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted by IEEE TPAMI. 22 pages, 4 figures, and 8 tables

  33. arXiv:2306.02421  [pdf, other

    cs.DB cs.LG

    Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data Pipelines

    Authors: Dezhan Tu, Yeye He, Weiwei Cui, Song Ge, Haidong Zhang, Han Shi, Dongmei Zhang, Surajit Chaudhuri

    Abstract: Data pipelines are widely employed in modern enterprises to power a variety of Machine-Learning (ML) and Business-Intelligence (BI) applications. Crucially, these pipelines are \emph{recurring} (e.g., daily or hourly) in production settings to keep data updated so that ML models can be re-trained regularly, and BI dashboards refreshed frequently. However, data quality (DQ) issues can often creep i… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: full version of a paper accepted to KDD 2023

  34. arXiv:2305.10662  [pdf, other

    cs.CV cs.CR

    Learning Differentially Private Probabilistic Models for Privacy-Preserving Image Generation

    Authors: Bochao Liu, Shiming Ge, Pengju Wang, Liansheng Zhuang, Tongliang Liu

    Abstract: A number of deep models trained on high-quality and valuable images have been deployed in practical applications, which may pose a leakage risk of data privacy. Learning differentially private generative models can sidestep this challenge through indirect data access. However, such differentially private generative models learned by existing approaches can only generate images with a low-resolutio… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  35. arXiv:2305.10474  [pdf, other

    cs.CV cs.GR cs.LG

    Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

    Authors: Songwei Ge, Seungjun Nah, Guilin Liu, Tyler Poon, Andrew Tao, Bryan Catanzaro, David Jacobs, Jia-Bin Huang, Ming-Yu Liu, Yogesh Balaji

    Abstract: Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy. While off-the-shelf billion-scale datasets for image generation are available, collecting similar video data of the same scale is still challenging. Also, training a video diffusion model is co… ▽ More

    Submitted 25 March, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: ICCV 2023. Project webpage: https://research.nvidia.com/labs/dir/pyoco

  36. arXiv:2305.01034  [pdf, other

    cs.LG cs.AI stat.ML

    Model-agnostic Measure of Generalization Difficulty

    Authors: Akhilan Boopathy, Kevin Liu, Jaedong Hwang, Shu Ge, Asaad Mohammedsaleh, Ila Fiete

    Abstract: The measure of a machine learning algorithm is the difficulty of the tasks it can perform, and sufficiently difficult tasks are critical drivers of strong machine learning models. However, quantifying the generalization difficulty of machine learning benchmarks has remained challenging. We propose what is to our knowledge the first model-agnostic measure of the inherent generalization difficulty o… ▽ More

    Submitted 2 June, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Published at ICML 2023, 28 pages, 6 figures

  37. arXiv:2304.12528  [pdf, other

    cs.CR

    Model Conversion via Differentially Private Data-Free Distillation

    Authors: Bochao Liu, Pengju Wang, Shikun Li, Dan Zeng, Shiming Ge

    Abstract: While massive valuable deep models trained on large-scale data have been released to facilitate the artificial intelligence community, they may encounter attacks in deployment which leads to privacy leakage of training data. In this work, we propose a learning approach termed differentially private data-free distillation (DPDFD) for model conversion that can convert a pretrained model (teacher) in… ▽ More

    Submitted 3 August, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: Published at IJCAI 2023

  38. arXiv:2304.06720  [pdf, other

    cs.CV cs.GR cs.LG

    Expressive Text-to-Image Generation with Rich Text

    Authors: Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang

    Abstract: Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to wri… ▽ More

    Submitted 28 May, 2024; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Project webpage: https://rich-text-to-image.github.io/

  39. arXiv:2303.17210  [pdf, other

    cs.CR cs.NI eess.SY

    DecentRAN: Decentralized Radio Access Network for 5.5G and beyond

    Authors: Hao Xu, Xun Liu, Qinghai Zeng, Qiang Li, Shibin Ge, Guohua Zhou, Raymond Forbes

    Abstract: Radio Access Network faces challenges from privacy and flexible wide area and local area network access. RAN is limited from providing local service directly due to centralized design of cellular network and concerns of user privacy and data security. DecentRAN or Decentralized Radio Access Network offers an alternative perspective to cope with the emerging demands of 5G Non-public Network and the… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

  40. arXiv:2303.02456  [pdf, other

    cs.RO eess.SY

    Fixed-time Adaptive Neural Control for Physical Human-Robot Collaboration with Time-Varying Workspace Constraints

    Authors: Yuzhu Sun, Mien Van, Stephen McIlvanna, Nguyen Minh Nhat, Sean McLoone, Dariusz Ceglarek, Shuzhi Sam Ge

    Abstract: Physical human-robot collaboration (pHRC) requires both compliance and safety guarantees since robots coordinate with human actions in a shared workspace. This paper presents a novel fixed-time adaptive neural control methodology for handling time-varying workspace constraints that occur in physical human-robot collaboration while also guaranteeing compliance during intended force interactions. Th… ▽ More

    Submitted 26 April, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

  41. arXiv:2303.00396  [pdf, other

    cs.CV cs.AI cs.LG

    Controlling Class Layout for Deep Ordinal Classification via Constrained Proxies Learning

    Authors: Cong Wang, Zhiwei Jiang, Yafeng Yin, Zifeng Cheng, Shiping Ge, Qing Gu

    Abstract: For deep ordinal classification, learning a well-structured feature space specific to ordinal classification is helpful to properly capture the ordinal nature among classes. Intuitively, when Euclidean distance metric is used, an ideal ordinal layout in feature space would be that the sample clusters are arranged in class order along a straight line in space. However, enforcing samples to conform… ▽ More

    Submitted 26 August, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted by AAAI 2023

  42. arXiv:2302.08510  [pdf, other

    cs.CV

    Text-driven Visual Synthesis with Latent Diffusion Prior

    Authors: Ting-Hsuan Liao, Songwei Ge, Yiran Xu, Yao-Chih Lee, Badour AlBahar, Jia-Bin Huang

    Abstract: There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. Existing methods that utilize such priors fail to use… ▽ More

    Submitted 3 April, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Project website: https://latent-diffusion-prior.github.io/

  43. arXiv:2302.03754  [pdf, other

    cs.CL

    Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories

    Authors: Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han, Paul Bennett

    Abstract: In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the option to "plug in" new memory at inference time. We develop a joint learning mechanism that trains the augmentation component with latent labels derived from t… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  44. arXiv:2212.14747  [pdf, other

    eess.IV cs.CV

    VertMatch: A Semi-supervised Framework for Vertebral Structure Detection in 3D Ultrasound Volume

    Authors: Hongye Zeng, kang Zhou, Songhan Ge, Yuchong Gao, Jianhao Zhao, Shenghua Gao, Rui Zheng

    Abstract: Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

    Comments: 15 pages, 8 figures

  45. arXiv:2212.00653  [pdf, other

    cs.CV cs.LG

    Hyperbolic Contrastive Learning for Visual Representations beyond Objects

    Authors: Songwei Ge, Shlok Mishra, Simon Kornblith, Chun-Liang Li, David Jacobs

    Abstract: Although self-/un-supervised methods have led to rapid progress in visual representation learning, these methods generally treat objects and scenes using the same lens. In this paper, we focus on learning representations for objects and scenes that preserve the structure among them. Motivated by the observation that visually similar objects are close in the representation space, we argue that th… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  46. A Novel Semisupervised Contrastive Regression Framework for Forest Inventory Mapping with Multisensor Satellite Data

    Authors: Shaojia Ge, Hong Gu, Weimin Su, Anne Lönnqvist, Oleg Antropov

    Abstract: Accurate mapping of forests is critical for forest management and carbon stocks monitoring. Deep learning is becoming more popular in Earth Observation (EO), however, the availability of reference data limits its potential in wide-area forest mapping. To overcome those limitations, here we introduce contrastive regression into EO based forest mapping and develop a novel semisupervised regression f… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

  47. arXiv:2211.13229  [pdf, other

    eess.IV cs.CL cs.CV cs.LG

    DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis

    Authors: Xian Wu, Shuxin Yang, Zhaopeng Qiu, Shen Ge, Yangtian Yan, Xingwang Wu, Yefeng Zheng, S. Kevin Zhou, Li Xiao

    Abstract: Fast screening and diagnosis are critical in COVID-19 patient treatment. In addition to the gold standard RT-PCR, radiological imaging like X-ray and CT also works as an important means in patient screening and follow-up. However, due to the excessive number of patients, writing reports becomes a heavy burden for radiologists. To reduce the workload of radiologists, we propose DeltaNet to generate… ▽ More

    Submitted 12 November, 2022; originally announced November 2022.

  48. arXiv:2211.12148  [pdf, other

    cs.CV cs.LG

    Aligning Source Visual and Target Language Domains for Unpaired Video Captioning

    Authors: Fenglin Liu, Xian Wu, Chenyu You, Shen Ge, Yuexian Zou, Xu Sun

    Abstract: Training supervised video captioning model requires coupled video-caption pairs. However, for many targeted languages, sufficient paired data are not available. To this end, we introduce the unpaired video captioning task aiming to train models without coupled video-caption pairs in target language. To solve the task, a natural choice is to employ a two-step pipeline system: first utilizing video-… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: Published at IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  49. arXiv:2211.11427  [pdf, other

    cs.CV

    Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

    Authors: Peng Jin, Jinfa Huang, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen

    Abstract: Most video-and-language representation learning approaches employ contrastive learning, e.g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs. However, such learned shared latent spaces are not often optimal, and the modality gap between visual and textual representation can not be fully eliminated. In this paper, w… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted to NeurIPS 2022

  50. arXiv:2210.16431  [pdf, other

    cs.CV cs.CL

    DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

    Authors: Fenglin Liu, Xian Wu, Shen Ge, Xuancheng Ren, Wei Fan, Xu Sun, Yuexian Zou

    Abstract: Vision-and-language (V-L) tasks require the system to understand both vision content and natural language, thus learning fine-grained joint representations of vision and language (a.k.a. V-L representations) is of paramount importance. Recently, various pre-trained V-L models are proposed to learn V-L representations and achieve improved results in many tasks. However, the mainstream models proces… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Published in ACM TKDD2022 (ACM Transactions on Knowledge Discovery from Data)