Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 129 results for author: Feng, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.19421  [pdf, other

    cs.LG

    Improved physics-informed neural network in mitigating gradient related failures

    Authors: Pancheng Niu, Yongming Chen, Jun Guo, Yuqian Zhou, Minfu Feng, Yanchao Shi

    Abstract: Physics-informed neural networks (PINNs) integrate fundamental physical principles with advanced data-driven techniques, driving significant advancements in scientific computing. However, PINNs face persistent challenges with stiffness in gradient flow, which limits their predictive capabilities. This paper presents an improved PINN (I-PINN) to mitigate gradient-related failures. The core of I-PIN… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Elsevier-LaTeX v1.2, 26 pages with 12 figures

    MSC Class: 35Q68; 35Q90 ACM Class: G.4

  2. arXiv:2407.12538  [pdf, other

    eess.IV cs.CV

    High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

    Authors: Juan Song, Jiaxiang He, Mingtao Feng, Keyan Wang, Yunsong Li, Ajmal Mian

    Abstract: Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in image compression applications. To address this issue, we propose an efficient Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compressio… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2407.08299  [pdf, other

    cs.SI eess.SY

    Evolving Network Modeling Driven by the Degree Increase and Decrease Mechanism

    Authors: Yuhan Li, Minyu Feng, Jürgen Kurths

    Abstract: Ever since the Barabási-Albert (BA) scale-free network has been proposed, network modeling has been studied intensively in light of the network growth and the preferential attachment (PA). However, numerous real systems are featured with a dynamic evolution including network reduction in addition to network growth. In this paper, we propose a novel mechanism for evolving networks from the perspect… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  4. arXiv:2407.05376  [pdf, other

    cs.RO

    Rethinking Closed-loop Planning Framework for Imitation-based Model Integrating Prediction and Planning

    Authors: Jiayu Guo, Mingyue Feng, Pengfei Zhu, Chengjun Li, Jian Pu

    Abstract: In recent years, the integration of prediction and planning through neural networks has received substantial attention. Despite extensive studies on it, there is a noticeable gap in understanding the operation of such models within a closed-loop planning setting. To bridge this gap, we propose a novel closed-loop planning framework compatible with neural networks engaged in joint prediction and pl… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 7 pages,5 figures

  5. arXiv:2407.04996  [pdf, other

    cs.LG cs.CV

    The Solution for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition

    Authors: Sishun Pan, Xixian Wu, Tingmin Li, Longfei Huang, Mingxu Feng, Zhonghua Wan, Yang Yang

    Abstract: This paper presents a data-free, parameter-isolation-based continual learning algorithm we developed for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition. The method learns an independent parameter subspace for each task within the network's convolutional and linear layers and freezes the batch normalization layers after the first task. S… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  6. arXiv:2407.03205  [pdf, other

    cs.CV

    Category-Aware Dynamic Label Assignment with High-Quality Oriented Proposal

    Authors: Mingkui Feng, Hancheng Yu, Xiaoyu Dang, Ming Zhou

    Abstract: Objects in aerial images are typically embedded in complex backgrounds and exhibit arbitrary orientations. When employing oriented bounding boxes (OBB) to represent arbitrary oriented objects, the periodicity of angles could lead to discontinuities in label regression values at the boundaries, inducing abrupt fluctuations in the loss function. To address this problem, an OBB representation based o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  7. arXiv:2407.03159  [pdf, other

    cs.SI eess.SY physics.soc-ph

    Protection Degree and Migration in the Stochastic SIRS Model: A Queueing System Perspective

    Authors: Yuhan Li, Ziyan Zeng, Minyu Feng, Jürgen Kurths

    Abstract: With the prevalence of COVID-19, the modeling of epidemic propagation and its analyses have played a significant role in controlling epidemics. However, individual behaviors, in particular the self-protection and migration, which have a strong influence on epidemic propagation, were always neglected in previous studies. In this paper, we mainly propose two models from the individual and population… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  8. arXiv:2406.13487  [pdf, other

    cs.LG

    An evidential time-to-event prediction model based on Gaussian random fuzzy numbers

    Authors: Ling Huang, Yucheng Xing, Thierry Denoeux, Mengling Feng

    Abstract: We introduce an evidential model for time-to-event prediction with censored data. In this model, uncertainty on event time is quantified by Gaussian random fuzzy numbers, a newly introduced family of random fuzzy subsets of the real line with associated belief functions, generalizing both Gaussian random variables and Gaussian possibility distributions. Our approach makes minimal assumptions about… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Journal ref: BELIEF2024

  9. arXiv:2406.12663  [pdf, other

    cs.CV cs.AI

    Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?

    Authors: Mingqian Feng, Yunlong Tang, Zeliang Zhang, Chenliang Xu

    Abstract: Large Vision-Language Models (LVLMs) excel in integrating visual and linguistic contexts to produce detailed content, facilitating applications such as image captioning. However, using LVLMs to generate descriptions often faces the challenge of object hallucination (OH), where the output text misrepresents actual objects in the input image. While previous studies attribute the occurrence of OH to… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  10. arXiv:2406.07741  [pdf, other

    cs.CV

    Back to the Color: Learning Depth to Specific Color Transformation for Unsupervised Depth Estimation

    Authors: Yufan Zhu, Chongzhi Ran, Mingtao Feng, Fangfang Wu, Le Dong, Weisheng Dong, Antonio M. López, Guangming Shi

    Abstract: Virtual engines can generate dense depth maps for various synthetic scenes, making them invaluable for training depth estimation models. However, discrepancies between synthetic and real-world colors pose significant challenges for depth estimation in real-world scenes, especially in complex and uncertain environments encountered in unsupervised monocular depth estimation tasks. To address this is… ▽ More

    Submitted 26 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  11. arXiv:2406.03763  [pdf, other

    cs.SI physics.soc-ph

    The impact of nodes of information dissemination on epidemic spreading in dynamic multiplex networks

    Authors: Minyu Feng, Xiangxi Li, Yuhan Li, Qin Li

    Abstract: Epidemic spreading processes on dynamic multiplex networks provide a more accurate description of natural spreading processes than those on single layered networks. To describe the influence of different individuals in the awareness layer on epidemic spreading, we propose a two-layer network-based epidemic spreading model, including some individuals who neglect the epidemic, and we explore how ind… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 11 pages, 10 figures

  12. A memory-based spatial evolutionary game with the dynamic interaction between learners and profiteers

    Authors: Bin Pi, Minyu Feng, Liang-Jian Deng

    Abstract: Spatial evolutionary games provide a valuable framework for elucidating the emergence and maintenance of cooperative behavior. However, most previous studies assume that individuals are profiteers and neglect to consider the effects of memory. To bridge this gap, in this paper, we propose a memory-based spatial evolutionary game with dynamic interaction between learners and profiteers. Specificall… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 11 pages, 9 figures

  13. Information Dynamics in Evolving Networks Based on the Birth-Death Process: Random Drift and Natural Selection Perspective

    Authors: Minyu Feng, Ziyan Zeng, Qin Li, Matjaž Perc, Jürgen Kurths

    Abstract: Dynamic processes in complex networks are crucial for better understanding collective behavior in human societies, biological systems, and the internet. In this paper, we first focus on the continuous Markov-based modeling of evolving networks with the birth-death of individuals. A new individual arrives at the group by the Poisson process, while new links are established in the network through ei… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 14 pages, 9 figures

  14. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  15. arXiv:2405.16393  [pdf, other

    cs.CV cs.AI

    Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation

    Authors: Jinlin Liu, Kai Yu, Mengyang Feng, Xiefan Guo, Miaomiao Cui

    Abstract: Recent advancements in human video synthesis have enabled the generation of high-quality videos through the application of stable diffusion models. However, existing methods predominantly concentrate on animating solely the human element (the foreground) guided by pose information, while leaving the background entirely static. Contrary to this, in authentic, high-quality videos, backgrounds often… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  16. arXiv:2405.12357  [pdf

    eess.IV cs.CV

    Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI

    Authors: Di Xu, Xin Miao, Hengjie Liu, Jessica E. Scholey, Wensha Yang, Mary Feng, Michael Ohliger, Hui Lin, Yi Lao, Yang Yang, Ke Sheng

    Abstract: Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI rec… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  17. arXiv:2403.18878  [pdf, other

    cs.CV cs.LG eess.IV

    AIC-UNet: Anatomy-informed Cascaded UNet for Robust Multi-Organ Segmentation

    Authors: Young Seok Jeon, Hongfei Yang, Huazhu Fu, Mengling Feng

    Abstract: Imposing key anatomical features, such as the number of organs, their shapes, sizes, and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening effective receptive fields (ERF) size with resource- and data-intensive modules such as self-attention or introducing organ-specific topology regularizers,… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  18. arXiv:2403.15603   

    cs.CV cs.AI

    Forward Learning for Gradient-based Black-box Saliency Map Generation

    Authors: Zeliang Zhang, Mingqian Feng, Jinyang Jiang, Rongyi Zhu, Yijie Peng, Chenliang Xu

    Abstract: Gradient-based saliency maps are widely used to explain deep neural network decisions. However, as models become deeper and more black-box, such as in closed-source APIs like ChatGPT, computing gradients become challenging, hindering conventional explanation methods. In this work, we introduce a novel unified framework for estimating gradients in black-box settings and generating saliency maps to… ▽ More

    Submitted 2 July, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: The evaluation is based on small datasets and limited models, of which bias leads to misleading conclusions

  19. arXiv:2403.14133  [pdf, other

    cs.CV

    3D Object Detection from Point Cloud via Voting Step Diffusion

    Authors: Haoran Hou, Mingtao Feng, Zijie Wu, Weisheng Dong, Qing Zhu, Yaonan Wang, Ajmal Mian

    Abstract: 3D object detection is a fundamental task in scene understanding. Numerous research efforts have been dedicated to better incorporate Hough voting into the 3D object detection pipeline. However, due to the noisy, cluttered, and partial nature of real 3D scans, existing voting-based methods often receive votes from the partial surfaces of individual objects together with severe noises, leading to s… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  20. arXiv:2403.14121  [pdf, other

    cs.CV

    External Knowledge Enhanced 3D Scene Generation from Sketch

    Authors: Zijie Wu, Mingtao Feng, Yaonan Wang, He Xie, Weisheng Dong, Bo Miao, Ajmal Mian

    Abstract: Generating realistic 3D scenes is challenging due to the complexity of room layouts and object geometries.We propose a sketch based knowledge enhanced diffusion architecture (SEK) for generating customized, diverse, and plausible 3D scenes. SEK conditions the denoising process with a hand-drawn sketch of the target scene and cues from an object relationship knowledge base. We first construct an ex… ▽ More

    Submitted 10 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by ECCV2024

  21. arXiv:2403.13238  [pdf, other

    cs.CV

    Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation

    Authors: Qitong Yang, Mingtao Feng, Zijie Wu, Shijie Sun, Weisheng Dong, Yaonan Wang, Ajmal Mian

    Abstract: Directly learning to model 4D content, including shape, color and motion, is challenging. Existing methods depend on skeleton-based motion control and offer limited continuity in detail. To address this, we propose a novel framework that generates coherent 4D sequences with animation of 3D shapes under given conditions with dynamic evolution of shape and color over time through integrative latent… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  22. arXiv:2403.12777  [pdf, other

    cs.CV cs.AI

    Discover and Mitigate Multiple Biased Subgroups in Image Classifiers

    Authors: Zeliang Zhang, Mingqian Feng, Zhiheng Li, Chenliang Xu

    Abstract: Machine learning models can perform well on in-distribution data but often fail on biased subgroups that are underrepresented in the training data, hindering the robustness of models for reliable applications. Such subgroups are typically unknown due to the absence of subgroup labels. Discovering biased subgroups is the key to understanding models' failure modes and further improving models' robus… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: CVPR 2024. Code is available at https://github.com/ZhangAIPI/DIM

  23. arXiv:2403.12728  [pdf, other

    cs.CV

    Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation

    Authors: Jingtao Sun, Yaonan Wang, Mingtao Feng, Chao Ding, Mike Zheng Shou, Ajmal Saeed Mian

    Abstract: Fully-supervised category-level pose estimation aims to determine the 6-DoF poses of unseen instances from known categories, requiring expensive mannual labeling costs. Recently, various self-supervised category-level pose estimation methods have been proposed to reduce the requirement of the annotated datasets. However, most methods rely on synthetic data or 3D CAD model for self-supervised train… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  24. arXiv:2402.13011  [pdf, other

    physics.soc-ph cond-mat.stat-mech cs.GT nlin.AO

    An evolutionary game with reputation-based imitation-mutation dynamics

    Authors: Kehuan Feng, Songlin Han, Minyu Feng, Attila Szolnoki

    Abstract: Reputation plays a crucial role in social interactions by affecting the fitness of individuals during an evolutionary process. Previous works have extensively studied the result of imitation dynamics without focusing on potential irrational choices in strategy updates. We now fill this gap and explore the consequence of such kind of randomness, or one may interpret it as an autonomous thinking. In… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 13 pages, 8 figures, to be published in Applied Mathematics and Computation

    Journal ref: Appl. Math. Comput. 472 (2024) 128618

  25. arXiv:2402.11816  [pdf, other

    cs.CV cs.LG

    Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

    Authors: Jihai Zhang, Xiang Lan, Xiaoye Qu, Yu Cheng, Mengling Feng, Bryan Hooi

    Abstract: Self-Supervised Contrastive Learning has proven effective in deriving high-quality representations from unlabeled data. However, a major challenge that hinders both unimodal and multimodal contrastive learning is feature suppression, a phenomenon where the trained model captures only a limited portion of the information from the input data while overlooking other potentially valuable content. This… ▽ More

    Submitted 15 July, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: ECCV 2024 Camera-Ready

  26. arXiv:2402.09804  [pdf, other

    physics.soc-ph cond-mat.stat-mech cs.GT nlin.PS

    Coevolution of relationship and interaction in cooperative dynamical multiplex networks

    Authors: Xiaojin Xiong, Ziyan Zeng, Minyu Feng, Attila Szolnoki

    Abstract: While actors in a population can interact with anyone else freely, social relations significantly influence our inclination towards particular individuals. The consequence of such interactions, however, may also form the intensity of our relations established earlier. These dynamical processes are captured via a coevolutionary model staged in multiplex networks with two distinct layers. In a so-ca… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 11 two-column pages, 6 figures, to be published in Chaos

    Journal ref: Chaos 34(2) (2024) 023118

  27. arXiv:2401.02046  [pdf, other

    eess.AS cs.SD

    CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition

    Authors: Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin

    Abstract: Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skipping method th… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: accepted by ASRU 2023

  28. arXiv:2312.17432  [pdf, other

    cs.CV cs.CL

    Video Understanding with Large Language Models: A Survey

    Authors: Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Pinxin Liu, Mingqian Feng, Feng Zheng, Jianguo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

    Abstract: With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly. Given the remarkable capabilities of large language models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of recent advancements in video understanding that harness the power of LLMs (Vid-LL… ▽ More

    Submitted 24 July, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  29. arXiv:2312.16995  [pdf, other

    cs.CV

    FlowDA: Unsupervised Domain Adaptive Framework for Optical Flow Estimation

    Authors: Miaojie Feng, Longliang Liu, Hao Jia, Gangwei Xu, Xin Yang

    Abstract: Collecting real-world optical flow datasets is a formidable challenge due to the high cost of labeling. A shortage of datasets significantly constrains the real-world performance of optical flow models. Building virtual datasets that resemble real scenarios offers a potential solution for performance enhancement, yet a domain gap separates virtual and real datasets. This paper introduces FlowDA, a… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 11 pages, 5 figures

  30. arXiv:2312.16837  [pdf, other

    cs.CV

    DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors

    Authors: Biwen Lei, Kai Yu, Mengyang Feng, Miaomiao Cui, Xuansong Xie

    Abstract: Text-guided domain adaptation and generation of 3D-aware portraits find many applications in various fields. However, due to the lack of training data and the challenges in handling the high variety of geometry and appearance, the existing methods for these tasks suffer from issues like inflexibility, instability, and low fidelity. In this paper, we propose a novel framework DiffusionGAN3D, which… ▽ More

    Submitted 12 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR2024

  31. arXiv:2312.16002  [pdf, other

    eess.AS cs.AI

    The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

    Authors: Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

    Abstract: This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition. Our submitted systems for ICMC-ASR Challenge include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best system achieves… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Technical Report. 2 pages. For ICMC-ASR-2023 Challenge

  32. arXiv:2312.05107  [pdf, other

    cs.CV

    DreaMoving: A Human Video Generation Framework based on Diffusion Models

    Authors: Mengyang Feng, Jinlin Liu, Kai Yu, Yuan Yao, Zheng Hui, Xiefan Guo, Xianhui Lin, Haolan Xue, Chen Shi, Xiaowen Li, Aojie Li, Xiaoyang Kang, Biwen Lei, Miaomiao Cui, Peiran Ren, Xuansong Xie

    Abstract: In this paper, we present DreaMoving, a diffusion-based controllable video generation framework to produce high-quality customized human videos. Specifically, given target identity and posture sequences, DreaMoving can generate a video of the target identity moving or dancing anywhere driven by the posture sequences. To this end, we propose a Video ControlNet for motion-controlling and a Content G… ▽ More

    Submitted 11 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: 5 pages, 5 figures, Tech. Report

  33. arXiv:2312.03790  [pdf, other

    cs.CV

    Memory-Efficient Optical Flow via Radius-Distribution Orthogonal Cost Volume

    Authors: Gangwei Xu, Shujun Chen, Hao Jia, Miaojie Feng, Xin Yang

    Abstract: The full 4D cost volume in Recurrent All-Pairs Field Transforms (RAFT) or global matching by Transformer achieves impressive performance for optical flow estimation. However, their memory consumption increases quadratically with input resolution, rendering them impractical for high-resolution images. In this paper, we present MeFlow, a novel memory-efficient method for high-resolution optical flow… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 10 pages, 9 figures

  34. arXiv:2311.13617  [pdf, other

    cs.CV

    Boosting3D: High-Fidelity Image-to-3D by Boosting 2D Diffusion Prior to 3D Prior with Progressive Learning

    Authors: Kai Yu, Jinlin Liu, Mengyang Feng, Miaomiao Cui, Xuansong Xie

    Abstract: We present Boosting3D, a multi-stage single image-to-3D generation method that can robustly generate reasonable 3D objects in different data domains. The point of this work is to solve the view consistency problem in single image-guided 3D generation by modeling a reasonable geometric structure. For this purpose, we propose to utilize better 3D prior to training the NeRF. More specifically, we tra… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 8 pages, 7 figures, 1 table

  35. arXiv:2311.13141  [pdf, other

    cs.CV

    Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models

    Authors: Mengyang Feng, Jinlin Liu, Miaomiao Cui, Xuansong Xie

    Abstract: This is a technical report on the 360-degree panoramic image generation task based on diffusion models. Unlike ordinary 2D images, 360-degree panoramic images capture the entire $360^\circ\times 180^\circ$ field of view. So the rightmost and the leftmost sides of the 360 panoramic image should be continued, which is the main challenge in this field. However, the current diffusion pipeline is not a… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 2 pages, 8 figures, Tech. Report

  36. arXiv:2311.02340  [pdf, other

    cs.CV

    MC-Stereo: Multi-peak Lookup and Cascade Search Range for Stereo Matching

    Authors: Miaojie Feng, Junda Cheng, Hao Jia, Longliang Liu, Gangwei Xu, Qingyong Hu, Xin Yang

    Abstract: Stereo matching is a fundamental task in scene comprehension. In recent years, the method based on iterative optimization has shown promise in stereo matching. However, the current iteration framework employs a single-peak lookup, which struggles to handle the multi-peak problem effectively. Additionally, the fixed search range used during the iteration process limits the final convergence effects… ▽ More

    Submitted 27 January, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: Accepted to 3DV 2024

  37. arXiv:2310.06873  [pdf, other

    eess.IV cs.CV

    A review of uncertainty quantification in medical image analysis: probabilistic and non-probabilistic methods

    Authors: Ling Huang, Su Ruan, Yucheng Xing, Mengling Feng

    Abstract: The comprehensive integration of machine learning healthcare models within clinical practice remains suboptimal, notwithstanding the proliferation of high-performing solutions reported in the literature. A predominant factor hindering widespread adoption pertains to an insufficiency of evidence affirming the reliability of the aforementioned models. Recently, uncertainty quantification methods hav… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.03736 by other authors

  38. arXiv:2310.05694  [pdf, other

    cs.CL

    A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics

    Authors: Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, Erik Cambria

    Abstract: The utilization of large language models (LLMs) in the Healthcare domain has generated both excitement and concern due to their ability to effectively respond to freetext queries with certain professional knowledge. This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process, with the aim of providing an overview of the development… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  39. arXiv:2309.13404  [pdf, other

    eess.IV cs.CV

    Weakly Supervised YOLO Network for Surgical Instrument Localization in Endoscopic Videos

    Authors: Rongfeng Wei, Jinlin Wu, Xuexue Bai, Ming Feng, Zhen Lei, Hongbin Liu, Zhen Chen

    Abstract: In minimally invasive surgery, surgical instrument localization is a crucial task for endoscopic videos, which enables various applications for improving surgical outcomes. However, annotating the instrument localization in endoscopic videos is tedious and labor-intensive. In contrast, obtaining the category information is easy and efficient in real-world applications. To fully utilize the categor… ▽ More

    Submitted 20 June, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted by ICRA 2024 Workshop on C4 Surgical Robotic Systems in the Embodied AI Era; Surgical Tool Localization in Endoscopic Videos Challenge of MICCAI2023

  40. arXiv:2308.10401  [pdf, other

    cs.RO eess.SY

    Model-Free Large-Scale Cloth Spreading With Mobile Manipulation: Initial Feasibility Study

    Authors: Xiangyu Chu+, Shengzhi Wang+, Minjian Feng, Jiaxi Zheng, Yuxuan Zhao, Jing Huang, K. W. Samuel Au

    Abstract: Cloth manipulation is common in domestic and service tasks, and most studies use fixed-base manipulators to manipulate objects whose sizes are relatively small with respect to the manipulators' workspace, such as towels, shirts, and rags. In contrast, manipulation of large-scale cloth, such as bed making and tablecloth spreading, poses additional challenges of reachability and manipulation control… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 6 pages, 6 figures, submit to CASE2023

    Journal ref: 2023 IEEE International Conference on Automation Science and Engineering (CASE)

  41. arXiv:2308.02874  [pdf, other

    cs.CV cs.MM

    Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation

    Authors: Zijie Wu, Yaonan Wang, Mingtao Feng, He Xie, Ajmal Mian

    Abstract: Diffusion probabilistic models have achieved remarkable success in text guided image generation. However, generating 3D shapes is still challenging due to the lack of sufficient data containing 3D models along with their descriptions. Moreover, text based descriptions of 3D shapes are inherently ambiguous and lack details. In this paper, we propose a sketch and text guided probabilistic diffusion… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  42. arXiv:2307.13566  [pdf, other

    cs.HC cs.AI

    The Impact of Imperfect XAI on Human-AI Decision-Making

    Authors: Katelyn Morrison, Philipp Spitzer, Violet Turri, Michelle Feng, Niklas Kühl, Adam Perer

    Abstract: Explainability techniques are rapidly being developed to improve human-AI decision-making across various cooperative work settings. Consequently, previous research has evaluated how decision-makers collaborate with imperfect AI by investigating appropriate reliance and task performance with the aim of designing more human-centered computer-supported collaborative tools. Several human-centered expl… ▽ More

    Submitted 8 May, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: Accepted to ACM CSCW 2024. 27 pages, 9 figures, 1 table, additional figures/table in the appendix

  43. arXiv:2306.12079  [pdf, other

    cs.LG cs.DC

    FLGo: A Fully Customizable Federated Learning Platform

    Authors: Zheng Wang, Xiaoliang Fan, Zhaopeng Peng, Xueheng Li, Ziqi Yang, Mingkuan Feng, Zhicheng Yang, Xiao Liu, Cheng Wang

    Abstract: Federated learning (FL) has found numerous applications in healthcare, finance, and IoT scenarios. Many existing FL frameworks offer a range of benchmarks to evaluate the performance of FL under realistic conditions. However, the process of customizing simulations to accommodate application-specific settings, data heterogeneity, and system heterogeneity typically remains unnecessarily complicated.… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  44. arXiv:2306.02006  [pdf, other

    cs.LG

    MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning

    Authors: Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li

    Abstract: Recent approaches have utilized self-supervised auxiliary tasks as representation learning to improve the performance and sample efficiency of vision-based reinforcement learning algorithms in single-agent settings. However, in multi-agent reinforcement learning (MARL), these techniques face challenges because each agent only receives partial observation from an environment influenced by others, r… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

  45. Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

    Authors: Ziwei He, Meng Yang, Minwei Feng, Jingcheng Yin, Xinbing Wang, Jingwen Leng, Zhouhan Lin

    Abstract: The transformer model is known to be computationally demanding, and prohibitively costly for long sequences, as the self-attention module uses a quadratic time and space complexity with respect to sequence length. Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to i… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Journal ref: Findings of the Association for Computational Linguistics: ACL 2023

  46. arXiv:2304.09395  [pdf, other

    cs.AI

    H-TSP: Hierarchically Solving the Large-Scale Travelling Salesman Problem

    Authors: Xuanhao Pan, Yan Jin, Yuandong Ding, Mingxiao Feng, Li Zhao, Lei Song, Jiang Bian

    Abstract: We propose an end-to-end learning framework based on hierarchical reinforcement learning, called H-TSP, for addressing the large-scale Travelling Salesman Problem (TSP). The proposed H-TSP constructs a solution of a TSP instance starting from the scratch relying on two components: the upper-level policy chooses a small subset of nodes (up to 200 in our experiment) from all nodes that are to be tra… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted by AAAI 2023, February 2023

  47. arXiv:2304.07898  [pdf, ps, other

    cs.LG cs.AI

    Time-series Anomaly Detection via Contextual Discriminative Contrastive Learning

    Authors: Katrina Chen, Mingbin Feng, Tony S. Wirjanto

    Abstract: Detecting anomalies in temporal data is challenging due to anomalies being dependent on temporal dynamics. One-class classification methods are commonly used for anomaly detection tasks, but they have limitations when applied to temporal data. In particular, mapping all normal instances into a single hypersphere to capture their global characteristics can lead to poor performance in detecting cont… ▽ More

    Submitted 16 April, 2023; originally announced April 2023.

  48. arXiv:2303.17408  [pdf, other

    cs.CL

    P-Transformer: A Prompt-based Multimodal Transformer Architecture For Medical Tabular Data

    Authors: Yucheng Ruan, Xiang Lan, Daniel J. Tan, Hairil Rizal Abdullah, Mengling Feng

    Abstract: Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remained for existing work to be effectively adapted into medical domain, such as under-utilization… ▽ More

    Submitted 9 January, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

  49. arXiv:2303.04473  [pdf, other

    cs.CV

    DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds

    Authors: Yong He, Hongshan Yu, Zhengeng Yang, Wei Sun, Mingtao Feng, Ajmal Mian

    Abstract: Local features and contextual dependencies are crucial for 3D point cloud analysis. Many works have been devoted to designing better local convolutional kernels that exploit the contextual dependencies. However, current point convolutions lack robustness to varying point cloud density. Moreover, contextual modeling is dominated by non-local or self-attention models which are computationally expens… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: 9

  50. arXiv:2302.14434  [pdf, other

    cs.CV

    A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images

    Authors: Biwen Lei, Jianqiang Ren, Mengyang Feng, Miaomiao Cui, Xuansong Xie

    Abstract: Limited by the nature of the low-dimensional representational capacity of 3DMM, most of the 3DMM-based face reconstruction (FR) methods fail to recover high-frequency facial details, such as wrinkles, dimples, etc. Some attempt to solve the problem by introducing detail maps or non-linear operations, however, the results are still not vivid. To this end, we in this paper present a novel hierarchic… ▽ More

    Submitted 28 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: Accepted by CVPR2023