Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–26 of 26 results for author: Zhan, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12009  [pdf, other

    cs.CV

    CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion

    Authors: Yunlong Tang, Gen Zhan, Li Yang, Yiting Liao, Chenliang Xu

    Abstract: Video saliency prediction aims to identify the regions in a video that attract human attention and gaze, driven by bottom-up features from the video and top-down processes like memory and cognition. Among these top-down influences, language plays a crucial role in guiding attention by shaping how visual information is interpreted. Existing methods primarily focus on modeling perceptual information… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2407.15083  [pdf, other

    cs.LG

    Rocket Landing Control with Random Annealing Jump Start Reinforcement Learning

    Authors: Yuxuan Jiang, Yujie Yang, Zhiqian Lan, Guojian Zhan, Shengbo Eben Li, Qi Sun, Jian Ma, Tianwen Yu, Changwu Zhang

    Abstract: Rocket recycling is a crucial pursuit in aerospace technology, aimed at reducing costs and environmental impact in space exploration. The primary focus centers on rocket landing control, involving the guidance of a nonlinear underactuated rocket with limited fuel in real-time. This challenging task prompts the application of reinforcement learning (RL), yet goal-oriented nature of the problem pose… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: IROS 2024 Oral

  3. arXiv:2407.06168  [pdf, other

    cs.RO cs.CV

    TARGO: Benchmarking Target-driven Object Grasping under Occlusions

    Authors: Yan Xia, Ran Ding, Ziyuan Qin, Guanqi Zhan, Kaichen Zhou, Long Yang, Hao Dong, Daniel Cremers

    Abstract: Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic grasping. However, previous grasping models face challenges in cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make the following contr… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 19 pages, 17 figures

  4. arXiv:2406.04882  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment

    Authors: Yuxing Long, Wenzhe Cai, Hongcheng Wang, Guanqi Zhan, Hao Dong

    Abstract: Enabling robots to navigate following diverse language instructions in unexplored environments is an attractive goal for human-robot interaction. However, this goal is challenging because different navigation tasks require different strategies. The scarcity of instruction navigation data hinders training an instruction navigation model with varied strategies. Therefore, previous methods are all co… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Submitted to CoRL 2024

  5. arXiv:2406.02283  [pdf, other

    cs.RO

    Broadcasting Support Relations Recursively from Local Dynamics for Object Retrieval in Clutters

    Authors: Yitong Li, Ruihai Wu, Haoran Lu, Chuanruo Ning, Yan Shen, Guanqi Zhan, Hao Dong

    Abstract: In our daily life, cluttered objects are everywhere, from scattered stationery and books cluttering the table to bowls and plates filling the kitchen sink. Retrieving a target object from clutters is an essential while challenging skill for robots, for the difficulty of safely manipulating an object without disturbing others, which requires the robot to plan a manipulation sequence and first move… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: RSS 2024

  6. arXiv:2404.03451  [pdf, other

    cs.CV

    How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks

    Authors: Dongang Wang, Peilin Liu, Hengrui Wang, Heidi Beadnall, Kain Kyle, Linda Ly, Mariano Cabezas, Geng Zhan, Ryan Sullivan, Weidong Cai, Wanli Ouyang, Fernando Calamante, Michael Barnett, Chenyu Wang

    Abstract: Training deep neural networks reliably requires access to large-scale datasets. However, obtaining such datasets can be challenging, especially in the context of neuroimaging analysis tasks, where the cost associated with image acquisition and annotation can be prohibitive. To mitigate both the time and financial costs associated with model development, a clear understanding of the amount of data… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  7. arXiv:2403.01768  [pdf, other

    eess.SY cs.AI

    Canonical Form of Datatic Description in Control Systems

    Authors: Guojian Zhan, Ziang Zheng, Shengbo Eben Li

    Abstract: The design of feedback controllers is undergoing a paradigm shift from modelic (i.e., model-driven) control to datatic (i.e., data-driven) control. Canonical form of state space model is an important concept in modelic control systems, exemplified by Jordan form, controllable form and observable form, whose purpose is to facilitate system analysis and controller synthesis. In the realm of datatic… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  8. arXiv:2312.17247  [pdf, other

    cs.CV

    Amodal Ground Truth and Completion in the Wild

    Authors: Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

    Abstract: This paper studies amodal image segmentation: predicting entire object segmentation masks including both visible and invisible (occluded) parts. In previous work, the amodal segmentation ground truth on real images is usually predicted by manual annotaton and thus is subjective. In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for part… ▽ More

    Submitted 29 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  9. arXiv:2310.07211  [pdf, other

    cs.LG

    Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration

    Authors: Zeyang Li, Chuxiong Hu, Yunan Wang, Guojian Zhan, Jie Li, Shengbo Eben Li

    Abstract: Regularization is one of the most important techniques in reinforcement learning algorithms. The well-known soft actor-critic algorithm is a special case of regularized policy iteration where the regularizer is chosen as Shannon entropy. Despite some empirical success of regularized policy iteration, its theoretical underpinnings remain unclear. This paper proves that regularized policy iteration… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  10. arXiv:2310.06836  [pdf, other

    cs.CV

    A General Protocol to Probe Large Vision Models for 3D Physical Understanding

    Authors: Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

    Abstract: Our objective in this paper is to probe large vision models to determine to what extent they 'understand' different physical properties of the 3D scene depicted in an image. To this end, we make the following contributions: (i) We introduce a general and lightweight protocol to evaluate whether features of an off-the-shelf large vision model encode a number of physical 'properties' of the 3D scene… ▽ More

    Submitted 10 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  11. arXiv:2309.07510  [pdf, other

    cs.RO cs.AI cs.CV

    Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions

    Authors: Kai Cheng, Ruihai Wu, Yan Shen, Chuanruo Ning, Guanqi Zhan, Hao Dong

    Abstract: Perceiving and manipulating 3D articulated objects in diverse environments is essential for home-assistant robots. Recent studies have shown that point-level affordance provides actionable priors for downstream manipulation tasks. However, existing works primarily focus on single-object scenarios with homogeneous agents, overlooking the realistic constraints imposed by the environment and the agen… ▽ More

    Submitted 20 November, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: In 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Website at https://chengkaiacademycity.github.io/EnvAwareAfford/

  12. arXiv:2309.04220  [pdf, other

    cs.CV

    Score-PA: Score-based 3D Part Assembly

    Authors: Junfeng Cheng, Mingdong Wu, Ruiyuan Zhang, Guanqi Zhan, Chao Wu, Hao Dong

    Abstract: Autonomous 3D part assembly is a challenging task in the areas of robotics and 3D computer vision. This task aims to assemble individual components into a complete shape without relying on predefined instructions. In this paper, we formulate this task from a novel generative perspective, introducing the Score-based 3D Part Assembly framework (Score-PA) for 3D part assembly. Knowing that score-base… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: BMVC 2023

  13. arXiv:2308.16376  [pdf, other

    eess.IV cs.CV cs.DC

    Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training

    Authors: Lei Bai, Dongang Wang, Michael Barnett, Mariano Cabezas, Weidong Cai, Fernando Calamante, Kain Kyle, Dongnan Liu, Linda Ly, Aria Nguyen, Chun-Chien Shieh, Ryan Sullivan, Hengrui Wang, Geng Zhan, Wanli Ouyang, Chenyu Wang

    Abstract: Accurately measuring the evolution of Multiple Sclerosis (MS) with magnetic resonance imaging (MRI) critically informs understanding of disease progression and helps to direct therapeutic strategy. Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. Obtaining sufficient data from a single clin… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 11 pages, 4 figures, journal submission

  14. Learning with linear mixed model for group recommendation systems

    Authors: Baode Gao, Guangpeng Zhan, Hanzhang Wang, Yiming Wang, Shengxin Zhu

    Abstract: Accurate prediction of users' responses to items is one of the main aims of many computational advising applications. Examples include recommending movies, news articles, songs, jobs, clothes, books and so forth. Accurate prediction of inactive users' responses still remains a challenging problem for many applications. In this paper, we explore the linear mixed model in recommendation system. The… ▽ More

    Submitted 17 December, 2022; originally announced December 2022.

    Comments: 5 pages, 9 figures, published

    ACM Class: G.3

    Journal ref: In Proceedings of the 2019 11th International Conference on Machine Learning and Computing (pp. 81-85) (2019, February)

  15. arXiv:2210.10046  [pdf, other

    cs.CV

    A Tri-Layer Plugin to Improve Occluded Detection

    Authors: Guanqi Zhan, Weidi Xie, Andrew Zisserman

    Abstract: Detecting occluded objects still remains a challenge for state-of-the-art object detectors. The objective of this work is to improve the detection for such objects, and thereby improve the overall performance of a modern object detector. To this end we make the following four contributions: (1) We propose a simple 'plugin' module for the detection head of two-stage object detectors to improve th… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: BMVC 2022

  16. arXiv:2209.14106  [pdf

    cs.CV cs.LG

    Cyclegan Network for Sheet Metal Welding Drawing Translation

    Authors: Zhiwei Song, Hui Yao, Dan Tian, Gaohui Zhan

    Abstract: In intelligent manufacturing, the quality of machine translation engineering drawings will directly affect its manufacturing accuracy. Currently, most of the work is manually translated, greatly reducing production efficiency. This paper proposes an automatic translation method for welded structural engineering drawings based on Cyclic Generative Adversarial Networks (CycleGAN). The CycleGAN netwo… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  17. arXiv:2205.12633  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

    Authors: Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, Jin Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang , et al. (68 additional authors not shown)

    Abstract: This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: CVPR Workshops 2022. 15 pages, 21 figures, 2 tables

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022

  18. arXiv:2205.01509  [pdf, other

    eess.IV cs.CV

    MS Lesion Segmentation: Revisiting Weighting Mechanisms for Federated Learning

    Authors: Dongnan Liu, Mariano Cabezas, Dongang Wang, Zihao Tang, Lei Bai, Geng Zhan, Yuling Luo, Kain Kyle, Linda Ly, James Yu, Chun-Chien Shieh, Aria Nguyen, Ettikan Kandasamy Karuppiah, Ryan Sullivan, Fernando Calamante, Michael Barnett, Wanli Ouyang, Weidong Cai, Chenyu Wang

    Abstract: Federated learning (FL) has been widely employed for medical image analysis to facilitate multi-client collaborative learning without sharing raw data. Despite great success, FL's performance is limited for multiple sclerosis (MS) lesion segmentation tasks, due to variance in lesion characteristics imparted by different scanners and acquisition parameters. In this work, we propose the first FL MS… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: 10 pages, 3 figures, and 7 tables

  19. arXiv:2204.04403  [pdf, other

    cs.RO eess.SY

    Improve Generalization of Driving Policy at Signalized Intersections with Adversarial Learning

    Authors: Yangang Ren, Guojian Zhan, Liye Tang, Shengbo Eben Li, Jianhua Jiang, Jingliang Duan

    Abstract: Intersections are quite challenging among various driving scenes wherein the interaction of signal lights and distinct traffic actors poses great difficulty to learn a wise and robust driving policy. Current research rarely considers the diversity of intersections and stochastic behaviors of traffic participants. For practical applications, the randomness usually leads to some devastating events,… ▽ More

    Submitted 9 April, 2022; originally announced April 2022.

  20. arXiv:2104.10781  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

    Authors: Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay , et al. (47 additional authors not shown)

    Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at… ▽ More

    Submitted 31 August, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: Corrected the MOS values in Table 2, and corrected some minor typos

  21. arXiv:2006.07793  [pdf, other

    cs.CV

    Generative 3D Part Assembly via Dynamic Graph Learning

    Authors: Jialei Huang, Guanqi Zhan, Qingnan Fan, Kaichun Mo, Lin Shao, Baoquan Chen, Leonidas Guibas, Hao Dong

    Abstract: Autonomous part assembly is a challenging yet crucial task in 3D computer vision and robotics. Analogous to buying an IKEA furniture, given a set of 3D parts that can assemble a single shape, an intelligent agent needs to perceive the 3D part geometry, reason to propose pose estimations for the input parts, and finally call robotic planning and control routines for actuation. In this paper, we foc… ▽ More

    Submitted 23 December, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  22. arXiv:2005.04854  [pdf, other

    cs.CV

    Scope Head for Accurate Localization in Object Detection

    Authors: Geng Zhan, Dan Xu, Guo Lu, Wei Wu, Chunhua Shen, Wanli Ouyang

    Abstract: Existing anchor-based and anchor-free object detectors in multi-stage or one-stage pipelines have achieved very promising detection performance. However, they still encounter the design difficulty in hand-crafted 2D anchor definition and the learning complexity in 1D direct location regression. To tackle these issues, in this paper, we propose a novel detector coined as ScopeNet, which models anch… ▽ More

    Submitted 11 May, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

  23. arXiv:1911.09943  [pdf, other

    cs.CV cs.LG

    DLGAN: Disentangling Label-Specific Fine-Grained Features for Image Manipulation

    Authors: Guanqi Zhan, Yihao Zhao, Bingchan Zhao, Haoqi Yuan, Baoquan Chen, Hao Dong

    Abstract: Recent studies have shown how disentangling images into content and feature spaces can provide controllable image translation/ manipulation. In this paper, we propose a framework to enable utilizing discrete multi-labels to control which features to be disentangled, i.e., disentangling label-specific fine-grained features for image manipulation (dubbed DLGAN). By mapping the discrete label-specifi… ▽ More

    Submitted 25 August, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

  24. arXiv:1909.02902  [pdf, other

    cs.LG cs.CV stat.ML

    Dynamic Spatial-Temporal Representation Learning for Traffic Flow Prediction

    Authors: Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, Liang Lin

    Abstract: As a crucial component in intelligent transportation systems, traffic flow prediction has recently attracted widespread research interest in the field of artificial intelligence (AI) with the increasing availability of massive traffic mobility data. Its key challenge lies in how to integrate diverse factors (such as temporal rules and spatial dependencies) to infer the evolution trend of traffic f… ▽ More

    Submitted 12 June, 2020; v1 submitted 1 September, 2019; originally announced September 2019.

    Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems. arXiv admin note: text overlap with arXiv:1809.00101

  25. arXiv:1907.09160  [pdf, other

    cs.CV

    Extended Local Binary Patterns for Efficient and Robust Spontaneous Facial Micro-Expression Recognition

    Authors: Chengyu Guo, Jingyun Liang, Geng Zhan, Zhong Liu, Matti Pietikäinen, Li Liu

    Abstract: Facial Micro-Expressions (MEs) are spontaneous, involuntary facial movements when a person experiences an emotion but deliberately or unconsciously attempts to conceal his or her genuine emotions. Recently, ME recognition has attracted increasing attention due to its potential applications such as clinical diagnosis, business negotiation, interrogations, and security. However, it is expensive to b… ▽ More

    Submitted 17 September, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

  26. arXiv:1907.03590  [pdf, other

    cs.CL cs.AI

    Multiple Generative Models Ensemble for Knowledge-Driven Proactive Human-Computer Dialogue Agent

    Authors: Zelin Dai, Weitang Liu, Guanhua Zhan

    Abstract: Multiple sequence to sequence models were used to establish an end-to-end multi-turns proactive dialogue generation agent, with the aid of data augmentation techniques and variant encoder-decoder structure designs. A rank-based ensemble approach was developed for boosting performance. Results indicate that our single model, in average, makes an obvious improvement in the terms of F1-score and BLEU… ▽ More

    Submitted 6 April, 2020; v1 submitted 8 July, 2019; originally announced July 2019.

    Comments: 7 pages, 3 figures submitted to journal