Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 61 results for author: Wan, F

Searching in archive cs. Search in all archives.
.
  1. On Flange-based 3D Hand-Eye Calibration for Soft Robotic Tactile Welding

    Authors: Xudong Han, Ning Guo, Yu Jie, He Wang, Fang Wan, Chaoyang Song

    Abstract: This paper investigates the direct application of standardized designs on the robot for conducting robot hand-eye calibration by employing 3D scanners with collaborative robots. The well-established geometric features of the robot flange are exploited by directly capturing its point cloud data. In particular, an iterative method is proposed to facilitate point cloud processing toward a refined cal… ▽ More

    Submitted 27 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 25 pages, 14 figures, 2 tables, Accepted by Measurement

  2. arXiv:2407.12449  [pdf, other

    cs.CV cs.AI

    Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation

    Authors: Kaixin Bai, Lei Zhang, Zhaopeng Chen, Fang Wan, Jianwei Zhang

    Abstract: Despite the substantial progress in deep learning, its adoption in industrial robotics projects remains limited, primarily due to challenges in data acquisition and labeling. Previous sim2real approaches using domain randomization require extensive scene and model optimization. To address these issues, we introduce an innovative physically-based structured light simulation system, generating both… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 7 pages, 2024 IEEE International Conference on Robotics and Automation

  3. arXiv:2407.01094  [pdf, other

    cs.CV

    Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

    Authors: Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, Wangmeng Zuo, Qixiang Ye, Jingdong Wang

    Abstract: Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2407.01050  [pdf, other

    cs.RO cs.AI

    Evolutionary Morphology Towards Overconstrained Locomotion via Large-Scale, Multi-Terrain Deep Reinforcement Learning

    Authors: Yenan Chen, Chuye Zhang, Pengxi Gu, Jianuo Qiu, Jiayi Yin, Nuofan Qiu, Guojing Huang, Bangchao Huang, Zishang Zhang, Hui Deng, Wei Zhang, Fang Wan, Chaoyang Song

    Abstract: While the animals' Fin-to-Limb evolution has been well-researched in biology, such morphological transformation remains under-adopted in the modern design of advanced robotic limbs. This paper investigates a novel class of overconstrained locomotion from a design and learning perspective inspired by evolutionary morphology, aiming to integrate the concept of `intelligent design under constraints'… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 13 pages, 5 figures, Accepted and Presented at ReMAR2024

  5. arXiv:2406.14136  [pdf, other

    cs.RO

    One Fling to Goal: Environment-aware Dynamics for Goal-conditioned Fabric Flinging

    Authors: Linhan Yang, Lei Yang, Haoran Sun, Zeqing Zhang, Haibin He, Fang Wan, Chaoyang Song, Jia Pan

    Abstract: Fabric manipulation dynamically is commonly seen in manufacturing and domestic settings. While dynamically manipulating a fabric piece to reach a target state is highly efficient, this task presents considerable challenges due to the varying properties of different fabrics, complex dynamics when interacting with environments, and meeting required goal conditions. To address these challenges, we pr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.10813  [pdf, other

    cs.CL

    Self-Evolution Fine-Tuning for Policy Optimization

    Authors: Ruijun Chen, Jiehao Liang, Shiping Gao, Fanqi Wan, Xiaojun Quan

    Abstract: The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologies face considerable challenges. For instance, supervised fine-tuning (SFT) requires extensive, high-quality annotated samples, while reinforcement lea… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  8. arXiv:2406.10594  [pdf, other

    cs.CL

    BlockPruner: Fine-grained Pruning for Large Language Models

    Authors: Longguang Zhong, Fanqi Wan, Ruijun Chen, Xiaojun Quan, Liangzhi Li

    Abstract: With the rapid growth in the size and complexity of large language models (LLMs), the costs associated with their training and inference have escalated significantly. Research indicates that certain layers in LLMs harbor substantial redundancy, and pruning these layers has minimal impact on the overall performance. While various layer pruning methods have been developed based on this insight, they… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  9. arXiv:2405.16071  [pdf, other

    cs.CV

    DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution

    Authors: Yuzhong Zhao, Feng Liu, Yue Liu, Mingxiang Liao, Chen Gong, Qixiang Ye, Fang Wan

    Abstract: Region-level multi-modality methods can translate referred image regions to human preferred language descriptions. Unfortunately, most of existing methods using fixed visual inputs remain lacking the resolution adaptability to find out precise language descriptions. In this study, we propose a dynamic resolution approach, referred to as DynRefer, to pursue high-accuracy region-level referring thro… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: Code is available at https://github.com/callsys/DynRefer

  10. arXiv:2403.09363  [pdf, other

    cs.CV

    Sentinel-Guided Zero-Shot Learning: A Collaborative Paradigm without Real Data Exposure

    Authors: Fan Wan, Xingyu Miao, Haoran Duan, Jingjing Deng, Rui Gao, Yang Long

    Abstract: With increasing concerns over data privacy and model copyrights, especially in the context of collaborations between AI service providers and data owners, an innovative SG-ZSL paradigm is proposed in this work. SG-ZSL is designed to foster efficient collaboration without the need to exchange models or sensitive data. It consists of a teacher model, a student model and a generator that links both m… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  11. arXiv:2402.16107  [pdf, other

    cs.CL

    Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

    Authors: Fanqi Wan, Ziyi Yang, Longguang Zhong, Xiaojun Quan, Xinting Huang, Wei Bi

    Abstract: Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. FusionChat comprises two main stages. Firstly, we undertake kno… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: Technical Report, work in progress

  12. arXiv:2402.03634  [pdf, other

    cs.CV

    Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

    Authors: Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou

    Abstract: Multi-view 3D object detection systems often struggle with generating precise predictions due to the challenges in estimating depth from images, increasing redundant and incorrect detections. Our paper presents Ray Denoising, an innovative method that enhances detection accuracy by strategically sampling along camera rays to construct hard negative examples. These examples, visually challenging to… ▽ More

    Submitted 12 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  13. arXiv:2402.01950  [pdf, other

    cs.CV

    ConRF: Zero-shot Stylization of 3D Scenes with Conditioned Radiation Fields

    Authors: Xingyu Miao, Yang Bai, Haoran Duan, Fan Wan, Yawen Huang, Yang Long, Yefeng Zheng

    Abstract: Most of the existing works on arbitrary 3D NeRF style transfer required retraining on each single style condition. This work aims to achieve zero-shot controlled stylization in 3D scenes utilizing text or visual input as conditioning factors. We introduce ConRF, a novel method of zero-shot stylization. Specifically, due to the ambiguity of CLIP features, we employ a conversion process that maps th… ▽ More

    Submitted 6 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  14. arXiv:2401.17910  [pdf, other

    cs.CV

    ControlCap: Controllable Region-level Captioning

    Authors: Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Fang Wan, Qixiang Ye

    Abstract: Region-level captioning is challenged by the caption degeneration issue, which refers to that pre-trained multimodal models tend to predict the most frequent captions but miss the less frequent ones. In this study, we propose a controllable region-level captioning (ControlCap) approach, which introduces control words to a multimodal model to address the caption degeneration issue. In specific, Con… ▽ More

    Submitted 9 March, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: https://github.com/callsys/ControlCap

  15. arXiv:2401.10768  [pdf, other

    cs.CL

    Knowledge Verification to Nip Hallucination in the Bud

    Authors: Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as \emph{hallucination}. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external know… ▽ More

    Submitted 16 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Work in progress

  16. arXiv:2401.10491  [pdf, other

    cs.CL

    Knowledge Fusion of Large Language Models

    Authors: Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: While training large language models (LLMs) from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weigh… ▽ More

    Submitted 22 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024

  17. CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video

    Authors: Xingyu Miao, Yang Bai, Haoran Duan, Yawen Huang, Fan Wan, Yang Long, Yefeng Zheng

    Abstract: The goal of our work is to generate high-quality novel views from monocular videos of complex and dynamic scenes. Prior methods, such as DynamicNeRF, have shown impressive performance by leveraging time-varying dynamic radiation fields. However, these methods have limitations when it comes to accurately modeling the motion of complex objects, which can lead to inaccurate and blurry renderings of d… ▽ More

    Submitted 26 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted by Pattern Recognition

  18. arXiv:2312.12295  [pdf, other

    cs.RO

    Describing Robots from Design to Learning: Towards an Interactive Lifecycle Representation of Robots

    Authors: Nuofan Qiu, Fang Wan, Chaoyang Song

    Abstract: The robot development process is divided into several stages, which create barriers to the exchange of information between these different stages. We advocate for an interactive lifecycle representation, extending from robot morphology design to learning, and introduce the role of robot description formats in facilitating information transfer throughout this pipeline. We analyzed the relationship… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 11 pages, 8 figures, 2 tables, submitted to ICRA2024 for review

  19. arXiv:2312.09863  [pdf, other

    cs.RO

    Proprioceptive State Estimation for Amphibious Tactile Sensing

    Authors: Ning Guo, Xudong Han, Shuqiao Zhong, Zhiyuan Zhou, Jian Lin, Jian S. Dai, Fang Wan, Chaoyang Song

    Abstract: This paper presents a novel vision-based proprioception approach for a soft robotic finger that can estimate and reconstruct tactile interactions in both terrestrial and aquatic environments. The key to this system lies in the finger's unique metamaterial structure, which facilitates omni-directional passive adaptation during grasping, protecting delicate objects across diverse scenarios. A compac… ▽ More

    Submitted 21 July, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 24 pages, 11 figures, 1 table, Conditionally Accepted for the Special Collection on Tactile Robotics in IEEE Transactions on Robotics

  20. arXiv:2312.09822  [pdf, other

    cs.RO

    SeeThruFinger: See and Grasp Anything with a Soft Touch

    Authors: Fang Wan, Chaoyang Song

    Abstract: We present SeeThruFinger, a soft robotic finger with an in-finger vision for multi-modal perception, including visual perception and tactile sensing, for geometrically adaptive and real-time reactive grasping. Multi-modal perception of intrinsic and extrinsic interactions is critical in building intelligent robots that learn. Instead of adding various sensors for different modalities, a preferred… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 10 pages, 5 figures, 1 table, submitted to Soft Robotics under review

  21. arXiv:2311.14974  [pdf, other

    cs.RO

    Active Surface with Passive Omni-Directional Adaptation of Soft Polyhedral Fingers for In-Hand Manipulation

    Authors: Sen Li, Fang Wan, Chaoyang Song

    Abstract: Track systems effectively distribute loads, augmenting traction and maneuverability on unstable terrains, leveraging their expansive contact areas. This tracked locomotion capability also aids in hand manipulation of not only regular objects but also irregular objects. In this study, we present the design of a soft robotic finger with an active surface on an omni-adaptive network structure, which… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

    Comments: 10 pages, 6 figures, 2 tables, submitted to ICRA 2024

  22. arXiv:2310.20256  [pdf, other

    cs.CL

    PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for Personality Detection

    Authors: Tao Yang, Tianyuan Shi, Fanqi Wan, Xiaojun Quan, Qifan Wang, Bingzhe Wu, Jiaxiang Wu

    Abstract: Recent advances in large language models (LLMs), such as ChatGPT, have showcased remarkable zero-shot performance across various NLP tasks. However, the potential of LLMs in personality detection, which involves identifying an individual's personality from their written texts, remains largely unexplored. Drawing inspiration from Psychological Questionnaires, which are carefully designed by psychol… ▽ More

    Submitted 4 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023

  23. arXiv:2310.09824  [pdf, other

    cs.RO

    Overconstrained Locomotion

    Authors: Haoran Sun, Bangchao Huang, Zishang Zhang, Ronghan Xu, Guojing Huang, Shihao Feng, Guangyi Huang, Jiayi Yin, Nuofan Qiu, Hua Chen, Wei Zhang, Jia Pan, Fang Wan, Chaoyang Song

    Abstract: This paper studies the design, control, and learning of a novel robotic limb that produces overconstrained locomotion by employing the Bennett linkage for motion generation, capable of parametric reconfiguration between a reptile- and mammal-inspired morphology within a single quadruped. In contrast to the prevailing focus on planar linkages, this research delves into adopting overconstrained link… ▽ More

    Submitted 30 July, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: 30 pages, 20 figures, 2 tables

  24. arXiv:2310.09168  [pdf, other

    cs.CL

    Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

    Authors: Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a nove… ▽ More

    Submitted 24 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (Main Conference)

  25. arXiv:2310.08877  [pdf, other

    cs.CL

    Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System

    Authors: Weizhou Shen, Yingqi Gao, Canbin Huang, Fanqi Wan, Xiaojun Quan, Wei Bi

    Abstract: Developing an efficient retriever to retrieve knowledge from a large-scale knowledge base (KB) is critical for task-oriented dialogue systems to effectively handle localized and specialized tasks. However, widely used generative models such as T5 and ChatGPT often struggle to differentiate subtle differences among the retrieved KB records when generating responses, resulting in suboptimal quality… ▽ More

    Submitted 20 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Main Conference

  26. Proprioceptive Learning with Soft Polyhedral Networks

    Authors: Xiaobo Liu, Xudong Han, Wei Hong, Fang Wan, Chaoyang Song

    Abstract: Proprioception is the "sixth sense" that detects limb postures with motor neurons. It requires a natural integration between the musculoskeletal systems and sensory receptors, which is challenging among modern robots that aim for lightweight, adaptive, and sensitive designs at a low cost. Here, we present the Soft Polyhedral Network with an embedded vision for physical interactions, capable of ada… ▽ More

    Submitted 27 July, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 20 pages, 10 figures, 2 tables, Published in the International Journal of Robotics Research

  27. arXiv:2308.08510  [pdf, other

    cs.RO cs.LG

    Autoencoding a Soft Touch to Learn Grasping from On-land to Underwater

    Authors: Ning Guo, Xudong Han, Xiaobo Liu, Shuqiao Zhong, Zhiyuan Zhou, Jian Lin, Jiansheng Dai, Fang Wan, Chaoyang Song

    Abstract: Robots play a critical role as the physical agent of human operators in exploring the ocean. However, it remains challenging to grasp objects reliably while fully submerging under a highly pressurized aquatic environment with little visible light, mainly due to the fluidic interference on the tactile mechanics between the finger and object surfaces. This study investigates the transferability of g… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 17 pages, 5 figures, 1 table, submitted to Advanced Intelligent Systems for review

  28. DS-Depth: Dynamic and Static Depth Estimation via a Fusion Cost Volume

    Authors: Xingyu Miao, Yang Bai, Haoran Duan, Yawen Huang, Fan Wan, Xinxing Xu, Yang Long, Yefeng Zheng

    Abstract: Self-supervised monocular depth estimation methods typically rely on the reprojection error to capture geometric relationships between successive frames in static environments. However, this assumption does not hold in dynamic objects in scenarios, leading to errors during the view synthesis stage, such as feature mismatch and occlusion, which can significantly reduce the accuracy of the generated… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  29. arXiv:2307.09756  [pdf, other

    cs.CV

    Generative Prompt Model for Weakly Supervised Object Localization

    Authors: Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

    Abstract: Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. Conventional methods that discriminatively train activation models ignore representative yet less discriminative object parts. In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative obje… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Journal ref: International Conference on Computer Vision Conference (ICCV2023)

  30. arXiv:2306.04932  [pdf, other

    cs.RO

    Jigsaw-based Benchmarking for Learning Robotic Manipulation

    Authors: Xiaobo Liu, Fang Wan, Sheng Ge, Haokun Wang, Haoran Sun, Chaoyang Song

    Abstract: Benchmarking provides experimental evidence of the scientific baseline to enhance the progression of fundamental research, which is also applicable to robotics. In this paper, we propose a method to benchmark metrics of robotic manipulation, which addresses the spatial-temporal reasoning skills for robot learning with the jigsaw game. In particular, our approach exploits a simple set of jigsaw pie… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 7 pages, 7 figures, accepted to 2023 IEEE International Conference on Advanced Robotics and Mechatronics (ICARM)

  31. arXiv:2306.04928  [pdf, other

    cs.RO cs.HC

    Underwater Intention Recognition using Head Motion and Throat Vibration for Supernumerary Robotic Assistance

    Authors: Yuqin Guo, Rongzheng Zhang, Wanghongjie Qiu, Harry Asada, Fang Wan, Chaoyang Song

    Abstract: This study presents a multi-modal mechanism for recognizing human intentions while diving underwater, aiming to achieve natural human-robot interactions through an underwater superlimb for diving assistance. The underwater environment severely limits the divers' capabilities in intention expression, which becomes more challenging when they intend to operate tools while keeping control of body post… ▽ More

    Submitted 16 August, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: 6 pages, 9 figures, 3 tables, accepted to IEEE CASE 2023

  32. arXiv:2305.10149  [pdf, other

    cs.CL

    Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

    Authors: Fanqi Wan, Weizhou Shen, Ke Yang, Xiaojun Quan, Wei Bi

    Abstract: Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses. Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses, leading to suboptimal retrieval performance when the knowledge base becomes large-scale. To address th… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 (Main Conference)

  33. arXiv:2305.09892  [pdf, other

    cs.CL cs.AI

    Clustering-Aware Negative Sampling for Unsupervised Sentence Representation

    Authors: Jinghao Deng, Fanqi Wan, Tao Yang, Xiaojun Quan, Rui Wang

    Abstract: Contrastive learning has been widely studied in sentence representation learning. However, earlier works mainly focus on the construction of positive examples, while in-batch samples are often simply treated as negative examples. This approach overlooks the importance of selecting appropriate negative examples, potentially leading to a scarcity of hard negatives and the inclusion of false negative… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: accepted to Finding of ACL2023, 16 pages

  34. arXiv:2210.03592  [pdf, other

    cs.CV

    Specialized Re-Ranking: A Novel Retrieval-Verification Framework for Cloth Changing Person Re-Identification

    Authors: Renjie Zhang, Yu Fang, Huaxin Song, Fangbin Wan, Yanwei Fu, Hirokazu Kato, Yang Wu

    Abstract: Cloth changing person re-identification(Re-ID) can work under more complicated scenarios with higher security than normal Re-ID and biometric techniques and is therefore extremely valuable in applications. Meanwhile, higher flexibility in appearance always leads to more similar-looking confusing images, which is the weakness of the widely used retrieval methods. In this work, we shed light on how… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted by Pattern Recognition

  35. arXiv:2205.09613  [pdf, other

    cs.CV

    Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

    Authors: Feng Liu, Xiaosong Zhang, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye

    Abstract: Modern object detectors have taken the advantages of backbone networks pre-trained on large scale datasets. Except for the backbone networks, however, other components such as the detector head and the feature pyramid network (FPN) remain trained from scratch, which hinders fully tapping the potential of representation models. In this study, we propose to integrally migrate pre-trained transformer… ▽ More

    Submitted 2 December, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: 6 figures, 6 tables

  36. arXiv:2202.11319  [pdf, other

    cs.CV cs.CR cs.LG

    Absolute Zero-Shot Learning

    Authors: Rui Gao, Fan Wan, Daniel Organisciak, Jiyao Pu, Junyan Wang, Haoran Duan, Peng Zhang, Xingsong Hou, Yang Long

    Abstract: Considering the increasing concerns about data copyright and privacy issues, we present a novel Absolute Zero-Shot Learning (AZSL) paradigm, i.e., training a classifier with zero real data. The key innovation is to involve a teacher model as the data safeguard to guide the AZSL model training without data leaking. The AZSL model consists of a generator and student network, which can achieve date-f… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

  37. arXiv:2108.13969  [pdf, other

    cs.CV cs.LG

    Semi-Supervised Crowd Counting from Unlabeled Data

    Authors: Haoran Duan, Fan Wan, Rui Sun, Zeyu Wang, Varun Ojha, Yu Guan, Hubert P. H. Shum, Bingzhang Hu, Yang Long

    Abstract: Automatic Crowd behavior analysis can be applied to effectively help the daily transportation statistics and planning, which helps the smart city construction. As one of the most important keys, crowd counting has drawn increasing attention. Recent works achieved promising performance but relied on the supervised paradigm with expensive crowd annotations. To alleviate the annotation cost in real-w… ▽ More

    Submitted 26 March, 2024; v1 submitted 31 August, 2021; originally announced August 2021.

  38. arXiv:2104.02324  [pdf, other

    cs.CV cs.AI cs.LG

    Multiple instance active learning for object detection

    Authors: Tianning Yuan, Fang Wan, Mengying Fu, Jianzhuang Liu, Songcen Xu, Xiangyang Ji, Qixiang Ye

    Abstract: Despite the substantial progress of active learning for image recognition, there still lacks an instance-level active learning method specified for object detection. In this paper, we propose Multiple Instance Active Object Detection (MI-AOD), to select the most informative images for detector training by observing instance-level uncertainty. MI-AOD defines an instance uncertainty learning module,… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 10 pages, 7 figures, 5 tables. Code is available at https://github.com/yuantn/MI-AOD

  39. arXiv:2103.14862  [pdf, other

    cs.CV

    TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

    Authors: Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian, Zhenjun Han, Bolei Zhou, Qixiang Ye

    Abstract: Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn object localization models. Optimizing a convolutional neural network (CNN) for classification tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue. In this paper, we argue that partial activation is cause… ▽ More

    Submitted 3 August, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

    Comments: Accepted by ICCV2021 (poster)

  40. arXiv:2101.12379  [pdf, other

    cs.RO

    Learning-based Optoelectronically Innervated Tactile Finger for Rigid-Soft Interactive Grasping

    Authors: Linhan Yang, Xudong Han, Weijie Guo, Fang Wan, Jia Pan, Chaoyang Song

    Abstract: This paper presents a novel design of a soft tactile finger with omni-directional adaptation using multi-channel optical fibers for rigid-soft interactive grasping. Machine learning methods are used to train a model for real-time prediction of force, torque, and contact using the tactile data collected. We further integrated such fingers in a reconfigurable gripper design with three fingers so tha… ▽ More

    Submitted 28 January, 2021; originally announced January 2021.

    Comments: 8 pages,9 figures, Submitted to RAL and ICRA2021

  41. arXiv:2012.03168  [pdf, other

    cs.RO

    Design of an Optoelectronically Innervated Gripper for Rigid-Soft Interactive Grasping

    Authors: Linhan Yang, Xudong Han, Weijie Guo, Zixin Zhang, Fang Wan, Jia Pan, Chaoyang Song

    Abstract: Over the past few decades, efforts have been made towards robust robotic grasping, and therefore dexterous manipulation. The soft gripper has shown their potential in robust grasping due to their inherent properties-low, control complexity, and high adaptability. However, the deformation of the soft gripper when interacting with objects bring inaccuracy of grasped objects, which causes instability… ▽ More

    Submitted 5 December, 2020; originally announced December 2020.

    Comments: 11 pages, 6 figures, submitted to IEEE ICRA 2021

  42. arXiv:2007.14557  [pdf, other

    cs.CV

    Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking

    Authors: Jinlong Peng, Changan Wang, Fangbin Wan, Yang Wu, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Yanwei Fu

    Abstract: Existing Multiple-Object Tracking (MOT) methods either follow the tracking-by-detection paradigm to conduct object detection, feature extraction and data association separately, or have two of the three subtasks integrated to form a partially end-to-end solution. Going beyond these sub-optimal frameworks, we propose a simple online model named Chained-Tracker (CTracker), which naturally integrates… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: European Conference on Computer Vision 2020 (Spotlight)

  43. arXiv:2006.14863  [pdf, other

    cs.CV cs.LG

    Domain Contrast for Domain Adaptive Object Detection

    Authors: Feng Liu, Xiaoxong Zhang, Fang Wan, Xiangyang Ji, Qixiang Ye

    Abstract: We present Domain Contrast (DC), a simple yet effective approach inspired by contrastive learning for training domain adaptive detectors. DC is deduced from the error bound minimization perspective of a transferred model, and is implemented with cross-domain contrast loss which is plug-and-play. By minimizing cross-domain contrast loss, DC guarantees the transferability of detectors while naturall… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  44. arXiv:2005.02588  [pdf, other

    cs.RO

    DeepClaw: A Robotic Hardware Benchmarking Platform for Learning Object Manipulation

    Authors: Fang Wan, Haokun Wang, Xiaobo Liu, Linhan Yang, Chaoyang Song

    Abstract: We present DeepClaw as a reconfigurable benchmark of robotic hardware and task hierarchy for robot learning. The DeepClaw benchmark aims at a mechatronics perspective of the robot learning problem, which features a minimum design of robot cell that can be easily reconfigured to host robot hardware from various vendors, including manipulators, grippers, cameras, desks, and objects, aiming at a stre… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

    Comments: 13 pages, 6 figures, 2 tables, accepted for AIM 2020

  45. arXiv:2004.00163  [pdf, other

    cs.CV cs.LG stat.ML

    Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning

    Authors: Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu

    Abstract: Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label. It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments). Since only the bag's label is known, the main challenge is assigning which key instances within the bag to trigger t… ▽ More

    Submitted 25 August, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

    Comments: Accepted at European Conference on Computer Vision (ECCV), 2020

  46. arXiv:2003.04070  [pdf, other

    cs.CV

    When Person Re-identification Meets Changing Clothes

    Authors: Fangbin Wan, Yang Wu, Xuelin Qian, Yixiong Chen, Yanwei Fu

    Abstract: Person re-identification (ReID) is now an active research topic for AI-based video surveillance applications such as specific person search, but the practical issue that the target person(s) may change clothes (clothes inconsistency problem) has been overlooked for long. For the first time, this paper systematically studies this problem. We first overcome the difficulty of lack of suitable dataset… ▽ More

    Submitted 24 May, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

    Comments: Accepted by CVPRW 2020

  47. Hybrid Actuator Design for a Gait Augmentation Wearable

    Authors: Fang Wan, Zheng Wang, Brooke Franchuk, Xinyao Hu, Zhenglong Sun, Chaoyang Song

    Abstract: We describe a fluidic actuator design that replaces the sealed chamber of a hydraulic cylinder using a soft actuator to provide compliant linear compression with a large force ($\geq$100 N) at a low operation pressure ($\leq$50 kPa) for a lower-limb wearable. The external shells constrain the deformation of the soft actuator under fluidic pressurization. This enables us to use latex party balloons… ▽ More

    Submitted 7 March, 2020; originally announced March 2020.

    Comments: 5 pages, 7 figures, published at IEEE ROBIO 2017

  48. Robotic Cane as a Soft SuperLimb for Elderly Sit-to-Stand Assistance

    Authors: Xia Wu, Haiyuan Liu, Ziqi Liu, Mingdong Chen, Fang Wan, Chenglong Fu, Harry Asada, Zheng Wang, Chaoyang Song

    Abstract: Many researchers have identified robotics as a potential solution to the aging population faced by many developed and developing countries. If so, how should we address the cognitive acceptance and ambient control of elderly assistive robots through design? In this paper, we proposed an explorative design of an ambient SuperLimb (Supernumerary Robotic Limb) system that involves a pneumatically-dri… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

    Comments: 8 pages, 9 figures, accepted for IEEE RoboSoft 2020

    Journal ref: 2020 3rd IEEE International Conference on Soft Robotics (RoboSoft)

  49. Rigid-Soft Interactive Learning for Robust Grasping

    Authors: Linhan Yang, Fang Wan, Haokun Wang, Xiaobo Liu, Yujia Liu, Jia Pan, Chaoyang Song

    Abstract: Inspired by widely used soft fingers on grasping, we propose a method of rigid-soft interactive learning, aiming at reducing the time of data collection. In this paper, we classify the interaction categories into Rigid-Rigid, Rigid-Soft, Soft-Rigid according to the interaction surface between grippers and target objects. We find experimental evidence that the interaction types between grippers and… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

    Comments: 8 pages, 5 figures, Accepted for IEEE RAL and IEEE ICRA 2020

  50. Scalable Tactile Sensing for an Omni-adaptive Soft Robot Finger

    Authors: Zeyi Yang, Sheng Ge, Fang Wan, Yujia Liu, Chaoyang Song

    Abstract: Robotic fingers made of soft material and compliant structures usually lead to superior adaptation when interacting with the unstructured physical environment. In this paper, we present an embedded sensing solution using optical fibers for an omni-adaptive soft robotic finger with exceptional adaptation in all directions. In particular, we managed to insert a pair of optical fibers inside the fing… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

    Comments: 8 pages, 6 figures, full-length version of a submission to IEEE RoboSoft 2020

    Journal ref: 2020 3rd IEEE International Conference on Soft Robotics (RoboSoft)