Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–21 of 21 results for author: Mu, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.16151  [pdf, other

    cs.RO

    Optimal camera-robot pose estimation in linear time from points and lines

    Authors: Guangyang Zeng, Biqiang Mu, Qingcheng Zeng, Yuchen Song, Chulin Dai, Guodong Shi, Junfeng Wu

    Abstract: Camera pose estimation is a fundamental problem in robotics. This paper focuses on two issues of interest: First, point and line features have complementary advantages, and it is of great value to design a uniform algorithm that can fuse them effectively; Second, with the development of modern front-end techniques, a large number of features can exist in a single image, which presents a potential… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  2. arXiv:2407.08931  [pdf, other

    cs.CV

    Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

    Authors: Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

    Abstract: Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous l… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV 2024

  3. arXiv:2405.03152  [pdf, other

    eess.AS cs.SD

    MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

    Authors: Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models (LLM), delivering impressive performance in ASR error correction, where N-best hypotheses provide valuable information for transcription prediction. H… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  5. arXiv:2403.07372  [pdf, other

    cs.CV

    Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection

    Authors: Jiahui Fu, Chen Gao, Zitian Wang, Lirong Yang, Xiaofei Wang, Beipeng Mu, Si Liu

    Abstract: Recent 3D object detectors typically utilize multi-sensor data and unify multi-modal features in the shared bird's-eye view (BEV) representation space. However, our empirical findings indicate that previous methods have limitations in generating fusion BEV features free from cross-modal conflicts. These conflicts encompass extrinsic conflicts caused by BEV feature construction and inherent conflic… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted by ICRA 2024

  6. arXiv:2403.01174  [pdf, other

    cs.CV

    Consistent and Asymptotically Statistically-Efficient Solution to Camera Motion Estimation

    Authors: Guangyang Zeng, Qingcheng Zeng, Xinghan Li, Biqiang Mu, Jiming Chen, Ling Shi, Junfeng Wu

    Abstract: Given 2D point correspondences between an image pair, inferring the camera motion is a fundamental issue in the computer vision community. The existing works generally set out from the epipolar constraint and estimate the essential matrix, which is not optimal in the maximum likelihood (ML) sense. In this paper, we dive into the original measurement model with respect to the rotation matrix and no… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  7. arXiv:2401.00475  [pdf, other

    cs.SD eess.AS

    E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

    Authors: Hongfei Xue, Yuhao Liang, Bingshen Mu, Shiliang Zhang, Mengzhe Chen, Qian Chen, Lei Xie

    Abstract: This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emo… ▽ More

    Submitted 27 July, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: 5 pages, 3 figures

  8. arXiv:2312.09746  [pdf, other

    cs.SD eess.AS

    Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies

    Authors: Bingshen Mu, Pengcheng Guo, Dake Guo, Pan Zhou, Wei Chen, Lei Xie

    Abstract: Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces challenges in real-world distant scenarios across various array topologies each with multiple recording devices. The focal point of the CHiME-7 Distant ASR task is to devise a unified system capable of generalizing various array topologies that have multiple recording devices and offering reliable recognition perf… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  9. arXiv:2312.03408  [pdf, other

    cs.CV

    Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

    Authors: Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao

    Abstract: With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem. Current autonomous driving datasets can broadly be categorized into two generations. The first-generation autonomous driving datasets are characterized by relatively sim… ▽ More

    Submitted 22 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: This article is a simplified English translation of corresponding Chinese article. Please refer to Chinese version for the complete content

  10. arXiv:2308.09346  [pdf, other

    cs.CV

    Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching

    Authors: Jiazheng Xing, Mengmeng Wang, Yudi Ruan, Bofan Chen, Yaowei Guo, Boyu Mu, Guang Dai, Jingdong Wang, Yong Liu

    Abstract: Class prototype construction and matching are core aspects of few-shot action recognition. Previous methods mainly focus on designing spatiotemporal relation modeling modules or complex temporal alignment algorithms. Despite the promising results, they ignored the value of class prototype construction and matching, leading to unsatisfactory performance in recognizing similar categories in every ta… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023

  11. arXiv:2305.12493  [pdf, other

    eess.AS cs.CL cs.SD

    Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

    Authors: Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie

    Abstract: Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context… ▽ More

    Submitted 12 July, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted by interspeech2023

  12. arXiv:2301.07944  [pdf, other

    cs.CV cs.AI

    Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition

    Authors: Jiazheng Xing, Mengmeng Wang, Yong Liu, Boyu Mu

    Abstract: Spatial and temporal modeling is one of the most core aspects of few-shot action recognition. Most previous works mainly focus on long-term temporal relation modeling based on high-level spatial representations, without considering the crucial low-level spatial features and short-term temporal relations. Actually, the former feature could bring rich local semantic information, and the latter featu… ▽ More

    Submitted 7 April, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

  13. arXiv:2209.06779  [pdf, ps, other

    cs.RO eess.SY

    Efficient Planar Pose Estimation via UWB Measurements

    Authors: Haodong Jiang, Wentao Wang, Yuan Shen, Xinghan Li, Xiaoqiang Ren, Biqiang Mu, Junfeng Wu

    Abstract: State estimation is an essential part of autonomous systems. Integrating the Ultra-Wideband(UWB) technique has been shown to correct the long-term estimation drift and bypass the complexity of loop closure detection. However, few works on robotics adopt UWB as a stand-alone state estimation solution. The primary purpose of this work is to investigate planar pose estimation using only UWB range mea… ▽ More

    Submitted 27 February, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: Update the content and improve consistency with the ICRA version

  14. arXiv:2209.05824  [pdf, other

    cs.CV cs.RO

    CPnP: Consistent Pose Estimator for Perspective-n-Point Problem with Bias Elimination

    Authors: Guangyang Zeng, Shiyu Chen, Biqiang Mu, Guodong Shi, Junfeng Wu

    Abstract: The Perspective-n-Point (PnP) problem has been widely studied in both computer vision and photogrammetry societies. With the development of feature extraction techniques, a large number of feature points might be available in a single shot. It is promising to devise a consistent estimator, i.e., the estimate can converge to the true camera pose as the number of points increases. To this end, we pr… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  15. arXiv:2202.06312  [pdf, other

    cs.CV

    Progressive Backdoor Erasing via connecting Backdoor and Adversarial Attacks

    Authors: Bingxu Mu, Zhenxing Niu, Le Wang, Xue Wang, Rong Jin, Gang Hua

    Abstract: Deep neural networks (DNNs) are known to be vulnerable to both backdoor attacks as well as adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, in this paper we find an intriguing connection between them: for a model planted with backdo… ▽ More

    Submitted 26 December, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

  16. arXiv:2106.03947  [pdf, other

    cs.LG

    TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

    Authors: Saeed Soori, Bugra Can, Baourun Mu, Mert Gürbüzbalaban, Maryam Mehri Dehnavi

    Abstract: This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees. Computing the inverse of the neural network's Fisher information matrix is expensive in NGD because the Fisher matrix is large. Approximate NGD methods such as KFAC attempt to improve NGD's running time and practical application by reducing the Fisher matrix inversion cost with… ▽ More

    Submitted 3 March, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

  17. arXiv:1911.03615  [pdf, ps, other

    cs.RO

    Universal Flying Objects: Modular Multirotor System for Flight of Rigid Objects

    Authors: Bingguo Mu, Pakpong Chirarattananon

    Abstract: We introduce UFO, a modular aerial robotic platform for transforming a rigid object into a multirotor robot. To achieve this, we develop flight modules, in the form of a control module and propelling modules, that can be affixed to an object. The object, or payload, serves as the airframe of the vehicle. The modular design produces a highly versatile platform as it is reconfigurable by the additio… ▽ More

    Submitted 13 January, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

    Comments: submitted to Transactions on Robotics

  18. arXiv:1905.06496  [pdf, ps, other

    cs.RO

    Trajectory Generation for Underactuated Multirotor Vehicles with Tilted Propellers via a Flatness-based Method

    Authors: Bingguo Mu, Pakpong Chirarattananon

    Abstract: This paper considers a class of rotary-wing aerial robots with unaligned propellers. By studying the dynamics of these vehicles, we show that the position and heading angle remain flat outputs of the system (similar to conventional quadrotors). The implication is that they can be commanded to follow desired trajectory setpoints in 3D space. We propose a numerical strategy based on the collocation… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Comments: 6 pages, 6 figures, accepted at IEEE AIM 2019

  19. arXiv:1704.05959  [pdf, other

    cs.CV cs.RO

    SLAM with Objects using a Nonparametric Pose Graph

    Authors: Beipeng Mu, Shih-Yuan Liu, Liam Paull, John Leonard, Jonathan How

    Abstract: Mapping and self-localization in unknown environments are fundamental capabilities in many robotic applications. These tasks typically involve the identification of objects as unique features or landmarks, which requires the objects both to be detected and then assigned a unique identifier that can be maintained when viewed from different perspectives and in different images. The \textit{data asso… ▽ More

    Submitted 19 April, 2017; originally announced April 2017.

    Comments: published at IROS 2016

  20. arXiv:1509.08155  [pdf, ps, other

    cs.RO

    Information-based Active SLAM via Topological Feature Graphs

    Authors: Beipeng Mu, Matthew Giamou, Liam Paull, Ali-akbar Agha-mohammadi, John Leonard, Jonathan How

    Abstract: Active SLAM is the task of actively planning robot paths while simultaneously building a map and localizing within. Existing work has focused on planning paths with occupancy grid maps, which do not scale well and suffer from long term drift. This work proposes a Topological Feature Graph (TFG) representation that scales well and develops an active SLAM algorithm with it. The TFG uses graphical mo… ▽ More

    Submitted 29 August, 2016; v1 submitted 27 September, 2015; originally announced September 2015.

    Comments: published in CDC 2016

  21. arXiv:1409.7808  [pdf, ps, other

    cs.IT

    Resource-Constrained Adaptive Search for Sparse Multi-Class Targets with Varying Importance

    Authors: Gregory E. Newstadt, Beipeng Mu, Dennis Wei, Jonathan P. How, Alfred O. Hero III

    Abstract: In sparse target inference problems it has been shown that significant gains can be achieved by adaptive sensing using convex criteria. We generalize previous work on adaptive sensing to (a) include multiple classes of targets with different levels of importance and (b) accommodate multiple sensor models. New optimization policies are developed to allocate a limited resource budget to simultaneous… ▽ More

    Submitted 27 September, 2014; originally announced September 2014.

    Comments: 49 pages, 9 figures