research-article

Sniffing Threatening Open-World Objects in Autonomous Driving by Open-Vocabulary Models

Authors:

Yusong TanAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 9067 - 9076

https://doi.org/10.1145/3664647.3680583

Published: 28 October 2024 Publication History

Get Access

Abstract

Autonomous driving (AD) is a typical application that requires effectively exploiting multimedia information. For AD, it is critical to ensure safety by detecting unknown objects in an open world, driving the demand for open world object detection (OWOD). However, existing OWOD methods treat generic objects beyond known classes in the train set as unknown objects and prioritize recall in evaluation. This encourages excessive false positives and endangers safety of AD. To address this issue, we restrict the definition of unknown objects to threatening objects in AD, and introduce a new evaluation protocol, which is built upon a new metric named U-ARecall, to alleviate biased evaluation caused by neglecting false positives. Under the new evaluation protocol, we re-evaluate existing OWOD methods and discover that they typically perform poorly in AD. Then, we propose a novel OWOD paradigm for AD based on fine-tuning foundational open-vocabulary models (OVMs), as they can exploit rich linguistic and visual prior knowledge for OWOD. Following this new paradigm, we propose a brand-new OWOD solution, which effectively addresses two core challenges of fine-tuning OVMs via two novel techniques: 1) the maintenance of open-world generic knowledge by a dual-branch architecture; 2) the acquisition of scenario-specific knowledge by the visual-oriented contrastive learning scheme. Besides, a dual-branch prediction fusion module is proposed to avoid post-processing and hand-crafted heuristics. Extensive experiments show that our proposed method not only surpasses classic OWOD methods in unknown object detection by a large margin (∼× U-ARecall), but also notably outperforms OVMs without fine-tuning in known object detection (∼ 20% K-mAP). Our codes are available at https://github.com/harrylin-hyl/AD-OWOD.

References

[1]

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11621--11631.

Abstract

References

Index Terms

Recommendations

Class-Agnostic Detection of Unknown Objects from Foreground Improves Robust Open World Object Detection

Open-World Few-Shot Object Detection

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations