Structuring Visual Words in 3D for Arbitrary-View Object Localization

Xiao, Jianxiong; Chen, Jingni; Yeung, Dit-Yan; Quan, Long

doi:10.1007/978-3-540-88690-7_54

Jianxiong Xiao⁴,
Jingni Chen⁴,
Dit-Yan Yeung⁴ &
…
Long Quan⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5304))

Included in the following conference series:

European Conference on Computer Vision

8333 Accesses

Abstract

We propose a novel and efficient method for generic arbitrary-view object class detection and localization. In contrast to existing single-view and multi-view methods using complicated mechanisms for relating the structural information in different parts of the objects or different viewpoints, we aim at representing the structural information in their true 3D locations. Uncalibrated multi-view images from a hand-held camera are used to reconstruct the 3D visual word models in the training stage. In the testing stage, beyond bounding boxes, our method can automatically determine the locations and outlines of multiple objects in the test image with occlusion handling, and can accurately estimate both the intrinsic and extrinsic camera parameters in an optimized way. With exemplar models, our method can also handle shape deformation for intra-class variance. To handle large data sets from models, we propose several speedup techniques to make the prediction efficient. Experimental results obtained based on some standard data sets demonstrate the effectiveness of the proposed approach.

Download to read the full chapter text

Chapter PDF

SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection From Multi-view Camera Images With Global Cross-Sensor Attention

3D Object retrieval based on viewpoint segmentation

Article 06 March 2015

UP-Net: unique keyPoint description and detection net

Article 23 December 2021

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Thomas, A., Ferrari, V., Leibe, B., Turtelaars, T., Schiele, B., Gool, L.V.: Towards multi-view object class detection. In: IEEE Conference Computer Vision and Pattern Recognition, vol. 2, pp. 1589–1596 (2006)
Google Scholar
Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: IEEE International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar
Yan, P., Khan, S., Shah, M.: 3D model based object class detection in an arbitrary view. In: IEEE International Conference on Computer Vision, pp. 1–6 (2007)
Google Scholar
Dorko, G., Schmid, C.: Selection of scale-invariant parts for object class recognition. In: IEEE International Conference on Computer Vision, vol. 1, pp. 634–639 (2003)
Google Scholar
Ferrai, V., Tuytelaars, T., Gool, L.V.: Simultaneous object recognition and segmentation by image exploration. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 40–54. Springer, Heidelberg (2004)
Chapter Google Scholar
Lowe, D.: Local feature view clustering for 3D object recognition. In: IEEE Conference Computer Vision and Pattern Recognition, vol. 1, pp. 682–688 (2001)
Google Scholar
Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In: IEEE Conference Computer Vision and Pattern Recognition, vol. 2, pp. 272–277 (2003)
Google Scholar
Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3D feature maps. In: IEEE Conference Computer Vision and Pattern Recognition (2008)
Google Scholar
Mikolajczyk, K., Leibe, B., Schiele, B.: Multiple object class detection with a generative model. In: IEEE Conference Computer Vision and Pattern Recognition (2006)
Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999)
Google Scholar
Sivic, J., Zisserman, A.: Video Google: A text retrival approach to object matching in videos. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1470–1477 (2003)
Google Scholar
Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L.: The PASCAL Visual Object Class challenge 2006 (VOC 2006) results (2006)
Google Scholar
Dorkó, G., Schmid, C.: Object class recognition using discriminative local features. IEEE Transaction on Pattern Analysis and Machine Intelligence (2004)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Xiao, J., Chen, J., Yeung, D.Y., Quan, L.: Learning two-view stereo matching. In: European Conference on Computer Vision (2008)
Google Scholar
Quan, L.: Invariant of six points and projective reconstruction from three uncalibrated images. IEEE Tranactions on Pattern Analysis and Machine Intelligence 17(1), 34–46 (1995)
Article Google Scholar
Lhuillier, M., Quan, L.: A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(3), 418–433 (2005)
Article Google Scholar
Lee, W., Woo, W., Boyer, E.: Identifying foreground from multiple images. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007, Part II. LNCS, vol. 4844, pp. 580–589. Springer, Heidelberg (2007)
Chapter Google Scholar
Xiao, J., Wang, J., Tan, P., Quan, L.: Joint affinity propagation for multiple view segmentation. In: IEEE International Conference on Computer Vision, pp. 1–7 (2007)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. International Journal of Computer Vision 59, 167–181 (2004)
Article Google Scholar
Quan, L., Lan, Z.: Linear n-point camera pose determination. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(8), 774–780 (1999)
Article Google Scholar
Li, Y., Sun, J., Tang, C.K., Shum, H.Y.: Lazy snapping. ACM Transaction on Graphics 23(3), 303–308 (2004)
Article Google Scholar
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in computer vision. IEEE Transaction on Pattern Analysis and Machine Intelligence (2004)
Google Scholar
Galoppo, N., Govindaraju, N.K., Henson, M., Bondhugula, V., Larsen, S., Manocha, D.: Efficient numerical algorithms on graphics hardware. In: Workshop on Edge Computing Using New Commodity Architectures (2006)
Google Scholar
Everingham, M., et al.: The 2005 PASCAL Visual Object Class challenge. In: Selected the 1st PASCAL Challenges Workshop (2005)
Google Scholar
Chum, O., Zisserman, A.: An exemplar model for learning object classes. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference Computer Vision and Pattern Recognition (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Jianxiong Xiao, Jingni Chen, Dit-Yan Yeung & Long Quan

Authors

Jianxiong Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Jingni Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dit-Yan Yeung
View author publications
You can also search for this author in PubMed Google Scholar
Long Quan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Illinois at Urbana Champaign, 3310 Siebel Hall, IL 61801, Urbana, USA
David Forsyth
Department of Computing, Oxford Brookes University, OX33 1HX, Wheatley, Oxford, UK
Philip Torr
Department of Engineering Science, University of Oxford, Parks Road, OX1 3PJ, Oxford, UK
Andrew Zisserman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiao, J., Chen, J., Yeung, DY., Quan, L. (2008). Structuring Visual Words in 3D for Arbitrary-View Object Localization. In: Forsyth, D., Torr, P., Zisserman, A. (eds) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol 5304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88690-7_54

Download citation

DOI: https://doi.org/10.1007/978-3-540-88690-7_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88689-1
Online ISBN: 978-3-540-88690-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Structuring Visual Words in 3D for Arbitrary-View Object Localization

Abstract

Chapter PDF

Similar content being viewed by others

SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection From Multi-view Camera Images With Global Cross-Sensor Attention

3D Object retrieval based on viewpoint segmentation

UP-Net: unique keyPoint description and detection net

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Structuring Visual Words in 3D for Arbitrary-View Object Localization

Abstract

Chapter PDF

Similar content being viewed by others

SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection From Multi-view Camera Images With Global Cross-Sensor Attention

3D Object retrieval based on viewpoint segmentation

UP-Net: unique keyPoint description and detection net

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation