Abstract
We propose a novel and efficient method for generic arbitrary-view object class detection and localization. In contrast to existing single-view and multi-view methods using complicated mechanisms for relating the structural information in different parts of the objects or different viewpoints, we aim at representing the structural information in their true 3D locations. Uncalibrated multi-view images from a hand-held camera are used to reconstruct the 3D visual word models in the training stage. In the testing stage, beyond bounding boxes, our method can automatically determine the locations and outlines of multiple objects in the test image with occlusion handling, and can accurately estimate both the intrinsic and extrinsic camera parameters in an optimized way. With exemplar models, our method can also handle shape deformation for intra-class variance. To handle large data sets from models, we propose several speedup techniques to make the prediction efficient. Experimental results obtained based on some standard data sets demonstrate the effectiveness of the proposed approach.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Thomas, A., Ferrari, V., Leibe, B., Turtelaars, T., Schiele, B., Gool, L.V.: Towards multi-view object class detection. In: IEEE Conference Computer Vision and Pattern Recognition, vol. 2, pp. 1589–1596 (2006)
Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: IEEE International Conference on Computer Vision, pp. 1–8 (2007)
Yan, P., Khan, S., Shah, M.: 3D model based object class detection in an arbitrary view. In: IEEE International Conference on Computer Vision, pp. 1–6 (2007)
Dorko, G., Schmid, C.: Selection of scale-invariant parts for object class recognition. In: IEEE International Conference on Computer Vision, vol. 1, pp. 634–639 (2003)
Ferrai, V., Tuytelaars, T., Gool, L.V.: Simultaneous object recognition and segmentation by image exploration. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 40–54. Springer, Heidelberg (2004)
Lowe, D.: Local feature view clustering for 3D object recognition. In: IEEE Conference Computer Vision and Pattern Recognition, vol. 1, pp. 682–688 (2001)
Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In: IEEE Conference Computer Vision and Pattern Recognition, vol. 2, pp. 272–277 (2003)
Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3D feature maps. In: IEEE Conference Computer Vision and Pattern Recognition (2008)
Mikolajczyk, K., Leibe, B., Schiele, B.: Multiple object class detection with a generative model. In: IEEE Conference Computer Vision and Pattern Recognition (2006)
Lowe, D.: Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999)
Sivic, J., Zisserman, A.: Video Google: A text retrival approach to object matching in videos. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1470–1477 (2003)
Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L.: The PASCAL Visual Object Class challenge 2006 (VOC 2006) results (2006)
Dorkó, G., Schmid, C.: Object class recognition using discriminative local features. IEEE Transaction on Pattern Analysis and Machine Intelligence (2004)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
Xiao, J., Chen, J., Yeung, D.Y., Quan, L.: Learning two-view stereo matching. In: European Conference on Computer Vision (2008)
Quan, L.: Invariant of six points and projective reconstruction from three uncalibrated images. IEEE Tranactions on Pattern Analysis and Machine Intelligence 17(1), 34–46 (1995)
Lhuillier, M., Quan, L.: A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(3), 418–433 (2005)
Lee, W., Woo, W., Boyer, E.: Identifying foreground from multiple images. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007, Part II. LNCS, vol. 4844, pp. 580–589. Springer, Heidelberg (2007)
Xiao, J., Wang, J., Tan, P., Quan, L.: Joint affinity propagation for multiple view segmentation. In: IEEE International Conference on Computer Vision, pp. 1–7 (2007)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. International Journal of Computer Vision 59, 167–181 (2004)
Quan, L., Lan, Z.: Linear n-point camera pose determination. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(8), 774–780 (1999)
Li, Y., Sun, J., Tang, C.K., Shum, H.Y.: Lazy snapping. ACM Transaction on Graphics 23(3), 303–308 (2004)
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in computer vision. IEEE Transaction on Pattern Analysis and Machine Intelligence (2004)
Galoppo, N., Govindaraju, N.K., Henson, M., Bondhugula, V., Larsen, S., Manocha, D.: Efficient numerical algorithms on graphics hardware. In: Workshop on Edge Computing Using New Commodity Architectures (2006)
Everingham, M., et al.: The 2005 PASCAL Visual Object Class challenge. In: Selected the 1st PASCAL Challenges Workshop (2005)
Chum, O., Zisserman, A.: An exemplar model for learning object classes. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference Computer Vision and Pattern Recognition (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xiao, J., Chen, J., Yeung, DY., Quan, L. (2008). Structuring Visual Words in 3D for Arbitrary-View Object Localization. In: Forsyth, D., Torr, P., Zisserman, A. (eds) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol 5304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88690-7_54
Download citation
DOI: https://doi.org/10.1007/978-3-540-88690-7_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88689-1
Online ISBN: 978-3-540-88690-7
eBook Packages: Computer ScienceComputer Science (R0)