When standard RANSAC is not enough: cross-media visual matching with hypothesis relevancy

Hassner, Tal; Assif, Liav; Wolf, Lior

doi:10.1007/s00138-013-0571-4

When standard RANSAC is not enough: cross-media visual matching with hypothesis relevancy

Original Paper
Published: 13 November 2013

Volume 25, pages 971–983, (2014)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Tal Hassner¹,
Liav Assif¹ &
Lior Wolf²

515 Accesses
16 Citations
Explore all metrics

Abstract

The same scene can be depicted by multiple visual media. For example, the same event can be captured by a comic image or a movie frame; the same object can be represented by a photograph or by a 3D computer graphics model. In order to extract the visual analogies that are at the heart of cross-media analysis, spatial matching is required. This matching is commonly achieved by extracting key points and scoring multiple, randomly generated mapping hypotheses. The more consensus a hypothesis can draw, the higher its score. In this paper, we go beyond the conventional set-size measure for the quality of a match and present a more general hypothesis score that attempts to reflect how likely is each hypothesized transformation to be the correct one for the matching task at hand. This is achieved by considering additional, contextual cues for the relevance of a hypothesized transformation. This context changes from one matching task to another and reflects different properties of the match, beyond the size of a consensus set. We demonstrate that by learning how to correctly score each hypothesis based on these features we are able to deal much more robustly with the challenges required to allow cross-media analysis, leading to correct matches where conventional methods fail.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BLINK: Multimodal Large Language Models Can See but Not Perceive

Conclusions and Perspectives

A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Notes

Please see the project webpage for available resources, including our MATLAB functions for rendering and computing the transformations. URL: http://www.openu.ac.il/home/hassner/projects/ransaclearn.
Source: http://sketchup.google.com/3dwarehouse.
Source: http://www.minecraft.net.

References

Cui, X., Kim, H., Park, E., Choi, H.: Robust and accurate pattern matching in fuzzy space for fiducial mark alignment. MVA 24(3), 447–459 (2012)
Google Scholar
Yoon, S., Scherer, M., Schreck, T., Kuijper, A.: Sketch-based 3D model retrieval using diffusion tensor fields of suggestive contours. In: ACM-MM, pp. 193–200. ACM, New York (2010)
Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography. Com. ACM 24, 381–395 (1981)
Google Scholar
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN: 0521540518
Choi, S., Kim, T., Yu, W.: Performance evaluation of RANSAC family. In: BMVC, pp. 1–12 (2009)
Capel, D.: An effective bail-out test for RANSAC consensus scoring. In: BMVC, pp. 629–638 (2005)
Chum, O., Matas, J.: Matching with PROSAC-progressive sample consensus. In: CVPR, vol. 1, pp. 220–226 (2005)
Matas, J., Chum, O.: Randomized RANSAC with sequential probability ratio test. In: ICCV,vol. 2, pp. 1727–1732. IEEE, New York (2005)
Chin, T., Yu, J., Suter, D.: Accelerated hypothesis generation for multi-structure data via preference analysis. IEEE Trans. Pattern Anal. Mach. Intell. 34, 625–638 (2012)
Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: SCRAMSAC: improving RANSAC’s efficiency with a spatial consistency filter. In: ICCV, pp. 2090–2097. IEEE, New York (2009)
Botterill, T., Mills, S., Green, R.: Fast RANSAC hypothesis generation for essential matrix estimation. In: 2011 International Conference on Digital Image Computing Techniques and Applications (DICTA), pp. 561–566. IEEE, New York (2011)
Raguram, R., Frahm, J., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In: ECCV, pp. 500–513. (2008)
Scaramuzza, D.: Performance evaluation of 1-point-RANSAC visual odometry. JFR 28, 792–811 (2011)
Google Scholar
Frahm, J., Pollefeys, M.: RANSAC for (quasi-) degenerate data (QDEGSAC). In: CVPR, vol. 1, pp. 453–460. IEEE, New York (2006)
Torr, P., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. CVIU 78, 138–156 (2000)
Google Scholar
Tran, Q.H., Chin, T.J., Carneiro, G., Brown, M., Suter, D.: In defence of RANSAC for outlier rejection in deformable registration. In: ECCV, pp. 274–287 (2012)
Yan, Q., Xu, Y., Yang, X.: A robust homography estimation method based on keypoint consensus and appearance similarity. In: ICME, pp. 586–591. IEEE, New York (2012)
Nishida, K., Kurita, T.: RANSAC-SVM for large-scale datasets. In: ICPR, pp. 1–4. IEEE, New York (2008)
Bozkurt, E., Erzin, E., Erdem, Ç., Erdem, A.: RANSAC-based training data selection for speaker state recognition. In: InterSpeech. (2011)
Nishida, K., Fujiki, J., Kurita, T.: Multiple random subset-kernel learning. In: CAIP, pp. 343–350. Springer, Berlin (2011)
Ukrainitz, Y., Irani, M.: Aligning sequences and actions by maximizing space-time correlations. In: ECCV, pp. 538–550 (2006)
Aanæs, H., Dahl, A., Steenstrup Pedersen, K.: Interesting interest points. IJCV 97(1), 18–35 (2011)
Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S., Reznik, Y., Grzeszczuk, R., Girod, B.: Compressed histogram of gradients: a low-bitrate descriptor. IJCV 96(3), 384–399 (2012)
Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. TPAMI 27, 1615–1630 (2005)
Article Google Scholar
Arie-Nachimson, M., Basri, R.: Constructing implicit 3D shape models for pose estimation. In: ICCV, pp. 1341–1348 (2009)
Glasner, D., Galun, M., Alpert, S., Basri, R., Shakhnarovich, G.: Viewpoint-aware object detection and pose estimation. In: ICCV, pp. 1275–1282. IEEE, New York (2011)
Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: ICCV, pp. 213–220. IEEE, New York (2009)
Prisacariu, V., Reid, I.: PWP3D: Real-time segmentation and tracking of 3D objects. In: BMVC. (2009)
Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: Non-rigid 2D–3D pose estimation and 2D image segmentation. In: CVPR, pp. 786–793 (2009)
Wu, C., Clipp, B., Li, X., Frahm, J., Pollefeys, M.: 3D model matching with viewpoint-invariant patches (VIP). In: CVPR, pp. 1–8 (2008)
Gall, J., Rosenhahn, B., Seidel, H.: Robust pose estimation with 3D textured models. In: Advances in Image and Video Technology, Lecture Notes in Computer Science, vol. 4319, pp. 84–95 (2006)
Hassner, T., Basri, R.: Example based 3D reconstruction from single 2D images. In: Beyond Patches Workshop at CVPR. (2006)
Hassner, T., Basri, R.: Single view depth estimation from examples. CoRR abs/1304.3915 (2013)
Hassner, T.: Viewing real-world faces in 3D. In: ICCV (2013)
Stark, M., Goesele, M., Schiele, B.: Back to the future: learning shape models from 3D CAD data. In: BMVC, pp. 106.1–106.11 (2010)
Liebelt, J., Schmid, C.: Multi-view object class detection with a 3D geometric model. In: CVPR, pp. 1688–1695 (2010)
Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3D feature maps. In: CVPR, pp. 1–8 (2008)
Fisher, S.: Statistical methods for research workers, vol. 5. Genesis Publishing Pvt Ltd, Traverse City (1932)
Whitlock, M.: Combining probability from independent tests: the weighted $z$-method is superior to Fisher’s approach. J. Evol. Biol. 18, 1368–1373 (2005)
Article Google Scholar
Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. TPAMI 19, 711–720 (1997)
Article Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Article Google Scholar
Mikolajcyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60, 63–86 (2004). http://www.robots.ox.ac.uk/~vgg/research/affine/
Google Scholar
Hassner, T., Mayzels, V., Zelnik-Manor, L.: On sifts and their scales. In: CVPR, pp. 1522–1528. IEEE, New York (2012)
Van Kaick, O., Tagliasacchi, A., Sidi, O., Zhang, H., Cohen-Or, D., Wolf, L., Hamarneh, G.: Prior knowledge for part correspondence. Comput. Graph. Forum 30, 553–562 (2011)
Article Google Scholar
Gu, H.Z., Lee, S.Y.: Car model recognition by utilizing symmetric property to overcome severe pose variation. MVA 24(2), 255–274 (2012)
Google Scholar
Hu, W.: Learning 3D object templates by hierarchical quantization of geometry and appearance spaces. In: CVPR, pp. 2336–2343. IEEE, New York (2012)
Xiang, Y., Savarese, S.: Estimating the aspect layout of object categories. In: CVPR, pp. 3410–3417. IEEE, New York (2012)
Vedaldi, A., Fulkerson, B.: VLFeat: An Open and Portable Library of Computer Vision Algorithms. http://www.vlfeat.org/ (2008). Accessed 1 Nov 2012
Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: ICCV, pp. 1–8 (2007)
Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007 (2007). Accessed 1 Nov 2012
Lin, W.Y., Liu, L., Matsushita, Y., Low, K.L., Liu, S.: Aligning images in the wild. In: CVPR, pp. 1–8. IEEE, New York (2012)

Download references

Acknowledgments

TH was partially funded by General Motors (GM).

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, The Open University of Israel, 1 University Road, P.O.B. 808, 43107 , Raanana, Israel
Tal Hassner & Liav Assif
Blavatnik School of Computer Science at Tel Aviv University, Tel Aviv, Israel
Lior Wolf

Authors

Tal Hassner
View author publications
You can also search for this author in PubMed Google Scholar
Liav Assif
View author publications
You can also search for this author in PubMed Google Scholar
Lior Wolf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tal Hassner.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassner, T., Assif, L. & Wolf, L. When standard RANSAC is not enough: cross-media visual matching with hypothesis relevancy. Machine Vision and Applications 25, 971–983 (2014). https://doi.org/10.1007/s00138-013-0571-4

Download citation

Received: 28 February 2013
Revised: 06 October 2013
Accepted: 15 October 2013
Published: 13 November 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s00138-013-0571-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

When standard RANSAC is not enough: cross-media visual matching with hypothesis relevancy

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

BLINK: Multimodal Large Language Models Can See but Not Perceive

Conclusions and Perspectives

A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

When standard RANSAC is not enough: cross-media visual matching with hypothesis relevancy

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

BLINK: Multimodal Large Language Models Can See but Not Perceive

Conclusions and Perspectives

A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation