Abstract
The same scene can be depicted by multiple visual media. For example, the same event can be captured by a comic image or a movie frame; the same object can be represented by a photograph or by a 3D computer graphics model. In order to extract the visual analogies that are at the heart of cross-media analysis, spatial matching is required. This matching is commonly achieved by extracting key points and scoring multiple, randomly generated mapping hypotheses. The more consensus a hypothesis can draw, the higher its score. In this paper, we go beyond the conventional set-size measure for the quality of a match and present a more general hypothesis score that attempts to reflect how likely is each hypothesized transformation to be the correct one for the matching task at hand. This is achieved by considering additional, contextual cues for the relevance of a hypothesized transformation. This context changes from one matching task to another and reflects different properties of the match, beyond the size of a consensus set. We demonstrate that by learning how to correctly score each hypothesis based on these features we are able to deal much more robustly with the challenges required to allow cross-media analysis, leading to correct matches where conventional methods fail.
Similar content being viewed by others
Notes
Please see the project webpage for available resources, including our MATLAB functions for rendering and computing the transformations. URL: http://www.openu.ac.il/home/hassner/projects/ransaclearn.
Source: http://www.minecraft.net.
References
Cui, X., Kim, H., Park, E., Choi, H.: Robust and accurate pattern matching in fuzzy space for fiducial mark alignment. MVA 24(3), 447–459 (2012)
Yoon, S., Scherer, M., Schreck, T., Kuijper, A.: Sketch-based 3D model retrieval using diffusion tensor fields of suggestive contours. In: ACM-MM, pp. 193–200. ACM, New York (2010)
Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography. Com. ACM 24, 381–395 (1981)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN: 0521540518
Choi, S., Kim, T., Yu, W.: Performance evaluation of RANSAC family. In: BMVC, pp. 1–12 (2009)
Capel, D.: An effective bail-out test for RANSAC consensus scoring. In: BMVC, pp. 629–638 (2005)
Chum, O., Matas, J.: Matching with PROSAC-progressive sample consensus. In: CVPR, vol. 1, pp. 220–226 (2005)
Matas, J., Chum, O.: Randomized RANSAC with sequential probability ratio test. In: ICCV,vol. 2, pp. 1727–1732. IEEE, New York (2005)
Chin, T., Yu, J., Suter, D.: Accelerated hypothesis generation for multi-structure data via preference analysis. IEEE Trans. Pattern Anal. Mach. Intell. 34, 625–638 (2012)
Sattler, T., Leibe, B., Kobbelt, L.: SCRAMSAC: improving RANSAC’s efficiency with a spatial consistency filter. In: ICCV, pp. 2090–2097. IEEE, New York (2009)
Botterill, T., Mills, S., Green, R.: Fast RANSAC hypothesis generation for essential matrix estimation. In: 2011 International Conference on Digital Image Computing Techniques and Applications (DICTA), pp. 561–566. IEEE, New York (2011)
Raguram, R., Frahm, J., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In: ECCV, pp. 500–513. (2008)
Scaramuzza, D.: Performance evaluation of 1-point-RANSAC visual odometry. JFR 28, 792–811 (2011)
Frahm, J., Pollefeys, M.: RANSAC for (quasi-) degenerate data (QDEGSAC). In: CVPR, vol. 1, pp. 453–460. IEEE, New York (2006)
Torr, P., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. CVIU 78, 138–156 (2000)
Tran, Q.H., Chin, T.J., Carneiro, G., Brown, M., Suter, D.: In defence of RANSAC for outlier rejection in deformable registration. In: ECCV, pp. 274–287 (2012)
Yan, Q., Xu, Y., Yang, X.: A robust homography estimation method based on keypoint consensus and appearance similarity. In: ICME, pp. 586–591. IEEE, New York (2012)
Nishida, K., Kurita, T.: RANSAC-SVM for large-scale datasets. In: ICPR, pp. 1–4. IEEE, New York (2008)
Bozkurt, E., Erzin, E., Erdem, Ç., Erdem, A.: RANSAC-based training data selection for speaker state recognition. In: InterSpeech. (2011)
Nishida, K., Fujiki, J., Kurita, T.: Multiple random subset-kernel learning. In: CAIP, pp. 343–350. Springer, Berlin (2011)
Ukrainitz, Y., Irani, M.: Aligning sequences and actions by maximizing space-time correlations. In: ECCV, pp. 538–550 (2006)
Aanæs, H., Dahl, A., Steenstrup Pedersen, K.: Interesting interest points. IJCV 97(1), 18–35 (2011)
Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S., Reznik, Y., Grzeszczuk, R., Girod, B.: Compressed histogram of gradients: a low-bitrate descriptor. IJCV 96(3), 384–399 (2012)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. TPAMI 27, 1615–1630 (2005)
Arie-Nachimson, M., Basri, R.: Constructing implicit 3D shape models for pose estimation. In: ICCV, pp. 1341–1348 (2009)
Glasner, D., Galun, M., Alpert, S., Basri, R., Shakhnarovich, G.: Viewpoint-aware object detection and pose estimation. In: ICCV, pp. 1275–1282. IEEE, New York (2011)
Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: ICCV, pp. 213–220. IEEE, New York (2009)
Prisacariu, V., Reid, I.: PWP3D: Real-time segmentation and tracking of 3D objects. In: BMVC. (2009)
Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: Non-rigid 2D–3D pose estimation and 2D image segmentation. In: CVPR, pp. 786–793 (2009)
Wu, C., Clipp, B., Li, X., Frahm, J., Pollefeys, M.: 3D model matching with viewpoint-invariant patches (VIP). In: CVPR, pp. 1–8 (2008)
Gall, J., Rosenhahn, B., Seidel, H.: Robust pose estimation with 3D textured models. In: Advances in Image and Video Technology, Lecture Notes in Computer Science, vol. 4319, pp. 84–95 (2006)
Hassner, T., Basri, R.: Example based 3D reconstruction from single 2D images. In: Beyond Patches Workshop at CVPR. (2006)
Hassner, T., Basri, R.: Single view depth estimation from examples. CoRR abs/1304.3915 (2013)
Hassner, T.: Viewing real-world faces in 3D. In: ICCV (2013)
Stark, M., Goesele, M., Schiele, B.: Back to the future: learning shape models from 3D CAD data. In: BMVC, pp. 106.1–106.11 (2010)
Liebelt, J., Schmid, C.: Multi-view object class detection with a 3D geometric model. In: CVPR, pp. 1688–1695 (2010)
Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3D feature maps. In: CVPR, pp. 1–8 (2008)
Fisher, S.: Statistical methods for research workers, vol. 5. Genesis Publishing Pvt Ltd, Traverse City (1932)
Whitlock, M.: Combining probability from independent tests: the weighted \(z\)-method is superior to Fisher’s approach. J. Evol. Biol. 18, 1368–1373 (2005)
Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. TPAMI 19, 711–720 (1997)
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Mikolajcyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60, 63–86 (2004). http://www.robots.ox.ac.uk/~vgg/research/affine/
Hassner, T., Mayzels, V., Zelnik-Manor, L.: On sifts and their scales. In: CVPR, pp. 1522–1528. IEEE, New York (2012)
Van Kaick, O., Tagliasacchi, A., Sidi, O., Zhang, H., Cohen-Or, D., Wolf, L., Hamarneh, G.: Prior knowledge for part correspondence. Comput. Graph. Forum 30, 553–562 (2011)
Gu, H.Z., Lee, S.Y.: Car model recognition by utilizing symmetric property to overcome severe pose variation. MVA 24(2), 255–274 (2012)
Hu, W.: Learning 3D object templates by hierarchical quantization of geometry and appearance spaces. In: CVPR, pp. 2336–2343. IEEE, New York (2012)
Xiang, Y., Savarese, S.: Estimating the aspect layout of object categories. In: CVPR, pp. 3410–3417. IEEE, New York (2012)
Vedaldi, A., Fulkerson, B.: VLFeat: An Open and Portable Library of Computer Vision Algorithms. http://www.vlfeat.org/ (2008). Accessed 1 Nov 2012
Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: ICCV, pp. 1–8 (2007)
Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007 (2007). Accessed 1 Nov 2012
Lin, W.Y., Liu, L., Matsushita, Y., Low, K.L., Liu, S.: Aligning images in the wild. In: CVPR, pp. 1–8. IEEE, New York (2012)
Acknowledgments
TH was partially funded by General Motors (GM).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hassner, T., Assif, L. & Wolf, L. When standard RANSAC is not enough: cross-media visual matching with hypothesis relevancy. Machine Vision and Applications 25, 971–983 (2014). https://doi.org/10.1007/s00138-013-0571-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-013-0571-4