Abstract
Finding semantic correspondences is a challenging problem. With the breakthrough of CNNs stronger features are available for tasks like classification but not specifically for the requirements of semantic matching. In the following we present a weakly supervised learning approach which generates stronger features by encoding far more context than previous methods. First, we generate more suitable training data using a geometrically informed correspondence mining method which is less prone to spurious matches and requires only image category labels as supervision. Second, we introduce a new convolutional layer which is a learned mixture of differently strided convolutions and allows the network to encode much more context while preserving matching accuracy at the same time. The strong geometric encoding on the feature side enables us to learn a semantic flow network, which generates more natural deformations than parametric transformation based models and is able to predict foreground regions at the same time. Our semantic flow network outperforms current state-of-the-art on several semantic matching benchmarks and the learned features show astonishing performance regarding simple nearest neighbor matching.
N. Ufer and K. T. Lui—Both authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition ofdeformations. TPAMI 11(6), 567–585 (1989)
Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: detecting and representing objects using holistic models and body parts. In: CVPR (2014)
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVRP (2005)
Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: NeurIPS (2016)
Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)
Dalal, N., Triggs, W.: Histograms of oriented gradients for human detection. In: CVPR (2004)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: ICCV (2015)
Eigenstetter, A., Takami, M., Ommer, B.: Randomized max-margin compositions for visual recognition. In: CVPR (2014)
Faktor, A., Irani, M.: Co-segmentation by composition. In: ICCV (2013)
Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. TPAMI 28(4), 594–611 (2006)
Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow. In: CVPR (2016)
Han, K., et al.: Scnet: learning semantic correspondence. In: ICCV (2017)
Hannah, M.J.: Computer matching of areas in stereo images (1974)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NeurIPS (2015)
Jeon, S., Kim, S., Min, D., Sohn, K.: Parn: pyramidal affine regression networks for dense semantic correspondence. In: ECCV (2018)
Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: CVPR (2010)
Kanazawa, A., Jacobs, D.W., Chandraker, M.: Warpnet: weakly supervised matching for single-view reconstruction. In: CVPR (2016)
Kim, J., Liu, C., Sha, F., Grauman, K.: Deformable spatial pyramid matching for fast dense correspondences. In: CVRP (2013)
Kim, S., Lin, S., Jeon, S.R., Min, D., Sohn, K.: Recurrent transformer networks for semantic correspondence. In: NeurIPS (2018)
Kim, S., Min, D., Ham, B., Jeon, S., Lin, S., Sohn, K.: Fcss: fully convolutional self-similarity for dense semantic correspondence. In: CVPR (2017)
Kim, S., Min, D., Ham, B., Lin, S., Sohn, K.: Fcss: fully convolutional self-similarity for dense semantic correspondence. In: TPAMI (2018)
Kim, S., Min, D., Lin, S., Sohn, K.: Dctm: discrete-continuous transformation matching for semantic flow. In: ICCV (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Kolmogorov, V.: Convergent tree-reweighted message passing for energyminimization. TPAMI 28(10), 1568–1583 (2006)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NeurIPS (2011)
Krizhevsky, A., Sutskever, I., Geoffrey E., H.: Imagenet classification with deep convolutional neural networks. In: NeurIPS (2012)
Li, W., Hosseini Jafari, O., Rother, C.: Deep object co-segmentation. In: ACCV (2018)
Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. TPAMI 33(5), 978–994 (2011)
Long, J.L., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: NeurIPS (2014)
Lorenz, D., Bereska, L., Milbich, T., Ommer, B.: Unsupervised part-based disentangling of object shape and appearance. In: CVPR (2019)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: NeurIPS (2017)
Monroy, A., Ommer, B.: Beyond bounding-boxes: learning object shape by model-driven grouping. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 580–593. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_42
Novotny, D., Larlus, D., Vedaldi, A.: Anchornet: a weakly supervised network to learn geometry-sensitive features for semantic matching. In: CVPR (2017)
Rocco, I., Arandjelovi, R., Inria, J.S.: Convolutional neural network architecture for geometric matching. In: CVPR (2017)
Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: CVPR (2018)
Rubio, J.C., Serrat, J., López, A., Paragios, N.: Unsupervised co-segmentation through region matching. In: CVPR (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szeliski, R., et al.: Image alignment and stitching: a tutorial. Found. Trends® Comput. Graph. Vis. 2(1), 1–104 (2007)
Taniai, T., Sinha, S.N., Sato, Y.: Joint recovery of dense correspondence and cosegmentation in two images. In: CVPR (2016)
Torresani, L., Kolmogorov, V., Rother, C.: A dual decomposition approach to feature correspondence. TPAMI 35(2), 259–271 (2013)
Ufer, N., Ommer, B.: Deep semantic feature matching. In: CVPR (2017)
Wang, S., Luo, L., Zhang, N., Li, J.: Autoscaler: scale-attention networks for visual correspondence. arXiv preprint arXiv:1611.05837 (2016)
Yarlagadda, P., Ommer, B.: From meaningful contours to discriminative object shape. In: ECCV (2012)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Zhou, T., Lee, Y.J., Yu, S., Efros, A.: Flowweb: joint image set alignment by weaving consistent pixel-wise correspondences. In: CVPR (2015)
Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondences via 3D-guided cycle consistency. In: CVPR (2016)
Acknowledgment
This work has been supported in part by the DFG grand OM81/1-1 and a hardware donation from NVIDIA Corporation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ufer, N., Lui, K.T., Schwarz, K., Warkentin, P., Ommer, B. (2019). Weakly Supervised Learning of Dense Semantic Correspondences and Segmentation. In: Fink, G., Frintrop, S., Jiang, X. (eds) Pattern Recognition. DAGM GCPR 2019. Lecture Notes in Computer Science(), vol 11824. Springer, Cham. https://doi.org/10.1007/978-3-030-33676-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-33676-9_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33675-2
Online ISBN: 978-3-030-33676-9
eBook Packages: Computer ScienceComputer Science (R0)