Abstract
Correspondences emerge from large-scale vision models trained for generative and discriminative tasks. This has been revealed and benchmarked by computing correspondence maps between pairs of images, using nearest neighbors on the feature grids. Existing work has attempted to improve the quality of these correspondence maps by carefully mixing features from different sources, such as by combining the features of different layers or networks. We point out that a better correspondence strategy is available, which directly imposes structure on the correspondence field: the functional map. Wielding this simple mathematical tool, we lift the correspondence problem from the pixel space to the function space and directly optimize for mappings that are globally coherent. We demonstrate that our technique yields correspondences that are not only smoother but also more accurate, with the possibility of better reflecting the knowledge embedded in the large-scale vision models that we are studying. Our approach sets a new state-of-the-art on various dense correspondence tasks. We also demonstrate our effectiveness in keypoint correspondence and affordance map transfer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amir, S., Gandelsman, Y., Bagon, S., Dekel, T.: Deep ViT features as dense visual descriptors. arXiv preprint arXiv:2112.05814, 2(3), 4 (2021)
Attaiki, S., Pai, G., Ovsjanikov, M.: DPFM: deep partial functional maps (2021)
Aubry, M., Schlickewei, U., Cremers, D.: The wave kernel signature: a quantum mechanical approach to shape analysis. In: ICCV Workshops (2011)
Burghard, O., Dieckmann, A., Klein, R.: Embedding shapes with green’s functions for global shape matching. Comput. Graph. 68, 1–10 (2017)
Cao, D., Bernard, F.: Unsupervised deep multi-shape matching. In: ECCV (2022)
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: ICCV (2021)
Cho, S., Hong, S., Jeon, S., Lee, Y., Sohn, K., Kim, S.: Cats: cost aggregation transformers for visual correspondence. In: Advances in Neural Information Processing Systems, vol. 34, pp. 9011–9023 (2021)
Donati, N., Corman, E., Ovsjanikov, M.: Deep orientation-aware functional maps: tackling symmetry issues in shape matching. In: CVPR (2022)
Dusmanu, M., et al.: D2-net: a trainable CNN for joint description and detection of local features. In: CVPR (2019)
Gupta, K., et al.: ASIC: aligning sparse in-the-wild image collections. arXiv preprint arXiv:2303.16201 (2023)
Halimi, O., Litany, O., Rodola, E., Bronstein, A.M., Kimmel, R.: Unsupervised learning of dense shape correspondence. In: CVPR (2019)
Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: ICCV (2011)
Hedlin, E., et al.: Unsupervised semantic correspondence using stable diffusion. arXiv preprint arXiv:2305.15581 (2023)
Huang, Q., Wang, F., Guibas, L.: Functional map networks for analyzing and exploring large shape collections. ACM TOG 33(4), 1–11 (2014)
Jeon, S., Kim, S., Min, D., Sohn, K.: Parn: pyramidal affine regression networks for dense semantic correspondence. In: ECCV (2018)
Kim, S., Lin, S., Jeon, S.R., Min, D., Sohn, K.: Recurrent transformer networks for semantic correspondence (2018)
Kovnatsky, A., Bronstein, M.M., Bronstein, A.M., Glashoff, K., Kimmel, R.: Coupled quasi-harmonic bases. In: Computer Graphics Forum (2013)
Learned-Miller, E.G.: Data driven image models through continuous joint alignment. IEEE TPAMI 28(2), 236–250 (2005)
Li, L., Donati, N., Ovsjanikov, M.: Learning multi-resolution functional maps with spectral attention for robust shape matching (2022)
Lin, Y.L., Morariu, V.I., Hsu, W., Davis, L.S.: Jointly optimizing 3D model fitting and fine-grained classification. In: ECCV (2014)
Litany, O., Remez, T., Rodola, E., Bronstein, A., Bronstein, M.: Deep functional maps: structured prediction for dense shape correspondence. In: ICCV (2017)
Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE TPAMI 33(5), 978–994 (2010)
Liu, Y., Zhu, L., Yamada, M., Yang, Y.: Semantic correspondence as an optimal transport problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4463–4472 (2020)
Min, J., Lee, J., Ponce, J., Cho, M.: Spair-71k: a large-scale benchmark for semantic correspondence. arXiv preprint arXiv:1908.10543 (2019)
Myers, A., Teo, C.L., Fermüller, C., Aloimonos, Y.: Affordance detection of tool parts from geometric features (2015)
Nogneng, D., Ovsjanikov, M.: Informative descriptor preservation via commutativity for shape matching. In: Computer Graphics Forum (2017)
Ofri-Amar, D., Geyer, M., Kasten, Y., Dekel, T.: Neural congealing: aligning images to a joint semantic atlas. In: CVPR (2023)
Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images (2018)
Oquab, M., et al.: Dinov2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)
Ovsjanikov, M., Ben-Chen, M., Solomon, J., Butscher, A., Guibas, L.: Functional maps: a flexible representation of maps between shapes. ACM TOG 31(4), 1–11 (2012)
Peebles, W., Zhu, J.Y., Zhang, R., Torralba, A., Efros, A.A., Shechtman, E.: Gan-supervised dense visual alignment. In: CVPR (2022)
Revaud, J., De Souza, C., Humenberger, M., Weinzaepfel, P.: R2D2: reliable and repeatable detector and descriptor (2019)
Rocco, I., Arandjelovic, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: CVPR (2017)
Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: CVPR (2018)
Rodolà, E., Cosmo, L., Bronstein, M.M., Torsello, A., Cremers, D.: Partial functional correspondence. In: Computer Graphics Forum (2017)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)
Roufosse, J.M., Sharma, A., Ovsjanikov, M.: Unsupervised deep learning for structured shape matching. In: ICCV (2019)
Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR (2013)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: CVPR (2020)
Seo, P.H., Lee, J., Jung, D., Han, B., Cho, M.: Attentive semantic alignment with offset-aware correlation kernels. In: ECCV (2018)
Sharp, N., Attaiki, S., Crane, K., Ovsjanikov, M.: Diffusionnet: discretization agnostic learning on surfaces. ACM TOG 41(3), 1–16 (2022)
Sun, J., Ovsjanikov, M., Guibas, L.: A concise and provably informative multi-scale signature based on heat diffusion. In: Computer Graphics Forum (2009)
Tang, L., Jia, M., Wang, Q., Phoo, C.P., Hariharan, B.: Emergent correspondence from image diffusion. arXiv preprint arXiv:2306.03881 (2023)
Taniai, T., Sinha, S.N., Sato, Y.: Joint recovery of dense correspondence and cosegmentation in two images. In: CVPR (2016)
Truong, P., Danelljan, M., Gool, L.V., Timofte, R.: Gocor: bringing globally optimized correspondence volumes into your neural network (2020)
Truong, P., Danelljan, M., Timofte, R.: GLU-net: global-local universal network for dense flow and correspondences. In: CVPR (2020)
Truong, P., Danelljan, M., Van Gool, L., Timofte, R.: Learning accurate dense correspondences and when to trust them. In: CVPR (2021)
Truong, P., Danelljan, M., Yu, F., Van Gool, L.: Warp consistency for unsupervised learning of dense correspondences. In: ICCV (2021)
Truong, P., Danelljan, M., Yu, F., Van Gool, L.: Probabilistic warp consistency for weakly-supervised semantic correspondences. In: CVPR (2022)
Tyszkiewicz, M., Fua, P., Trulls, E.: Disk: learning local features with policy gradient (2020)
Wang, F., Huang, Q., Guibas, L.J.: Image co-segmentation via consistent functional maps. In: ICCV (2013)
Wang, F., Huang, Q., Ovsjanikov, M., Guibas, L.J.: Unsupervised multi-class joint image segmentation. In: CVPR (2014)
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE TPAMI 35(12), 2878–2890 (2012)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: Lift: Learned invariant feature transform. In: ECCV (2016)
Zhang, J., et al.: A tale of two features: Stable diffusion complements DINO for zero-shot semantic correspondence. arXiv preprint arXiv:2305.15347 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cheng, X., Deng, C., Harley, A.W., Zhu, Y., Guibas, L. (2025). Zero-Shot Image Feature Consensus with Deep Functional Maps. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15105. Springer, Cham. https://doi.org/10.1007/978-3-031-72970-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-72970-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72969-0
Online ISBN: 978-3-031-72970-6
eBook Packages: Computer ScienceComputer Science (R0)