Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394171.3413870acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space

Published: 12 October 2020 Publication History

Abstract

The crux of homography estimation is that the homography is characterized by the geometric correspondences between two related images rather than appearance features, which differs from typical image recognition tasks. Existing methods either decompose the task of homography estimation into several individual sub-problems and optimize them sequentially, or attempt to tackle it in an end-to-end manner by delegating the whole task to deep convolutional networks (CNNs). However, it is quite arduous for CNNs to learn the mapping function from appearance features of related images to the homography directly. In this paper, we propose to parse the geometric correspondences between related images explicitly to bridge the gap between deep appearance features and the homography. Furthermore, we propose a coarse-to-fine estimation framework to capture different scale of homography transformations and thus predict the homography in a stepwise-refining manner. Additionally, we propose a pyramidal supervision scheme to leverage an important prior concerning the homography estimation. Extensive experiments on two large-scale datasets demonstrate that our model advances the state-of-the-art performance significantly.

Supplementary Material

MP4 File (3394171.3413870.mp4)
The crux of homography estimation is that the homography is characterized by the geometric correspondences between two related images rather than appearance features, which differs from typical image recognition tasks. Existing methods either decompose the task of homography estimation into several individual sub-problems and optimize them sequentially, or attempt to tackle it in an end-to-end manner by delegating the whole task to CNNs. However, it is quite arduous for CNNs to learn the mapping function from appearance features to the homography directly. In this paper, we parse the geometric correspondences between related images explicitly to bridge the gap between deep appearance features and the homography. Furthermore, we propose a coarse-to-fine estimation framework to capture different scale of homography transformations and thus predict the homography in a stepwise-refining manner.

References

[1]
Simon Baker, Ankur Datta, and Takeo Kanade. 2006. Parameterizing Homographies . Technical Report Carnegie Mellon University-RI-TR-06--11. Carnegie Mellon University.
[2]
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. Surf: Speeded up Robust Features. In European Conference on Computer Vision (ECCV) . 404--417.
[3]
Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. 2017. DSAC-Differentiable RANSAC for Camera Localization. In Computer Vision and Pattern Recoginition (CVPR). 6684--6692.
[4]
Chehan Chang, Chunnan Chou, and Edward Y. Chang. 2017. CLKN: Cascaded Lucas-Kanade Networks for Image Alignment. In Computer Vision and Pattern Recognition (CVPR). 3777--3785.
[5]
Christopher B. Choy, Junyoung Gwak, Silvio Savarese, and Manmohan Chandraker. 2016. Universal Correspondence Network. In Neural Information Processing Systems (NeurlPS) . 2414--2422.
[6]
Ondrej Chum and Jiri Matas. 2008. Optimal Randomized RANSAC. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 30, 8 (2008), 1472--1482.
[7]
Daniel Detone, Tomasz Malisiewicz, and Andrew Rabinovich. 2016. Deep Image Homography Estimation. arXiv (2016).
[8]
Daniel Detone, Tomasz Malisiewicz, and Andrew Rabinovich. 2018. Superpoint: Self-Supervised Interest Point Detection and Description. In Computer Vision and Pattern Recognition Workshops (CVPRW). 224--236.
[9]
Patrick Ebel, Eduard Trulls, Kwang Moo Yi, Pascal Fua, and Anastasiia Mishchuk. 2019. Beyond Cartesian Representations for Local Descriptors. In International Conference on Computer Vision (ICCV). 253--262.
[10]
Martin A. Fischler and Robert C. Bolles. 1981. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM, Vol. 24, 6 (1981), 381--395.
[11]
Nathalie Japkowicz, Farzan Erlik Nowruzi, and Robert Laganiere. 2017. Homography Estimation from Image Pairs with Hierarchical Convolutional Networks. In International Conference on Computer Vision Workshops (ICCVW). 913--920.
[12]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR) .
[13]
Georg Klein and D. W. Murray. 2007. Parallel Tracking and Mapping for Small AR Workspaces. In International Symposium on Mixed and Augmented Reality. 1--10.
[14]
Chengcai Leng, Hai Zhang, Bo Li, Guorong Cai, Zhao Pei, and Li He. 2019. Local Feature Descriptor for Image Matching: A Survey. IEEE Access, Vol. 7, 2 (2019), 6424--6434.
[15]
Stefan Leutenegger, Margarita Chli, and Roland Siegwart. 2011. BRISK: Binary Robust Invariant Scalable Keypoints. In International Conference on Computer Vision (ICCV). 2548--2555.
[16]
Pengpeng Liang, Yifan Wu, Hu Lu, Liming Wang, Chunyuan Liao, and Haibin Ling. 2018. Planar Object Tracking in The Wild: A Benchmark. In International Conference on Robotics and Automation. 651--658.
[17]
Tsungyi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV) . 740--755.
[18]
David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision (IJCV), Vol. 60, 2 (2004), 91--110.
[19]
Zixin Luo, Tianwei Shen, Lei Zhou, Siyu Zhu, Runze Zhang, Yao Yao, Tian Fang, and Long Quan. 2018. GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints. In European Conference on Computer Vision (ECCV). 168--183.
[20]
Krystian Mikolajczyk and Cordelia Schmid. 2005. A Performance Evaluation of Local Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 27, 10 (2005), 1615--1630.
[21]
Anastasiya Mishchuk, Dmytro Mishkin, Filip Radenovic, and Jiri Matas. 2017. Working Hard to Know Your Neighbor's Margins: Local Descriptor Learning Loss. In Neural Information Processing Systems (NeurlPS). 4826--4837.
[22]
Raul Murartal, J. M. M. Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics, Vol. 31, 5 (2015), 1147--1163.
[23]
Raul Murartal and Juan D. Tardos. 2017. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cametas. IEEE Transactions on Robotics, Vol. 33, 5 (2017), 1255--1262.
[24]
Ty Nguyen, Steven W. Chen, Shreyas S. Shivakumar, Camillo J. Taylor, and Vijay Kumar. 2018. Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model. In International Conference on Robotics and Automation. 2346--2353.
[25]
Rahul Raguram, Janmichael Frahm, and Marc Pollefeys. 2008. A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus. In European Conference on Computer Vision (ECCV) . 500--513.
[26]
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An Efficient Alternative to SIFT or SURF. In International Conference on Computer Vision (ICCV). 1--9.
[27]
Cordelia Schmid, Roger Mohr, and Christian Bauckhage. 2000. Evaluation of Interest Point Detectors. International Journal of Computer Vision (IJCV), Vol. 37, 2 (2000), 151--172.
[28]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Computer Vision and Pattern Recognition (CVPR) .
[29]
Yurun Tian, Bin Fan, and Fuchao Wu. 2017. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space. In Computer Vision and Pattern Recognition (CVPR). 661--669.
[30]
Philip H. S. Torr and Andrew Zisserman. 2000. MLESAC: A New Robust Estimator with Application to Estimating Image Geometry. Computer Vision and Image Understanding (CVIU), Vol. 78, 1 (2000), 138--156.
[31]
Xiang Wang, Chen Wang, Xiao Bai, Yun Liu, and Jun Zhou. 2018. Deep Homography Estimation with Pairwise Invertibility Constraint. In International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition. 204--214.
[32]
Jianxiong Xiao, Krista A. Ehinger, James Hays, Antonio Torralba, and Aude Oliva. 2016. SUN Database: Exploring A Large Collection of Scene Categories. International Journal of Computer Vision (IJCV), Vol. 119, 1 (2016), 3--22.
[33]
Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. 2016. Lift: Learned Invariant Feature Transform. In European Conference on Computer Vision (ECCV) . 467--483.
[34]
Julio H. Zaragoza, Tatjun Chin, Michael S. Brown, and David Suter. 2013. As-Projective-As-Possible Image Stitching with Moving DLT. In Computer Vision and Pattern Recognition (CVPR). 2339--2346.
[35]
Jirong Zhang, Chuan Wang, Shuaicheng Liu, Lanpeng Jia, Jue Wang, and Ji Zhou. 2019. Content-Aware Unsupervised Deep Homography Estimation. arXiv (2019).
[36]
Feng Zhou, Henry Beenlirn Duh, and Mark Billinghurst. 2008. Trends in Augmented Reality Tracking, Interaction and Display: A Review of Ten Years of ISMAR. In International Symposium on Mixed and Augmented Reality. 193--202.
[37]
Qiang Zhou and Xin Li. 2019. STN-Homography: Estimate Homography Parameters Directly. arXiv (2019).

Cited By

View all
  • (2024)Discrete latent perspective learning for segmentation and detectionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692941(21719-21730)Online publication date: 21-Jul-2024
  • (2024)Image Registration Algorithm for Stamping Process Monitoring Based on Improved Unsupervised Homography EstimationApplied Sciences10.3390/app1417772114:17(7721)Online publication date: 2-Sep-2024
  • (2023)Unsupervised Multi-Scale-Stage Content-Aware Homography EstimationElectronics10.3390/electronics1209197612:9(1976)Online publication date: 24-Apr-2023
  • Show More Cited By

Index Terms

  1. SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '20: Proceedings of the 28th ACM International Conference on Multimedia
    October 2020
    4889 pages
    ISBN:9781450379885
    DOI:10.1145/3394171
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. convolutional networks
    2. homography estimation

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Shenzhen Research Council

    Conference

    MM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Discrete latent perspective learning for segmentation and detectionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692941(21719-21730)Online publication date: 21-Jul-2024
    • (2024)Image Registration Algorithm for Stamping Process Monitoring Based on Improved Unsupervised Homography EstimationApplied Sciences10.3390/app1417772114:17(7721)Online publication date: 2-Sep-2024
    • (2023)Unsupervised Multi-Scale-Stage Content-Aware Homography EstimationElectronics10.3390/electronics1209197612:9(1976)Online publication date: 24-Apr-2023
    • (2023)Single View Homography Estimation for an Inclined Textured Planar Surface: Overcoming the Inverse and Ill-Posed Challenge!Proceedings of the Fourteenth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3627631.3627633(1-9)Online publication date: 15-Dec-2023
    • (2022)Unsupervised Deep Plane-Aware Multi-homography Learning for Image AlignmentArtificial Intelligence10.1007/978-3-030-93046-2_45(528-539)Online publication date: 1-Jan-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media