Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Estimation of the Precipitable Water and Water Vapor Fluxes in the Coastal and Inland Cities of China Using MAX-DOAS
Next Article in Special Issue
Multiscale Semantic Feature Optimization and Fusion Network for Building Extraction Using High-Resolution Aerial Images and LiDAR Data
Previous Article in Journal
Hyperspectral Image Classification across Different Datasets: A Generalization to Unseen Categories
Previous Article in Special Issue
Dynamic Pseudo-Label Generation for Weakly Supervised Object Detection in Remote Sensing Images
 
 
Technical Note
Peer-Review Record

Self-Supervised Monocular Depth Learning in Low-Texture Areas

Remote Sens. 2021, 13(9), 1673; https://doi.org/10.3390/rs13091673
by Wanpeng Xu 1,2, Ling Zou 3,*, Lingda Wu 1 and Zhipeng Fu 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Remote Sens. 2021, 13(9), 1673; https://doi.org/10.3390/rs13091673
Submission received: 16 March 2021 / Revised: 18 April 2021 / Accepted: 21 April 2021 / Published: 26 April 2021
(This article belongs to the Special Issue Big Remotely Sensed Data)

Round 1

Reviewer 1 Report

This article modified resnet to improve the depth estimation in the lack of texture areas. Among them, the customized resblok and loss are used, and finally, U NET is used for depth estimation. It seems that this model engineering can bring benefits.


Figure 3 shows that the graphics and the text description cannot be linked.

Figure 6 shows a good profile highlight, but it would be better if you can also provide ground truth for comparison.

"5." Results 

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

Generally, the paper is too wordy and has many writing mistakes which makes reading the manuscript a challenging task. It should be completely revised and checked by a native English.

However the authors mentioned the inspiring references, the originality aspects of the manuscript are not highlighted.

It seems that Experiments and Results should be combined and a Conclusion section is necessary to wrap up the manuscript.

The current version of the manuscript cannot be accepted.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 3 Report

In this paper, the authors focus their work on low-texture areas that are often overlooked based on stereo pair input. They have used feature maps instead of images to perform photometric loss supervision. The authors have introduced dataset with point-by-point depth labels. It is appropriate for their evaluation.

Also, in order to show the generalization performance of the proposed depth estimation model, two other datasets are selected for test.

With their results, the authors have proposed a solution about the problematic pixels in low-texture regions in images, that are ignored, since most of the published results show that no pixels violate the assumption of camera motion taking stereo pairs as input in self-supervised, which leads to the optimization problem of this region.

I have some reviewer notes

If possible, add a “Conclusion” part. You can include there the practical importance of your findings. Also what will be your future work.

Add more comparisons and discussion with results from other authors. For example, in part “4.1. KITTI Eigen Split” you have cited datasets and disadvantage of “metric SqRel”. There is no discussion about the advantage of your findings over existing ones.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 4 Report

It is very interesting topic and important to solve this kind of problem. However, I have several question and suggestion for your paper.

 

  1. I have some difficulty to understand your proposed method about monocular learning. Do you use stereo pairs as input your study which is mentioned just after formula (3)? I have some confusion.
  2. In Chapter 3 method, you mainly focus to explain about architecture of your proposed system in 3.1 and then you describe training data in 3.2. For me, it is very difficult to understand how you apply your proposed method. Can you explain more about how to apply your training data in your proposed architecture in 3.1 clearly?
  3. In chapter 3.2.1, what are “equation 2 below”? Are those formula (3) and (4)?
  4. In chapter 3.2.1, what is L”1” pixel? Where can I find definition?
  5. In table 1 in chapter 4, for experiment, I believe that it is very important to know performance comparison. Do you have any benchmark (processing time) comparison?
  6. In result, there are not enough explanation about figure 6, 7 and 8. Can you refer them clearly?

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 4 Report

For my previous point 2, I would like to suggest to add some text in your paper.

Comments for author File: Comments.docx

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop