Cloud Based Automatic Building and Road Extraction From Large Scale Open Geospatial Datasets
Cloud Based Automatic Building and Road Extraction From Large Scale Open Geospatial Datasets
1 Introduction
We propose a half day tutorial at WACV 2021 focused on infrastructure identi-
fication from open geospatial datasets. This proposal is a collaboration between
the AWS Machine Learning Solutions Lab and CosmiQ Works teams, with Cos-
miQ focusing on the datasets, algorithms, and applications, while the AWS
Machine Learning Solutions Lab team will focus on cloud implementation and
scaling of algorithms. Details about the proposal team and course implementa-
tion are as follows.
2 Proposal Team
• Yunzhi Shi, Data Scientist, AWS ML Solutions Lab, shiyunzh@amazon.com.
Yunzhi helps AWS customers address business problems with AI and cloud
capabilities. Recently, he has been building CV, search, and forecast solu-
tions for various customers. Prior to Amazon, Yunzhi obtained his Ph.D.
in Geophysics from The University of Texa at Austin.
• Tianyu Zhang, Data Scientist, AWS ML Solutions Lab, ttizha@amazon.com.
Tianyu helps AWS customers solve business problems by applying ML
and AI techniques. Most recently, he has built NLP model and predictive
model for procurement and sports.
• Daniel Hogan, Data Scientist, In-Q-Tel CosmiQ Works dhogan@iqt.org.
Daniel is a data scientist with a geospatial focus. His research has looked
at dataset development and synthetic aperture radar. Daniel received a
Ph.D. in Physics from the University of California, Berkeley.
• Jake Shermeyer, Research Scientist, In-Q-Tel CosmiQ Works jshermeyer@iqt.org.
Jake is a researcher and geographer specializing in geospatial machine
1
learning and computer vision. His research with satellite imagery focuses
on time series analysis, super-resolution, the value of synthetic data, and
object detection. Jake served as the lead for SpaceNet 6, a sensor fu-
sion challenge and dataset focused on both synthetic aperture radar and
electro-optical remote sensing data and their application to foundational
mapping problems.
• Adam Van Etten, Chief Data Scientist, In-Q-Tel, avanetten@iqt.org.
Adam focuses on applied machine learning topics of interest to the US
Government. His most recent research lies in the geospatial analytics
realm, where he applies machine learning and computer vision techniques
to satellite imaging data. Other recent foci for Adam are helping run the
SpaceNet initiative, and exploring the limitations and utility functions of
machine learning techniques.
• Xin Chen, Senior Manager, AWS ML Solutions Lab, xcaa@amazon.com.
Xin leads his team to help AWS customers identify and build machine
learning solutions to address their organization’s high-est return-on-investment
machine learning opportunities. Prior to Amazon, Xin was a Director
of Engineering at HERE Technologies whose team completed pioneering
work to achieve the automation of next generation map creation using
computer vision and machine learning technologies. Xin is an adjunct
faculty at Northwestern U. and Illinois Institute of Technology.
3 Course Description
The course will consist of five sections (plus a break), organized as follows.
1. SpaceNet Dataset, Algorithms, Applications (80 minutes) In the
first section, we will introduce the SpaceNet [ELB18] dataset, along with
open source algorithms developed from this dataset and discuss applica-
tions. The SpaceNet dataset is a large corpus of imagery and labels that
is hosted as an Amazon Web Services (AWS) Public Dataset. It contains
70,000 square km of high-resolution imagery, 11,000,000 building foot-
prints, and 20,000 km of road labels to ensure that there is adequate open
source data available for geospatial machine learning research. Seven pub-
lic data science challenges have been run with this data, tackling various
problems from building footprint extraction to road travel time prediction
to urban change detection. The winning algorithms of these challenges are
open source, and address a whole host of humanitarian use cases (disaster
response, evacuation planning, urban planning, etc) that we will discuss
in detail.
2. Synthetic Data and Rare Objects (35 minutes) The second section
will focus on the Rareplanes dataset and study. RarePlanes is a unique
open-source machine learning dataset that incorporates both real and syn-
thetically generated satellite imagery, and is the largest openly-available
2
high resolution dataset built to test the value of synthetic data from an
overhead perspective. The real portion of the dataset consists of > 250
satellite images spanning > 100 locations with 15,000 hand-annotated
aircraft. The accompanying synthetic dataset features 50,000 synthetic
satellite images with 600,000 aircraft annotations. Both the real and syn-
thetically generated aircraft feature fine grain attributes such as length,
wingspan, engine type, etc. We conduct extensive experiments to evalu-
ate the real and synthetic datasets and compare performances, and show
the value of synthetic data for the task of detecting and classifying air-
craft from an overhead perspective. The lessons learned from this study
translate readily to other objects and modalities.
3. 5 minute break
4. Cloud Services (25 minutes) In this section we will talk about Ama-
zon SageMaker, a fully managed ML service that provides every developer
and data scientist with the ability to build, train, and deploy ML mod-
els quickly. Amazon SageMaker Ground Truth is a data labeling service
that makes it easy to build highly accurate training dataset in the data
preparing step. Amazon SageMaker Notebook Instance is the ML compute
instance running the Jupyter Notebook APP, offering a ML development
environment that allows users to prepare and process data, write code to
train, deploy and validate models. The SageMaker also provides several
images of built-in ML algorithms that makes the training process much
smoother and simpler. In the training job, Amazon SageMaker Hyper-
parameter Tuning helps to tune the hyperparameters and find the best
version of a model automatically. After training, Amazon SageMaker can
deploy the trained model into production with a single click so that it can
start generating predictions for real-time or batch data and monitor the
performance of model. (Tianyu)
5. Cloud Notebooks (75 minutes) In this section, we will walk through
deep learning models that extract building footprints and road networks
using Jupyter notebooks developed by AWS Machine Learning Solutions
Lab team. The notebooks reproduce winning algorithms from the SpaceNet
challenges. In addition to the SpaceNet satellite images [ELB18], we in-
troduce USGS 3D Elevation Program (3DEP) light detection and rangin
(LiDAR) data to the workflow. We demonstrate using satellite images, Li-
DAR data, or combination of both to train and test deep learning models
for map feature extractions. This tutorial shares the notebooks and pro-
vides instructions on running ML services on large scale geospatial data on
Amazon SageMaker. At the end of this section, audiences can reproduce
the notebook content, apply the models to other area of interests, and
innovate with new ideas to improve. The audiences can also appreciate
the benefits of cloud computing and storage first-hand. (Yunzhi)
6. Summary and Conclusions (10 minutes)
3
4 Related Works
The SpaceNet dataset and challenge was featured in CVPR EarthVision 2017,
2019, and 2020. The authors of this proposal also helped organized the Deep-
Globe Workshop at CVPR 2018, which used SpaceNet data. Our proposed
tutorial directly follows upon these previous workshops, with the added layer of
focusing on the applications of computer vision by deploying models in cloud
environments.
References
[ELB18] Adam Van Etten, Dave Lindenbaum, and Todd M. Bacastow. “SpaceNet:
A Remote Sensing Dataset and Challenge Series”. In: CoRR abs/1807.01232
(2018). arXiv: 1807.01232. url: http://arxiv.org/abs/1807.
01232.
5 Appendix
Attachment 1: Technical abstract by AWS Machine Learning Solutions Lab,
”Cloud Based Automatic Building and Road Extraction from Large Scale Open
Geospatial Datasets”
4
Cloud Based Automatic Building and Road Extraction from
Open Datasets of Satellite and LiDAR
Yunzhi Shi Xin Chen Tianyu Zhang
Amazon Web Services Amazon Web Services Amazon Web Services
Austin, TX USA Chicago, IL USA Austin, TX USA
shiyunzh@amazon.com xcaa@amazon.com ttizha@amazon.com
Entwine Point Tiles format, which is a lossless, full density, Finally, we merge either one of the LiDAR attributes and merge
streamable octree based on LASzip encoding. This format is them with the RGB images. The images are saved in 16-bit since
suitable for online visualization [10]; Fig. 1 shows a visualization LiDAR attribute values can be larger than 255, the 8-bit upper
example in Las Vegas. The second resource is the in LAZ limit. We make these processed and merged data available via
(compressed LAS) format with requester-pays access. AWS S3 bucket for this tutorial. Fig. 4 shows three sample images.
3 BUILDING EXTRACTION
The 1st and 2nd SpaceNet challenge [4,5] aimed to extract
building footprints from the satellite images in various AOIs. The
4th SpaceNet challenge [6] posed similar task with more
challenging off-nadir (i.e. oblique look angle) imagery. In this
Figure 2: RGB value aggregated histogram of all images section, we reproduce a winning algorithm and evaluate its
after the white balancing and 8-bit conversion.
performance with both RGB images and LiDAR data.
While satellite images are 2D images, the USGS LiDAR data is 3D
3.1 Training Data
point cloud format and thus requires conversion and projection.
We use Matlab and LAStools [11] to map each 3D LiDAR point to In the Las Vegas AOI, SpaceNet data is tiled to size 200m×200m.
pixel-wise location corresponding to SpaceNet tiles, and generate We locate 3084 tiles where both SpaceNet imagery and LiDAR
two sets of attribute images: elevation and reflectivity intensity. data are available, and merge them together. Unfortunately, the
The elevation ranges from ~2000–3000 feet, and the intensity labels of test data for scoring in the SpaceNet challenges are not
ranges from 0–5000 units. Fig. 3 shows the aggregated histogram published, so we split the merged data by 70%/30% for training
of all images for elevation and reflectivity intensity values. and evaluation. We select elevation in this case because it is more
representative to extract buildings than reflectivity intensity.
3.2 Model
We reproduce a winning algorithm from SpaceNet challenge 4 [6]
by XD_XD. The model has a U-net [12] architecture with skip-
connections between encoder and decoder, and a modified VGG16
[13] as backbone encoder. The model takes three different types
of input: (1) 3-channel RGB image, same as the original contest,
(2) 1-channel LiDAR elevation image, and (3) 4-channel RGB +
Figure 3: Aggregated histogram of all images for LiDAR LiDAR merged image. We will train three models and compare
elevation and reflectivity intensity values. their performances in evaluation section.
2
Cloud Based Automatic Building and Road Extraction
The label for training is binary mask converted from polygon 4 ROAD EXTRACTION
geojson by Solaris [14], a machine learning pipeline library
The 3rd SpaceNet challenge [7] aimed to extract road networks
developed by CosmiQ Works. We select a combined loss of binary
from the satellite images, and the 5th SpaceNet challenge [8] add
cross-entropy and Jaccard loss with a weight factor 𝛼 = 0.8:
to the task to predict road speed along with the network
ℒ = 𝛼ℒBCE + (1 − 𝛼ℒJaccard ) extraction in order to minimize travel time and plan optimal
routing. Similar to the previous section, we will reproduce a top
The model is implemented with Solaris and deployed on an
algorithm, train different models with either RGB images, LiDAR
Amazon SageMaker p3.8xlarge instance (4× V100 GPUs). We train
attributes, or both of them, and evaluate their performances.
the models with batch size 20, Adam optimizer, and 10&' learning
rate for 100 epochs. Fig. 5 shows some examples of input image
4.1 Training Data
(RGB + LiDAR), predicted building mask output by training with
both RGB and LiDAR data, and ground truth building mask. The road network extraction uses larger tiles with size
400m×400m. We generate 918 merged tiles, and split by 70%/30%
for training and evaluation. In this case, we select reflectivity
intensity for road extraction because road surfaces often consist
of materials that have distinctive reflectivity among background,
e.g. paved surface, dirt road, asphalt.
4.2 Model
We reproduce the CRESI algorithm [15] for road networks
extraction. It also has a U-net architecture but uses ResNet [16] as
backbone encoder. Again, we train the model with three different
types of input: (1) 3-channel RGB image, (2) 1-channel LiDAR
intensity image, and (3) 4-channel RGB + LiDAR merged image.
To extract road location and speed together, binary road mask
will not provide enough information for training. As mentioned
in CRESI paper [15], we can convert speed metadata to either
continuous mask (0–1 values) or multi-class binary mask. Because
their test results show that multi-class binary mask perform
better, we will use the latter conversion scheme. Fig. 6 and 7 show
Figure 5: Examples of building extraction model inputs and
outputs. Columns from left to right: RGB image, LiDAR visualizations of the multi-class road mask.
elevation image, model prediction trained by both RGB and We train the model with the same setup as in building
LiDAR data, and ground truth building footprint mask. extraction. Fig. 8 shows some examples of input image (RGB +
LiDAR), predicted road mask output by training with both RGB
3.3 Evaluation and LiDAR data, and ground truth road mask.
3
Y. Shi et al.
those models. Using dataset in the Las Vegas AOI, we show LiDAR
data can be used to perform the same task with similar accuracy,
and outperform RGB models when combined with RGB imagery.
We prepare Jupyter notebooks and will share them in the
tutorial to provide step-by-step guide. At the end of the tutorial,
audiences can reproduce the building and road extraction tasks,
apply the models to any other area of interests, and innovate with
new ideas to improve the performances. The audiences can also
appreciate the benefits of cloud computing and storage first-hand.
This tutorial teaches cloud computing in a large geospatial data
analysis context, highlighting multimodal models that process
both satellite image and LiDAR data. Our future work is to
Figure 7: Break down of the 8-class road masks. The first 7
generate and share tooling on AWS to streamline the process of
binary masks represent road corresponds to 7 bins of speed
within 0–65 mph. The 8th mask (bottom right) represent geospatial data.
the aggregation of all previous masks.
ACKNOWLEDGMENTS
LiDAR data courtesy of U.S. Geological Survey.
REFERENCES
[1] "Registry of Open Data on AWS," [Online]. Available:
https://registry.opendata.aws/.
[2] A. Van Etten, D. Lindenbaum and T. M. Bacastow, "Spacenet: A remote
sensing dataset and challenge series," arXiv preprint arXiv:1807.01232, 2018.
[3] "USGS 3D Elevation Program," [Online]. Available:
https://www.usgs.gov/core-science-systems/ngp/3dep.
[4] "SpaceNet Round 1 Challenge Implementations," 2017. [Online]. Available:
https://github.com/SpaceNetChallenge/BuildingDetectors/.
[5] "SpaceNet Round 2 Challenge Implementations," 2017. [Online]. Available:
https://github.com/SpaceNetChallenge/BuildingDetectors_Round2.
[6] "SpaceNet Round 4 Challenge Implementations," 2018. [Online]. Available:
https://github.com/SpaceNetChallenge/SpaceNet_Off_Nadir_Solutions.
[7] "SpaceNet Round 3 Challenge Implementations," 2018. [Online]. Available:
https://github.com/SpaceNetChallenge/RoadDetector.
[8] "SpaceNet Round 5 Challenge Implementations," 2019. [Online]. Available:
https://github.com/SpaceNetChallenge/SpaceNet_Optimized_Routing_Solut
Figure 8: Examples of road extraction model inputs and ions.
outputs. Columns from left to right: RGB image, LiDAR [9] "SpaceNet Round 7 Challenge," 2020. [Online]. Available:
reflectivity intensity image, model prediction trained by https://www.cosmiqworks.org/current-projects/spacenet-7/.
both RGB and LiDAR data, and ground truth road mask. [10] "USGS & Entwine," [Online]. Available: https://usgs.entwine.io/.
[11] M. Isenburg, "LAStools-efficient LiDAR processing software," 2014.
We convert multi-class road mask predictions to skeleton and [12] O. Ronneberger, P. Fischer and T. Brox, "U-net: Convolutional networks for
biomedical image segmentation," in International Conference on Medical
speed-weighted graph and compute APLS scores. Table 2 shows image computing and computer-assisted intervention, 2015.
the APLS scores of the three models. Similar to building extraction [13] K. Simonyan and A. Zisserman, "Very deep convolutional networks for
results, LiDAR-only result achieve close scores to RGB-only large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
result, while RGB + LiDAR gives the best performance. [14] CosmiQ Works, "Solaris: An open source ML pipeline for overhead
imagery," 2019. [Online]. Available: https://github.com/CosmiQ/solaris.
Table 2: APLS scores of road extraction models [15] A. Van Etten, "City-scale road extraction from satellite imagery v2: Road
speeds and travel times," in 2020 IEEE Winter Conference on Applications of
Training data type APLSlength APLStime Computer Vision (WACV), 2020.
RGB images 0.59624 0.54298 [16] K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image
recognition," in Proceedings of the IEEE conference on computer vision and
LiDAR intensity 0.57811 0.52697 pattern recognition, 2016.
RGB + LiDAR merged 0.63651 0.58518 [17] CosmiQ Works, "CosmiQ/apls: Python code to evaluate the APLS metric,"
2017. [Online]. Available: https://github.com/CosmiQ/apls.
5 SUMMARY
We present reproductions of SpaceNet winning algorithms,
implement machine learning models on Amazon SageMaker
instances to automatically extract building and road from
geospatial data. In addition to RGB satellite imagery, we process
USGS 3DEP LiDAR data and incorporate the LiDAR attributes in
4