Python Geospatial Analysis Cookbook - Sample Chapter
Python Geospatial Analysis Cookbook - Sample Chapter
ee
This book begins by tackling the installation of the necessary software dependencies and libraries needed
to perform spatial analysis with Python. From there, the next logical step is to prepare our data for analysis;
we will do this by building our toolbox to deal with data preparation, transformations, and projections. When
our data is ready for analysis, we will tackle problems such as indoor routing and diverse overlay analysis
methods. To validate our results, we will explore how topology checks can ensure top-quality results.
P U B L I S H I N G
Michael Diener
$ 49.99 US
31.99 UK
P U B L I S H I N G
Geospatial development links your data to places on the Earth's surface. Its analysis is used in almost every
industry to answer location questions.
Sa
pl
e
Python Geospatial
Analysis Cookbook
60 recipes to work with topology, overlays, indoor routing, and
web application analysis with Python
Michael Diener
Preface
Geospatial analysis is not special; it is just different when compared to other types of analysis
such as financial market analysis. We work with geometry objects, such as lines, points, and
polygons, and connect these geometries to attributes such as business data. We ask "where"
question, such as "Where is the nearest pub?", "Where are all my customers located?", and
"Where is my competition located?". The other location questions include, "Will this new
building cast a shadow over the park?", "What is the shortest way to school?", "What is the
safest way to school for my kids?", "Will this building block my view of the mountains?", and
"Where is the optimal place to build my next store?". Identify the areas that fire trucks can
reach from their station in 5 min, 10 min, or 20 min, and so on.
One thing all these questions have in common is the fact that you need to know where certain
objects are located in order to answer them. Without the spatial component, you cannot
answer such questions and this is what geospatial analysis is all about.
Geospatial features are laid over each other and patterns or trends are easily identified. This
ability to see a pattern or trend is geospatial analysis in its simplest form.
Throughout this book, simple and complex code recipes are provided as small working models
that can easily be integrated or expanded into a larger project or model.
Analysis is the fun part of GIS, and involves visualizing relationships, identifying trends, and
seeing patterns that are not visible in a spreadsheet.
The Python programming language is clean, clear, and concise, making it great for beginners.
It also has advanced powers for professionals to help them quickly code solutions to complex
problems. Python makes visualization quick and easy for experts or beginners who work with
geospatial data. It's that simple.
Preface
Preface
Appendix A, Other Geospatial Python Libraries, explains how Python flourishes with geospatial
libraries, and you will also find a listing of many popular libraries that are used for data
analysis, regardless of whether they're spatial or not. This may trigger your interest.
Appendix B, Mapping Icon Libraries, quickly goes over the icon libraries out there that play a
special role in the python geospatial working environment.
Setting Up Your
Geospatial Python
Environment
In this chapter, we will cover the following topics:
f
Introduction
This chapter will get the grunt work done for you so that you can freely and actively complete
all the recipes in this book. We will start off by installing, each of the libraries you will be using,
one by one. Once each step is completed, we will test each library installation to make sure
it works. Since this book is directed toward those of you already working with spatial data,
you can skip this chapter if you have it installed already. If not, you will find the installation
instructions here useful as a reference.
The choice of Python libraries is based on industry-proven reliability and functionality. The
plethora of functions in Python libraries has led to a flourishing GIS support on many top
desktop GIS systems, such as QGIS and ESRI ArcGIS.
1
Getting ready
Before anything, we are going to assume that you already have a Linux/Ubuntu machine or a
virtualbox instance running Linux/Ubuntu so you can follow these instructions.
I also suggest trying out Vagrant (http://www.vagrantup.com),
which uses virtualbox to box and standardize your development
environment.
Chapter 1
Ubuntu 14.04 comes with Python 2.7.6 and Python 3.4 preinstalled; the other libraries are
your responsibility as explained in the following sections.
Windows users need to download and install Python 2.7.x from the Python home page at
https://www.python.org/downloads/windows/; please download the newest version
of the 2.7.x series since this book is written with 2.7.X in mind. The installer includes a
bundled version of pip, so make sure you install it!
Take a close look at the correct version to download, making sure that you get either the
32-bit or 64-bit download. You cannot mix and match the versions, so be careful and
remember to install the correct version.
A great site for other kinds of Windows binaries can be found at http://www.lfd.uci.
edu/~gohlke/pythonlibs/. Wheel files are the new norms of installations and can be
executed from the command line as follows:
python pip install libraryName.whl
Python 3 would be awesome to use, and for many Python GIS libraries, it is ready for
show time. Unfortunately, not all GIS libraries jive with Python 3 (pyproj) as one would
love to see at the time of writing this. If you want, feel free to go for Python 3.x and give it
a go. A great webpage to check the compatibility of a library can be found at https://
caniusepython3.com/.
To install virtualenv, you need to have a running installation of Python and pip. The
pip package manager manages and installs Python packages, making our lives easier.
Throughout this book, if we need to install a package, pip will be our tool of choice for this
job. The official installation instructions for pip can be found at https://pip.pypa.io/en/
latest/installing.html. To install pip from the command line, we first need to install
easy_install. Let's try it out from the Terminal:
$ sudo apt-get install python-setuptools python-pip
The command sudo means to run an execution as a super user. If this fails, you will need
to get the ez_setup.py file, which is available at https://bootstrap.pypa.io/ez_
setup.py. After downloading the file, you can run it from the command line:
$ python ez_setup.py
Now pip should be up and running and you can execute commands to complete the
installations of virtualenv and virtualenvwrapper. The virtualenvwrapper creates shortcuts
that are faster ways to create or delete your virtual environments. You can test it as follows:
$ pip install virtualenv
How to do it...
The steps to install your Python virtualenv and virtualenvwrapper packages are
as follows:
1. Install virtualenv using the pip installer:
$ sudo pip install virtualenv
3. Assign the WORKON_HOME variable to your home directory with the folder name
venvs. Create a single folder where you want to store all your different Python virtual
environments; in my case, the folder is located at /home/mdiener/venvs:
$ export WORKON_HOME=~/venvs
$ mkdir $WORKON_HOME
Chapter 1
4. Run the source command to execute the virtualenvrapper.sh bash file:
$ source /usr/local/bin/virtualenvwrapper.sh
5. Next, we create a new virtual environment called pygeoan_cb, and this is also the
name of the new folder where the virtual environment is installed:
$ mkvirtualenv pygeoan_cb
To use virtualenvwrapper the next time you start up your machine, we need to
set it up so that your bash terminal runs the virtualenvwrapper.sh script when
your computer starts.
6. First, put it in your ~/.bashrc file:
$ echo "export WORKON_HOME=$WORKON_HOME" >> ~/.bashrc
7.
How it works...
Step one shows how pip installs the virtualenv package into your system-wide Python
installation. Step two shows how the virtualenvwrapper helper package is installed with
easy_install because the virtualenvwrapper.sh file is not created using the pip
installer. This will help us create, enter, and generally, work or switch between Python virtual
environments with ease. Step three assigns the WORKON_HOME variable to a directory where
we want to have all of our virtual environments. Then, we'll create a new directory to hold all
the virtual environments. In step four, the command source is used to execute the shell script
to set up the virtualenvwrapper package. In step five, we see how to actually create a
new virtualenv called pygeoan_cb in our /home/mdiener/venvs directory. This final
step automatically starts our virtualenv session.
Once the virtualenv session starts, we can now see the name of virtualenv in brackets
like this:
(pygeoan_cb)mdiener@mdiener-VirtualBox:~$
Inside the /venvs folder, you will find specific individual virtual environments for each project
in the form of a subfolder. The virtualenvwrapper package will always create a new folder
for each new project you create. You can, therefore, easily delete a folder and it will remove
your virtual environment.
To quickly print a list all of the installed libraries to a file, we'll use the pip command:
$ pip freeze > requirements.txt
This will create a text file called requirements.txt in the current folder. The text file
contains a list of all the installed Python packages inside the Python virtual environment
currently running.
To create a new virtualenv from a requirements file, use the following command:
$ pip install -r /path/to/requirements.txt
There's more
For those of you who are just starting out with geospatial Python development, it should be
noted that you should keep your project-specific code at another location outside your Python
virtual environment folder. For example, I always have each project-related code contained in a
separate folder called 01_projects, which is my main folder. The path to my projects folder is
/home/mdiener/01_projects, and the structure of two of my projects is as follows:
f
01_projects/Name_project1
01_projects/Name_project2
All virtual environments are located under /home/mdiener/venvs/. Usually, I give them the
same name as a project to keep things organized, as follows:
f
/home/mdiener/venvs/Name_project1
/home/mdiener/venvs/Name_project2
Chapter 1
Getting ready
Fire up your virtual environment, if it is not already running, using the following standard
start command:
$ workon pygeoan_cb
Now, we need to install some Python tools for development that allow us to install NumPy, so
run this command:
$ sudo apt-get install -y python-dev
You are now ready to move on and install pyproj and NumPy inside your running virtual
environment.
7
How to do it...
Simply fire up virtualenv and we will use the pip installer to do all the heavy lifting as follows:
1. Use pip to go ahead and install NumPy; this can take a couple of minutes as many
lines of installation verbosity are written on screen:
$ pip install numpy
Windows users can grab the .whl file for NumPy and execute it using following
command:
pip install numpy -1.9.2+mkl-cp27-none-win32.whl
3. Wait a few minutes; NumPy should be now running along with pyproj. To test if it's
worked out, enter the following command in the Python console. The output should
look like this:
(pygeoan_cb)mdiener@mdiener-VirtualBox:~/venv$ python
Python 2.7.3 (default,
copyright, credits, or
No errors, I hope. You have now successfully installed NumPy and pyproj.
All sorts of errors could show up, so please take a look at the
respective installation links to help you solve them:
For pyproj: https://pypi.python.org/pypi/pyproj/
For NumPy: http://www.numpy.org
Chapter 1
How it works...
This easy installation works using the standard pip installation method. No tricks or special
commands are needed. You need to simply execute the pip install <library_name>
command and you are off to the races.
Library names can be found by visiting the
https://pypi.python.org/pypi web page if
you are unsure of the exact name you want to install.
Getting ready
To prepare for installation, it is necessary to install some global packages, such as
libgeos_c, as these are required by Shapely. NumPy is also a requirement that we have
already met and is also used by Shapely.
Install the requirements of matplotlib from the command line like this:
$ sudo apt-get install freetype* libpng-dev libjpeg8-dev
These are the dependencies of matplotlib, which can be seen on a Ubuntu 14.04 machine.
How to do it...
Follow these instructions:
1. Run pip to install shapely:
$ pip install shapely
10
Chapter 1
Another test to see if all has gone well is to simply enter the Python console and try to import
the packages, and if no errors occur, your console should show an empty Python cursor. The
output should look like what is shown in the following screenshot:
(pygeoan_cb)mdiener@mdiener-VirtualBox:~/venv$ python
Python 2.7.3 (default,
copyright, credits, or
If any errors occur, Python usually provides some good clues as to where the problem
is located and there is always Stack Overflow. For example, have a look at http://
stackoverflow.com/questions/19742406/could-not-find-library-geos-cor-load-any-of-its-variants/23057508#2305750823057508.
How it works...
Here, the order in which you install the packages is very important. The descartes package
depends on matplotlib, and matplotlib depends on NumPy plus freetype and libpng. This
narrows you down to installing NumPy first, then matplotlib and its dependencies, and
finally, descartes.
The installation itself is simple with pip and should be quick and painless. The tricky
parts occur if libgeos_c is not installed properly, and you might need to install the
libgeos-dev library.
11
Getting ready
Enter your virtual environment using the following command:
$ workon pygeoan_cb
How to do it...
The three installations are as follows:
1. Pyshp will first be installed by simply using pip as follows:
$ pip install pyshp
12
Chapter 1
To test your installation of pyshp, use the import shapefile type. The output should look
like what is shown in the following output:
(pygeoan_cb)mdiener@mdiener-VirtualBox:~/venv$ python
Python 2.7.3 (default,
copyright, credits, or
How it works...
As seen in the other modules, we've used the standard installation pip package to execute
installations. There are no other dependencies to worry about, making for fast progress.
Getting ready
The dependency jungle we looked at earlier is back and we need three more universal
installations to our Ubuntu system using apt-get install as follows:
$ sudo apt-get install libblas-dev liblapack-dev gfortran
13
Three dependencies are used for the SciPy installation. PySAL depends on SciPy so make
sure to install SciPy first. Only IPython does not need any extra installations.
Start up your Python virtual environment with the following code:
mdiener@mdiener-VirtualBox:~$ workon pygeoan_cb
(pygeoan_cb)mdiener@mdiener-VirtualBox:~$
How to do it...
Let's look at these steps:
1. First, we'll install SciPy since PySAL depends on it. This will take a while to install; it
took my machine 5 minutes to go through so take a break:
$ pip install scipy
3. As usual, we'd like to see whether everything's working, so let's fire up the Python
shell as follows:
(pygeoan_cb)mdiener@mdiener-VirtualBox:~$python
>>> import scipy
>>> import pysal
>>>
How it works...
SciPy and PySAL libraries are both geared to help accomplish various spatial analysis duties.
The choice of tool is based on the task at hand, so make sure that you check which library
offers what function at the command prompt as follows:
>>> from scipy import spatial
>>> help(spatial)
14
Chapter 1
The output should look like what is shown in the following screenshot:
Currently, GDAL covers working with raster data, and OGR covers working with vector data.
With GDAL 2.x now here, the two sides, raster and vector, are merged under one hat. GDAL
and OGR are the so-called Swiss Army knives of geospatial data transformations, covering
over 200 different spatial data formats.
15
Getting ready
GDAL isn't known to be the friendliest beast to install on Windows, Linux, or OSX. There are
many dependencies and even more ways to install them. The descriptions are not all very
straightforward. Keep in mind that this description is just one way of doing things and will not
always work on all machines, so please refer to the online instructions for the latest and best
ways to get your system up and running.
To start with, we will install some dependencies globally on our machine. After the
dependencies have been installed, we will go into the global installation of GDAL for
Python in our global site packages.
How to do it...
To globally install GDAL into our Python site packages, we will proceed with the following steps:
1. The following command is used when installing build and XML tools:
$ sudo apt-get install -y build-essentiallibxml2-dev libxslt1-dev
3. This following command will install GDAL package in the main Python package. This
means that GDAL will be installed globally. The global installation of GDAL is usually
not a bad thing since, as far as I am aware, there are no backward incompatible
versions, which is very rare these days. The installation of GDAL directly and only in
virtualenv is painful, to say the least, and if you are interested in attempting it, I've
mentioned some links for you to try out.
$ sudo apt-get install python-gdal
4. To get GDAL in the Python virtual environment, we only need to run a simple
virtualevnwrapper command:
toggleglobalsitepackages
16
Chapter 1
5. Now, activate the global Python site packages in your current virtual environment:
(pygeoan_cb)mdiener@mdiener-VirtualBox:~$ toggleglobalsitepackages
enable global site-packages
7.
Windows 7 plus users should use the OSGeo4W windows installer (https://trac.osgeo.
org/osgeo4w/).Find the following section on the web page and download your Windows
version in 32-bit or 64-bit. Follow the graphical installer instructions and the GDAL installation
will then be complete.
Windows users can also directly get binaries if all fails at
http://www.gisinternals.com/sdk/. This installer
should help avoid any other Windows specific problems that can
arise and this site can help get you going in the right direction.
How it works...
The GDAL installation encompasses both the raster (GDAL) and vector (OGR) tools in one.
Within the GDAL install are five modules that can be separately imported into your project
depending on your needs:
>>> from osgeo import gdal
>>> from osgeo import ogr
>>> from osgeo import osr
>>> from osgeo import gdal_array
>>> from osgeo import gdalconst
>>> python
>>> import osgeo
>>> help(osgeo)
17
At the time of writing this, the GDAL version is now bumped up to 2.0, and in developer land,
this is old even before it gets printed. Beware that the GDAL 2.0 has compatibility issues and
for this book, version 1.x.x is recommended.
See also
The http://www.gdal.org homepage is always the best place for reference regarding any
information about it. The OSGEO includes GDAL as a supported project, and you can find more
information on it at http://www.osgeo.org.
18
Chapter 1
We will use PostgreSQL and PostGIS since they are the open source industry go-to spatial
databases. The installations are not 100% necessary, but without them there is no real point
because you then limit your operations, and they're definitely needed if you plan to store your
spatial data in a spatial database. The combination of PostgreSQL and PostGIS is the most
common spatial database setup for GeoDjango. This installation is definitely more involved
and can lead to some hook-ups depending on your system.
Getting ready
To use GeoDjango, we will need to have a spatial database installed, and in our case, we
will be using PostgreSQL with the PostGIS extension. GeoDjango also supports Oracle,
Spatialite, and MySQL. The dependencies of PostGIS include GDAL, GEOS, PROJ.4, LibXML2,
and JSON-C.
Start up your Python virtual environment as follows:
mdiener@mdiener-VirtualBox:~$ workon pygeoan_cb
(pygeoan_cb)mdiener@mdiener-VirtualBox:~$
19
How to do it...
Follow these steps. These are taken from the PostgreSQL homepage for Ubuntu Linux:
1. Create a new file called pgdg.list using the standard gedit text editor. This stores
the command to fire up your Ubuntu installer package:
$ sudo gedit /etc/apt/sources.list.d/pgdg.list
2. Add this line to the file, save, and then close it:
$ deb http://apt.postgresql.org/pub/repos/apt/ precise-pgdg main
6. To install PostGIS 2.1, we will have one unmet dependency, libgdal1, so go ahead
and install it:
$ sudo apt-get install libgdal1
7.
Now we can install PostGIS 2.1 for PostgreSQL 9.3 on our machine:
$ sudo apt-get install postgresql-9.3-postgis-2.1
10. Install the Python database adapter, psycopg2, to connect to your PostgreSQL
database from Python:
$ sudo apt-get install python-psycopg2
20
Chapter 1
12. Using the psql command-line tool, we can create a PostGIS extension to our newly
created database to give it all the PostGIS functions as follows:
(pygeoan_cb)mdiener@mdiener-VirtualBox:~$ psql -d
[NewDatabaseName] -c "CREATE EXTENSION postgis;"
13. Moving on, we can finally install Django in one line directly in our activated
virtual environment:
$ pip install django
14. Test out your install of Django and GDAL and, as always, try to import them as follows:
>>> from django.contrib.gis import gdal
>>> gdal.HAS_GDAL
True
How it works...
Installations using the apt-get Ubuntu installer and the Windows installers are simple enough
in order to have PostgreSQL, PostGIS, and Django up and running. However, the inner
workings of the installers are beyond the scope of this book.
There's more...
To summarize all the installed libraries, take a look at this table:
Library name
Description
Reason to install
NumPy
pyproj
It transforms projections
shapely
matplotlib
descartes
pandas
Description
Reason to install
SciPy
PySAL
IPython
Django
pyshp
GeoJSON
PostgreSQL
PostGIS
22
www.PacktPub.com
Stay Connected: