Curso Python y GIS
Curso Python y GIS
As with GEOG 483 and GEOG 484, the lessons in this course are project-based
with key concepts embedded within. However, because of the nature of computer
programming, there is no way this course can follow the step-by-step instruction
design of the previous courses. You will probably find the course to be more
challenging than the others. For that reason, it is more important than ever that
you stay on schedule and take advantage of the course message boards and private
e-mail. It's quite likely that you will get stuck somewhere during the course, so
before getting hopelessly frustrated, please seek help from me or your classmates!
I hope that by now that you have reviewed our Orientation and Syllabus for an
important course site overview. Before we begin our first project, let me share some
important information about the textbook and a related Esri course.
The textbook for this course is Python Scripting for ArcGIS by Paul A. Zandbergen.
This book came out in 2012 and has been a hot item among Esri software users; I
suggest you order your copy immediately in case of shortages or delays.
Back when Geog 485 was rewritten as a Python course, there was no textbook
available that tied together ArcGIS and Python scripting. As you read through
Zandbergen's book, you'll see material that closely parallels what is in the Geog 485
lessons. This isn't necessarily a bad thing; when you are learning a subject like
programming, it can be helpful to have the same concept explained from two
angles.
My advice about the readings is this: Read the material on the Geog 485 lesson
pages first. If you feel like you have a good understanding from the lesson pages,
you can skim through some of the more lengthy Zandbergen readings. If you
struggled with understanding the lesson pages, you should pay close attention to
the Zandbergen readings and try some of the related code snippets and exercises. I
suggest you plan about 1 - 2 hours per week of reading if you are going to study the
chapters in detail.
In all cases, you should get a copy of the textbook because it is a relevant and
helpful reference.
There is a free Esri Virtual Campus course, Using Python in ArcGIS Desktop 10 [1],
that introduces a lot of the same things you'll learn this quarter in Geog 485. The
course consists of a one-hour recorded seminar and a walkthrough exercise. If you
want to get a head start, or you feel you want some reinforcement of what we're
learning from a different point of view, it would be worth your time to complete
this Virtual Campus course.
All you need in order to access this course is an Esri Global Account, which you can
create for free. You do not need to obtain an access code from Penn State.
The video moves very quickly and covers a range of concepts that we'll spend 10
weeks studying in depth, so don't worry if you don't understand it all immediately.
You might find it helpful to watch the video again near the end of Geog 485 to
review what you've learned.
Questions?
If you have any questions now or at any point during this week, please feel free to
post them to the Lesson 1 Discussion Forum. (To access the forums, return to
ANGEL via the ANGEL link in the Resources menu. Once in ANGEL, you can
navigate to the Communicate tab and then scroll down to the Discussion
Forums section.) While you are there, feel free to post your own responses if you,
too, are able to help a classmate.
Lesson 1 checklist
This lesson is two weeks in length. (See the Calendar in ANGEL for specific due
dates.) To finish this lesson, you must complete the actvities listed below. You may
find it useful to print this page so that you can follow along with the directions.
chapters.
4. Complete Project 1, Part I and submit the deliverables to the course drop
box.
5. Complete Project 1, Part II and submit the deliverables to the course drop
box.
Successful GIS analysis requires selecting the right tools to operate on your data.
ArcGIS uses a toolbox metaphor to organize its suite of tools. You pick the tools you
need and run them in the proper order to make your finished product.
Suppose you’re responsible for selecting sites for video stores. You might use one
tool to select land parcels along a major thoroughfare, another tool to select parcels
no smaller than 0.25 acres, and other tools for other selection criteria. If this
selection process were limited to a small area, it would probably make sense to
perform the work manually.
However, let’s suppose you’re responsible for carrying out the same analysis for
several areas around the country. Because this scenario involves running the same
sequence of tools for several areas, it is one that lends itself well to automation.
There are several major benefits to automating tasks like this:
Automation makes work easier. Once you automate a process, you don't
have to put in as much effort remembering which tools to use or the proper
Automation makes work faster. A computer can open and execute tools in
sequence much faster than you can accomplish the same task by pointing
and clicking.
Automation makes work more accurate. Any time you perform a manual
task on a computer, there is a chance for error. The chance multiplies with
the number and complexity of the steps in your analysis. In contrast, once
ArcGIS provides three ways for users to automate their geoprocessing tasks. These
three options differ in the amount of skill required to produce the automated
solution and in the range of scenarios that each can address.
The first option is to construct a model using Model Builder. Model Builder is an
interactive program that allows the user to “chain” tools together, using the output
of one tool as input in another. Perhaps the most attractive feature of Model
Builder is that users can automate rather complex GIS workflows without the need
for programming. You will learn how to use Model Builder early in this course.
Some automation tasks require greater flexibility than is offered by Model Builder,
and for these scenarios it's recommended that you write scripts. The bulk of this
course is concerned with script writing.
There are special scripting languages for writing scripts, including Python, JScript,
and Perl. Often these languages have more basic syntax and are easier to learn than
other languages such as C, Java, or Visual Basic.
Although ArcGIS supports various scripting languages for working with its tools,
Esri emphasizes Python in its documentation and includes Python with the ArcGIS
install. In this course we’ll be working strictly with Python. You’ll learn the basics of
the Python language, how to write a script, and how to manipulate and analyze GIS
data using scripts. Finally, you’ll apply your new Python knowledge to a final
project, where you write a script of your choosing that you may be able to apply
directly to your work.
The tools that you run in ModelBuilder and Python actually use ArcObjects "under
the hood" to run GIS functions; however, the advantage of Python scripting with
ArcGIS is that you don't need to learn all the ArcObjects logic behind the tools.
Your job is just to learn the tools and how to run them in the appropriate order to
accomplish your task.
This first lesson will introduce you to concepts in both model building and script
writing. We’ll start by just getting familiar with how tools run in ArcGIS and how
you can use those tools in the ModelBuilder interface. Then, we’ll cover some of the
basics of Python and see how the tools can be run within scripts.
Although you may have seen them before, let’s take a quick look at the toolboxes:
1. Open ArcMap.
2. If the Catalog window isn't visible, click the Windows menu, then click
Catalog. (If you've used previous versions of ArcGIS, this is a new window
at version 10 that allows you to have a lot of the ArcCatalog functionality
available in ArcMap.) If you hover over or click the Catalog item on the right
side of your screen, you can make the Catalog window appear. Optionally,
3. In the Catalog, expand the nodes Toolboxes > System Toolboxes and
continue expanding the toolboxes of your choice until you see some of the
available tools. Notice that they’re organized into toolboxes and toolsets.
Sometimes it’s faster to use the Search window to find the tool you need
4. Let’s examine a tool. Expand Analysis Tools > Proximity > Buffer, and
double-click the Buffer tool to open it.
At this point, you’re looking at a dialog with many fields. Each geoprocessing
tool has required inputs and outputs. Those are indicated by the green dots.
They represent the minimum amount of information you need to supply in
order to run a tool. For the Buffer tool, as inputs, you’re required to supply
an input features location (the features that will be buffered) and a buffer
distance. You’re also required to indicate an output feature class location
(for the new buffered features).
Many tools also have optional parameters. You can modify these if you want,
but if you don’t supply them, the tool will still run using default values. For
the Buffer tool, optional parameters are the Side Type, End Type, Dissolve
Type, and Dissolve Fields. Optional parameters are typically specified after
required parameters.
5. Click the Show Help button in the lower-right corner of the tool (if it says
Hide Help then you’re already viewing help). You can now click on any
parameter in the dialog to see an explanation of that parameter appear in
the right-hand window.
If you’re not sure what a parameter means, this is a good way to learn. For
example, with the help still open, click the Side Type input box on the
Buffer tool (right where it says "FULL"). The Help explains what the Side
Type parameter means and lists the different options: FULL, LEFT, RIGHT,
and OUTSIDE_ONLY.
If you need even more help, each tool is fully documented in the ArcGIS Desktop
Help. You could go directly to the Buffer tool help by clicking the Tool Help
button in the tool dialog box, but in this course you'll often want to get to these help
pages without opening the tool itself. Below are the steps for doing so.
1. From the main menu of ArcMap, click Help > ArcGIS Desktop Help.
Optionally, for the most up-to-date help, you can use the Web-based help at
http://webhelp.esri.com [3]. (All links to the Help in this course will open
Notice that the help topics in this section are organized into toolboxes and
Proximity toolset > Buffer. Scroll through the entire topic examining all
the information that is given about the Buffer tool. Here you get tips about
what the Buffer tool does, how to use it, a full list of parameters, and
and you should always check the Geoprocessing Tool Reference in the Help
a tool and its parameters. In this case, you can open the tool directly from
the Catalog and use the tool’s graphical user interface (GUI, pronounced
ModelBuilder is also a GUI application where you can set up tools to run in a
given sequence, using the output of one tool as input to another tool.
If you’re familiar with the tool and want to use it quickly in ArcMap, you
may prefer the Python window approach. You type the tool name and
required parameters into a command window. You can use this window to
run several tools in a row and declare variables, effectively doing simple
scripting.
logical sequence of tools, you can run it from a script. Running a tool from a
We’ll start with the simplest of these cases, running a tool from its GUI, and work
our way up to scripting.
1. If by chance you still have the Buffer tool open from the previous section,
5. Click the Add Data button and browse to the data you just extracted.
6. Open the Catalog window if necessary and browse to the Buffer tool as you
8. Examine the first required parameter: Input Features. Click the Browse
A more convenient way to supply the Input Features is to just select the
cities map layer from the dropdown menu. This dropdown automatically
contains all the layers in your map document. However, in this example we
browsed to the path of the data because it’s conceptually similar to how we’ll
provide the paths in the command line and scripting environments.
9. Now you need to supply the Distance parameter for the buffer. For this run
of the tool, set a Linear unit of 5 miles. When we run the tool from the
10. The rest of the parameters are optional. The Side Type and End Type
parameters apply only to lines and polygons, so they are not even available
for setting in the GUI environment when working with city points. However,
12. The tool should take just a few seconds to complete. Examine the output
that appears on the map, and do a “sanity check” to make sure that buffers
appear around the cities and they appear to be about 5 miles in radius. You
13. Click the Geoprocessing menu and click Results. This window lists
messages about successes or failures of all recent tools that you've run.
14. Expand the Buffer tool until you can see all the messages. They list the tool
parameters, the time of completion, and any problems that occurred when
running the tool. (See Figure 1.1.) These messages can be a big help later
when you troubleshoot your Python scripts. The text of these messages is
available whether you run the tool from the GUI, from the Python window in
Figure 1.1 Screen capture showing the Buffer tool and all messages.
A set of tools chained together in this way is called a model. Models can be simple,
consisting of just a few tools, or complex, consisting of many tools and parameters
and occasionally some iterative logic. Whether big or small, the benefit of a model
is that it solves a unique geographic problem that cannot be addressed by one of
the “out-of-the-box” tools.
In ArcGIS, modeling can be done either through the ModelBuilder graphical user
interface (GUI) or through code, using Python. To keep our terms clear, we’ll refer
to anything built in ModelBuilder as a “model” and anything built through Python
as a “script.” However, it’s important to remember that both things are doing
modeling.
ModelBuilder is a nice environment for exploring the ArcGIS tools, learning how
tool inputs and outputs are used, and visually understanding how GIS modeling
works. When you begin using Python, you will not have the same visual assistance
to see how the tools you’re using are connected, but you may still want to draw your
model on a whiteboard in a similar fashion to what you saw in ModelBuilder.
ModelBuilder can frequently reduce the amount of Python coding that you need to
do. If your GIS problem does not require advanced conditional and iterative logic,
you may be able to get your work done in ModelBuilder without writing a script.
ModelBuilder also allows you to export any model to Python code, so even if you do
need to write a script, you may be able to use ModelBuilder to get a head start.
You can double-click this model any time in the Catalog window and run it just as
you would a tool. If you do this, you’ll notice that the model has no parameters; you
can’t change the buffer distance or input features. The truth is, our model is useful
for solving this particular site-selection problem with these particular datasets, but
it’s not very flexible. In the next section of the lesson, we’ll make this model more
versatile by configuring some of the variables as input and output parameters.
1. Create a new map document in ArcMap and add the us_cities, us_roads,
and us_boundaries shapefiles from the Lesson 1 data folder that you
C:\WCGIS\Geog485\Lesson1\ModelPractice.mxd.
2. In ArcGIS, all models are stored in toolboxes. The first thing you need to do
is create a toolbox to hold your new model. If the Catalog window is not
visible already, display it by clicking the menu item Windows > Catalog.
3. In the Catalog window, expand the nodes until you see Toolboxes > My
Toolboxes.
5. Right-click the Lesson 1 toolbox and click New > Model. You’ll see
ModelBuilder appear.
7. For the Name, type SuitableLand and for the Label, type Find Suitable
Land. The label is what everyone will see when they open your tool from the
Catalog. That’s why it can contain spaces. The name is what people will use
if they ever run your model from Python. That’s why it cannot contain
spaces.
8. Click OK to dismiss the Model Properties dialog.
You now have a blank canvas on which you can drag and drop the tools.
When creating a model (and when writing Python scripts), it’s best to break
your problem into manageable pieces. The simple site selection problem
here can be thought of as four steps:
Let’s tackle these items one at a time, starting with buffering the cities.
10. Click the Buffer tool and drag it onto the ModelBuilder canvas. You’ll see a
white rectangular box representing the buffer tool and a white oval
representing the output buffers. These are connected with a line, showing
that Buffer tool will always produce an output dataset.
11. In your ModelBuilder window, double-click the Buffer box. The tool dialog
here is the same as if you had opened the Buffer directly out of ArcToolbox.
12. For Input Features, browse to the path of your us_cities shapefile on disk.
14. For Dissolve Type, select ALL, then click OK to close the Buffer dialog.
The model elements (tools and variables) should be filled in with color, and
you should see a new element to the left of the tool representing the input
15. An important part of working with ModelBuilder is supplying clear labels for
all the elements. This way, if you share your model, others can easily
understand what will happen when it runs. Supplying clear labels also helps
you remember what the model does, especially if you haven’t worked with
the model for a while.
16. Right-click the Buffer tool (yellow-orange box, at center) and click Rename.
17. Right-click the us_citiesBuffer1.shp element (green oval, at far right) and
click Rename. Name this “Buffered cities.” Your model should look like
this.
18. Save your model (Model > Save). This is the kind of activity where you
19. Practice what you just learned by adding another Buffer tool to your model.
This time, configure the tool so that it buffers the us_roads shapefile by 10
miles. Remember to set the Dissolve type to ALL and to add meaningful
labels. Your model should now look like this.
Figure 1.3 The model's appearance following step 19, above.
20. The next task is to intersect the buffers. In the Catalog window's list of
toolboxes, browse to Analysis Tools > Overlay and drag the Intersect
tool onto your model. Position it to the right of your existing Buffer tools.
21. Here’s the pivotal moment when you chain the tools together, setting the
outputs of your Buffer tools as the inputs of the Intersect tool. Click the
Connect tool , then click the Buffered cities element followed by the
Intersect element. If you see a small menu appear, click Input Features to
denote that the buffered cities will act as inputs to the Intersect tool. An
arrow will now point from the Buffered cities element to the Intersect
element.
22. Use the same process to connect the Buffered roads to the Intersect element.
24. The final step is to clip the intersected buffers to the outline of the United
States. This prevents any of the selected area from falling outside the
> Extract and drag the Clip tool into ModelBuilder. Position this tool to
25. Use the Connect tool again to set the Intersected buffers as an input to the
Clip tool, choosing Input Features when prompted. Notice that even when
you do this, the Clip tool is not ready to run (it’s still shown as a white
rectangle, located at right). You need to supply the clip features, which is the
27. Set meaningful labels for the remaining tools as shown below. Below is an
example of how you can label and arrange the model elements.
Fig 1.5 The completed model with the clip tool included.
28. Double click the final output element (named "Suitable land" in the image
This is where you can expect your model output feature class to be written to
disk.
31. Test the model by clicking the Run button . You’ll see the by-now-
familiar geoprocessing message window that will report any errors that may
occur. ModelBuilder also gives you a visual cue of which tool is running by
turning the tool red. (If the model crashes, try closing ModelBuilder and
running the model by double-clicking it from the Catalog window. You'll get
a message that the model has no parameters. This is okay [and true, as you'll
32. When the model has finished running (it may take a while), examine the
output in ArcMap. Zoom in to Washington state to verify that the has Clip
worked on the coastal areas. The output should look similar to this.
Let’s modify that model to use some parameters, so that you can easily run it with
different datasets and buffer distances.
C:\WCGIS\Geog485\Lesson1\ModelPractice.mxd in ArcMap.
2. In the Catalog window, find the model you created in the previous lesson
which should be under Toolboxes > My Toolboxes > Lesson 1 > Find
Suitable Land.
3. Right-click the model Find Suitable Land and click Copy. Now right-click
the Lesson 1 toolbox and click Paste. This creates a new copy of your model
that you can work with to create model parameters. Using a copy of the
model like this allows you to easily start over if you make a mistake.
4. Rename the copy of your model Find Suitable Land With Parameters
or something similar.
Parameters and click Edit. You'll see the model appear in ModelBuilder.
6. Right-click the element US Cities (should be a blue oval) and click Model
Parameter. This means that whoever runs the model must specify the
7. You need a more general name for this parameter now, so right-click the US
Cities element and click Rename. Change the name to just "Cities."
8. Even though you "parameterized" the cities, your model still defaults to
using the C:\WCGIS\Geog485\Lesson1\us_cities.shp dataset. This isn't
going to make much sense if you share your model or toolbox with other
people because they may not have the same us_cities shapefile, and even if
they do, it probably won't be sitting at the same path on their machines.
To remove the default dataset, double-click the Cities element and delete the
path, then click OK. Some of the elements in your model may turn white.
This signifies that a value has to be provided before the model can
successfully run.
9. Now you need to create a parameter for the distance of the buffer to be
created around the cities. Right-click the element that you named "Buffer
the cities" and click Make Variable > From Parameter > Distance
[value or field].
10. The previous step created a new element Distance [value or field]. Rename
(Review the steps above if you're unsure about how to rename an element or
make it a model parameter.) For this element, you can leave the default at 10
miles. Your model should look similar to this, although the title bar of your
11. Repeating what you learned above, rename the US Roads element to
12. Repeating what you learned above, make a parameter for the Roads buffer
13. Repeating what you learned above, rename the US Boundaries element to
Boundaries, make it a model parameter, and remove the default value. Your
model should look like this (notice the five parameters indicated by "P"s):
Figure 1.8 The "Find Suitable Land With Parameters" model following Step
Figure 1.9 The model interface, or tool dialog, for the model "Find Suitable
People who run this model will be able to browse to any cities, roads, and
boundaries datasets, and will be able to control the buffer distance. The
green dots indicate parameters that must be supplied with valid values
before the model can run.
16. Test your model by supplying the us_cities, us_roads, and us_boundaries
shapefiles for the model parameters. If you like, you can try changing the
buffer distance.
Note that sometimes the result does not add itself to the display like it
should. You should just be able to add it to the display by using the Add
GIS analysis sometimes gets messy. Most of the tools that you run produce an
output dataset, and when you chain many tools together those datasets start piling
up on disk. Even if you're diligent about naming your datasets intuitively, it's easy
to wind up with a folder full of datasets with names like buffers1, clippedbuffers1,
intersectedandclippedbuffers1, raster2reclassified, etc.
In most cases, you are concerned with just the final output dataset. The
intermediate data is just temporary; you only need to keep it around for as long as
it takes to run the model, and then it can be deleted.
ModelBuilder can manage your intermediate data for you, placing it in a temporary
directory called the scratch workspace. By default, the scratch workspace is your
operating system's temp directory, but you can configure it to exist in another
location.
You can force data to go into the scratch workspace by using the
%SCRATCHWORKSPACE% variable in the path. For example:
%SCRATCHWORKSPACE%\myOutput.shp
You can also mark any element in ModelBuilder as Intermediate and it will be
deleted after the model is run. By default, all derived data is Intermediate.
The following topics from Esri go into more detail on intermediate data and are
important to understand as you work with the geoprocessing framework. I suggest
reading them once now and returning to them occasionally throughout the course.
Some of the concepts in them are easier to understand once you've worked with
geoprocessing for a while.
A quick tour of managing intermediate data [4]
Managing intermediate data in shared models [7] (Skip the section about
ArcGIS Server)
Looping in ModelBuilder
To take a peek at how iteration works in ModelBuilder, you can visit the ArcGIS
Desktop help book for model iteration [8]. If you're having trouble understanding
looping in later lessons, ModelBuilder might be a good environment to visualize
what a loop does. You can come back and visit this book as needed.
Readings
Read Zandbergen Chapter 2.1 - 2.9 to reinforce what you learned about
geoprocessing and ModelBuilder.
3. On the Standard toolbar, click the Python window button . Once the
window appears, drag it over to the side or bottom of the screen to dock it.
4. Type the following in the Python window (Don't type the >>>. These are just
included to show you where the new lines begin in the Python window.)
5. >>> import arcpy
You’ve just run your first bit of Python. You don’t have to understand everything
about the code you wrote in this window, but here are a few important things to
note.
The first line of the script import arcpy tells the Python interpreter (which was
installed when you installed ArcGIS) that you’re going to work with some special
scripting functions and tools included with ArcGIS. Without this line of code,
Python knows nothing about ArcGIS, so you'll put it at the top of all ArcGIS-related
code that you write in this class. You technically don't need this line when you work
with the Python window in ArcMap because arcpy is already imported, but I
wanted to show you this pattern early; you'll use it in all the scripts you write
outside the Python window.
The second line of the script actually runs the tool. You can type arcpy, plus a dot,
plus any tool name to run a tool in Python. Notice here that you also put an
underscore followed by the name of the toolbox that includes the buffer tool. This is
necessary because some tools in different toolboxes actually have the same name
(like Clip, which is in both the Data Management and Analysis toolboxes).
After you typed arcpy.Buffer_analysis, you typed all the parameters for the
tool. Each parameter was separated by a comma, and the whole list of parameters
was enclosed in parentheses. Get used to this pattern, since you'll follow it with
every tool you run in this course.
In this code, we also supplied some optional parameters, leaving empty quotes
where we wanted to take the default values, and truncating the parameter list at the
final optional parameter we wanted to set.
How do you know the syntax, or structure, of the parameters to enter? For
example, for the buffer distance, should you enter 15MILES, ‘15MILES’, 15 Miles,
or ’15 Miles’? The best way to answer questions like these is to return to the
Geoprocessing tool reference help topic for the Buffer tool [9]. All of the topics in
this reference section have a command line usage and example section to help you
understand how to structure the parameters. All the required parameters are
shown inside carets (<>), while the optional parameters are shown inside braces
({}). From the example in this topic, you can see that the buffer distance should be
specified as ’15 miles’. Because there is a space in this text, or string, you need to
surround it with single quotes.
You might have noticed that the Python window helps you by popping up different
options you can type for each parameter. This is called autocompletion, and it can
be very helpful if you're trying to run a tool for the first time and you don't know
exactly how to type the parameters. When you write code in PythonWin, you don't
get the autocompletion, so you may want to return to the Python window for tips as
you practice writing lines of code. If you can get a line of code to work in the Python
window, it will probably work in your script that you're writing in PythonWin.
There are a couple of differences between writing code in the Python window and
writing code in some other program, such as Notepad or PythonWin (which we'll
use throughout the course). In the Python window, you can reference layers in the
map document by their names only, instead of their file paths. Thus, we were able
to type "us_cities" instead of something like "C:\\data\\us_cities.shp". We were
also able to make up the name of a new layer "us_cities_buffered" and get it added
to the map by default after the code ran. If you're going to use your code outside the
Python window, make sure you use the full paths.
When you write more complex scripts, it will be helpful to use an integrated
development environment (IDE), meaning a program specifically designed to help
you write and test Python code. Later in this course we’ll explore the PythonWin
IDE.
Earlier in this lesson you saw how tools can be chained together to solve a problem
using ModelBuilder. The same can be done in Python, but it’s going to take a little
groundwork to get to that point. For this reason we’ll spend the rest of Lesson 1
covering some of the basics of Python.
Readings
Take a few minutes to read Zandbergen Chapter 3, a fairly short chapter where he
explains the Python window and some things you can do with it.
In ArcGIS, Python can be used for coarse-grained programming, meaning that you
can use it to easily run geoprocessing tools such as the Buffer tool that we just
worked with. You could code all the buffer logic yourself, using more detailed, fine-
grained programming with ArcObjects, but this would be time consuming and
unnecessary in most scenarios; it’s easier just to call the Buffer tool from a Python
script using one line of code.
In this course we’ll be working with Python version 2.6 (if you have ArcGIS 10.0) or
version 2.7 (if you have ArcGIS 10.1). If you download Python from its home page
at www.python.org [10], you’ll see that there are actually higher versions of Python
available. Python versions 3 and above contain some big changes and are going to
take some time for the Python user community to adopt. You may see some
information about Python 3 in your textbook that will give you an idea of the
changes coming in that version. You can read this information if you're interested,
but it's not applicable to this course.
Python comes with a simple default editor called IDLE; however, in this course
you’ll use the PythonWin integrated development environment (IDE) to help you
write code. PythonWin is free, has basic debugging capabilities, and is included
with ArcGIS. The only catch is that it is not installed by default with ArcGIS; you
have to do it manually by following the steps below. If you are using ArcGIS 10.1,
replace any instances of 2.6 or 26 below with 2.7 or 27, respectively.
2. Dismiss any welcome screens that appear and choose Start > My
3. Find your DVD drive, right-click it, and click Open. Your goal is to get to the
folder structure of the DVD, not run the Auto Play that shows the Esri
welcome screen.
4. Once you’ve successfully displayed the folder structure, open the Desktop
folder.
not the PythonWin readme file). If you are using Windows Vista or Windows
one yourself. Once the install completes, use My Computer (or "Computer")
C:\Python26\ArcGIS10.0\Lib\site-packages\pythonwin.
9. Right-click the item Pythonwin and click Create Shortcut. You should
10. Drag and drop the shortcut onto your Desktop or wherever else you want to
put it.
On Windows Vista or Windows 7, if you see error messages during install, it’s likely
that you did not run the install as an Administrator. When you launch the install,
make sure you right-click and choose Run as Administrator.
When PythonWin opens, you’ll see what’s known as the Interactive Window. You
can type a line of Python at the >>> prompt and it will immediately execute and
print the result, if there is a printable result. The Interactive Window can be a good
place to practice with Python in this course, and whenever you see some Python
code next to the >>> prompt in the lesson materials, this means you can type it in
the Interactive Window to follow along. In these ways, the Interactive Window is
very similar to the Python window in ArcGIS.
To actually write a new script, click File > New and choose Python Script.
Notice a blank page opens that looks a whole lot like Notepad. However, the nice
thing about this interface is that the code is color-coded and the default font,
Courier, is one typically used by programmers. Spacing and indentation, which are
important in Python, are also easy to keep track of in this interface.
Remember your first introductory algebra class where you learned that a letter
could represent any number, like in the statement x + 3? This may have been your
first exposure to variables. (Sorry if the memory is traumatic!) In computer science,
variables represent values or objects you want the computer to store in its memory
for use later in the program.
Variables are frequently used to represent not only numbers, but also text and
“Boolean” values (‘true’ or ‘false’). A variable might be used to store input from the
program’s user, to store values returned from another program, to represent
constant values, and so on.
Variables make your code readable and flexible. If you hard-code your values,
meaning that you always use the literal value, your code is useful only in one
particular scenario. You could manually change the values in your code to fit a
different scenario, but this is tedious and exposes you to greater risk of making a
mistake (suppose you forget to change a value). Variables, on the other hand, allow
your code to be useful in many scenarios and are easy to parameterize, meaning
you can let users change the values to whatever they need.
To see some variables in action, open PythonWin and type this in the Interactive
Window:
>>> x = 2
You’ve just created, or declared, a variable, x, and set its value to 2. In some
strongly-typed programming languages, such as Java, you would be required to tell
the program that you were creating a numerical variable, but Python assumes this
when it sees the 2.
When you hit Enter, nothing happens, but the program now has this variable in
memory. To prove this, type:
>>> x + 3
You see the answer of this mathematical expression, 5, appear immediately in the
Interactive Window, proving that your variable was remembered and used.
You can also use the print command to write the results of operations. We’ll use
this a lot when practicing and testing code.
>>>print x + 3
5
Variables can also represent words, or strings, as they are referred to by
programmers. Try typing this in the Interactive Window:
In this example, the quotation marks tell Python that you are declaring a string
variable. Python is a powerful language for working with strings. A very simple
example of string manipulation is to add, or concatenate, two strings, like this:
You can include a number in a string variable by putting it in quotes, but you must
thereafter treat it like a string; you cannot treat it like a number. For example, this
results in an error:
>>>myValue = "3"
>>>print myValue + 2
In these examples you’ve seen the use of the = sign to assign the value of the
variable. You can always reassign the variable. For example:
>>> x = 5
>>> x = x - 2
>>> print x
3
When naming your variables, the following tips will help you avoid errors.
MyVariable.
beginning with a lower-case letter, then begin each subsequent word with a
"import" or "print."
Make variable names meaningful so that others can easily read your code. This will
also help you read your code and avoid making mistakes.
You’ll get plenty of experience working with variables throughout this course and
will learn more in future lessons.
5. Get a knife.
8. etc.
1. mySandwich = Sandwich.Make
2. mySandwich.Bread = Wheat
3. mySandwich.Add(PeanutButter)
4. mySandwich.Add(Jelly)
In the object-oriented example, the bulk of the steps have been eliminated. The
sandwich object "knows how" to build itself, given just a few pieces of information.
This is an important feature of object-oriented languages known as encapsulation.
Notice that you can define the properties of the sandwich (like the bread type) and
perform methods (remember that these are actions) on the sandwich, such as
adding the peanut butter and jelly.
1.5.3 Classes
The reason it’s so easy to "make a sandwich" in an object-oriented language is that
some programmer, somewhere, already did the work to define what a sandwich is
and what you can do with it. He or she did this using a class. A class defines how to
create an object, the properties and methods available to that object, how the
properties are set and used, and what each method does.
In Python, classes are grouped together into modules. You import modules into
your code to tell your program what objects you’ll be working with. You can write
modules yourself, but most likely you'll bring them in from other parties or
software packages. For example, the first line of most scripts you write in this
course will be:
import arcpy
Here you're using the import keyword to tell your script that you’ll be working
with the arcpy module, which is provided as part of ArcGIS. After importing this
module, you can create objects that leverage ArcGIS in your scripts.
Other modules that you may import in this course are os (allows you to work with
the operating system), random (allows for generation of random numbers), and
math (allows you to work with advanced math operations). These modules are
included with Python, but they aren't imported by default. A best practice for
keeping your scripts fast is to import only the modules that you need for that
particular script. For example, although it might not cause any errors in your
script, you wouldn't include import arcpy in a script not requiring any ArcGIS
functions.
1.5.4 Inheritance
Another important feature of object-oriented languages is inheritance. Classes are
arranged in a hierarchical relationship such that each class inherits its properties
and methods from the class above it in the hierarchy (its parent class or
superclass). A class also passes along its properties and methods to the class below
it (its child class or subclass). A real-world analogy involves the classification of
animal species. As a species, we have many characteristics that are unique to
humans. However, we also inherit many characteristics from classes higher in the
class hierarchy. We have some characteristics as a result of being vertebrates. We
have other characteristics as a result of being mammals. To illustrate the point,
think of the ability of humans to run. Our bodies respond to our command to run
not because we belong to the "human" class, but because we inherit that trait from
some class higher in the class hierarchy.
Back in the programming context, the lesson to be learned is that it pays to know
where a class fits into the class hierarchy. Without that piece of information, you
will be unaware of all of the operations available to you. This information about
inheritance can often be found in informational posters called object model
diagrams.
Here's an example of the object model diagram for ArcGIS Python scripting at 9.3
[11] (unfortunately, there is no poster at ArcGIS 10, but the 9.3 poster still comes in
handy for some things like this). Take a look at the green box titled FeatureClass
and notice at the bottom it says Dataset properties. This is because FeatureClass
inherits all properties from Dataset. Therefore any properties on a Dataset object,
such as Extent or SpatialReference, can also be obtained if you create a
FeatureClass object. Apart from all the properties it inherits from Dataset, the
FeatureClass has its own specialized properties such as FeatureType and
ShapeType.
means it’s important whether you use upper or lower-case. The all lower-
case "print" is a reserved word in Python that will print a value, while "Print"
sensitive about case and will return an error if you try to run a tool without
You end a Python statement by pressing Enter and literally beginning a new
denotes the end of a statement.) It’s okay to add empty lines to divide your
If you have a long statement that you want to display on multiple lines for
is a backslash (\). You can then continue typing on the line below and
Python will interpret the line sequence as one statement. One exception is if
blocks, of code. You should indent your code four spaces inside loops,
You can add a comment to your code by beginning the line with a pound (#)
sign. Comments are lines that you include to explain what the code is doing.
Comments are ignored by Python when it runs the script, so you can add
them at any place in your code without worrying about their effect.
Comments help others who may have to work with your code in the future;
and they may even help you remember what the code does.
import arcpy
featureClass = "C:/Data/USA/USA.gdb/StateBoundaries"
This may look intimidating at first, so let’s go through what’s happening in this
script, line by line. Watch this video to get a visual walkthrough of the code.
Again, notice that:
The variable names featureClass, desc, and spatialRef that the programmer
assigned are short, but intuitive. By looking at the variable name, you can
The script creates objects and uses a combination of properties and methods
programming works.
The best way to get familiar with a new programming languages is to look at
example code and practice with it yourself. See if you can modify the script above to
report the spatial reference of a feature class on your computer. In my example the
feature class is in a file geodatabase; you’ll need to modify the structure of the
featureClass path if you are using a shapefile (for example, you'll put .shp at the end
of the file name, and you won't have .gdb in your path).
3. Paste in the code above and modify it to fit your data (change the path).
5. Click the Run button to run the script. Make sure the Interactive Window is
visible when you do this, because this is where you’ll see the output from the
print keyword. The print keyword does not actually cause a hard copy to be
printed!
Something you may not recognize below is the expression Raster(inRaster). This
function just tells ArcGIS that it needs to treat your inRaster variable as a raster
dataset so that you can perform map algebra on it. If you didn't do this, the script
would treat inRaster as just a literal string of characters (the path) instead of a
raster dataset.
import arcpy
from arcpy.sa import *
Begin by examining this script and trying to figure out as much as you can based on
what you remember from the previous scripts you’ve seen.
Notice the lines of code that check out the Spatial Analyst extension before
doing any map algebra and check it back in after finishing. Because each line
of code takes some time to run, avoid putting unnecessary code between
checkout and checkin. This allows others in your organization to use the
back in when your script ends, thus some of the Esri code examples you will
see do not check it in. However, it is a good practice to explicitly check it in,
just in case you have some long code that needs to execute afterward, or in
case your script crashes and against your intentions "hangs onto" the
license.
inRaster begins as a string, but is then casted to, or treated as, a Raster
used for working with raster datasets in ArcGIS. It's not available in just any
Python script: you can use it only if you import the arcpy module at the top
of your script.
cutoffElevation is a number variable that you declare early in your script and
then use later on when you build the map algebra expression for your
outRaster.
outRaster. Do this by taking all the cells of the raster dataset at the path of
inRaster that are greater than the number I assigned to the variable
cutoffElevation."
outRaster is also a Raster object, but you have to call the method
takes one argument, which is the path to which you want to save.
Now try to run the script yourself using the FoxLake digital elevation model (DEM)
in your Lesson 1 data folder. If it doesn’t work the first time, verify that:
Your path name contains forward slashes (/) or double backslashes (\\), not
You have the Spatial Analyst Extension installed and enabled. To check this,
Analyst is checked.
The output data does not exist yet. If you want to be able to overwrite the
You can experiment with this script using different values in the map algebra
expression (try 3000 for example).
This third example is a little different. Instead of hard-coding the values needed for
the tool (in other words, literally including the values in the script) we’ll use some
user input variables, or parameters. This allows people to try different values in the
script without altering the code itself. Just like in ModelBuilder, parameters make
your script available to a wider audience.
The simple example below just runs the Buffer tool, but it allows the user to enter
the path of the input and output datasets as well as the distance of the buffer. The
user-supplied parameters make their way into the script with the
arcpy.GetParameterAsText() method.
Examine the script below carefully, but don't try to run it yet. You'll do that in the
next part of the lesson.
# This script runs the Buffer tool. The user supplies the input
# and output paths, and the buffer distance.
import arcpy
arcpy.env.overwriteOutput = True
try:
# Get the input parameters for the Buffer tool
inPath = arcpy.GetParameterAsText(0)
outPath = arcpy.GetParameterAsText(1)
bufferDistance = arcpy.GetParameterAsText(2)
except:
# Report an error messages
arcpy.AddError("Could not complete the buffer")
# Report any error messages that the Buffer tool might have generated
arcpy.AddMessage(arcpy.GetMessages())
Again, examine the above code line by line and figure out as much as you can about
what the code does. If necessary, print the code and write notes next to each line.
Here are some of the main points to understand:
ahead and make a tool out of this script, as we are going to do in the next
page of this lesson, then it’s important you define the parameters in the
When we called the Buffer tool in this script, we supplied only three
parameters. By not supplying any more, we accepted the default values for
the rest of the tool’s parameter (Side Type, End Type, etc.).
The try and except blocks of code are a way that you can prevent your script
from crashing if there is an error. Your script attempts to run all of the code
in the try block. If the script cannot continue for some reason, it jumps down
and runs the code in the except block. Inserting try/except blocks like this is
a good practice to do once you think you've gotten all the errors out of your
script, or when you want to make sure your code will run a certain line at the
When you are first writing and debugging your script, sometimes it's more
useful to leave out try/except and let the code crash, because the (red) error
clues on how to diagnose the problem in your code. Suppose you put a print
statement in your except block saying "There was an error. Please try again."
For the end user of your script, this is nicer than seeing a nasty (red) error
the (red) error message to get any insight you can about what went wrong.
Projects that you submit in this course require error handling using
additional messages to the user of the tool. Whenever you run a tool, the
geoprocessor prints messages, which you have probably seen before (for
2009”). You have the power to add more messages through these methods.
The messages have differing levels of severity, hence different methods for
When you use arcpy.GetMessages(), you get all the messages generated by
the tool itself. These will tell you things such as whether the user entered
invalid parameters. Notice in this script the somewhat complex syntax you
math operations: you start by working inside the parentheses first to get the
The AddError and AddMessage methods are only used when making script
tools (which you'll learn about in the very next section). When you are just
running a script in PythonWin (not making a script tool), you can still get
the messages using a print statement with GetMessages(), like this: print
arcpy.GetMessages().
Before you begin this exercise, I strongly recommend that you read the first four
topics in the ArcGIS Desktop Help section Creating script tools with Python scripts
[13]. You likely will not understand all the parts of this section yet, but it will give
you some familiarity with script tools that will be helpful during the exercise.
1. Copy the code from Lesson 1.6.4 "Example: Creating Buffers" into a new
7. Fill in the Name, Label, and Description properties for your Script tool
as shown below:
Figure 1.10 Entering information for your script tool.
8. Click Next and supply the Script File. To do this, click the folder icon and
9. Click Next and examine the dialog that appears. This is where you can
specify the parameters of your script. The parameters are the values for
outPath, and bufferMiles. You will use this dialog to list those parameters in
the same order, except you can give the parameters names that are easier to
understand.
10. In the Display Name column that you see at the top of this wizard, click
11. Immediately to the right, click the first empty cell in the Data Type column
and choose Feature Class. Here is one of the huge advantages of making a
script tool. Instead of accepting any string as input (which could contain an
error), your tool will now enforce the requirement that a feature class be
used as input. ArcGIS will help you by confirming that the value entered is a
path to a valid feature class. It will even supply the users of your tool with a
12. Just as you did in the previous steps, add a second parameter named
“Output Feature Class”. The data type should again be Feature Class.
13. With the Output Feature Class parameter still highlighted, look down at
property to Output.
14. Add a third property named “Buffer Distance”. Choose Linear Unit as the
data type. This data type will allow the user of the tool to select both the
distance value and the units (for example, miles, kilometers, etc.).
15. With the Buffer Distance parameter still highlighted, look down at the
Miles” (do not include the quotes). Your dialog should look like what you see
below:
16. Click Finish and, in the Catalog window, open your new script tool by
double-clicking it.
17. Try out your tool by buffering any feature class on your computer. Notice
that once you supply the input feature class, an output feature class path is
suggested for you. This is because you specifically set Output Feature Class
Results window for the custom message "All done!" that you added in your
code.
This is a very simple example and obviously you could just run the out-of-the-box
Buffer tool with similar results. Normally when you create a script tool, it will be
backed with a script that runs a combination of tools and applies some logic that
makes those tools uniquely useful.
There’s another benefit to this example, though. Notice the simplicity of our script
tool dialog compared to the main Buffer tool:
Figure 1.14 Comparison of our script tool with the main buffer tool.
At some point you may need to design a set of tools for beginning GIS users where
only the most necessary parameters are exposed. You may also do this to enforce
quality control if you know that some of the parameters must always be set to
certain defaults and you want to avoid the scenario where a beginning user (or a
rogue user) might change the required values. A simple script tool is effective for
simplifying the tool dialog in this way.
Readings
Read Zandbergen 2.10 - 2.13 to reinforce what you learned during this lesson about
scripts and script tools.
1. Say hello
Create a string variable called x and assign it the value "Hello". Display the
Create a string variable called first and assign to it your first name. Likewise,
create a string variable called last and assign to it your last name.
Concatenate (merge) the two strings together, making sure to also include a
This method is typically used in conjunction with an ArcGIS script tool that
has been designed to prompt the user to enter the required parameters.
However, you may have noticed that the little dialog that appears after
arguments.
For this exercise, write a script that accepts a single string value using the
the Interactive Window. Test the script from within PythonWin, entering a
name (in quotes) in the Arguments text box after clicking the Run button.
Example 1.6.2 demonstrates the use of the Describe method to report the
that has a number of properties that can vary depending on what type of
virtue of the fact that it is a type of Dataset. (Rasters are another type of
The Describe method's page in the Help [17] lists the types of objects that
the method can be used on. Clicking the Dataset link pulls up a list of the
For this exercise, use the Describe method again; this time, to determine the
won't tell you the name of the property that returns this information. But I
will give you the hint that feature classes have this mystery property not
You need to do several tasks in order to get this data ready for mapping:
dataset with estimated precipitation values for your entire area of interest.
You've already planned for this, knowing that you are going to use inverse
distance weighted (IDW) interpolation. Click the following link to learn how
the IDW technique works. [20] You've also selected your points to include
precipitation "zones" that delineate relatively dry, medium, and wet regions.
It's very possible that you'll want to repeat the above process in order to test
different IDW interpolation parameters or make similar maps with other datasets
(such as next year's precipitation data). Therefore, the above series of tasks is well-
suited to ModelBuilder. Your job is to create a model that can complete the
above series of steps without you having to manually open four
different tools.
Model parameters
readings point data. This is a model parameter so that the model can be
points are included in the interpolation of a point. The search radius can be
fall within, or its distance can vary in order for it to always include a
minimum number of points. When you use ModelBuilder, you don't have to
set up any of these choices; ModelBuilder does it for you when you set the
4. Zone boundaries- This is a table allowing the user of the model to specify
the zone boundaries. For example, you could configure precipitation values
2), and so on. The way to get this table is to make a variable from the
parameter.
5. Output precipitation zones- This is the location where you want the
As you build your model, you will need to configure some settings that will not be
exposed as parameters. These include the clip feature, which is the state of
Nebraska outline Nebraska.shp in your Lesson 1 data folder. There are many
other settings such as "Z Value field" and "Input barrier polyline features" (for
IDW) or "Reclass field" (for Reclassify) that should not be exposed as parameters.
You should just set these values once when you build your model. If you ever ask
someone else to run this model, you don't want them to be overwhelmed with
choices stemming from every tool in the model; you should just expose the
essential things they might want to change.
For this particular model, you should assume that any input dataset will conform to
the same schema as your Precip2008Readings.shp feature class. For example, an
analyst should be able to submit a similar Precip2009Readings dataset with the
same fields, field names, and data types. However, he or she should not expect to
provide any feature class with a different set of fields and field names, etc. As you
might discover, handling all types of feature class schemas would make your model
more complex than we want for this assignment.
When you double-click the model to run it, the interface should look like the
following:
Deliverables
The .tbx file of the toolbox containing your model. The easiest way to find it
note the Location. If you can't browse to this path in Windows Explorer,
you'll need to enable the Windows option to show hidden files and folders.
A screen capture of the model interface before you run the model (it should
look a lot like the above image, although you can set your own
A screen capture of your the model result in ArcMap, with zones symbolized
in different colors. You don't have to use the Layout view for this project.
Tips
The following tips may help you as you build your model:
Your model needs to include the following tools in this order: IDW (from
An easy way to find the tools you need in ArcMap is to click Windows >
Search and type the name of the tool you want in the search box. Be careful
when multiple tools have the same name. You'll typically be using tools from
Once you drag and drop a tool onto the ModelBuilder canvas, double-click it
and set all the parameters the way you want. These will be the default
If there is a certain parameter for a tool that you want to expose as a model
parameter, right-click the tool in the ModelBuilder canvas, then click Make
Variable > From Parameter and choose the parameter. Once the oval
If you receive errors that a tool is not able to run, or that no Spatial Analyst
click Customize > Extensions and then check the Spatial Analyst
checkbox.
Project 1, Part II: Creating contours
for the Fox Lake DEM
The second part of Project 1 will help you get some practice with Python. At the end
of Lesson 1, you saw three simple scripting examples; now your task is to write your
own script. This script will create vector contour lines from a raster elevation
dataset. Don't forget that the ArcGIS Desktop Help [21] can indeed be helpful if you
need to figure out the syntax for a particular command.
Earlier in the lesson you were introduced to the Fox Lake DEM in your Lesson 1
data folder. It represents elevation in the Fox Lake Quadrangle, Utah. Write a
script that uses the Contour tool in the Spatial Analyst toolbox to create contour
lines for the quadrangle. The contour interval should be 25 meters and the base
contour should be 0. Remember that the native units of the DEM are meters, so no
unit conversions are required.
Running the script should immediately create a shapefile of contour lines on disk.
The purpose of this exercise is just to get you some practice writing Python
get the input parameters. Go ahead and hard-code the values (such as the
Consequently, you are not required to create a script tool for this exercise.
Your code should run correctly from PythonWin. For full credit, it should
also contain comments, attempt to handle errors, and use legal and intuitive
variable names.
Deliverables
project and how you approached the problem. These writeups will be
Finishing Lesson 1
To complete Lesson 1, please zip all your Project 1 deliverables (for parts I and II)
into one file and submit them to the Lesson 1 Drop Box in ANGEL. Then take the
Lesson 1 Quiz if you haven't taken it already.
Penn State Professional Masters Degree in GIS: Winner of the 2009 Sloan
Consortium award for Most Outstanding Online Program
Please address questions and comments about this resource to the site editor.
Links:
[1]
http://training.esri.com/acb2000/showdetl.cfm?DID=6&Product_ID=971
[2] https://www.e-education.psu.edu/drupal6/files/geog485py/data/Lesson1.zip
[3] http://webhelp.esri.com
[4]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/A_quick_tour_o
f_managing_intermediate_data/002w0000000z000000/
[5]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0021/002100000037000000
.htm
[6]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Setting_Current
_and_Scratch_Workspace_environments/002w00000037000000/
[7]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00570000000
q000000.htm
[8]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002w/002w0000001w00000
0.htm
[9]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00080000001
9000000.htm
[10] http://www.python.org
[11] http://webhelp.esri.com/arcgisdesktop/9.3/pdf/Geoprocessor_93.pdf
[12]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00p6/00p60000000r00000
0.htm
[13]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0015/001500000006000000
.htm
[14] https://www.e-education.psu.edu/geog485/../L01_Prac1.html
[15] https://www.e-education.psu.edu/geog485/../L01_Prac2.html
[16] https://www.e-education.psu.edu/geog485/../L01_Prac3.html
[17]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v0000002600000
0.htm
[18] https://www.e-education.psu.edu/geog485/../L01_Prac4.html
[19] http://www.prismclimate.org
[20]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_IDW_wor
ks/009z00000075000000/
[21] http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html
Lesson 2: Python and programming
basics
In Lesson 1 you received an introduction to Python. Lesson 2 builds on that
experience, diving into Python fundamentals. Many of the things you'll learn are
common to programming in other languages. If you already have coding
experience, this lesson may contain some review.
This lesson has a relatively large amount of reading from the course materials, the
Zandbergen text, and the ArcGIS help. I believe you will get a better understanding
of the Python concepts as they are explained and demonstrated from several
different perspectives. Whenever the examples use the Interactive Window, I
strongly suggest that you type in the code yourself as you follow the examples. This
can take some time, but you'll be amazed at how much more information you retain
if you try the examples yourself instead of just reading them.
At the end of the lesson you'll be required to write a Python script that puts
together many of the things you've learned. This will go much faster if you've taken
the time to read all the required text and work through the examples.
Lesson 2 checklist
Lesson 2 covers Python fundamentals (many of which are common to other
programming languages) and gives you a chance to practice these in a project. To
complete Lesson 2, you are required to do the following:
C:\WCGIS\Geog485\Lesson2.
3. Read Zandbergen chapters 4 - 6, 11.1 - 11.5, and 11.11. In the online lesson
each of these chapters. There is more reading this lesson than in a typical
week. If you are new to Python, please plan some extra time to read these
chapters. There are also some readings this week from the ArcGIS Help.
4. Complete Project 2 and upload its deliverables to the Lesson 2 drop box. The
In Lesson 1, you saw your first Python scripts and were introduced to the basics,
such as importing modules, using arcpy, working with properties and methods, and
indenting your code in try/catch blocks. In the following sections, you'll learn about
more Python programming fundamentals such as working with lists, looping,
if/then decision structures, manipulating strings, and casting variables.
Although this might not be the most thrilling section of the course, it's probably the
most important section for you to spend time understanding and experimenting
with on your own, especially if you are new to programming.
Learning a programming language is the same way. When faced with a problem,
you'll be forced to draw on your fundamental skills to come up with a workable
plan. You may need to include a loop in your program, store items in a list, or make
the program do one of four different things based on certain user input. If you
know how to do each of these things individually, you'll be able to fit the pieces
together, even if the required task seems daunting.
Take time to make sure you understand what's happening in each line of the code
examples, and if you run into a question, please jot it down and post to the forums.
2.1.1 Lists
In Lesson 1 you learned about some common data types in Python, such as strings
and integers. Sometimes you need a type that can store multiple related values
together. Python offers several ways of doing this, and the first one we'll learn
about is the list.
Here's a simple example of a list. You can type this in the PythonWin Interactive
Window to follow along:
This list named 'suits' stores four related string values representing the suits in a
deck of cards. In many programming languages, storing a group of objects in
sequence like this is done with arrays. While the Python list could be thought of as
an array, it's a little more flexible than the typical array in other programming
languages. This is because you're allowed to put multiple data types into one list.
For example, suppose we wanted to make a list for the card values you could draw.
The list might look like this:
Notice that you just mixed string and integer values in the list. Python doesn't care.
However, each item in the list still has an index, meaning an integer that denotes
each item's place in the list. The list starts with index 0 and for each item in the list,
the index increments by one. Try this:
>>>print suits[0]
Spades
>>>print values[12]
King
In the above lines, you just requested the item with index 0 in the suits list and got
'Spades'. Similarly, you requested the item with index 12 in the values list and got
'King'.
It may take some practice initially to remember that your lists start with a 0 index.
Testing your scripts can help you avoid off-by-one errors that might result from
forgetting that lists are zero-indexed. For example, you might set up a script to
draw 100 random cards and print the values. If none of them is an Ace, you've
probably stacked the deck against yourself by making the indices begin at 1.
Remember you learned that everything is an object in Python? That applies to lists
too. In fact, lists have a lot of useful methods that you can use to change the order
of the items, insert items, sort the list, and so on. Try this:
>>> suits = ['Spades', 'Clubs', 'Diamonds', 'Hearts']
>>> suits.sort()
>>> print suits
['Clubs', 'Diamonds', 'Hearts', 'Spades']
Notice that the items in the list are now in alphabetical order. You may have also
noticed that when you typed "suits" you got some help to see what methods were
available for the list. This is called autocompletion, and it can be a great help when
you're writing code. You want to understand what you can do and cannot do. It can
also be a way of avoiding typing errors because you can arrow down and press Tab
to insert the method you want. The autocompletion is a feature of PythonWin, but
this type of help can be found in other integrated development environments
(IDEs) like Microsoft Visual Studio. Microsoft has branded their version of
autocompletion, "IntelliSense," and this name has been catchy enough that you
may hear people using it conversationally even when using non-Microsoft IDEs.
The sort() method you used above allowed you to do something in one line of code
that would have otherwise taken many lines. Another helpful method like this is
reverse(), which allows you to sort a list in reverse alphabetical order:
>>> suits.reverse()
>>> print suits
['Spades', 'Hearts', 'Diamonds', 'Clubs']
Before you attempt to write list-manipulation code, check your textbook or the
Python list reference documentation [2] to see if there's an existing method that
might simplify your work.
What happens when you want to combine two lists? Type this in the Interactive
Window:
Notice that you did not get [205,207,209]; rather, Python treats the addition as
appending listTwo to list One. Next, try these other ways of adding items to the list:
If you need to insert some items in the middle of the list, you can use the insert()
method:
Notice that the insert() method above took two parameters. You might have even
noticed a tooltip that shows you what the parameters mean.
The first parameter is the index position that the new item will take. This method
call inserts 999 between 104 and 105. Now 999 is at index 4.
Sometimes you'll need to find out how many items are in a list, particularly when
looping. Here's how you can get the length of a list:
Notice that len() gives you the exact number of items in the list. To get the index of
the final item, you would need to use len(myList) - 1. Again, this distinction can
lead to off-by-one errors if you're not careful.
Lists are not the only way to store ordered collections of items in Python; you can
also use tuples and dictionaries. Tuples are like lists, but you can't change the
objects inside a tuple over time. In some cases a tuple might actually be a better
structure for storing values like the suits in a deck of cards, because this is a fixed
list that you wouldn't want your program to change by accident.
Dictionaries differ from lists in that items are not indexed; instead, each item is
stored with a key value which can be used to retrieve the item. We'll use
dictionaries later in the course, and your reading assignment for this lesson covers
dictionary basics. The best way to understand how dictionaries work is to play with
some of the textbook examples in the Interactive Window (see Zandbergen 6.8).
2.1.2 Loops
A loop is a section of code that repeats an action. Remember, the power of scripting
(and computing in general) is the ability to quickly repeat a task that might be
time-consuming or error-prone for a human. Looping is how you repeat tasks with
code; whether its reading a file, searching for a value, or performing the same
action on each item in a list.
for Loop
A for loop does something with each item in a list. Type this in the PythonWin
Interactive Window to see how a simple for loop works:
After typing this, you'll have to hit Enter twice in a row to tell PythonWin that you
are done working on the loop and that the loop should be executed. You should see:
Notice a couple of important things about the loop above. First, you declared a new
variable, "name," to represent each item in the list as you iterated through. This is
okay to do; in fact, it's expected that you'll do this at the beginning of the for loop.
The second thing to notice is that after the condition, or the first line of the loop,
you typed a colon (:), then started indenting subsequent lines. Some programming
languages require you to type some kind of special line or character at the end of
the loop (for example, "Next" in Visual Basic, or "}" in JavaScript), but Python just
looks for the place where you stop indenting. By pressing Enter twice, you told
Python to stop indenting and that you were ready to run the loop.
for Loops can also work with lists of numbers. Try this one in the Interactive
Window:
>>> x = 2
>>> multipliers = [1,2,3,4]
>>> for num in multipliers:
print x * num
2
4
6
8
In the loop above, you multiplied each item in the list by 2. Notice that you can set
up your list before you start coding the loop.
You could have also done the following with the same result:
>>> multipliers = [1,2,3,4]
>>> for num in multipliers:
x = 2
print x * num
The above code, however, is less efficient than what we did initially. Can you see
why? This time you are declaring and setting the variable x = 2 inside the loop. The
Python interpreter will now have to read and execute that line of code four times
instead of one. You might think this is a trivial amount of work, but if your list
contained thousands or millions of items the difference in execution time would
become noticeable. Declaring and setting variables outside a loop, whenever
possible, is a best practice in programming.
While we're on the subject, what would you do if you wanted to multiply 2 by every
number from 1 to 1000? It would definitely be too much typing to manually set up
a multipliers list as in the previous example. In this case, you can use Python's
built-in range function. Try this:
>>> x = 2
>>> for num in range(1,1001):
print x * num
The range function is your way of telling Python, "Start here and stop there." We
used 1001 because the loop stops one item before the function's second argument
(the arguments are the values you put in parentheses to tell the function how to
run). If you need the function to multiply by 0 at the beginning as well, you could
even get away with using one argument:
>>> x = 2
>>> for num in range(1001):
print x * num
The range function has many interesting uses which are detailed in this section's
reading assignment in Lutz.
while Loops
A while loop executes until some condition is met. Here's how to code our example
above using a while loop:
>>> x = 0
>>> while x < 1001:
... print x * 2
... x += 1
while loops often involve the use of some counter that keeps track of how many
times the loop has run. Sometimes you'll perform operations with the counter. For
example, in the above loop, x was the counter, and we also multiplied the counter
by 2 each time during the loop. To increment the counter we used x += 1 which is
shorthand for x = x + 1, or "add one to x".
Nesting loops
Some situations call for putting one loop inside another, a practice called nesting.
Nested loops could help you print every card in a deck (minus the Jokers):
In the above example you start with a suit, then loop through each value in the suit,
printing out the card name. When you've reached the end of the list of values, you
jump out of the nested loop and go back to the first loop to get the next suit. Then
you loop through all values in the second suit and print the card names. This
process continues until all the suits and values have been looped through.
You will use looping repeatedly (makes sense!) as you write GIS scripts in Python.
Often you'll need to iterate through every row in a table, every field in a table, or
every feature class in a folder or a geodatabase. You might even need to loop
through the vertices of a geographic feature.
You saw above that loops work particularly well with lists. arcpy has some methods
that can help you create lists. Here's an example you can try that uses
arcpy.ListFeatureClasses(). First, manually create a new folder
C:\WCGIS\Geog485\Lesson2\PracticeData. Then copy the code below into a new
script in PythonWin and run the script. The script copies all the data in your
Lesson1 folder into the new Lesson2\PracticeData folder you just created.
try:
arcpy.env.workspace = "C:/WCGIS/Geog485/Lesson1"
# Loop through the list and copy the feature classes to the Lesson 2
PracticeData folder
for featureClass in fcList:
arcpy.CopyFeatures_management(featureClass,
"C:/WCGIS/Geog485/Lesson2/PracticeData/" + featureClass)
except:
print "Script failed to complete"
print arcpy.GetMessages(2)
Notice above that once you have a Python list of feature classes (fcList), it's very
easy to set up the loop condition (for featureClass in fcList:).
Another common operation in GIS scripts is looping through tables. In fact, the
arcpy module contains some special objects called cursors that help you do this.
Here's a short script showing how a cursor can loop through each row in a feature
class and print the name. We'll cover cursors in detail in the next lesson, so don't
worry if some of this code looks confusing right now. The important thing is to
notice how a loop is used to iterate through each record:
import arcpy
inTable = "C:/WCGIS/Geog485/Lesson2/CityBoundaries.shp"
inField = "NAME"
rows = arcpy.SearchCursor(inTable)
In the above example, a search cursor named rows retrieves records from the table.
The for loop makes it possible to perform an action on each individual record.
>>> x = 3
>>> if x > 2:
... print "Greater than two"
...
Greater than two
In the above example, the keyword "if" denotes that some conditional test is about
to follow. In this case, the condition of x being greater than two was met, so the
script printed "Greater than two." Notice that you are required to put a colon (:)
after the condition and indent any code executing because of the condition. For
consistency in this class, all indentation is done using four spaces.
Using "else" is a way to run code if the condition isn't met. Try this:
>>> x = 1
>>> if x > 2:
... print "Greater than two"
... else:
... print "Less than or equal to two"
...
Less than or equal to two
Notice that you don't have to put any condition after "else." It's a way of catching all
other cases. Again the conditional code is indented four spaces, which makes the
code very easy for a human to scan. The indentation is required because Python
doesn't require any type of "end if" statement (like many other languages) to
denote the end of the code you want to execute.
If you want to run through multiple conditions, you can use "elif", which Python's
abbreviation for "else if":
>>>x = 2
>>> if x > 2:
... print "Greater than two"
... elif x == 2:
... print "Equal to two"
... else:
... print "Less than two"
...
Equal to two
In the code above, "elif x == 2:" tests whether x is equal to two. The == is a way to
test whether two values are equal. Using a single = in this case would result in an
error because = is used to assign values to variables. In the code above, you're not
trying to assign x the value of 2, you want to check if x is already equal to 2, hence
you use ==.
You can also use if, elif, and else to handle multiple possibilities in a set. The code
below picks a random school from a list (notice we had to import the random
module to do this and call a special method random.randrange()). After the school
is selected and its name is printed, a series of if/elif/else statements appears that
handles each possibility. Notice that the else statement is left in as an error
handler; you should not run into that line if your code works properly, but you can
leave the line in there to fail gracefully if something goes wrong.
import random
Some other programming languages have special keywords for doing the above,
such as switch or select case. In Python, however, it's usually just done with a long
list of "if"s and "elif"s.
Python has some very useful string manipulation abilities. We won't get into all of
them in this course, but following are a few techniques that you need to know.
Concatenating strings
To concatenate two strings means to append or add one string on to the end of
another. For example, you could concatenate the strings "Python is " and "a
scripting language" to make the complete sentence "Python is a scripting
language." Since you are adding one string to another, it's intuitive that in Python
you can use the + sign to concatenate strings.
You may need to concatenate strings when working with path names. Sometimes
it's helpful or required to store one string representing the folder or geodatabase
from which you're pulling datasets and a second string representing the dataset
itself. You put both together to make a full path.
The following example, modified from one in the ArcGIS Help, demonstrates this
concept. Suppose you already have a list of strings representing feature classes that
you want to clip. The list is represented by "featureClasses" in this script:
inFolder = "c:\\data\\inputShapefiles\\"
resultsFolder = "c:\\data\\results\\"
clipFeature = "c:\\data\\states\\Nebraska.shp"
The above example shows that string concatenation can be useful in looping.
Constructing the output path by using a set workspace or folder name followed by a
feature class name from a list gives much more flexibility than trying to create
output path strings for each dataset individually. You may not know how many
feature classes are in the list or what their names are. You can get around that if
you construct the output paths on the fly through string concatenation.
Casting to a string
Sometimes in programming you have a variable of one type that needs to be treated
as another type. For example, 5 can be represented as a number or as a string.
Python can only perform math on 5 if it is treated as a number, and it can only
concatenate 5 onto an existing string if it is treated as a string.
x = 0
while x < 10:
print x
x += 1
print "You ran the loop " + x + " times."
Now try to run it. The script attempts to concatenate strings with the variable x to
print how many times you ran a loop, but it results in an error: "TypeError: cannot
concatenate 'str' and 'int' objects." Python doesn't have a problem when you want
to print the variable x on its own, but Python cannot mix strings and integer
variables in a printed statement. To get the code to work, you have to cast the
variable x to a string when you try to print it.
x = 0
while x < 10:
print x
x += 1
You can force Python to think of x as a string by using str(x). Python has other
casting functions such as int() and float() that you can use if you need to go from a
string to a number. Use int() for integers and float() for decimals.
Readings
It's time to take a break and do some readings from another source. If you are new
to Python scripting this will help you see the concepts from a second angle.
Read Zandbergen chapters 4 - 6. This can take a few hours but it will save you
hours of time if you make sure you understand this material now.
Chapter 4 covers the basics of Python syntax, loops, strings and other things
we just learned.
Chapter 5 talks about working with arcpy and ArcGIS tools, which you
If you still don't feel like you understand the material after reading the above
chapters, don't re-read it just yet. Try some coding from the Lesson 2 practice
exercises and assignments, then come back and re-read if necessary. If you are
really struggling with a particular concept, type the examples in the interactive
window. Programming is like a sport in the sense that you cannot learn all about it
by reading; at some point you have to get up and do it.
2.1.5 Putting it all together
In this section of the lesson you've learned the basic programming concepts of lists,
loops, decision structures, and string manipulation. You might be surprised at what
you can do with just these skills. In this section, we'll practice putting them all
together to address a scenario. This will give us an opportunity to talk about
strategies for approaching programming problems in general.
Remove 1 cherry
Remove 2 cherries
Remove 3 cherries
Remove 4 cherries
You continue taking turns until you have 0 cherries left on your tree, at which point
you have won the game. Your objective here is to write a script that simulates the
game, printing the following:
The number of cherries on your tree after each turn. This must always be
Although this example may seem juvenile, it's an excellent way to practice
everything you just learned. As a beginner, you may seem overwhelmed by the
above problem. A common question is, "Where do I start?" The best approach is to
break down the problem into smaller chunks of things you know how to do.
One of the most important programming skills you can acquire is the ability to
verbalize a problem and translate it into a series of small programming steps.
Here's a list of things you would need to do in this script. Programmers call this
pseudocode because it's not written in code, but it follows the sequence their code
will need to take.
6. Take another turn or print the number of turns it took to win the game
It also helps to list the variables you'll need to keep track of:
Let's try to address each of the pseudocode steps. Don't worry about the full flow of
the script yet. Rather, try to understand how each step of the problem should be
solved with code. Assembling the blocks of code at the end is relatively trivial.
How do you simulate a random spin? In one of our previous examples, we used the
random module to generate a random number within a range of integers; however,
the choices on this spinner are not linear. A good approach here is to store all spin
possibilities in a list and use the random number generator to pick the index for
one of the possibilities. On its own, the code would look like this:
import random
spinnerChoices = [-1, -2, -3, -4, 2, 2, 10]
spinIndex = random.randrange(0, 7)
spinResult = spinnerChoices[spinIndex]
The list spinnerChoices holds all possible mathematical results of a spin (remove 1
cherry, remove 2 cherries, etc.). The final value 10 represents the spilled bucket
(putting all cherries back on the tree).
You need to pick one random value out of this list to simulate a spin. The variable
spinIndex represents a random integer from 0 to 6 that is the index of the item
you'll pull out of the list. For example, if spinIndex turns out to be 2, your spin is -3
(remove 3 cherries from the tree). The spin is held in the variable spinResult.
Once you have a spin result, it only takes one line of code to print it. You'll have to
use the str() method to cast it to a string, though.
As mentioned above, you need to have some variable to keep track of the number of
cherries on your tree. This is one of those variables that it helps to name intuitively:
cherriesOnTree = 10
After you complete a spin, you need to modify this variable based on the result.
Remember that the result is held in the variable spinResult and that a negative
spinResult removes cherries from your tree. So your code to modify the number of
cherries on the tree would look like:
cherriesOnTree += spinResult
If you win the game you have 0 cherries. You don't have to reach 0 exactly, but it
doesn't make sense to say that you have negative cherries. Similarly, you might spin
the spilled bucket, which for simplicity we represented with positive 10 in the
spinnerChoices. You are not allowed to have more than 10 cherries on the tree.
A simple if/elif decision structure can help you keep the cherriesOnTree within 0
and 10:
if cherriesOnTree > 10:
cherriesOnTree = 10
elif cherriesOnTree < 0:
cherriesOnTree = 0
This means, if you wound up with more than 10 cherries on the tree, set
cherriesOnTree back to 10. If you wound up with fewer than 0 cherries, set
cherriesOnTree to 0.
All you have to do for this step is to print your cherriesOnTree variable, casting it to
a string so it can legally be inserted into a sentence.
Take another turn or print the number of turns it took to win the game
You probably anticipated that you would have to figure out a way to take multiple
turns. This is the perfect scenario for a loop.
What is the loop condition? There have to be some cherries left on the tree in order
to start another turn, so you could begin the loop this way:
Much of the code we wrote above would go inside the loop to simulate a turn. Since
we need to keep track of the number of turns taken, at the end of the loop we need
to increment a counter:
turns += 1
This turns variable would have to be initialized at the beginning of the script,
before the loop.
This code could print the number of turns at the end of the game:
print "It took you " + str(turns) + "turns to win the game."
Final code
Your only remaining task is to assemble the above pieces of code into a script.
Below is an example of how the final script would look. Copy this into a new
PythonWin script and try to run it:
import random
spinnerChoices = [-1, -2, -3, -4, 2, 2, 10]
turns = 0
cherriesOnTree = 10
turns += 1
Review the final code closely and consider the following things.
The first thing you do is import whatever supporting modules you need, in this case
it's the random module.
Next, you declare the variables that you'll use throughout the script. Each variable
has a scope, which determines how broadly it is used throughout the script. The
variables spinnerChoices, turns, and cherriesOnTree are needed through the entire
script, so they are declared at the beginning, outside the loop. Variables used
throughout your entire program like this have global scope. On the other hand, the
variables spinIndex and spinResult have local scope because they are used only
inside the loop. Each time the loop runs, these variables are re-initialized and their
values change.
You could potentially declare the variable spinnerChoices inside the loop and get
the same end result, but performance would be slower because the variable would
have to be re-initialized every time you ran the loop. When possible, you should
declare variables outside loops for this reason.
If you had declared the variables turns or cherriesOnTree inside the loop, your code
would have logical errors. You would essentially be starting the game anew on
every turn with 10 cherries on your tree, having taken 0 turns. In fact, you would
create an infinite loop because there is no way to remove 10 cherries during one
turn, and the loop condition would always evaluate to true. Again, be very careful
about where you declare your variables when the script contains loops.
Notice that the total number of turns is printed outside the loop once the game has
ended. The final line lastline = raw_input(">") gives you an empty cursor
prompting for input and is just a trick to make sure the application doesn't
disappear when it's finished (if you run the script from a command console).
Summary
In the above example, you saw how lists, loops, decision structures, and variable
casting can work together to help you solve a programming challenge. You also
learned how to approach a problem one piece at a time and assemble those pieces
into a working script. You'll have a chance to practice these concepts on your own
during this week's assignment. The next and final section of this lesson will provide
you with some sources of help if you get stuck.
Challenge activity
If the above activity made you enthusiastic about writing some code yourself, take
the above script and try to find the average number of turns it takes to win a game
of Hi-Ho! Cherry-O. To do this, add another loop that runs the game a large
number of times, say 10000. You'll need to record the total number of turns
required to win all the games, then divide by the number of games (use "/" for the
division). Send me your final result and I'll let you know if you've found the correct
average.
The best candidates for software engineering jobs are not the ones who list the
most languages or acronyms on their resumes. Instead, the most desirable
candidates are self-sufficient, meaning they know how to learn new things and find
answers to problems on their own. This doesn't mean that they never ask for help;
on the contrary, a good programmer knows when to stop banging his or her head
against the wall and consult peers or a supervisor for advice. However, most
everyday problems can be solved using the help documentation, online code
examples, online forums, existing code that works, programming books, and
debugging tools in the software.
Suppose you're in a job interview and your prospective employer asks, "What do
you do when you run into a 'brick wall' when programming? What sources do you
first go to for help?" If you answer, "My supervisor" or "My co-workers," this is a
red flag signifying that you could be a potential time sink to the development team.
Although the more difficult problems require group collaboration, a competitive
software development team cannot afford to hold an employee's hand through
every issue that he or she encounters. From the author's experience, many of the
most compelling candidates answer this question, "Google." They know that most
programming problems, although vexing, are common and the answer may be at
their fingertips in less than 30 seconds through a well-phrased Internet search.
Believe it or not, this can actually be faster than walking down the hall and asking a
co-worker, and it saves everybody time.
In this section of the lesson, you'll learn about places where you can go for help
when working with Python and when programming in general. You will have a
much easier experience in this course if you remember these resources and use
them as you complete your assignments.
Your code doesn't run at all, usually because of a syntax error (you typed
Your code runs, but the script doesn't complete and reports an error.
Your code runs, but the script never completes. Often this occurs when
result. This is called a logical error and it is often the type of error that takes
Errors happen. There are very few programmers who can sit down and, off the top
of their heads, write dozens of lines of bug free code. This means a couple of things
for you:
Expect to spend some time dealing with errors during the script-writing
this takes. To get an initial estimate, you can take the amount of time it takes
Don't be afraid to run your script and hit the errors. A good strategy is to
write a small piece of functionality, run it to make sure its working, then add
on the next piece. It's less effective to write dozens of lines of code off the top
of your head before you ever run the script. Think of it this way: it's much
harder to find the errors in 50 new lines of code than in 10 new lines of code.
If you're building your script piece by piece and debugging often, you'll have
Syntax errors occur when you typed something incorrectly and your code refuses to
run. Common syntax errors include forgetting a colon when setting a loop or an if
condition, using single backslashes in a file name, providing the wrong number of
arguments to a function, or trying to mix variable types incorrectly, such as
dividing a number by a string.
When you try to run code with a syntax error in PythonWin, you may not notice
anything happen. At the bottom of the window, look for a message such as "Failed
to run script - syntax error - invalid syntax."
Sometimes the message is clearer. For example, if you indent a line only three
spaces instead of four, you get: "Failed to run script - syntax error - unexpected
indent."
You can check for syntax errors before you run your code using the Check button
on the PythonWin Standard toolbar. This button checks for errors and reports
them in a small message at the bottom of the window, just as you would see if you
tried to run your code with a syntax error. If there are no errors, you'll see a
message such as "Python and the TabNanny successfully checked the file
'myScript.py.'" (The TabNanny is a module that PythonWin uses to check for
correct indentation.)
If your code crashes, you may see an error message in the Interactive Window or
the console. Instead of allowing your eyes to glaze over or banging your head
against the desk, you should rejoice at the fact that the software possibly reported
to you exactly what went wrong! Scour the message for clues as to what line of code
caused the error and what the problem was. Do this even if the message looks
intimidating. For example, see if you can understand what caused this error
message:
Although the message begins with some content you probably don't understand
and contains a typo ("modulo"), you can reasonably guess that the error was caused
in Line 4: x = x / 0. Dividing by 0 is not possible and the computer won't try to do
it.
It's easier to interpret messages like this if you've displayed line numbers for your
code in PythonWin. To get the line numbers:
The line numbers are also helpful if you make an e-mail or forum posting about
your code and include the script. You can immediately point out the line of the
crash to your colleagues. If you e-mail code to the instructors during this course
asking for help, be prepared to get a response pointing out specific line numbers
that need attention.
Ad-hoc debugging
Sometimes it's easy to sprinkle a few 'print' statements throughout your code to
figure out how far it got before it crashed, or what's happening to certain values in
your script as it runs. This can also be helpful to verify that your loops are doing
what you expect and that you are avoiding off-by-one errors.
Suppose you are trying to find the mean (average) value of the items in a list with
the code below.
list = [22,343,73,464,90]
The script reports "Average is 18," which doesn't look right. From a quick visual
check of this list you could guess that the average would be over 100. The script
isn't erroneously getting the number 18 from the list; it's not one of the values. So
where is it coming from? You can place a few strategic print statements in the
script to get a better report of what's going on:
list = [22,343,73,464,90]
print len(list)
average = total / len(list)
print "Performing division..."
print "Average is " + str(average)
Now when you run the script you see.
Processing loop...
22
Processing loop...
343
Processing loop...
73
Processing loop...
464
Processing loop...
90
5
Performing division...
Average is 18
The error now becomes more clear. The running total isn't being kept successfully;
instead, it's resetting each time the loop runs. This causes the last value, 90, to be
divided by 5, yielding an answer of 18. You need to initialize the variable for the
total outside the loop to prevent this from happening. After fixing the code and
removing the print statements, you get:
list = [22,343,73,464,90]
total = 0
The resulting "Average is 198" looks a lot better. You've fixed a logical error in your
code: an error that doesn't make your script crash, but produces the wrong result.
Although debugging with print statements is quick and easy, you need to be careful
with it. Once you've fixed your code, you need to remember to remove the
statements in order to make your code faster and less cluttered. Also, adding print
statements becomes impractical for long or complex scripts. You can pinpoint
problems more quickly and keep track of many variables at a time using the
PythonWin debugger, which is covered in the next section of this lesson.
The best way to explain the aspects of debugging is to work through an example.
This time we'll look at some code that tries to calculate the factorial of an integer
(the integer is hard-coded to 5 in this case). In mathematics, a factorial is the
product of an integer and all positive integers below it. Thus, 5! (or "5 factorial")
should be 5 * 4 * 3 * 2 * 1 = 120.
The code below attempts to calculate a factorial through a loop that increments the
multiplier by 1 until it reaches the original integer. This is a valid approach since 1 *
2 * 3 * 4 * 5 would also yield 120.
number = 5
multiplier = 1
print number
Even if you can spot the error, follow along with the steps below to get a feel for the
debugging process and the PythonWin Debugging toolbar.
1. Open PythonWin and copy the above code into a new script.
the script, but you won't get a result and you may have to shut down
3. Click View > Toolbars and ensure Debugging is checked. Many IDEs
have debugging toolbars like this, and the tools they contain are pretty
standard: a way to run the code, a way to set breakpoints, a way to step
through the code line by line, and a way to watch the value of variables while
stepping through the code. We'll cover each of these in the steps below.
4. Set your cursor on the first line (number = 5) and click the Toggle
to stop running so you can examine it line by line using the debugger. Often
you'll set a breakpoint deep in the middle of your script so you don't have to
examine every single line of code. In this example, the script is very short so
represented by a circle next to the line of code and this is common in other
debuggers too.
5. Press the Go button . This runs your script up to the breakpoint. You
now have a small yellow arrow indicating which line of the script you are
about to run.
appears. This will help you track what happens to your variables as you
execute the code line by line. Before you run any more code, however, you
Item> and type the name of your first variable "number" (omit the quotes).
In the Value column you'll see "NameError: name 'number' is not defined."
This makes sense because you haven't run the line of code yet that creates
this variable.
8. Similar to the previous step, click <New Item> again and set up a watch
for the "multiplier" variable. You should get the same error about the
your watch window that the variable "number" now has a value of 5.
10. Click the Step button again. This time the "multiplier" variable has been
assigned a value.
11. Click the Step button a few more times to cycle through the loop. Go slowly
and use the watch window to understand the effect that each line has on the
two variables.
12. Step through the loop until "multiplier" reaches a value of 10. It should be
obvious at this point that the loop has not exited at the desired point. Our
intent was for it to quit when "number" reached 120.
Can you spot the error now? The fact that the loop has failed to exit should
draw your attention to the loop condition. The loop will only exit when
"multiplier" is greater than or equal to "number." That is obviously never
going to happen as "number" keeps getting bigger and bigger as it is
multiplied each time through the loop.
In this example, the code contained a logical error. It re-used the variable for
which we wanted to find the factorial (5) as a variable in the loop condition,
without considering that the number would be repeatedly increased within
the loop. Changing the loop condition to the following would cause the script
to work:
Even better than hard-coding the value 5 in this line would be to initialize a
variable early and set it equal to the number whose factorial we want to find.
The number could then get multiplied independent of the loop condition
variable.
13. Close PythonWin and re-open to a new script. Paste in the code below and
save the script as debugger_walkthrough2.py.
14. # This script calculates the factorial of a given
17.
18. number = 5
20. multiplier = 1
21.
24. multiplier += 1
25.
print number
26. Display the Debugging toolbar and step through the loop a few times as
you did above. Watch the values of the "number" and "multiplier" variables,
but this time, also add a watch on the "loopStop" variable. This variable
increases to 120.
27. Keep stepping until "number" reaches 120 and you reach the "print number"
line. At this line, don't press the Step button; instead, just press Go to finish
out the script. (You don't want to step through all the internal Python code
required to print the variable.) At this point the value of "number" should be
120, which is 5 factorial. If you want, you can try substituting other integers
In the above example you used the Debugging toolbar to find a logical error that
had caused an endless loop in your code. Debugging tools are often your best
resource for hunting down subtle errors in your code.
If you reach an internal Python function such as print while using the Debugger,
the debugger will dive right into all the Python code needed to run the function.
You'll know when this happens because you'll see one or more windows open with
code that's difficult to understand. This is also the case sometimes when you run
arcpy functions.The problem is compounded because this type of code tends to call
other functions, which winds up opening many windows.
If you don't want to see all this code, you can try shortcutting around it by using the
Step Over button to jump over a complex function or Step Out to get out
of the function. If stepping over or through or out of all that code is too confusing,
you can set another breakpoint one or two lines beyond the line with the function
and just press the Go button again to run to that next breakpoint. When you press
the Go button, the debugger doesn't stop until it hits the next breakpoint.
You can and should practice using the Debugging toolbar in the script-writing
assignments that you receive in this course. You may save a lot of time this way. As
a teaching assistant in a university programming lab years ago, the author of this
course saw many students wait a long time to get one-on-one help, when a simple
walk through their code using the debugger would have revealed the problem.
Readings
Read Zandbergen 11.1 - 11.5 to get his tips for debugging. Then read 11.11 and dog-
ear this section as a checklist for you to review any time you hit a problem in your
code during the next few weeks.
Esri has configured its geoprocessing tools to frequently report what they're doing.
When you run a geoprocessing tool from ArcMap or ArcCatalog, you see a box with
these messages, sometimes accompanied by a progress bar. You learned in Lesson 1
that you can use arcpy.GetMessages() to access these messages from your script. If
you only want to view the messages when something goes wrong, you can include
them in an except block of code, like this.
try:
. . .
except:
print arcpy.GetMessages()
Geoprocessing messages have three levels of severity: Message, Warning, and
Error. You can pass an index to the arcpy.GetMessages() method to filter through
only the messages that reach a certain level of severity. For example,
arcpy.GetMessages(2) would return only the messages with a severity of "Error."
Error and warning messages sometimes include a unique code that you can use to
look up more information about the message. The ArcGIS Desktop Help contains
topics that list the message codes and provide details on each. Some of the entries
have tips for fixing the problem.
Further reading
Please take a look at the official ArcGIS documentation for more detail about
geoprocessing messages. Be sure to read these topics:
gateway into the error and warning reference section of the help that
explains all the error codes. Sometimes you'll see these codes in the
messages you get back, and the specific help topic for the code can help you
understand what went wrong. The article also talks about how you can trap
for certain conditions in your own scripts and cause specific error codes to
appear. This type of thing is optional, for over and above credit, in this
course.
Drawing on the resources below takes time and effort. Many people don't like
combing through computer documentation, and this is understandable. However,
you may ultimately save time if you look up the answer for yourself instead of
waiting for someone to help you. Even better, you will have learned something new
from your own experience, and things you learn this way are much easier to
remember in the future.
Sources of help
Search engines - Search engines are useful for both quick answers and
obscure problems. Did you forget the syntax for a loop? The quickest remedy
may be to Google "for loop python" or "while loop python" and examine one
of the many code examples returned. Search engines are extremely useful
for diagnosing error messages. Google the error message in quotes and you
can read experiences from others who have had the same issue. If you don't
get enough hits, remove the quotes to broaden the search.
One risk you run from online searches is finding irrelevant information.
Even more dangerous is using irrelevant information. Research any sample
code to make sure it is applicable to the version of Python you're using.
Some syntax in Python 3.x is different from the Python 2.x that you're using
in this course.
Esri online help - Esri maintains their entire help system online, and
you'll find most of their scripting topics in the sections Geoprocessing with
Tool Reference [8], which describes every tool in the toolbox and contains
before you do a random Internet search. You will have to visit the
online. Some of it gets very detailed and takes the tone of being written by
programmers for programmers. The part you'll probably find most helpful is
the Python Standard Library reference [10], which is a good place to learn
Before you post a question on the Esri forums, do a little research to make
sure the question hasn't been answered already, at least recently. I also
suggest that you post the question to our class forums first, since your peers
are working on the same problems and you are more likely to find someone
who's familiar with your situation and has found a solution.
There are many other online forums that address GIS or programming
questions. You'll see them all over the Internet if you perform a Google
search on how to do something in Python. Some of these sites are laden with
annoying banner ads or require logins, while others are more immediately
helpful. Stack Exchange [13] is an example of a well-traveled technical
forum, light on ads, that allows readers to promote or demote answers
depending on their helpfulness. One of its child sites, GIS Stack Exchange
[14], specifically addresses GIS and cartography issues.
Class forums - Our course has discussion boards that you may use to
consult your peers about any Python problem that you encounter. I
encourage you to check them often and to participate by both asking and
answering questions. I request that you make your questions focused and
avoid pasting large blocks of code that would rob someone of the benefit of
completing the assignment on their own. Short, focused blocks of code that
solve a specific question are definitely okay. Code blocks that are not copied
directly from your assignment are also okay.
I ask that you try some of the many troubleshooting and help resources
above before you contact me. If the issue is with your code and I cannot
immediately see the problem, the resources we will use to find the answer
will be the same that I listed above: the debugger, printing geoprocessing
messages, looking for online code examples, etc. If you feel unsure about
what you're doing, I'm available to talk through these approaches with you.
Python String objects have an index method that enables you to find a
substring within the larger string. For example, if I had a variable defined as
For this practice exercise, start by creating a list of names like the following:
Harrison"]
Then write code that will loop through all the items in the list, printing a
where the first blank is filled in with the name currently being processed by
the loop and the second blank is filled in with the position of the first space
in the name as returned by the index method. (You should obtain values of
This is a good example in which it is smart to write and test versions of the
script that incrementally build toward the desired result rather than trying
to write the final version in one fell swoop. For example, you might start by
setting up a loop and simply printing each name. If you get that to work, give
yourself a little pat on the back and then see if you can simply print the
positions of the space. Once you get that working, then try plugging the
Build on Exercise 1 by printing each name in the list in the following format:
Last, First
To do this, you'll need to find the position of the space just as before. To
extract part of a string, you can specify the start character and the end
One quirky thing about this syntax is that you need to specify the end
character as 1 beyond the one you really want. The "o" in "Paterno" is really
One handy feature of the syntax is that you may omit the end character
index if you want everything after the start character. Thus, name[4:] will
return the same string as name[4:11] in this example. Likewise, the start
Write a script that accepts a score from 1-100 as an input parameter, then
reports the letter grade for that score. Assign letter grades as follows:
A: 90-100
B: 80-89
C: 70-79
D: 60-69
F: <60
Imagine that you're again working with the Nebraska precipitation data
from Lesson 1 and that you want to create copies of the Precip2008Readings
schema of the 2008 shapefile, but not the data points themselves. Those will
be added later. The tool for automating this kind of operation is the Create
Feature Class tool in the Data Management toolbox. Look up this tool in the
Help system and examine its syntax and the example script. Note the
To complete this exercise, you should invoke the Create Feature Class tool
inside a loop that will cause the tool to be run once for each desired year.
Note that Esri uses some inconsistent casing with this tool and you will have
"class." If you follow the examples in the Geoprocessing Tool Reference help
The data for this practice exercise consists of two file geodatabases: one for
the USA and one for just the state of Iowa. The USA dataset contains
miscellaneous feature classes. The Iowa file geodatabase is empty except for
Your task is to write a script that programmatically clips all the feature
classes in the USA geodatabase to the Iowa state boundary. The clipped
feature classes in the USA geodatabase. For example, if there were 15 feature
classes in the USA geodatabase instead of three, your final code should not
In this project you'll practice Python fundamentals by writing a script that re-
projects the vector datasets in a folder. From this script, you will then create a
script tool that can easily be shared with others.
The tool you will write should look like the image below. It has two input
parameters and no output parameters. The two input parameters are:
2. The path to a vector dataset whose spatial reference will be used in the re-
projection. For example, if you want to re-project into NAD 1983 UTM Zone
10, you would browse to some vector dataset already in NAD 1983 UTM
Zone 10. This could be one of the datasets in the folder you supplied in the
parameters.
Running the tool causes re-projected datasets to be placed on disk in the target
folder.
Requirements
Must re-project shapefile vector datasets in the folder to match the target
dataset's projection.
Must append "_projected" to the end of each projected dataset name. For
example: CityBoundaries_projected.shp.
Must skip projecting any datasets that are already in the target projection.
message, do not include datasets that were skipped because they were
Must not contain any hard-coded values such as dataset names, path names,
or projection names.
Must be made available as a script tool that can be easily run from
Side panel help documentation is provided for your script tool. This means
that when you open the tool dialog and click Show Help, instructions for
each parameter appear on the side. The ArcGIS Desktop Help can teach you
how to do this.
Your script tool uses relative paths to the .py file and is easily deployed
without having to "re-wire" the toolbox to the script. See A structure for
You are not required to handle datum transformations in this script. It is assumed
that each dataset in the folder uses the same datum, although the datasets may be
in different projections. Handling transformations would cause you to have to add
an additional parameter in the Project tool and would make your script more
complicated than you would probably like for this assignment.
Sample data
The Lesson 2 data [1] folder contains a set of vector shapefiles for you to work with
when completing this project (delete any subfolders in your Lesson 2 data folder—
you may have one called PracticeData—before beginning this project). These
shapefiles were obtained from the Washington State Department of Transportation
GeoData Distribution Catalog [22], and they represent various geographic features
around Washington state. For the purpose of this project, I have put these datasets
in various projections. These projections share the same datum (NAD 83) so that
you do not have to deal with datum transformations.
NAD_1983_StatePlane_Washington_South_FIPS_4602
CountyLines - NAD_1983_UTM_Zone_10N
Ferries - USA_Contiguous_Lambert_Conformal_Conic
PopulatedPlaces - GCS_NorthAmerican_1983
Deliverables
A short writeup (about 300 words) describing what you learned during this
project and how you approached the problem. You should include which
requirements you met, or failed to meet, including "over and above" efforts.
Tips
The following tips can help improve your possibility of success with this project:
Do not use the Esri Batch Project tool in this project. In essence, you're
required to make your own variation of a batch project tool in this project by
running the Project tool inside a loop. Your tool will be easier to use because
There are a lot of ways to insert "_projected" in the name of a dataset, but
you might find it useful to start by temporarily removing ".shp" and adding
it back on later. To remove ".shp" you can use syntax like this:
rootName = ""
if fc.endswith(".shp"):
rootName = fc[:-4]
In the above code, fc is your shapefile name with .shp included, and fc[:-4]
removes the last four characters. If your script were to handle all types of file
extensions you would avoid hard-coding a number like -4, but your script is
not required to handle any extensions other than .shp.
projected and the target dataset). You will then need to compare the spatial
objects themselves. This is because you can have two spatial reference
objects that are different entities (and are thus "not equal"), but have the
If you want to show all the messages from each run of the Project tool, add
where you run the Project tool. Each time the loop runs, it will add the
messages from the current run of the Project tool into the results window.
It's been my experience that if you wait to add this line until the end of your
script, you only get the messages from the last run of the tool, so it's
important to put the line inside the loop. Remember that while you are first
writing your script you can use print statements to debug, then switch to
arcpy.AddMessage() when you have verified that your script works and you
If you need extra help with making the script tool, refer back to Lesson 1.7.1
and also read Zandbergen 13.1 - 13.10 where he goes in depth about making
If, after all your best efforts, you ran out of time and could not meet one of
the requirements, comment out the code that is not working (using a # sign
at the beginning of each line) and send the code anyway. Then explain in
your brief writeup which section is not working and what troubles you
encountered. If your commented code shows that you were heading down
Penn State Professional Masters Degree in GIS: Winner of the 2009 Sloan
Consortium award for Most Outstanding Online Program
Please address questions and comments about this resource to the site editor.
Links:
[1] https://www.e-education.psu.edu/drupal6/files/geog485py/data/Lesson2.zip
[2] http://docs.python.org/tutorial/datastructures.html
[3]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002z/002z00000011000000.
htm
[4]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002z/002z0000000p00000
0.htm
[5]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00r90000009v
000000.htm
[6]
http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=An_overview_
of_writing_geoprocessing_scripts
[7]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v000000v700000
0.htm
[8]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002t/002t0000000z000000.
htm
[9] http://www.python.org/doc/
[10] http://docs.python.org/library
[11] http://forums.arcgis.com/forums/117-Python
[12] http://forums.arcgis.com/forums/31-Geoprocessing
[13] https://www.e-education.psu.edu/geog485/stackexchange.com
[14] http://gis.stackexchange.com/
[15] https://www.e-education.psu.edu/geog485/../L02_Prac1.html
[16] https://www.e-education.psu.edu/geog485/../L02_Prac2.html
[17] https://www.e-education.psu.edu/geog485/../L02_Prac3.html
[18] https://www.e-education.psu.edu/geog485/../L02_Prac4.html
[19] https://www.e-
education.psu.edu/drupal6/files/geog485py/data/Lesson2PracticeExercise.zip
[20] https://www.e-education.psu.edu/geog485/../L02_Prac5.html
[21]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0057/00570000000400000
0.htm
[22] http://www.wsdot.wa.gov/mapsdata/GeoDataCatalog/default.htm
Lesson 3: GIS data access and
manipulation with Python
An essential part of a GIS is the data that represents both the geometry
(coordinates) of geographic features and the attributes of those features. This
combination of features and attributes is what makes GIS go beyond just
"mapping." Much of your work as a GIS analyst involves adding, modifying, and
deleting features and their attributes from the GIS.
Beyond maintaining the data, you also need to know how to query and select the
data that is most important to your projects. Sometimes you'll want to query a
dataset to find only the records that match a certain criteria (for example, single-
family homes constructed before 1980) and calculate some statistics based on only
the selected records (for example, percentage of those homes that experienced
termite infestation).
All of the above tasks of maintaining, querying, and summarizing data can become
tedious and error prone if performed manually. Python scripting is often a faster
and more accurate way to read and write large amounts of data. There are already
many tools for data selection and management in ArcToolbox. Any of these can be
used in a Python script. For more customized scenarios where you want to read
through a table yourself and modify records one-by-one, the Geoprocessor
programming model contains special objects, called cursors, that you can use to
examine each record in a table. You'll quickly see how the looping logic that you
learned in Lesson 2 becomes useful when you are cycling through tables using
cursors.
Using a script to work with your data introduces some other subtle advantages over
manual data entry. For example, in a script you can add checks to ensure that the
data entered conforms to a certain format. You can also chain together multiple
steps of selection logic that would be time-consuming to perform in ArcMap.
This lesson explains ways to read and write GIS data using Python. We'll start off
by looking at how you can create and open datasets within a script. Then we'll
practice reading and writing data using both geoprocessing tools and cursor
objects. Although this is most applicable to vector datasets, we'll also look at some
ways you can manipulate rasters with Python. Once you're familiar with these
concepts, Project 3 will give you a chance to practice what you've learned.
Lesson 3 checklist
Lesson 3 explains how to read and manipulate both vector and raster data with
Python. To complete Lesson 3 you are required to do the following:
1. Work through the course lesson materials.
2. Read Zandbergen chapter 7.1 - 7.3 and all of chapter 9. In the online lesson
drop box.
5. Read the Final Project proposal assignment and begin working on your
Geodatabases
Over the years, Esri has developed various ways of storing spatial data. They
encourage you to put your data in geodatabases, which are organizational
structures for storing datasets and defining relationships between those datasets.
Different flavors of geodatabase are offered for storing different magnitutes of data.
that store data on the local file system. The data is held in a Microsoft Access
database, which limits how much data can be stored in the geodatabase.
File geodatabases are a newer way of storing data on the local file system.
terabytes.
serving data not just to one computer, but to an entire enterprise. Since
working with an RDBMS can be a job in itself, Esri has develped ArcSDE as
For actions where ArcSDE is required but where it would be too heavy-
a smaller "workgroup" version of ArcSDE that works with the free database
SQL Server Express. This can be configured directly from ArcCatalog or the
In recent years, Esri has also promoted a new feature called query layers [1],
which allow you to pull data directly out of an RDBMS using SQL queries,
Standalone datasets
Although geodatabases are essential for long-term data storage and organization,
it's sometimes convenient to access datasets in a "standalone" format on the local
file system. Esri's shapefile is probably the most ubiquitous standalone vector data
format (it even has its own Wikipedia article [2]). A shapefile actually consists of
several files that work together to store vector geometries and attributes. The files
all have the same root name, but use different extensions. You can zip the
participating files together and easily e-mail them or post them in a folder for
download. In the Esri file browsers in ArcCatalog or ArcMap, the shapefiles just
appear as one file.
Another type of standalone dataset dating back to the early days of ArcGIS is the
ArcInfo coverage. Like the shapefile, the coverage consists of several files that work
together. Coverages are becoming more and more rare, but you might encounter
them if your organization has used (or still uses!) ArcInfo Workstation.
Raster datasets are also often stored in standalone format instead of being loaded
into a geodatabase. A raster dataset can be a single file, such as a JPEG or a TIFF,
or, like a shapefile, it can consist of multiple files that work together.
Often in a script you'll need to provide the path to a dataset. Knowing the syntax for
specifying the path is sometimes a challenge because of the many different ways of
storing data listed above. For example, below is an example of what a file
geodatabase looks like if you just browse the file system of Windows Explorer. How
do you specify the path to the dataset you need? This same challenge could occur
with a shapefile, which, although more intuitively named, actually has three or
more participating files.
Figure 3.1 A file database as viewed via the file system of Windows Explorer.
The safest way to get the paths you need is to browse to the dataset in ArcCatalog
and take the path that appears in the Location toolbar. Here's what the same file
geodatabase would look like in ArcCatalog. The circled path shows how you would
refer to a feature class within the geodatabase.
Below is an example of how you could access the feature class in a Python script
using this path. This is similar to one of the examples in Lesson 1.
import arcpy
featureClass = "C:\\Data\\USA\\USA.gdb\\Cities"
desc = arcpy.Describe(featureClass)
spatialRef = desc.SpatialReference
print spatialRef.Name
Remember that the backslash (\) is a reserved character in Python, so you'll need to
use either the double backslash (\\) or forward slash (/) in the path. Another
technique you can use for paths is the raw string, which allows you to put
backslashes and other reserved characters in your string as long as you put "r"
before your quotation marks.
featureClass = r"C:\Data\USA\USA.gdb\Cities"
. . .
Workspaces
The Esri geoprocessing framework often uses the notion of a workspace to denote
the folder or geodatabase where you're currently working. When you specify a
workspace in your script, you don't have to list the full path to every dataset. When
you run a tool, the geoprocessor sees the feature class name and assumes that it
resides in the workspace you specified.
Workspaces are especially useful for batch processing, when you perform the same
action on many datasets in the workspace. For example, you may want to clip all
the feature classes in a folder to the boundary of your county. The workflow for this
is:
1. Define a workspace.
Here's some code that clips each feature class in a file geodatabase to the Alabama
state boundary, then places the output in a different file geodatabase. Note how the
five lines of code after import arcpy correspond to the five steps listed above.
import arcpy
arcpy.env.workspace = "C:\\Data\\USA\\USA.gdb"
featureClassList = arcpy.ListFeatureClasses()
clipFeature = "C:\\Data\\Alabama\\Alabama.gdb\\StateBoundary"
for featureClass in featureClassList:
arcpy.Clip_analysis(featureClass, clipFeature,
"C:\\Data\\Alabama\\Alabama.gdb\\" + featureClass)
Notice that you designated the path to the workspace using the location of the file
geodatabase "C:\\Data\\USA\\USA.gdb". If you were working with shapefiles, you
would just use the path to the containing folder as the workspace.
If you were working with ArcSDE, you would use the path to the .sde connection
file when creating your workspace. This is a file that is created when you connect to
ArcSDE in ArcCatalog, and is placed in your local profile directory. We won't be
accessing ArcSDE data in this course, but if you do this at work, remember that you
can use the Location toolbar in ArcCatalog to help you understand the paths to
datasets in ArcSDE.
As we work with the data, it will be helpful for you to follow along, cutting and
pasting the example code into practice scripts. Throughout the lesson you'll
encounter exercises that you can do to practice what you just learned. You're not
required to turn in these exercises, but if you complete them you will have a greater
familiarity with the code that will be helpful when you begin working on this week's
project. It's impossible to read a book or a lesson, then sit down and write perfect
code. Much of what you learn comes through trial and error and learning from
mistakes. Thus, it's wise to write code often as you complete the lesson.
There are two fields in the table that you cannot delete. One of the fields (usually
called SHAPE) contains the geometry information for the features. This includes
the coordinates of each vertex in the feature and allows the feature to be drawn on
the screen. The geometry is stored in binary format; if you were to see it printed on
the screen, it wouldn't make any sense to you. However, you can read and work
with geometries using objects that are provided with arcpy.
The other field included in every feature class is an object ID field (OBJECTID or
FID). This contains a unique number, or identifier for each record that is used by
ArcGIS to keep track of features. The object ID helps avoid confusion when
working with data. Sometimes records have the same attributes. For example, both
Los Angeles and San Francisco could have a STATE attribute of 'California,' or a
USA cities dataset could contain multiple cities with the NAME attribute of
'Portland;' however, the OBJECTID field can never have the same value for two
records.
The rest of the fields contain attribute information that describe the feature. These
attributes are usually stored as numbers or text.
When you write a script, you'll need to provide the names of the particular fields
you want to read and write. You can get a Python list of field names using
arcpy.ListFields().
import arcpy
featureClass = "C:\\Data\\Alabama\\Alabama.gdb\\Cities"
fieldList = arcpy.ListFields(featureClass)
# Loop through each field in the list and print the name
for field in fieldList:
print field.name
The above would yield a list of the fields in the Cities feature class in a file
geodatabase named Alabama. If you ran this script in PythonWin (try it with one of
your own feature classes!) you would see something like the following in the
Interactive Window.
>>> OBJECTID
Shape
UIDENT
POPCLASS
NAME
CAPITAL
STATEABB
COUNTRY
Notice the two special fields we already talked about: OBJECTID, which holds the
unique identifying number for each record, and Shape, which holds the geometry
for the record. Additionally, this feature class has fields that hold the name
(NAME), the state (STATEABB), whether or not the city is a capital (CAPITAL),
and so on.
Arcpy treats the field as an object, therefore the field has properties that describe it.
That's why you can print field.name. The help reference topic Using fields and
indexes [3] lists all the properties that you can read from a field. These include
AliasName, Length, Type, Scale, Precision, and others.
Properties of a field are read-only, meaning that you can find out what the field
properties are, but you cannot change those properties in a script using the Field
object. If you wanted to change the scale and precision of a field, for instance, you
would have to programmatically add a new field.
The arcpy module contains some objects called cursors that allow you to move
through records in a table. Cursors are not unique to ArcGIS scripting; in fact, if
you've worked in ArcObjects before, this concept of a cursor is probably familiar to
you. The first cursor we'll look at is the search cursor, since it's designed for simple
reading of data. The common workflow is:
specify which dataset and, optionally, which specific rows you want to read.
3. Start a loop that will exit when there are no more rows available to read.
loop, this puts you back at the previous step if there is another row available
to be read. If there are no more rows, the loop condition is not met and the
loop terminates.
When you first try to understand cursors, it may help to visualize the attribute table
with an arrow pointing at the "current row." When the cursor is first created, that
arrow is pointing just above the first row in the table. The first time the next()
method is called, the arrow moves down to the first row (and returns a reference to
that row). Each time next() is called, the arrow moves down one row. If next() is
called when the arrow is pointing at the last row, a special data type called None is
returned.
Here's a very simple example of a search cursor that reads through a point dataset
of cities and prints the name of each.
import arcpy
featureClass = "C:\\Data\\Alabama\\Alabama.gdb\\Cities"
rows = arcpy.SearchCursor(featureClass)
row = rows.next()
while row:
print row.NAME
row = rows.next()
The last five lines of the above script correspond to the five steps in the above
workflow. Cursors can be tricky to understand at first, so let's look at those lines
more closely. Below are the five lines again with comments so you can see exactly
what's happening:
# Start a loop that will exit when there are no more rows available
while row:
whether the loop should continue. If a row object exists, the statement
evaluates to true and the loop continues. If a row object doesn't exist, the
You can read a field value as a property of a row. For example, row.NAME
gave you the value in the NAME field. If your table had a POPULATION
The names "rows" and "row" are just variable names that represent the
The Esri examples tend to name them rows and row, and we'll do the same.
However, if you needed to use two search cursors at the same time, you'd
Here's another example where something more complex is done with the row
values. This script finds the average population for counties in a dataset. To find
the average, you need to divide the total population by the number of counties. The
code below loops through each record and keeps a running total of the population
and the number of records counted. Once all the records have been read, only one
line of division is necessary to find the average.
import arcpy
featureClass = "C:\\Data\\Iowa\\Counties.shp"
rows = arcpy.SearchCursor(featureClass)
row = rows.next()
average = 0
totalPopulation = 0
recordsCounted = 0
while row:
totalPopulation += row.POP2008
recordsCounted += 1
row = rows.next()
Although the above script is longer than the first one, it's still following the general
pattern of creating a search cursor, advancing to the first row, doing something
with the row, and repeating the process until there are no records left.
You can make your scripts more versatile by using variables to represent field
names. You could declare a variable, such as populationField to reference the
population field name, whether it were POP2008, POP2009, or simply
POPULATION. The Python interpreter isn't going to recognize
row.populationField, so you need to use Row.getValue() instead and pass in the
variable as a parameter.
The script below uses a variable name to get the population for each record. Lines
changed from the script above are in bold. Notice how a variable named
populationField is created and the method call
row.getValue(populationField) that retrieves the population of each
record.
import arcpy
featureClass = "C:\\Data\\Iowa\\Counties.shp"
populationField = "POP2008"
rows = arcpy.SearchCursor(featureClass)
row = rows.next()
average = 0
totalPopulation = 0
recordsCounted = 0
while row:
totalPopulation += row.getValue(populationField)
recordsCounted += 1
row = rows.next()
To update the above script, you would just have to set populationField =
"POP2009" near the top of the script. This is certainly easier than searching
through the body of the script for row.POP2008; however, you can go one step
further and allow the user to enter any field name that he or she wants as an
argument when running the script.
import arcpy
featureClass = arcpy.GetParameterAsText(0)
populationField = arcpy.GetParameterAsText(1)
rows = arcpy.SearchCursor(featureClass)
row = rows.next()
average = 0
totalPopulation = 0
recordsCounted = 0
while row:
totalPopulation += row.getValue(populationField)
recordsCounted += 1
row = rows.next()
Here's how you could run the above script in PythonWin by supplying the path
name and population field as the arguments.
Figure 3.3 Running the above script in PythonWin.
Although the above examples use a while loop in conjunction with the next()
method to advance the cursor, it's often easier to iterate through each record using
a for loop. This became possible with ArcGIS 10. Here's how the above sample
could be modified to use a for loop. Notice the syntax for row in rows.
import arcpy
featureClass = "C:\\Data\\Iowa\\Counties.shp"
populationField = "POP2008"
rows = arcpy.SearchCursor(featureClass)
average = 0
totalPopulation = 0
recordsCounted = 0
In this example, the next() method is not even required because it is implied by the
for loop that the script will iterate through every record. The object named row is
declared when the for loop is declared.
This syntax is more compact than using a while loop, and you are welcome to
experiment with it and use it in your projects.
The Geog 485 lesson examples have not all been converted to use this technique.
There is some benefit to seeing how the next() method works, especially if you ever
work with ArcGIS 9.3.x Python scripts or if you use cursors in ArcObjects (which
has conceptually similar methods for advancing a cursor row by row). However,
once you get accustomed to using a for loop to traverse a table, it's unlikely you'll
want to go back to using while loops.
If you're using ArcGIS 10.1, you can use the above code for search cursors, or you
can use a new data access module that was introduced into arcpy. These data access
functions are prefixed with arcpy.da and give you faster performance along with
more robust behavior when crashes or errors are encountered with the cursor.
The data access module arcpy.da allows you to create cursor objects, just like arcpy,
but you create them a little differently. Take a close look at the following example
code, which repeats the scenario above to calculate the average population of a
county.
import arcpy
featureClass = "C:\\Data\\Iowa\\Counties.shp"
populationField = "POP2008"
average = 0
totalPopulation = 0
recordsCounted = 0
This example uses the same basic structure as the previous one, with a few
important changes. One thing you probably noticed is that the cursor is created
using a "with" statement. Although the explanation of "with" is somewhat
technical, the key thing to understand is that it allows your cursor to exit the
dataset gracefully, whether it crashes or completes its work successfully. This is a
big issue with cursors, which can sometimes maintain locks on data if they are not
exited properly.
The "with" statement requires that you indent all the code beneath it. After you
create the cursor in your "with" statement, you'll initiate a for loop to run through
all the rows in the table. This requires additional indentation.
Notice that this "with" statement creates a SearchCursor object, and declares that it
will be named "cursor" in any subsequent code. The search cursors you create with
arcpy.da have some different initialization parameters from the search cursors you
create with arcpy. The biggest difference is that when you create a cursor with
arcpy.da, you have to supply a tuple of field names that will be returned by the
cursor. Remember that a tuple is a Python data structure much like a list, except it
is enclosed in parentheses and its contents cannot be modified.
Supplying this tuple speeds up the work of the cursor because it does not have to
deal with the potentially dozens of fields included in your dataset. In the example
above, the tuple contains just one field, populationField. A tuple with just one item
in it contains a comma after the item, therefore our tuple above looks like this:
(populationField,). If the tuple were to have multiple items in it, we might see
something like: (populationField, nameField).
Notice that with arcpy.da, you use row objects like with arcpy; however, you do not
use the getValue method to retrieve values out of the rows. Instead, you use the
index position of the field name in the tuple you submitted when you created the
object. Since the above example submits only one item in the tuple, then the index
position of "populationField" within that tuple is 0 (remember that we start
counting from 0 in Python). Therefore, you can use row[0] to get the value of
populationField for a particular row.
If you don't understand all this right away, keep an eye out for the other arcpy.da
examples that I've included throughout Lesson 3. Once you see a few different
examples, you'll start to understand the pattern, and then you can experiment with
it in your own code. If you're using ArcGIS 10.1, it's worth it to learn the arcpy.da
functions. Your code will be faster, more compact, and more robust.
For review, this is how you construct a search cursor to operate on every record in a
dataset:
rows = arcpy.SearchCursor(featureClass)
If you want the search cursor to retrieve only a subset of the records based on some
criteria, you can supply a SQL expression as the second argument in the
constructor (the constructor is the method that creates the SearchCursor). For
example:
Note that the SQL expression you supply for a search cursor is for attribute queries,
not spatial queries. You could not use a SQL expression to select records that fall
"west of the Mississippi River," or "inside the boundary of Canada" unless you had
previously added and populated some attribute stating whether that condition were
true (for example, REGION = 'Western' or CANADIAN = True). Later in this lesson
we'll talk about how to make spatial queries using the Select By Location
geoprocessing tool.
Once you retrieve the subset of records, you can follow the same pattern of iterating
through them using SearchCursor.next() within a loop.
If you're using the arcpy.da data access functions in ArcGIS 10.1, you pass in the
SQL expression as the third parameter when you create the search cursor, like this:
When you include a SQL expression in your SearchCursor constructor, you must
supply it as a string. This is where things can get tricky with quotation marks. SQL
requires single and double quotes in specific places, but you also need to enclose
the entire expression with quotes because it is a string. How do you keep from
getting confused?
You may have noticed that in the above example the SQL expression is enclosed in
single quotes, not double quotes: '"POP2008" > 100000'. In Python you
can use either single quotes or double quotes to enclose a string. Because I knew
the double quotes were required to surround the field name in the SQL statement, I
used single quotes to surround the entire string. This is not just to keep things easy
to read; the Python interpreter does not understand two double quotes in a row.
Therefore, it was not an option to use ""POP2008" > 100000".
The situation gets a bit more difficult when your SQL expression has to use both
single and double quotes, for instance, when you query for a string variable.
Suppose your script allows the user to enter the ID of a parcel and you need to find
it with a search cursor. Some of the parcel IDs include letters and others don't,
therefore you need to always treat the parcel ID as a string. Your SQL expression
would probably look like this: "PARCEL" = 'A2003KSW'.
Because your expression starts with double quotes and ends with single quotes,
which style of quotes do you use to enclose the entire expression? In this case, you
cannot simply enclose the entire expression in one style of quotes; you need to
break up the expression into separate strings. Take a close look at this example:
ID = arcpy.GetParameterAsText(0)
whereClause = '"Parcel"' + " = '" + str(ID) + "'"
rows = arcpy.SearchCursor(featureClass, whereClause)
Field delimiters
In the examples above, field names are surrounded with double quotes (for
example, "STATE_NAME"). This is correct syntax for shapefiles and file
geodatabases, which are the only data types we'll use in this course. If you use
personal geodatabases in your daily work, there are different ways to delimit the
field name. If you're interested in the correct syntax for different data types, or
ways to make your script flexible for any data type, take a look at the topic SQL
reference for query expressions used in ArcGIS [5] in the ArcGIS Desktop Help.
Note: A few relational databases such as SQL Server 2008 expose spatial data
types that can be spatially queried with SQL. Support for these spatial types in
ArcGIS is still maturing, and in this course we will assume that way to make a
spatial query is through Select Layer By Location. Since we are not using ArcSDE,
this is actually true.
Here's where you need to know a little bit about how ArcGIS works with layers and
selections. Suppose you want to select all states whose boundaries touch Wyoming.
In most cases you won't need to create an entirely new feature class to hold those
particular states; you probably only need to maintain those particular state records
in the computer's memory for a short time while you update some attribute.
ArcGIS uses the concept of feature layers to represent in-memory sets of records
from a feature class.
The Make Feature Layer tool creates a feature layer from some or all of the records
in a feature class. You can apply a SQL expression when you run Make Feature
Layer to narrow down the records included in the feature layer based on attributes.
You can subsequently use Select Layer By Location to narrow down the records in
the feature layer based on some spatial criteria.
Opening a search cursor on Wyoming and all states bordering it would take four
steps:
1. Use Make Feature Layer to make a feature layer of all US States. Let's call
2. Use Make Feature Layer to create a second feature layer of just Wyoming.
(To get Wyoming alone, you would apply an SQL expression when making
the feature layer.) Let's call this the Selection State layer.
3. Use Select Layer By Location to narrow down the All States layer (the layer
you created in Step 1) to just those states that touch the Selection State layer.
4. Open a search cursor on the All States layer. The cursor will include only
Wyoming and the states that touch it because there is a selection applied to
the All States layer. Remember that the feature layer is just a set of records
held in memory. Even if you called it the All States layer, it no longer
import arcpy
try:
# Make a feature layer with all the US States
arcpy.MakeFeatureLayer_management(usaLayer, "AllStatesLayer")
arcpy.SelectLayerByLocation_management("AllStatesLayer","BOUNDARY_TOUCHES
","SelectionStateLayer")
except:
print arcpy.GetMessages()
You can choose from many spatial operators when running SelectLayerByLocation.
The code above uses "BOUNDARY_TOUCHES". Other available relationships are
"INTERSECT", "WITHIN A DISTANCE" (may save you a buffering step),
"CONTAINS", "CONTAINED_BY", and others.
Once you open the search cursor on your selected records, you can perform
whatever action you want on them. The code above just prints the state name, but
more likely you'll want to summarize or update attribute values. You'll learn how to
write attribute values later in this lesson.
Notice the above code example deletes the feature layers using the Delete tool and
the cursors using the del command.
Feature layers and cursors can maintain locks on your data, preventing other
applications from using the data until your script is done. Arcpy is supposed to
clean up cursors and feature layers at the end of the script, but it's a good idea to
delete them yourself in case this doesn't happen or in case there is a crash. In the
case above, the except block will catch a crash, then the script will continue and run
the Delete tool.
If you're using the arcpy data access module (arcpy.da), the above example could be
written as follows:
import arcpy
try:
# Make a feature layer with all the US States
arcpy.MakeFeatureLayer_management(usaLayer, "AllStatesLayer")
arcpy.SelectLayerByLocation_management("AllStatesLayer","BOUNDARY_TOUCHES
","SelectionStateLayer")
except:
print arcpy.GetMessages()
You might have noticed that this sample is a little more brief. You don't have to
delete the cursor because the "with" statement cleans it up for you.There is no use
of getValue; rather, the Row object "row" returns only one field ("NAME"), which is
accessed using its index position in the list of fields. Since there's only one field,
that index is 0, and the syntax looks like this: row[0]
To keep things short, I've written the example using a "for" loop. Remember that
you could potentially use a for loop in 10.0.
Required reading
Before you move on, examine the following tool reference pages. You can ignore the
Command Line Syntax section, but pay particular attention to the Usage Tips and
the Script Examples.
records
In the following sections you'll learn about both of these cursors and get some tips
for using them.
Required reading
The ArcGIS Desktop Help has some explanation of cursors. Get familiar with this
help now, as it will prepare you for the next sections of the lesson. You'll also find it
helpful to return to the code examples while working on Project 3:
Also follow the three links in the table at the beginning of the above topic. These
briefly explain the InsertCursor [9], SearchCursor [10], and UpdateCursor [11] and
provide a code example for each. You've already worked with SearchCursor, but
closely examine the code examples for all three cursor types and see if you can
determine what is happening in each.
a good way to narrow down the rows you want to edit if you are not
3. Modify the field values in the row that need updating (see tips below).
When you create an UpdateCursor and advanced it to a row, you can then modify
field values. There are two ways you can modify a value:
Use the syntax Row.< the field name> = <the new value>. For example:
Using the second way, Row.setValue(), is especially useful if the field name is a
variable, and is a good way to avoid hard-coding field names into your script.
Row.setValue() is similar to Row.getValue() that you use with search cursors, but
it's important to remember that with Row.setValue(), you have to supply two
arguments: the field to update, and the new value for that field.
Example
The script below performs a "search and replace" operation on an attribute table.
For example, suppose you have a dataset representing local businesses, including
banks. One of the banks was recently bought out by another bank. You need to find
every instance of the old bank name and replace it with the new name. This script
could perform that task automatically.
# Create the SQL expression for the update cursor. Here this is
# done on a separate line for readability.
queryString = '"' + affectedField + '" = ' + "'" + oldValue + "'"
# Create the update cursor and advance the cursor to the first row
rows = arcpy.UpdateCursor(fc, queryString)
row = rows.next()
# Perform the update and move to the next row as long as there are
# rows left
while row:
row.setValue(affectedField, newValue)
rows.updateRow(row)
row = rows.next()
Notice that this script is relatively flexible because it gets all the parameters as text
and uses Row.setValue() instead of hard-coding the field name. However, this
script can only be run on string variables because of the way the query string is set
up. Notice that the old value is put in quotes, like this: "'" + oldValue + "'".
Handling other types of variables, such as integers, would have made the example
longer.
Dataset locking
You can remove the possibility of locks affecting your work by deleting your cursors
where you are done using them. Use the built-in del function to do this. You can
even delete multiple objects on the same line. Notice that our find and replace
example also deletes the row just to be safe:
If you forget to delete your cursor, the script may maintain a lock on your data even
when the script has finished executing. If you think that a lock from your script is
affecting your dataset (by preventing you from viewing it, making it look like all
rows have been deleted, and so on), you must close PythonWin to remove the lock.
If you think that ArcGIS has a lock on your data, check to see if ArcMap or
ArcCatalog are using the data in any way. This could possibly occur through having
an open edit session on the data, having the data open in the Preview tab in
ArcCatalog, or having the layer in the table of contents in an open map document
(MXD).
For the Esri explanation of how locking works, you can review the section "Cursors
and locking" in the topic Accessing data using cursors [8] in the ArcGIS Desktop
Help.
When you use the arcpy data access module to update records, you do not use the
setValue method. Instead, you just use an = sign to set the value in the row object.
Take a look at how the above "search and replace" example would look using
arcpy.da:
# Create the update cursor and update each row returned by the SQL
expression
with arcpy.da.UpdateCursor(fc, (affectedField,), queryString) as cursor:
for row in cursor:
row[0] = newValue
cursor.updateRow(row)
Here it's critical to understand the tuple of affected fields that you pass in when you
create the update cursor. In this example, there is only one affected field (which we
named affectedField), so its index position is 0 in the tuple. Therefore, you set that
field value using row[0] = newValue. Cursor cleanup is not required at the end of
the script because this is accomplished through the "with" statement.
Set attributes on the row. This can include assigning its geometry, or shape.
As with the update cursor, you can avoid data locking problems by deleting the
insert cursor when you've finished using it.
Insert cursors differ from search and update cursors in that you cannot provide an
SQL expression when you create the insert cursor. This makes sense because an
insert cursor is only concerned with adding records to the table. It does not need to
"know" about the existing records or any subset thereof.
Example
The example below uses an insert cursor to create one new point in the dataset and
assign it one attribute: a string description. This script could potentially be used
behind a public-facing 311 [12] application, in which members of the public can
click a point on a Web map and type a description of an incident that needs to be
resolved by the municipality, such as a broken streetlight.
# Create point
inPoint = arcpy.Point(inX, inY)
In the above example, the insert cursor is called rowInserter and the row is called
newIncident. Take a moment to ensure that you know exactly where the following
things are happening in the code:
Besides creating the insert cursor, the idea of creating geometry (the point) may be
new to you. In this example, arcpy creates a new Point object whose X and Y
coordinates we can assign right at the time the point is created. Our script gets
those original X and Y coordinate values as input parameters. If this script really
were powering an interactive 311 application, the X and Y values could be derived
from a point a user clicked on the Web map.
Once you create the geometry, you have to write it to the special field that the
dataset uses to hold geometry. This field is usually called SHAPE, and for simplicity
the field name SHAPE is hard-coded in the lesson examples. If you need your code
to be bullet-proof or to work with many types of databases, you can
programmatically determine the name of the geometry field by calling the Describe
method on the feature class, then retrieving the ShapeFieldName property.
Notice that the description attribute is assigned using Row.setValue(), the same
method you used with the update cursor. Since both types of cursors are used for
writing data, setValue() can be used with both.
The arcpy data access module in ArcGIS 10.1 contains insert cursors. You can use
them with a "with" statement like you used the other cursors:
When you insert a row using arcpy.da, you provide a comma-delimited series of
values to update. The order of these values must match the order of values of the
tuple of affected fields you provided when you created the cursor.
Another thing you might have noticed is that the string "SHAPE@XY" is used to
specify the SHAPE field. You might expect that this would just be "SHAPE," but
arcpy.da provides a list of "tokens" that you can use if the field will be specified in a
certain way. In our case, it would be very convenient just to provide the X and Y
values of the points using a tuple of coordinates. It turns out that the token
"SHAPE@XY" allows you to do just that. See help topic for InsertCursor [13] to
learn about other tokens you can use.
Putting this all together, the example creates a tuple of affected fields:
("SHAPE@XY", "COMMENTS"). When the row is inserted, the values for these
items are provided in the same order: cursor.insertRow((inX, inY),
inDescription).
Readings
Take a few minutes to read Zandbergen 7.1 - 7.3 to reinforce your learning about
cursors.
It's unlikely that you will ever need to cycle through a raster cell by cell on your own
using Python, and that technique is outside the scope of this course. Instead, you'll
most often use predefined tools to read and manipulate rasters. These tools have
been designed to operate on various types of rasters and perform the cell-by-cell
computations so that you don't have to.
In ArcGIS, most of the tools you'll use when working with rasters are in either the
Data Management > Raster toolset or the Spatial Analyst toolbox. These
tools can reproject, clip, mosaic, and reclassify rasters. They can calculate slope,
hillshade, and aspect rasters from DEMs.
The Spatial Analyst toolbox also contains tools for performing map algebra on
rasters. Multiplying or adding many rasters together using map algebra is
important for GIS site selection scenarios. For example, you may be trying to find
the best location for a new restaurant and you have seven criteria that must be met.
If you can create a boolean raster (containing 1 for suitable, 0 for unsuitable) for
each criteria, you can use map algebra to multiply the rasters and determine which
cells receive a score of 1, meeting all the criteria. (Alternatively you could add the
rasters together and determine which areas received a value of 7.) Other courses in
the Penn State GIS certificate program walk through these types of scenarios in
more detail.
The tricky part of map algebra is constructing the expression, which is a string
stating what the map algebra operation is supposed to do. ArcGIS Desktop contains
interfaces for constructing an expression for one-time runs of the tool. But what if
you want to run the analysis several times, or with different datasets? It's
challenging even in ModelBuilder to build a flexible expression into the map
algebra tools. With Python, you can manipulate the expression as much as you
need.
Example
But what if you don't want those 0 values cluttering your raster? This script gets rid
of the 0's by running the Reclassify tool with a real simple remap table stating that
input raster values of 1 should remain 1. Because 0 is left out of the remap table, it
gets reclassified as NoData:
import arcpy
from arcpy.sa import *
arcpy.env.overwriteOutput = True
arcpy.env.workspace = "C:/Data/Elevation"
arcpy.CheckOutExtension("Spatial")
# Set up remap table and call Reclassify, leaving all values not 1 as
NODATA
remap = RemapValue([[1,1]])
remappedRaster = Reclassify(tempRaster, "Value", remap, "NODATA")
arcpy.CheckInExtension("Spatial")
Read the example above carefully, as many times as necessary for you to
understand what is occurring in each line. Notice the following things:
There is one intermediate raster (in other words, not the final output) that
you don't want to have cluttering the output directory. This is referred to as
ArcCatalog after you run the script, but it goes away after you close
PythonWin.
Notice the expression contains > and < signs, as well as the & operator. You
you have to cast the input to an integer before you can do map algebra with
it. If we just used inMin, the software would see "3000," for example, and
have to use int(inMin). Then the software sees the number 3000 instead of
Map algebra can perform many types of math and operations on rasters, not
limited to "greater than" or "less than." For example, you can use map
Analyst extension to a certain number of users, you must check out the
extension in your script, then check it back in. Notice the calls to
Notice that the script doesn't call these spatial analyst functions using arcpy.
directly. This can be confusing for beginners (and old pros) so be sure to
check the Esri samples closely for each tool you plan to run. Follow the
table in this example was created. Whenever you run Reclassify, you have to
create a remap table stating how the old values should be reclassified to new
values. This example has about the simplest remap table possible, but if you
want a more complex remap table you'll need to study the documentation.
The above example script doesn't use any file extensions for the rasters. This is
because the rasters use the Esri GRID format, which doesn't use extensions. If you
have rasters in another format, such as .jpg, you will need to add the correct file
extension. If you're unsure of the syntax to use when providing a raster file name,
highlight the raster in ArcCatalog and note how the path appears in the Location
bar.
If you look at rasters such as an Esri GRID in Windows Explorer, you may see that
they actually consist of several supporting files with different extensions,
sometimes even contained in a series of folders. Don't try to guess one of the files to
reference; instead, use ArcCatalog to get the path to the raster. When you use this
path, the supporting files and folders will work together automatically.
Figure 3.4 When using ArcCatalog for the path to a raster, the supporting files
and folders work together automatically.
Readings
Zandbergen chapter 9 covers a lot of additional functions you can perform with
rasters and has some good code examples. You don't have to understand everything
in this chapter, but it might give you some good ideas for your final project.
Don't spend so much time on the practice exercises that you neglect Project 3.
However, successfully completing the practice exercises will make Project 3 much
easier and quicker for you.
The data for the Lesson 3 practice exercises is very simple and, like some of the
Project 2 practice exercise data, was derived from Washington State Department of
Transportation [15] datasets. Download the data here [16].
Using the discussion forums is a great way to work towards figuring out the
practice exercises. You are welcome to post blocks of code on the forums relating to
these exercises.
When completing the actual Project 3, avoid posting blocks of code longer than a
few lines. If you have a question about your Project 3 code, please e-mail the
instructor, or you can post general questions to the forums that don't contain more
than a few lines of code.
Getting ready
If the practice exercises look daunting to you, you might start by practicing with
your cursors a little bit using the sample data:
Try to loop through the CityBoundaries and print the name of each city.
Try using an SQL expression with a search cursor to print the OBJECTIDs of
all the park and rides in Chelan county (notice there is a County field that
Use an update cursor to find the park and ride with OBJECTID 336. Assign
The objective
You want to find out which cities contain park and ride facilities and what
percentage of cities have at least one facility.
to "False" by default. Your job is to mark this field "True" for every city
containing at least one park and ride facility within its boundaries.
Your script should also calculate the percentage of cities that have a park
and ride facility and print this figure for the user.
You do not have to make a script tool for this assignment. You can hard-code the
variable values. Try to group the hard-coded string variables at the beginning of the
script.
For the purposes of these practice exercises, assume that each point in the
ParkAndRide dataset represents one valid park and ride (ignore the value in the
TYPE field).
Tips
You can jump into the assignment at this point, or read the following tips to give
you some guidance.
narrow down your cities feature layer list to only the cities that contain park
and rides.
To calculate the percentage of cities with park and rides, you'll need to know
the total number of cities. You can use the GetCount [17] tool to get a total
without writing a loop. Beware that you may have to monkey around with
the output a bit to get it in a format you can use. See the example in the Esri
Similarly, you may have to play around with your Python math a little to get
import arcpy
arcpy.env.overwriteOutput = True
cityBoundaries =
"C:\\Data\\Lesson3PracticeExerciseA\\Washington.gdb\\CityBoundaries"
parkAndRide =
"C:\\Data\\Lesson3PracticeExerciseA\\Washington.gdb\\ParkAndRide"
parkAndRideField = "HasParkAndRide"
citiesWithParkAndRide = 0
try:
# Make a feature layer of all the park and ride facilities
arcpy.MakeFeatureLayer_management(parkAndRide, "ParkAndRideLayer")
except:
print "Could not create feature layers"
try:
# Narrow down the cities layer to only the cities that contain a park
and ride
arcpy.SelectLayerByLocation_management("CitiesLayer", "CONTAINS",
"ParkAndRideLayer")
Alternate solution using the arcpy data access module in ArcGIS 10.1
The following solution shows how you could take advantage of the arcpy data
access module in ArcGIS 10.1 to solve this problem. The general approach is the
same as the other solution above.
import arcpy
arcpy.env.overwriteOutput = True
cityBoundaries =
"D:\\Data\\Geog485\\Lesson3PracticeExerciseA\\Washington.gdb\\CityBoundar
ies"
parkAndRide =
"D:\\Data\\Geog485\\Lesson3PracticeExerciseA\\Washington.gdb\\ParkAndRide
"
parkAndRideField = "HasParkAndRide"
citiesWithParkAndRide = 0
try:
# Make a feature layer of all the park and ride facilities
arcpy.MakeFeatureLayer_management(parkAndRide, "ParkAndRideLayer")
except:
print "Could not create feature layers"
try:
# Narrow down the cities layer to only the cities that contain a park
and ride
arcpy.SelectLayerByLocation_management("CitiesLayer", "CONTAINS",
"ParkAndRideLayer")
# Count the total number of cities (this tool saves you a loop)
numCitiesCount = arcpy.GetCount_management(cityBoundaries)
numCities = int(numCitiesCount.getOutput(0))
The objective
In Practice Exercise B your assignment is to find which cities have at least two park
and rides within their boundaries.
Mark the "HasTwoParkAndRides" field as "True" for all cities that have at
Calculate the percentage of cities that have at least two park and rides within
Tips
Create an update cursor for the cities and start a loop that will examine each
city.
Make a feature layer with all the park and ride facilities.
Make a feature layer for just the current city. You'll have to make an SQL
get values, so you can use it to get the name of the current city.
the current city. Your result will be a narrowed-down park and ride feature
layer. This is different from Practice Exercise A where you narrowed down
Open a search cursor on your now narrowed-down park and ride layer. Your
through all the park and rides contained by the one city boundary. If there
were no park and rides, calling next() on your search cursor would return
nothing. One way to see if there are two park and rides is to call next() twice
within an if statement (if there's one found, check for a second). If you get a
result, you know there are at least two park and rides and you can use your
update cursor to mark the row "True." You can then go on to the next city.
Another approach is to run the GetCount tool to find out how many features
Be sure to delete your feature layers before you loop on to the next city. For
example: arcpy.Delete_management("ParkAndRideLayer")
Keep a tally for every row you mark "True" and find the average as you did in
Practice Exercise A.
Lesson 3 Practice Exercise B
Solution
Below is one possible solution to Practice Exercise B, with comments. If you find a
more efficient way to code a solution, please share it through the discussion
forums.
import arcpy
arcpy.env.overwriteOutput = True
cityBoundaries =
"C:\\Data\\Lesson3PracticeExerciseB\\Washington.gdb\\CityBoundaries"
parkAndRide =
"C:\\Data\\Lesson3PracticeExerciseB\\Washington.gdb\\ParkAndRide"
parkAndRideField = "HasTwoParkAndRides"
cityIDStringField = "CI_FIPS"
citiesWithTwoParkAndRides = 0
numCities = 0
while city:
# Narrow down the park and ride layer by selecting only the park and
rides
# in the current city
arcpy.SelectLayerByLocation_management("ParkAndRideLayer",
"CONTAINED_BY", "CurrentCityLayer")
try:
# Try to get the first park and ride in the list
selectedParkAndRideRows = arcpy.SearchCursor("ParkAndRideLayer")
firstParkAndRide = selectedParkAndRideRows.next()
# If a first park and ride was found, look for a second one
if firstParkAndRide:
secondParkAndRide = selectedParkAndRideRows.next()
# Mark the park and ride field TRUE if a second park and ride
was found
if secondParkAndRide:
city.setValue(parkAndRideField, "TRUE")
finally:
# Delete feature layer to get ready for next run of loop
arcpy.Delete_management("CurrentCityLayer")
# Clean up update cursor and feature layer containing all park and rides
del cityRows
arcpy.Delete_management("ParkAndRideLayer")
# Calculate and report the number of cities with two park and rides
percentCitiesWithParkAndRide = ((1.0 * citiesWithTwoParkAndRides) /
numCities) * 100
Below is an explanatory video of the solution. Note that the video was recorded
showing a slightly less efficient technique of making the "ParkAndRideLayer"
feature layer each time the loop runs. Since recording this video, I have discovered
that you only have to create ParkAndRideLayer once (before the loop, as shown
above), then the SelectLayerByLocation just performs a new selection on it each
time the loop runs.
Alternate solution using the arcpy data access module in ArcGIS 10.1
The following alternate solution uses the arcpy data access module from ArcGIS
10.1. It also takes the approach of running the GetCount tool to figure out whether
two or more park and rides were selected for each city. Also, as an improvement on
the above solution, it creates the ParkAndRideLayer just once, before the loop runs.
import arcpy
arcpy.env.overwriteOutput = True
cityBoundaries =
"D:\\Data\\Geog485\\Lesson3PracticeExerciseB\\Washington.gdb\\CityBoundar
ies"
parkAndRide =
"D:\\Data\\Geog485\\Lesson3PracticeExerciseB\\Washington.gdb\\ParkAndRide
"
parkAndRideField = "HasTwoParkAndRides"
cityIDStringField = "CI_FIPS"
citiesWithTwoParkAndRides = 0
numCities = 0
try:
# Narrow down the park and ride layer by selecting only the
park and rides
# in the current city
arcpy.SelectLayerByLocation_management("ParkAndRideLayer",
"CONTAINED_BY", "CurrentCityLayer")
# If more the one park and ride found, update the row to TRUE
if numSelectedCities >= 2:
city[1] = "TRUE"
finally:
# Delete current cities layer to prepare for next run of loop
arcpy.Delete_management("CurrentCityLayer")
numCities +=1
The application has already been wildly successful in its first few months of
operation and your department has amassed a large amount of point data showing
graffiti incidents. However, the police chief is now interested in seeing an
aggregation of this data by patrol zones. The goal is to set a priority on each zone
and allot more resources to fighting graffiti in the high priority zones.
Your task
You have a point feature class of graffiti incidents and a polygon feature class of
patrol zones with some empty attributes already created for you. You must write a
script that updates the attributes of the patrol zones with:
The number of graffiti incidents falling within the patrol zone. This is an
The priority ranking for the patrol zone. This is a string that goes in the
PRIORITY field. You will derive this string using some simple math that
compares the number of incidents in the zone with the area of the zone.
HIGH CONCERN— At least 12 but less than 15 incidents per square mile
SOME CONCERN— At least 6 but less than 12 incidents per square mile
Deliverables
Your Python script (.py file) that performs the above tasks
A screenshot of your patrol zones attribute table after running the script.
A short writeup (about 300 words) describing what you learned and how
you approached the problem. If you included any "over and above" efforts,
please describe these here so the graders know to look for them.
Challenges
In this project, you need to manage an update cursor and perform repeated
SelectLayerByLocation operations in order to figure out how many incidents each
zone contains. You then need to use the number of incidents to calculate the
incidents per square mile, and make a decision about which priority to assign.
Approach
The approach you take for finding the number of incidents per zone should be very
similar to what you did in Lesson 3 Practice Exercise B to find the number of park
and rides inside a city boundary. This time, instead of counting whether two items
were found, you will need to get a count on the entire number of incidents selected.
This is easily done using the GetCount tool. I encourage you to spend some time
studying Practice Exercise B and its accompanying solution.
There are many ways to approach the script, and the majority of available credit
will be awarded just for solving the problem. More points will be awarded for
scripts that solve the problem in the most efficient way possible. For example, if
you can get a job done with one loop instead of two, this is more efficient. If you
loop through 10 objects instead of 100 and accomplish the same thing, this is more
efficient. Look for ways to economize what your script is doing. It is a wonderful
feeling to delete unnecessary code.
Hints
Do the practice exercises. It is worth your time to study their solutions to the
Before you begin working, manually make a copy of the patrol areas and
you test, you need a convenient way to restore the data to its original state.
After each test run, you can make a new copy of the dataset. If you fail to do
this, you'll have to download the data and unzip it again each time you re-
Not only do the cursors place locks on your data, but the feature layers do as
the feature layers; however, at the time of this writing I had still not
determined how to get rid of all the locks. Be prepared to close PythonWin
and/or ArcCatalog or ArcMap if you're having trouble restoring your original
Use the debugging toolbar to help you. Set up watches on your variables and
observe how your variable values change as you step through each line of
their time.
File geodatabases always have an area field called SHAPE_Area. Use this to
get the initial area, which for this dataset will be in square meters. Divide (/)
If you get stuck or burned out, you may want to take a break and read
through the lesson again, front to back. The lesson material will take on a
whole new meaning once you have actually tried to write some code using
the described techniques, and you may read something that will help you get
past your brick wall. Also, the ArcGIS Help topics linked to in the lesson are
very helpful for this assignment, especially the ones about making feature
Note: Although I did have a pre-college job painting over graffiti in this geographic
area, the data for this project is completely fabricated and does not represent actual
graffiti incidents or police patrol zones.
Please address questions and comments about this resource to the site editor.
Links:
[1]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s5/00s50000000n000000
.htm
[2] http://en.wikipedia.org/wiki/Shapefile
[3]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Using_fields_an
d_indexes/002z00000019000000/
[4]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s5/00s50000002t000000.
htm
[5]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s5/00s500000033000000
.htm
[6]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0017/00170000006p000000
.htm
[7]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0017/001700000072000000
.htm
[8]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002z/002z0000001q000000
.htm
[9]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v0000003800000
0.htm
[10]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v0000003900000
0.htm
[11]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v0000003m00000
0.htm
[12] http://en.wikipedia.org/wiki/3-1-1
[13]
http://resources.arcgis.com/en/help/main/10.1/018w/018w0000000t000000.ht
m
[14]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/005m/005m0000001s00000
0.htm
[15] http://www.wsdot.wa.gov/mapsdata/GeoDataCatalog/default.htm
[16] https://www.e-
education.psu.edu/drupal6/files/geog485py/data/Lesson3PracticeExercises.zip
[17]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//0017000000n7
000000.htm
[18] https://www.e-education.psu.edu/drupal6/files/geog485py/data/Project3.zip
Final Project proposal assignment
At some point during this course you've hopefully felt "the lightbulb go on"
regarding how you might apply the lesson material to your own tasks in the GIS
workplace. To conclude this course, you will be expected to complete an individual
project that uses Python automation to make some GIS task easier, faster, or more
accurate.
The project goal is up to you, but it is preferrably one that relates to your current
field of work or a field in which you have a personal interest. Since you're defining
the requirements from the beginning, there is no "over and above" credit factored
into this project grade. The number of lines of code you write is not as important as
the problem you solved. However, we encourage you to propose a project that
meets or even slightly exceeds your relative level of experience with programming.
You will have two weeks at the end of the quarter to dedicate completely toward the
project and the Review Quiz. This is your chance to apply what you've learned
about Python to a problem that really interests you.
One week into Lesson 4 you are required to submit a project proposal to the Final
Project Proposal Drop Box in ANGEL. This proposal must clearly explain:
2. How your proposed solution will make the task easier, faster, and/or more
accurate. Also explain why your task could not simply be accomplished using
the "out-of-the-box" tools from Esri, or why your script gives a particular
3. The deliverables you will submit for the project. A well-documented script
tool is highly encouraged. If the script requires data, describe how the
instructors will be able to evaluate your script. Possible solutions are to zip a
sample dataset for the instructors, demonstrate your script during an Adobe
Connect session, or make the script flexible enough that it could be used
As you work on your project, you're encouraged to seek help from all resources
discussed in this class, including existing code samples and scripts on the Internet.
If you re-use any long sections of code that you found on the Internet, please
thoroughly explain in your project writeup how you found it, tested it, and
extracted only the parts you needed.
Project ideas
If you're having trouble thinking up a project, you can derive a proposal from one
of the suggestions here. You may have to spend a little bit of time acquiring or
making up some test datasets to fit these project ideas. I also suggest that you read
through the Lesson 4 material before selecting a project, just so you have a better
idea of what types of things are possible with Python.
Compare dataset statistics: Make a tool or script that takes two feature
classes as input, along with a field name. The tool should check whether the
field is numeric and exists in both feature classes. If both these conditions
are met, the tool should calculate statistics for that field for both feature
classes and report the difference. Statistics could be sum, average, standard
that reads two feature classes based on a key field (such as OBJECTID). The
tool should figure out which features only appear in one of the feature
classes and write them to a third feature class. As a variation on this, the tool
could figure out which features appear in both feature classes and write
them to a third feature class. You could even allow the tool user to set a
and reports the difference. For example, this tool might compare "Acres of
in 2009."
Find and replace: Make a tool flexible enough to search for any term in
any field in a feature class and replace it with another user-provided term.
Ensure in your code that users cannot modify the critical fields' OBJECTIDs
or SHAPEs. Also ensure that partial strings are supported, such that if the
Parse KML, XML, or JSON and write to a feature class: Make a tool
or script that reads a KML file, or an XML or JSON response from a Web
service, and writes the geometries to a feature class. (You'll get some
Concatenate name fields: Write a tool or script that takes a feature class
parameters that represent fields in the feature class. Your tool should add a
new field for each record that contains the first, middle, and last names
separated by one space. Your tool should intuitively handle blank records
that takes a raw elevation dataset (such as a DEM), clips it to your study
area, creates both hillshade and slope rasters, and projects them into your
organization's most commonly used projection. Expose the study area
Parse raw textual data and write to feature class: Find some data
format with no Esri feature class available (for example, weather station
readings or GPS tracks). If you need to, you can copy the HTML out of the
Web page and paste it in a .txt file to help you get started. Read the .txt file
and write the data to a new feature class. (You'll get some exposure to
Make an MXD repair tool: Make a tool that takes an old and new
workspace path as inputs and then repairs all the broken data links in an
MXD. (You can do this using the arcpy.mapping module described in Lesson
4.)
Make a "map book": Make a tool that opens a series of MXDs, data
frames, or map extents and constructs a multi-page PDF from them (You
Penn State Professional Masters Degree in GIS: Winner of the 2009 Sloan
Consortium award for Most Outstanding Online Program
© 1999-2012 The Pennsylvania State University. Except where otherwise noted,
this courseware module is licensed under the Creative Commons Attribution-Non-
Commercial-Share-Alike 3.0 License and is freely available through Penn State's
College of Earth and Mineral Sciences' Open Educational Resources Initiative.
Please address questions and comments about this resource to the site editor.
Lesson 4 checklist
Lesson 4 explores some more advanced Python concepts, including reading and
parsing text. To complete Lesson 4, do the following:
1. One week into the lesson, submit your Final Project proposal [1] to the
instructors using the ANGEL e-mail system. For the exact due date, see the
3. Read Zandbergen chapters 7.6, 8.1 - 8.6, 10, and 12.1 - 12.5. In the online
drop box.
Functions exist in many programming languages, and each has its way of defining a
function. In Python, you define a function using the def statement. Each line in the
function that follows the def is indented. Here's a simple function that reads the
radius of a circle and reports the circle's approximate area. (Remember that the
area is equal to pi [3.14159...] multiplied by the square [** 2] of the radius.)
Notice from the above example that functions can take parameters, or arguments.
When you call the above function, you supply the radius of the circle in
parentheses. The function returns the area (notice the return statement, which is
new to you).
Thus, to find the area of a circle with a radius of 3 inches, you could make the
function call findArea(3) and get the return value 28.27431 (inches).
It's common to assign the returned value to a variable and use it later in your code.
For example, you could add these lines in the Interactive Window:
A function is not required to return any value. For example, you may have a
function that takes the path of a text file as a parameter, reads the first line of the
file, and prints that line to the Interactive Window. Since all the printing logic is
performed inside the function, there is really no return value.
Neither is a function required to take a parameter. For example, you might write a
function that retrieves or calculates some static value. Try this in the Interactive
Window:
Modules
You may be wondering what advantage you gain by putting the above
getCurrentPresident() logic in a function. Why couldn't you just define a string
currentPresident and set it equal to "Barack Obama?" The big reason is reusability.
Suppose you maintain 20 different scripts, each of which works with the name of
the current President in some way. You know that the name of the current
President will eventually change. Therefore, you could put this function in what's
known as a module file and reference that file inside your 20 different scripts.
When the name of the President changes, you don't have to open 20 scripts and
change them. Instead, you just open the module file and make the change once.
You may remember that you've already worked with some of Python's built-in
modules. The Hi Ho! Cherry O example in Lesson 2 imported the random module
so that the script could generate a random number for the spinner result. This
spared you the effort of writing or pasting any random number generating code
into your script.
You've also probably gotten used to the pattern of importing the arcpy site package
at the beginning of your scripts. A site package can contain numerous modules. In
the case of arcpy, these modules include Esri functions for geoprocessing.
As you use Python in your GIS work, you'll probably write functions that are useful
in many types of scripts. These functions might convert a coordinate from one
projection to another, or create a polygon from a list of coordinates. These
functions are perfect candidates for modules. If you ever want to improve on your
code, you can make the change once in your module instead of finding each script
where you duplicated the code.
Creating a module
To create a module, create a new script in PythonWin and save it with the standard
.py extension; but instead of writing start-to-finish scripting logic, just write some
functions. Here's what a simple module file might look like. This module only
contains one function, which adds a set of points to a feature class given a Python
list of coordinates.
# This module is saved as practiceModule1.py
(Note that if you're using ArcGIS 10.1 with the data access module arcpy.da, you
could write it like the following:)
The above function createPoints could be useful in various scripts, so it's very
appropriate for putting in a module. Notice that this script has to work with insert
cursors and point objects, so it requires arcpy. It's legal to import a site package or
module within a module.
Also notice that arcpy is imported within the function, not at the very top of the
module like you are accustomed to seeing. This is done for performance reasons.
You may add more functions to this module later that do not require arcpy. You
should only do the work of importing arcpy when necessary, that is, if a function is
called that requires it.
The arcpy site package is only available inside the scope of this function. If other
functions in your practice module were called, the arcpy module would not be
available to those functions. Scope applies also to variables that you create in this
function, such as rowInserter. Scope can be further limited by loops that you put in
your function. The variable pointGeometry is only valid inside the for loop inside
this particular function. If you tried to use it elsewhere, it would be out of scope and
unavailable.
Using a module
So how could you use the above module in a script? Imagine that the module above
is saved on its own as practiceModule1.py. Below is an example of a separate script
that imports practiceModule1.
The above script is simple and easy to read because you didn't have to include all
the logic for creating the points. That is taken care of by the createPoints function
in the module you imported, practiceModule1. Notice that to call a function from a
module, you need to use the syntax module.function().
Readings
To reinforce the material in this section, read Zandbergen 12.1 - 12.5, which talks
about creating Python functions and modules.
Practice
Before moving ahead, get some practice in PythonWin by trying to write the
following functions. These functions are not graded, but the experience of writing
them will help you in Project 4. Use the course forums to help each other.
A function that returns the perimeter of a square given the length of one
side.
A function that takes a path to a feature class as a parameter and returns a
Python list of the fields in that feature class. Practice calling the function and
printing the list. However, do not print the list within the function.
A function that returns the Euclidean distance between any two coordinates.
The coordinates can be supplied as parameters in the form (x1, y1, x2, y2).
59468), your function call might look like this: findDistance(312088, 60271,
The best practice is to put your functions inside a module and see if you can
successfully call them from a separate script. If you try to step through your code
using the debugger, you'll notice that the debugger helpfully moves back and forth
between the script and the module whenever you call a function in the module.
When faced with these files, you should first understand if your GIS software
already comes with a tool or script that can read or convert the data to a format it
can use. If no tool or script exists, you'll need to do some programmatic work to
read the file and separate out the pieces of text that you really need. This is called
parsing the text.
For example, a Web service may return you many lines of XML describing all the
readings at a weather station, when all you're really interested in are the
coordinates of the weather station and the annual average temperature. Parsing the
response involves writing some code to read through the lines and tags in the XML
and isolating only those three values.
When you parse, you cycle through lines of text, treating them as strings, and pull
out the useful information from those strings. In an XML file, for example, you may
know that the information you want falls inside a particular tag, such as
<AvgTemp>46</AvgTemp>. One approach to getting the value 46 might be to
search for the line containing the substring "AvgTemp," then take all the characters
that fall between the first > coming from the left of the string and the first < coming
from the right.
In another case, you might know that the values you want fall after the second and
third commas in a line of comma-separated values. You can split up the line based
on comma locations and put all the segments of the string in a Python list. You can
then take the third and fourth items in the list to get the values you want.
(Remember the third and fourth items would come after the second and third
commas, respectively.)
In both cases, the keys to effective parsing are to know how to read lines in a file
and know your string manipulation methods. It's helpful to know how to read a
string, search for values in a string, split up a string based on some delimeter, and
extract particular segments of a string.
Sometimes you can import helper modules, or libraries, into your code to make it
easier to parse certain types of text. In the XML example, it may be easier to import
xml.dom (described here in Chapter 1 of the book Python & XML [3]), which puts
all the XML elements in the file into a series of lists. Searching through those lists is
easier than repeatedly searching for the < and > characters. If you're preparing for
a big parsing project with XML or some other type of well-known format, it may be
worth your while to investigate whether there's a third-party library that can help
you.
There are an infinite number of parsing scenarios that you can encounter. This
lesson will attempt to teach you the general approach by walking through just one
example. In your final project for this course, you may choose to explore parsing
other types of files.
This example reads a text file collected from a GPS unit. The lines in the file
represent readings taken from the GPS unit as the user traveled along a path. In
this section of the lesson, you'll learn one way to parse out the coordinates from
each reading. The next section of the lesson uses a variation of this example to
show how you could write the user's track to a polyline feature class.
The file for this example is called gps_track.txt and it looks something like the text
string shown below. (Please note, line breaks have been added to the file shown
below to ensure that the text fits within the page margins. Click on this link to the
gps track.txt file [4] to see what the text file actually looks like.)
type,ident,lat,long,y_proj,x_proj,new_seg,display,color,altitude,depth,te
mp,time,model,filename,ltime
TRACK,ACTIVE LOG,40.78966141,-
77.85948515,4627251.76270444,1779451.21349775,True,False,
255,358.228393554688,0,0,2008/06/11-14:08:30,eTrex Venture,
,2008/06/11 09:08:30
TRACK,ACTIVE LOG,40.78963995,-
77.85954952,4627248.40489401,1779446.18060893,False,False,
255,358.228393554688,0,0,2008/06/11-14:09:43,eTrex Venture,
,2008/06/11 09:09:43
TRACK,ACTIVE LOG,40.78961849,-
77.85957098,4627245.69008772,1779444.78476531,False,False,
255,357.747802734375,0,0,2008/06/11-14:09:44,eTrex Venture,
,2008/06/11 09:09:44
TRACK,ACTIVE LOG,40.78953266,-
77.85965681,4627234.83213242,1779439.20202706,False,False,
255,353.421875,0,0,2008/06/11-14:10:18,eTrex Venture, ,2008/06/11
09:10:18
TRACK,ACTIVE LOG,40.78957558,-
77.85972118,4627238.65402635,1779432.89982442,False,False,
255,356.786376953125,0,0,2008/06/11-14:11:57,eTrex Venture,
,2008/06/11 09:11:57
TRACK,ACTIVE LOG,40.78968287,-
77.85976410,4627249.97592111,1779427.14663093,False,False,
255,354.383178710938,0,0,2008/06/11-14:12:18,eTrex Venture,
,2008/06/11 09:12:18
TRACK,ACTIVE LOG,40.78979015,-
77.85961390,4627264.19055204,1779437.76243578,False,False,
255,351.499145507813,0,0,2008/06/11-14:12:50,eTrex Venture,
,2008/06/11 09:12:50
etc. ...
Notice that the file starts with a header line, explaining the meaning of the values
contained in the readings from the GPS unit. Each subsequent line contains one
reading. The goal for this example is to create a Python list containing the X,Y
coordinates from each reading. Specifically, the script should be able to read the
above file and print a text string like the one shown below.
Before you start parsing a file, it's helpful to outline what you're going to do and
break up the task into manageable chunks. Here's some pseudocode for the
approach we'll take in this example:
"long" values.
5. Split each line into a list of values, using the comma as a delimiter.
6. Find the values in the list that correspond to the lat and long coordinates
The first thing the script needs to do is open the file. Python contains a built-in
open() [5] method for doing this. The parameters for this method are the path to
the file and the mode in which you want to open the file (read, write, etc.). In this
example, "r" stands for read-only mode.
Opening the file with the open() method gets you a file object (called gpsTrack in
our case). You can read the first line by calling the file.readline() [6] method, like
this:
headerLine = gpsTrack.readline()
Looping through the header line to find the index positions of the the
"lat" and "long" values
You need to search through this string and find the position of "lat" and "long." If
you start counting comma-separated values in this string beginning from zero, it's
easy to see that "lat" is at index position 2 and "long" is at index position 3.
However, it's a good practice not to hard-code numbers like 2 and 3 into your
script. Hard-coded numbers other than 0 or 1 are sometimes derided as magic
numbers, suggesting that if you're not the programmer, you might have to use
magic to know where the numbers came from!
Avoiding magic numbers gives you greater flexibility. If you wanted to re-use this
script with a file in which "lat" and "long" were in different positions, you wouldn't
have to modify your code. Even if "lat" and "long" went by some other name, it
would be easier to find and change a string in your script instead of finding and
changing a "magic number".
So how can you programmatically determine that "lat" is at index 2 and "long" is at
index 3? Below is one way that uses the string.split() [7] method. This method puts
each "item" in the line into a list. The parameter you pass to the split method
determines the delimiter, or the character that determines a new list item. In our
case, it's the comma:
valueList = headerLine.split(",")
The above method call returns: ['type', 'ident', 'lat', 'long', 'y_proj', 'x_proj',
'new_seg', 'display', 'color', 'altitude', 'depth', 'temp', 'time', 'model', 'filename',
'ltime']. The key is that now you can cycle through this list and discover the position
of "lat" and "long." To do this, you could write a loop that searched through the list
for "lat" and "long," but a quicker way is to use the helper method index() that gets
you the index position of any item in the list:
latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")
When you have an open text file, you can always call file.readline() to go to the next
line. In our case, we know we're going to use all the rest of the lines in the file, so
it's more efficient to call file.readlines() [8] to read them all at once. (This might
not be efficient with an extremely long file.) The readlines() method returns a list of
all the remaining lines in the file.
Now you can cycle through each GPS reading and split it up based on commas the
same way you split up the header. You specifically need to pull out the values in
index positions 2 and 3 of your list (represented by latValueIndex and
lonValueIndex, respectively) and write those to a new list (coordList).
print coordList
Note a few important things about the above code:
coordList actually contains a bunch of small lists within a big list. Each small
The list.append() method is used to add items to coordList. Notice again that
you can append a list itself (representing the coordinate pair) using this
method.
Here's the full code for the example. Feel free to download the text file [4] and try it
out on your computer.
# Reads a GPS-produced text file and writes the lat and long values
# to a list of coordinates
gpsTrack = open("C:\\Data\\GPS\\gps_track.txt", "r")
latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")
print coordList
You might be asking at this point, "What good does this list of coordinates do for
me?" Admittedly, the data is still very "raw." It cannot be read directly in this state
by a GIS. However, having the coordinates in a Python list makes them easy to get
into other formats that can be visualized. For example, these coordinates could be
written to points in a feature class, or vertices in a polyline or polygon feature class.
The list of points could also be sent to a Web service for reverse geocoding, or
finding the address associated with each point. The points could also be plotted on
top of a Web map using programming tools like the Google Maps API. Or, if you
were feeling really ambitious, you might use Python to write a new file in KML
format, which could be viewed in 3D in Google Earth.
Summary
Parsing any piece of text requires you to be familiar with file opening and reading
methods, the structure of the text you're going to parse, and string manipulation
methods. In the preceding example, we parsed a simple text file, extracting
coordinates collected by a handheld GPS unit. We used the string.split() method to
break up each GPS reading and find the latitude and longitude values. In the next
section of the lesson, you'll learn how you could do more with this information by
writing the coordinates to a polyline dataset.
As you use Python in your GIS work, you could encounter a variety of parsing tasks.
As you approach these, don't be afraid to seek help from Internet examples, code
reference topics such as the ones linked to in this lesson, and your textbook.
Readings
It's worth your time to read Zandbergen 7.6, which talks about parsing text files.
Any examples you can pick up with text parsing will help you when you encounter a
new file that you need to read. You'll have this experience in the practice exercises
and projects this week.
You've already had some experience writing point geometries when we learned
about insert cursors. To review, you use arcpy.Point() to create a Point object, then
you use an insert cursor to assign it to the geometry field of the feature class (called
"shape" for shapefiles).
# Create point
inPoint = arcpy.Point(-121.34, 47.1)
For polylines and polygons, you create multiple Point objects that you add to an
Array object. Then you make a Polyline or Polygon object using the array. With
polygons it's a good practice to make the end vertex the same as the start vertex if
possible.
The code below creates an empty array and adds three points using the Array.add()
method. Then the array is used to create a Polyline object.
The first parameter you pass in when creating a polyline is the array containing the
points for the polyline. The second parameter is a spatial reference of the
coordinates, which you should always pass in to ensure that the precision of your
data is maintained.
Of course, you usually won't create points manually in your code like this with
hard-coded coordinates. It's more likely that you'll parse out the coordinates from a
file or capture them from some external source, such as a series of mouse clicks on
the screen.
Here's how you could parse out coordinates from a GPS-created text file like the
one in the previous section of the lesson. This code reads all the points captured by
the GPS and adds them to one long polyline. The polyline is then written to an
empty, pre-existing polyline shapefile with a geographic coordinate system named
tracklines.shp. If you didn't have a shapefile already on disk, you could use the
Create Feature Class tool to create one with your script.
# Reads a GPS-produced text file and writes the lat and long values
# to an already-created polyline shapefile
import arcpy
# Hard-coded variables for GPS track text file and feature class
gpsTrack = open("C:\\Data\\GPS\\gps_track.txt", "r")
polylineFC = "C:\\Data\\GPS\\tracklines.shp"
spatialRef = arcpy.Describe(polylineFC).spatialReference
latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")
cursor.insertRow(feature)
del cursor
The above script starts out the same as the one in the previous section of the lesson.
First, it parses the header line of the file to determine the position of the latitude
and longitude coordinates in each reading. But then, notice that an array is created
to hold the points for the polyline:
vertexArray = arcpy.Array()
After that, a loop is initiated that reads each line and creates a point object from the
latitude and longitude values. At the end of the loop, the point is added to the array.
Once all the lines have been read, the loop exits and an insert cursor is created. The
cursor is used to create a new row. Then a Polyline object is created and assigned to
the shape field, thereby giving the row some geometry.
cursor.insertRow(feature)
del cursor
Remember that the cursor places a lock on your dataset, so this script doesn't
create the cursor until absolutely necessary (in other words, after the loop). After
the row is inserted, the cursor is deleted to remove the lock.
Just for fun, suppose your GPS allows you to mark the start and stop of different
tracks. How would you handle this in the code? You can download this modified
text file with multiple tracks [9] if you want to try out the following example.
type,ident,lat,long,y_proj,x_proj,new_seg,display,color,altitude,depth,te
mp,time,model,filename,ltime
new_seg is a boolean property that determines whether the reading begins a new
track. If new_seg = true, you need to write the existing polyline to the shapefile and
start creating a new one. Take a close look at this code example and notice how it
differs from the previous one in order to handle multiple polylines:
# Reads a GPS-produced text file and writes the lat and long values
# to an already-created polyline shapefile. Handles multiple polylines.
latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")
newTrackIndex = valueList.index("new_seg")
del cursor
The first thing you should notice is that this script uses a function. The addPolyline
function adds a polyline to a feature class, given three parameters: (1) an existing
insert cursor, (2) an array, and (3) a spatial reference. This function cuts down on
repeated code and makes the script more readable.
Notice it's okay to use arcpy in the above function, since it is going inside the body
of a script that imports arcpy. However, you want to avoid using variables in the
function that are not defined within the function or passed in as parameters.
The addPolyline function is called twice in the script: once within the loop, which
we would expect, and once at the end to make sure the final polyline is added to the
shapefile. This is where writing a function cuts down on repeated code.
As you read each line of the text file, how do you determine whether it begins a new
track? First of all, notice that we've added one more value to look for in this script:
newTrackIndex = valueList.index("new_seg")
The variable newTrackIndex shows us which position in the line is held by the
boolean new_seg property that tells us whether a new polyline is beginning. If you
have sharp eyes, you'll notice we check for this later in the code:
segmentedLine = line.split(",")
isNew = segmentedLine[newTrackIndex].upper()
In the above code, the upper() method converts the string into all upper-case, so we
don't have to worry about whether the line says "true," "True," or "TRUE." But
there's another situation we have to handle: What about the first line of the file?
This line should read "true," but we can't add the existing polyline to the file at that
time, because there isn't one yet. Notice that a second check is performed to make
sure there are more than zero points in the array before the array is written to the
shapefile:
The above code checks to make sure there's at least one point in the array, then it
calls the addPolyline function, passing in the cursor and the array.
Here's another question to consider: How did we know that the Array object has a
count property that tells us how many items are in it? This comes from the ArcGIS
Desktop Help topic describing the Array class [10]. In this section of the help there
are topics describing each class in arcpy, and you'll come here often if you work
with ArcGIS geometries in Python.
In the above-linked Array topic, find the Properties table in this topic and notice
that Array has a read-only count property. If we were working with a Python list,
we could use len(vertexArray), but in our case vertexArray is an Array object that is
native to the ArcGIS geoprocessing programming model. This means it is a
specialized object designed by Esri, and you can only learn its methods and
properties by examining the documentation. Bookmark these pages!
The GPS parsing example using ArcGIS 10.1 and the arcpy data access
module
For reference only, below is how you could write the above script using ArcGIS 10.1
and the data access module arcpy.da. This example handles multiple polylines in
the file. The syntax ("SHAPE@",) is a tuple with one item, indicating that just the
SHAPE field will be updated using the insert cursor.
# Reads a GPS-produced text file and writes the lat and long values
# to an already-created polyline shapefile. Handles multiple polylines.
# Hard-coded variables for GPS track text file and feature class
gpsTrack = open("D://Data//GPS//gps_track_multiple.txt", "r")
polylineFC = "D://Data//GPS//tracklines_sept25.shp"
spatialRef = arcpy.SpatialReference("WGS 1984")
latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")
newTrackIndex = valueList.index("new_seg")
Summary
You may be wondering how you might create a multi-part feature (such as the state
of Hawaii containing multiple islands), or a polygon with a "hole" in it. There are
special rules for ordering and nesting Points and Arrays to create these types of
geometries. These are covered in the course textbook, which brings us to...
Readings
Read Zandbergen 8.1 - 8.6, which contains a good summary of how to read and
write Esri geometries.
Most of the time we've run scripts in this course, it's been through PythonWin.
Your operating system (Windows) can run scripts directly. Maybe you've tried to
double-click a .py file to run a script. As long as Windows understands that .py files
represent a Python script and that it should use the Python interpreter to run the
script, the script will launch immediately.
When you try to launch a script automatically by double-clicking it, it's possible
you'll get a message saying Windows doesn't know which program to use to open
your file. If this happens to you, use the Browse button on the error dialog box to
browse to the Python executable, most likely located in
C:\Python26\ArcGIS10.0\Python.exe. Make sure "Always use the selected
program to open this kind of file" is checked and click OK. Windows now
understands that .py files should be run using Python.
Double-clicking a .py file gives your operating system the simple command to run
that Python script. You can alternatively tell your operating system to run a script
using the Windows command line interface. This environment just gives you a
blank window with a blinking cursor and allows you to type the path to a script or
program, followed by a list of parameters. It's a clean, minimalist way to run a
script. In Windows XP, you can open the command line by clicking Start > Run
and typing cmd. In Windows Vista or Windows 7, just type cmd in the Search box.
Advanced use of the command line is outside the scope of this course. For now, it's
sufficient to say that you can run a script from the command line by typing the path
of the Python executable, followed by the full path to the script, like this:
C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson1\Project1.py
If the script takes parameters, you must also type each argument separated by a
space. Remember that arguments are the values you supply for the script's
parameters. Here's an example of a command that runs a script with two
arguments, both strings that represent pathnames. Notice that you should use the
regular \ in your paths when providing arguments from the command line (not / or
\\ as you would use in PythonWin).
C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson2\Project2.py
C:\WCGIS\Geog485\Lesson2\ C:\WCGIS\Geog485\Lesson2\CityBoundaries.shp
If the script executes successfully, you often won't see anything except a new
command prompt (remember, this is minimalist!). If your script is designed to
print a message, you should see the message. If your script is designed to modify
files or data, you can check those files or data (perhaps using ArcCatalog) to make
sure the script ran correctly.
You'll also see messages if your script fails. Sometimes these are the same messages
you would see in the Python Interactive Window. At other times, the messages are
more helpful than what you would see in PythonWin, making the command line
another useful tool for debugging. Unfortunately, at some times the messages are
less helpful.
Batch files
Why is the command line so important in a discussion about automation? After all,
it still takes work to open the command line and type the commands. The beautiful
thing about commands is that they, too, can be scripted. You can list multiple
commands in a simple text-based file, called a batch file. Running the batch file
runs all the commands in it.
Here's an example of a simple batch file that runs the two scripts above. To make
this batch file, you could put the text below inside an empty Notepad file and save it
with a .bat extension. Remember that this is not Python; it's command syntax:
@ECHO OFF
REM Runs both my project scripts
C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson1\Project1.py
ECHO Ran project 1
C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson2\Project2.py
C:\WCGIS\Geog485\Lesson2\ C:\WCGIS\Geog485\Lesson2\CityBoundaries.shp
ECHO Ran project 2
PAUSE
Here are some notes about the above batch file, starting from the top:
@ECHO OFF prevents all the lines in your batch file from being printed to
the command line window, or console, when you run the file. It's standard
procedure to use this as the first line of your batch file, unless you really
want to see which line of the file is executing (perhaps for debugging
purposes).
REM is how you put a comment in your batch file, the same way # denotes a
comment in Python.
You put commands in your batch file using the same syntax you used from
ECHO prints something to the console. This can be useful for debugging,
especially when you've used @ECHO OFF because you don't want to see
PAUSE gives a "Press any key to continue..." prompt. If you don't put this at
the end of your batch file, the console will immediately close after the file is
done executing. When you're writing and debugging the batch file, it's useful
to put PAUSE at the end so you can see any error messages that were
printed when running the file. Once your batch file is tested and working
Batch files can contain variables, loops, comments, and conditional logic, all of
which are beyond the scope of this lesson. However, if you'll be writing and running
many scripts for your organization, it's worthwhile to spend some time learning
more about batch files. Fortunately, batch files have been around for a long time
(they are older than Windows itself), so there's an abundance of good information
available on the Internet to help you.
Scheduling tasks
At this point we've come pretty close to reaching true automation, but there's still
that need to launch the Python script or the batch file, either by double-clicking it,
invoking it from the command line, or otherwise telling the operating system to run
it. To truly automate the running of scripts and batch files, you can use an
operating system utility such as Windows Task Scheduler.
Task Scheduler is one of those items hidden in Windows System Tools that you
may not have paid any attention to before. It's a relatively simple program that
allows you to schedule your scripts and batch files to run on a regular basis. This is
helpful if the task needs to run often enough that it would be burdensome to launch
the batch file manually; but it's even more helpful if the task takes some of your
computing resources and you want to run it during the night or weekend to
minimize impact on others who may be using the computer.
Here's a real-world scenario where Task Scheduler (or a comparable utility if you're
running on a Mac, Linux, or UNIX) is very important: Fast Web maps tend to use a
server-side cache of pregenerated map images, or tiles, so that the server doesn't
have to draw the map each time someone navigates to an area. A Web map
administrator who has ArcGIS Server can run the tool Manage Map Server Cache
Tiles to make the tiles before he or she deploys the Web map. After deployment, the
server quickly sends the appropriate tiles to people as they navigate the Web map.
So far so good.
As the source GIS data for the map changes, however, the cache tiles become out of
date. They are just images and do not know how to update themselves
automatically. The cache needs to be updated periodically, but cache tile creation is
a time consuming and CPU-intensive operation. For this reason, many server
administrators use Task Scheduler to update the cache. This usually involves
writing a script or batch file that runs Manage Map Server Cache Tiles and other
caching tools, then scheduling that script to run on nights or weekends when it
would be least disruptive to users of the Web map.
Let's take a quick look inside Windows Task Scheduler. The instructions below are
for Windows Vista (and probably Windows 7). Other versions of Windows have a
very similar Task Scheduler, and with some adaptation you can also use the
instructions below to understand how to schedule a task.
2. Click Create Basic Task. This walks you through a simple wizard to set up
the task. You can configure advanced options on the task later.
3. Give your task a Name that will be easily remembered and optionally, a
4. Choose how often you want the task to run. For this example, choose Daily.
5. Choose a Start time and a recurrence frequency. If you want, choose a time
a few minutes ahead of the current time, so you can see what it looks like
7. Here's the moment of truth where you specify which script or batch file you
want to run. Click Browse and navigate to one of the Python scripts you've
written during this course. It's going to be easiest here if you pick a script
that doesn't take any arguments, such as your project 1 script that makes
contour lines from hard-coded datasets, but if you are feeling brave you can
also add arguments in this panel of the wizard. Then click Next.
highlight the task to see its properties, or right-click the task and click
Properties to actually set those properties. You can use the advanced
properties to get your script to run even more frequently than daily, for
10. Wait for your scheduled time to occur, or if you don't want to wait, right-
click the task and click Run. Either way, you'll see a console window appear
when the script begins and disappear once the script has finished. (If you're
running a Python script and you don't want the console window to disappear
at the end, you can put a line at the end of the script such as lastline =
raw_input(">"). This stops the script until the user presses Enter. Once
you're comfortable with the script running on a regular basis, you'll probably
want to remove this line to keep open console windows from cluttering your
screen. After all, the idea of a scheduled task is that it happens in the
Summary
To make your scripts run automatically, you use Windows Task Scheduler to create
a task that the operating system runs at regular intervals. The task can point at
either a .py file (for a single script), or a .bat file (for multiple scripts). Using
scheduled tasks, you can achieve full automation of your GIS processes.
The approach for both of these situations is the same. Here are some suggested
steps for running any tool in the ArcGIS toolboxes using Python:
1. Find the tool reference documentation. We've seen this already during the
course. Each tool has its own topic in the Geoprocessing tool reference [11]
section of the ArcGIS Help. Open that topic and read it before you do
anything else. Read the "Usage" section at the beginning to make sure that
it's the right tool for you and that you are about to employ it correctly.
2. Examine the parameters. Scroll down to the "Syntax" section of the topic
and read which parameters the tool accepts. Note which parameters are
required and which are optional, and decide which parameters your script is
going to supply.
3. In your Python script, create variables for each parameter. Note that each
parameter in the "Syntax" section of the topic has a data type listed. If the
data type for a certain parameter is listed as "String," you need to create a
Python string variable for that parameter.
Sometimes the translation from data type to Python variable is not direct.
For example, sometimes the tool reference will say that the required variable
is a "Feature Class." What this really means for your Python script is that
you need to create a string variable containing the path to a feature class.
Another example is if the tool reference says that the required data type is a
"Long." What this means in Python is that you need to create a numerical
variable (as opposed to a string) for that particular parameter.
If you have doubts about how to create your variable to match the required
data type, scroll down to the "Code Sample" in the tool reference topic. Try
to find the place where the example script defines the variable you're having
trouble with. Copy the patterns that you see in the example script and
usually you'll be okay.
Most of the commonly used tools have excellent example scripts, but others
are hit or miss. If your tool of interest doesn't have a good example script,
you may be able to find something on the Esri forums, ArcScripts [12], or a
well-phrased Google search.
4. Run the tool...with error handling. You can run your script without
you're still not getting anything helpful, a next resport is to add the
block.
In Project 4 you'll get a chance to practice these skills to run a tool you previously
haven't worked with in a script.
Repairing layers that are referencing data sources using the wrong paths.
For example, your map was sitting on a computer where all the data was in
D:\myfolder\mydata.
Exporting a series of maps to PDF and joining them to create a "map book."
Esri map documents are binary files, meaning they can't be easily read and parsed
using the techniques we covered earlier in this lesson. Until very recently the only
way to automate anything with a map document was to use ArcObjects, which is
somewhat challenging for beginners and requires using a language other than
Python. With the release of ArcGIS 10.0, Esri added a Python module for
automating common tasks with map documents.
arcpy.mapping is a module you can use in your scripts to work with map
documents. Please take a detour at this point to read the Esri overview of
arcpy.mapping, which is found in the topic Geoprocessing scripts for map
document management and output [13].
The most important object in this module is MapDocument. This tells your script
which map you'll be working with. You can get a MapDocument by referencing a
path, like this:
mxd = arcpy.mapping.MapDocument(r"C:\data\Alabama\UtilityNetwork.mxd")
Notice the use of r in the line above to denote a string literal. In other words, if you
include r right before you begin you're string, it's safe to use reserved characters
like the single backslash \. I've done it here because you'll see it in a lot of the Esri
examples with arcpy.mapping.
Instead of directly using a string path, you could alternatively put a variable
holding the path. This would be useful if you were iterating through all the map
documents in a folder using a loop, or if you previously obtained the path in your
script using something like arcpy.GetParameterAsText().
mxd = arcpy.mapping.MapDocument("CURRENT")
Once you get a MapDocument, then you do something with it. Most of the
functions in arcpy.mapping take a MapDocument object as a parameter. Let's look
at this first script from the Esri help topic linked above and scrutinize what is going
on. I've added comments to each line.
The first line in the above example gets a MapDocument object referencing
C:\GIS\TownCenter_2009.mxd. The example then employs two functions from
arcpy.mapping. The first is ListLayoutElements. Notice that the parameters for this
function are a MapDocument and the type of layout element you want to get back,
in this case, "TEXT_ELEMENT". (Examine the documentation for List Layout
Elements [14] to understand the other types of elements you can get back.)
The function returns a Python list of TextElement [15] objects representing all the
text elements in the map document. You know what to do if you want to
manipulate every item in a Python list. In this case, the example uses a for loop to
check the TextElement.text property of each element. This property is readable and
writeable, meaning if you want to set some new text, you can do so by simply using
the equals sign assignment operator as in textElement.text = "GIS Services Division
2010"
Learning arcpy.mapping
The best way to learn arcpy.mapping is to try to use it. Because of its simple, "one-
line-fix" nature, it's a good place to practice your Python. It's also a good way to get
used to the Python window in ArcMap, because you can immediately see the results
of your actions.
By now you'll probably have experienced the reality that your code does not always
run as expected on the first try. Before you start running arcpy.mapping commands
on your production MXDs, I suggest making backup copies.
Here are a few additional places where you can find excellent help on learning
arcpy.mapping:
Zandbergen chapter 10. I recommend that you at least skim this chapter to
The Arcpy Mapping module book [17] in the ArcGIS Desktop Help
Video from the 2010 Esri Developer Summit: Python scripting for map
ArcGIS 10 [19]
To conclude this lesson, however, it's important to talk about what's not available
through Python scripting in ArcGIS.
At ArcGIS, Python interaction with ArcGIS is mainly limited to reading and writing
data, editing the properties of map documents, and running the tools that are
included with ArcGIS. Although the ArcGIS tools are useful, they are somewhat
black box, meaning you put things in and get things out without knowing or being
concerned about what is happening inside. If you want a greater degree of control
over how ArcGIS is manipulating your data, you need to work with ArcObjects.
In this course we have done nothing with customizing ArcMap to add special
buttons, toolbars, and so on that trigger our programs. Our foray into user interface
design has been limited to making a script tool and toolbox. Although script tools
are useful, there are times when you want to take the functionality out of the
toolbox and put it directly into ArcMap as a button on a toolbar. You may want that
button to launch a new window with text boxes, labels, and buttons that you design
yourself.
In ArcGIS 10.0 if you want to put custom functionality or programs directly into
ArcMap, you need to use Visual Basic for Applications (VBA), C ++, or a .NET
language (VB.NET or C#) working with ArcObjects. The functionality may be as
simple as putting some custom actions behind a button (zoom to a certain
bookmark, for example), or you may open a full-blown program you develop with
multiple forms, options, and menus. The aforementioned languages have IDEs in
which you can design custom user interfaces with text boxes, labels, buttons, and so
on.
Geog 489, another elective course in the GIS certificate program, covers GIS
customization using ArcObjects.
To allow a greater degree of interactivity between the ArcMap user interface and
Python scripts, ArcGIS 10.1 introduces the concept of a Python add-in. These allow
you to attach Python logic to a limited set of actions you perform in ArcMap, such
as zooming the map, opening a new map document, or clicking a button on a
custom toolbar. For example, you might create an add-in that automatically adds a
particular set of layers any time someone pushes a certain button on your toolbar.
With Python add-ins, you get access to a number of user interface elements to use
as a front end to your Python scripts, including toolbars, buttons, menus, combo
boxes, and basic file browsing and Yes/No confirmation dialog boxes. There's also a
set of common events that you can detect and respond to in your code, such as the
map opening, the map extent changing, or the spatial reference changing. Although
this is far from the full realm of ArcObjects and .NET customization possibilities, it
gives a lot more possibilities than were available in previous versions of ArcGIS.
The nice thing about add-ins is that they are easily shareable. You download the
Python Add-In Wizard from Esri, and it helps you prepare and package up your
add-in into a .esriaddin file. Other people with ArcGIS can then install the add-in
from the .esriaddin file.
Working with Python add-ins is currently not included in the scope of this course,
but you can learn all about them in the help book ArcGIS Desktop Python add-ins
[20]. After reading this material and getting a basic understanding of what's
required to create add-ins, you're welcome to incorporate them into your final
project if you have ArcGIS 10.1 and you are confident that you can work somewhat
independently to test and create the add-ins. If you have struggled in the course, I
recommend that you wait until after completing Geog 485 to further explore add-
ins, so that you can give them the necessary amount of time and testing.
Both exercises involve opening a file and parsing text. In Practice Exercise A, you'll
read some coordinate points and make a polygon from those points. In Practice
Exercise B, you'll work with dictionaries to manage information that you parse
from the text file.
Example solutions are provided for both practice exercises. You'll get the most
value out of the exercises if you make your best attempt to complete them on your
own before looking at the solutions. In any case, the patterns shown in the solution
code can help you approach Project 4.
boundary.
The objective
Your job is to write a script that reads the text file and creates a state boundary
polygon out of the coordinates. When you successfully complete this exercise, you
should be able to preview the shapefile in ArcCatalog and see the state boundary.
Tips
If you're up for the challenge of this script, go ahead and start coding. But if you're
not sure how to get started, here are some tips:
This script will differ from some of the examples you've seen. There is no
header line for the file, and there is only one line of text to read. This should
Remember that when you call the split() method, you can pass in any
delimiter. Previously we have used a comma (",") but you can use the | just
as easily ("|").
Before you start looping through the coordinates, create an Array object to
Loop through each coordinate and create a Point object from the coordinate
Once you start looping through the coordinates, you'll be dealing with
coordinate pairs such as -109.05,31.33. You need to split this again (this
the first row and assign your Array to the SHAPE field.
import arcpy
shapefile = "C:\\Data\\Lesson4PracticeExerciseA\\MysteryState.shp"
pointFilePath =
"C:\\Data\\Lesson4PracticeExerciseA\\MysteryStatePoints.txt"
spatialRef = arcpy.Describe(shapefile).spatialReference
# Open the file and read the first (and only) line
pointFile = open(pointFilePath, "r")
lineOfText = pointFile.readline()
Alternate solution using the arcpy data access module in ArcGIS 10.1
Here's an example of how you might solve this practice exercise using the arcpy
data access module in ArcGIS 10.1. In this case the "SHAPE@" token is used to
assign the geometry to a row. The syntax ("SHAPE@",) is a tuple with one item
indicating that just the SHAPE field will be updated.
import arcpy
shapefile =
"D:\\data\\Geog485\\Lesson4PracticeExerciseA\\MysteryState.shp"
pointFilePath =
"D:\\data\\Geog485\\Lesson4PracticeExerciseA\\MysteryStatePoints.txt"
spatialRef = arcpy.Describe(shapefile).spatialReference
# Open the file and read the first (and only) line
pointFile = open(pointFilePath, "r")
lineOfText = pointFile.readline()
The objective
You've been given a text file of (completely fabricated) soccer scores from some of
the most popular teams in Buenos Aires. Write a script that reads through the
scores and prints each team name, followed by the maximum number of goals that
team scored in a game, for example:
River: 5
Racing: 4
etc.
Keep in mind that the maximum number of goals scored might have come during a
loss.
You are encouraged to use dictionaries to complete this exercise. This is probably
the most efficient way to solve the problem. You'll also be able to write at least one
function that will cut down on repeated code.
I have purposefully kept this text file short to make things simple to debug. This is
an excellent exercise in using the debugger, especially to watch your dictionary as
you step through each line of code.
Tips
If you want a challenge, go ahead and start coding. Otherwise, here are some tips
that can help you get started:
Your approach should be to read through each line and split it using a space
make a dictionary that has a key for each team, and an associated value that
dictionary in the debugger it would look like {'River': '5', 'Racing': '4', etc.}
You can write a function that takes in three things: the key (team name), the
number of goals, and the dictionary name. This function should then check
if the key has an entry in the dictionary. If not, a key should be added and its
value set to the current number of goals. If a key is found, you should
perform a check to see if the current number of goals is higher than the
value associated with that key. If so, you should set a new value. Notice how
Some of the lines of text end with the new line character "\n". This can
happen with some text files that come out of Notepad. You can get rid of this
# Also check the losing number of goals against the team's max
checkGoals(loser, loserGoals, maxGoalsDictionary)
You’ll start with the code below that builds the dictionary. Copy and paste this into
an empty script and start writing your code below it. Your dictionary name will be
animals:
#create lists
dogList = ["Dalmatian", "German Shepherd"]
catList = ["American Shorthair"]
birdList = ["Robin", "Canary","Bluebird" ]
#use dict() constructor to create dictionary and add keys and values
return dict([('dogs', dogList), ('cats', catList), ('birds',
birdList)])
# Call the function and assign the result to the variable 'animals'.
animals = BuildDictionary()
# New code to print the average length of the animal names for each
animal type
# (dogs, cats, and birds) should be inserted after this line.
Tips
If you're up for the challenge of this script, go ahead and start coding. But if you're
not sure how to get started, here are some tips:
o You can find the length of a string by using the function len, for
example, len(MyString)
o There are many ways to solve this problem, the answer gives two.
Two ways of solving the problem are shown below, both accomplish the same
thing.
# Name: dictionaries.py
# Description: Solves sample problem using dictionaries.
# Author: Frank Hardisty
# function to load dictionary
def BuildDictionary():
#create lists
dogList = ["Dalmatian", "German Shepherd"]
catList = ["American Shorthair"]
birdList = ["Robin", "Canary","Bluebird" ]
#use dict() constructor to create dictionary and add keys and values
return dict([('dogs', dogList), ('cats', catList), ('birds',
birdList)])
# call the function and assign the result to the variable 'animals'
animals = BuildDictionary()
#find average length of names for different animal types two different
ways
total = 0.0
cList = animals['cats']
total = 0.0
bList = animals['birds']
You want to write a script that will turn the readings in the spreadsheet into a
vector dataset that you can place on a map. This will be a polyline dataset showing
the tracks the rhinos followed over the time the data was collected.
Please carefully read all the following instructions before beginning the project.
Deliverables
any text editor. This should consist only of short, focused steps describing
what you are going to do to solve the problem. This is a separate deliverable
2. A Python script that reads the data from the spreadsheet and creates, from
spreadsheet. Each polyline should also have a text attribute containing the
rhino's name. The shapefile should use the WGS 1984 geographic coordinate
system.
3. A short writeup (~300 words) explaining what you learned during this
project and which requirements you met, or failed to meet. Also describe any
"over and above" efforts here so that the graders can look for them.
Challenges
The data is in a format (XLSX) that you cannot easily parse. The first step
you must do is manually open the file in Excel and save it as a comma-
delimited format that you can easily read with a script. Choose the option
CSV (comma-delimited) (*.csv).
If you are so inclined, you can attempt to download and use a Python library
that works directly with XLSX files. Be aware that you will have less
comprehensive "technical support" from your fellow students if you use this
route.
The rhinos in the spreadsheet appear in no guaranteed order, and not all the
rhinos appear at the beginning of the spreadsheet. As you parse each line,
you must determine which rhino the reading belongs to and update that
rhino's polyline track accordingly. You are not allowed to sort the
Rhino column in Excel before you export to the CSV file. Your
their names are. Although you could visually comb the spreadsheet for this
handle all the rhino names programmatically. The idea is that you should be
able to run this script on a different file, possibly containing more rhinos,
You have not previously created a feature class programmatically. You must
find and run ArcGIS geoprocessing tools that will create an empty polyline
shapefile with a text field for storing the rhino's name. You must also assign
the WGS 1984 geographic coordinate system as the spatial reference for this
shapefile.
Hints
Before you start writing code, write a plan of attack describing the logic your
script will use to accomplish this task. Break up the original task into small,
focused chunks. You can write this in Word or even Notepad. Your objective
is not to write fancy prose, but rather short, terse statements of what your
code will do: in other words, pseudocode. Here's an example of some
pseudocode that might appear in your file:
...
Add the array to the dictionary using the rhino name as the key.
...
If you do a good job writing your pseudocode, you'll find that each line
translates into about one line of code. Writing your script then becomes a
matter of translating from English to code. You may also find it helpful to
sketch out a diagram of the workflow and logistical branches in your script.
You will have a much easier time with this assignment if you first create the
array objects representing each rhino track, then use insert cursors to add
the arrays once they are completed. Not only is this easier to code, it's better
for performance to open the insert cursor only once near the end of the
script.
A Python dictionary is an excellent structure for storing a rhino name
coupled with the rhino's array of observed locations. A dictionary is similar
to a list, but it stores items in key-value pairs. For example, a key could be a
string representing the rhino name, and that key's corresponding value
could be an array object containing all the points where the rhino was
observed. You can retrieve any value based on its key, and you can also
check whether a key exists using a simple if key in dictionary:
check.
We have not worked with dictionaries much in this course, but your
Zandbergen text has an excellent section about them and there are abundant
Python dictionary examples on the Internet.
You can alternatively use lists to keep track of the information, but this will
probably take more code. Using dictionaries I was able to write this script in
under 60 lines (including comments and whitespace). If you find yourself
getting confused or writing a lot of code with lists, you may try to switch to
dictionaries.
reference object that you can assign to the shapefile at the time you create it.
Systems\. Be warned that if you do not correctly apply the spatial reference,
If you do things right, your polylines should look like this (points are included only
for reference):
Note: Although I have placed the data in an African context (who heard of rhinos
wandering New York City?) it is completely fabricated and does not resemble the
path of any actual rhino, living or dead. If you exhibit a stellar performance on this
project, you may choose the option of having a rhino named after you in a future
offering of this course!
Penn State Professional Masters Degree in GIS: Winner of the 2009 Sloan
Consortium award for Most Outstanding Online Program
© 1999-2012 The Pennsylvania State University. Except where otherwise noted,
this courseware module is licensed under the Creative Commons Attribution-Non-
Commercial-Share-Alike 3.0 License and is freely available through Penn State's
College of Earth and Mineral Sciences' Open Educational Resources Initiative.
Please address questions and comments about this resource to the site editor.
Links:
[1] https://www.e-education.psu.edu/geog485/?q=node/149
[2] http://www.w3schools.com/XML/xml_whatis.asp
[3] http://oreilly.com/catalog/pythonxml/chapter/ch01.html
[4] https://www.e-education.psu.edu/drupal6/files/geog485py/data/gps_track.txt
[5] http://docs.python.org/library/functions.html#open
[6] http://docs.python.org/library/stdtypes.html?highlight=readline#file.readline
[7] http://docs.python.org/library/string.html?highlight=split#string.split
[8] http://docs.python.org/library/stdtypes.html?highlight=readline#file.readlines
[9] https://www.e-
education.psu.edu/drupal6/files/geog485py/data/gps_track_multiple.txt
[10]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v0000005r00000
0.htm
[11]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002t/002t0000000z000000.
htm
[12] http://arcscripts.esri.com/
[13]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Geoprocessing_
scripts_for_map_document_management_and_output/00s300000032000000/
[14]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/ListLayoutEleme
nts/00s30000003w000000/
[15]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/TextElement/00
s30000000m000000/
[16]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s3/00s300000027000000
.htm
[17]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s3/00s300000032000000
.htm
[18]
http://proceedings.esri.com/library/userconf/devsummit10/tech/tech_56.html
[19] http://geochalkboard.wordpress.com/2010/08/02/introducing-the-arcpy-
mapping-module-in-arcgis-10/
[20]
http://resources.arcgis.com/en/help/main/10.1/014p/014p00000025000000.ht
m
[21] https://www.e-
education.psu.edu/drupal6/files/geog485py/data/Lesson4PracticeExercises.zip
[22] https://www.e-
education.psu.edu/drupal6/files/geog485py/data/RhinoObservations.xlsx