Chapter4 SM
Chapter4 SM
Chapter4 SM
Figure 4-1: Maps have served to store geographic knowledge for at least the past 4000 years. This early map
of northern Europe shows approximate shapes and relative locations.
132 GIS Fundamentals
Figure 4-4: Maps often depict lines representing (a) a graticule of constant latitude and longitude or (b) a
grid of constant x and y coordinates.
and y directions, and appear straight on most include complete raster and vector data lay-
maps (Figure 4-4b). Graticules and grids are ers, text files, lists of coordinates, and digital
useful because they provide a reference images. Files and export formats can be used
against which location may be quickly esti- to transfer them to a local GIS system.
mated. Graticules are particularly useful for Global Navigation Satellite Systems
depicting the distortion inherent in a map (GNSS), such as the U.S. Global Positioning
projection, because they show how geo- System (GPS) and coordinate survey devices
graphic north or east lines are deformed, and described in Chapter 5, are direct measure-
how this distortion varies across the map. ment system that can be used to record coor-
Grids may establish a map-projected north, dinates in the field and report them directly
in contrast to geographic north, and may be into digital formats. Finally, a number of
useful when trying to navigate or locate a digital image sources are available, such as
position on the map. satellite or airborne images that are collected
Historical and current images are valu- in a digital raster format, or hardcopy aerial
able sources of geographic data, and photographs that have been scanned to pro-
although they are not maps, the line is duce digital images.
becoming blurred, as aerial and satellite pho- Hardcopy data are an important source
tographs become common backdrops for of geographic information for many reasons.
digital maps. Photographs do not typically First, most geographic information produced
provide an orthographic (flat, undistorted) before 1980 was recorded in hardcopy form.
view, and houses, rivers, or features of inter- Advances in optics, metallurgy, and industry
est are not explicitly identified. However, during the 18th and 19th centuries allowed
images are a rich source of geographic infor- the mass production of precise surveying
mation, and standard techniques may be devices, and by the mid 20th century, much
used to remove major systematic distortions of the world had been plotted on cartometric
and extract features, through manual digitiz- quality maps. Cartometric maps are those
ing, described later in this chapter, or that faithfully represent the relative position
through image classification, described in of objects and thus may be suitable as a
Chapter 6. source of spatial data.
Digital spatial data are those provided in While much spatial data has been col-
a computer-compatible format. These lected from hardcopy sources, data entry
134 GIS Fundamentals
from digital sources now dominates. Coordi- road may be plotted with a symbol defining
nates are increasingly captured via interpre- the type of road or a point may be plotted
tation of digital image sources (these sources indicating the location of a city center, but
are described in Chapter 6) or collected the width of the road or number of city
directly in the field by satellite-based posi- dwellers are not provided in the shading or
tioning services (Chapter 5). other symbology on the map. Feature maps
Our objective in this chapter is to intro- are perhaps the most common map form,
duce spatial data entry via digitizing and and examples include most road maps, and
coordinate surveying. We will also cover standard map series such as the 7.5 minute
basic editing methods and data documenta- topographic maps produced by the U.S.
tion, and rudimentary cartography and out- Geological Survey.
put. Choropleth maps depict quantitative
information for areas. A mapped variable
such as population density may be repre-
Map Types sented in the map (Figure 4-5, top right).
Many types of maps are produced, and Polygons define area boundaries, such as
the types are often referred to by the way counties, states, census tracts, or other stan-
features are depicted on the map. Feature dard administrative units. Each polygon is
maps are among the simplest, because they given a color, shading, or pattern corre-
map points, lines, or areas and provide nom- sponding to values for a mapped variable,
inal information (Figure 4-5, upper left). A for example, in Figure 4-5, top right, the
Figure 4-5: Common hardcopy map types depicting New England, in the northeastern United States.
Chapter 4: Maps and Data Entry 135
Figure 4-7: Coverage, relative distance, and detail change from larger-scale (top) to smaller-scale (bot-
tom) maps.
Chapter 4: Maps and Data Entry 137
Figure 4-8: A mapmaker chooses the materials and methods used to produce a map, and so imposes a
limit on spatial detail. Here, the choice of an input image with a 250 meter resolution (left) renders it
impossible to represent all the details of the real lake boundaries (right). In this example, features
smaller than approximately 250 meters on a side may not be faithfully represented on the map.
138 GIS Fundamentals
a
Figure 4-10: Examples of map generalization. Por-
tions are shown for three maps for an area in central
Minnesota. Excerpts from a large scale
(a, 1:24,000), intermediate scale (b, 1:62,500), and
small scale (c, 1:250,000) map are shown. Note
that the maps are not drawn at true scale to
facilitate comparison. The smaller-scale maps (b
and c) have been magnified more than a to better
show the effects of generalization. Each map has a
different level of map generalization. Generaliza-
tions increase with smaller-scale maps, and include
omissions of smaller lakes, successively greater
road width exaggerations, and increasingly general-
ized shorelines as one moves from maps a through
c.
b c
ried to the present. These errors are disap- Larger scale maps generally cover smaller
pearing as newer data are collected with areas. A 1:100,000-scale map that is 18
digital methods, but will be encountered and inches (47 centimeters) on a side spans
should be understood. approximately 28 miles (47 kilometers). A
Large-scale, high-quality maps gener- 1:24,000-scale map that is 18 inches on a
ally cover small areas. This is because of the side represents 9 miles (15 kilometers) on
trade-off between scale and area coverage, the Earth’s surface. Because spatial data in a
and because of limits on the practical size of GIS often span several large-scale maps,
a map. Cartometric maps larger than a meter these map boundaries may occur in a spatial
in any dimension have proven to be imprac- database. Problems often arise when adja-
tical for most organizations. Maps above this cent maps are entered into a spatial database
size are expensive and difficult to print, because features do not align or have mis-
store, or view. Thus, human ergonomics set a matched attributes across map boundaries.
practical limit on the physical size of a map. Differences in the time of data collection
The fixed maximum map dimension for adjacent map sheets may lead to incon-
when coupled with a fixed map scale defines sistencies across map borders. Landscape
the area coverage of the hardcopy map. change through time is a major source of dif-
140 GIS Fundamentals
ferences across map boundaries. For exam- problems are compounded when extensive
ple, the U.S. Geological Survey has checking and guidelines are not enforced
produced 1:24,000-scale map sheets for all across map sheet boundaries, especially
of the lower 48 United States of America. when adjacent areas are mapped at different
The original mapping took place over sev- times or by two different organizations.
eral decades, and there were inevitable time Finally, differences in coordinate regis-
lags between mapping some adjacent areas. tration can lead to spatial mismatch across
As much as two decades passed between map sheets. Registration, discussed later in
mapping or updating adjacent map sheets. this chapter, is the conversion of digitizer or
Thus, many features, such as roads, canals, other coordinate data to an earth-surface
or municipal boundaries, are discontinuous coordinate system. These registrations con-
or inconsistent across map sheets. tain unavoidable errors that translate into
Different interpreters may also cause spatial uncertainty. There may be mis-
differences across map boundaries. Large- matches when data from two separate regis-
area mapping projects typically employ sev- trations are joined along the edge of a map.
eral interpreters, each working on different Spatial data stored in a GIS are not
map sheets for a region. All professional, bound by the same constraints that limit the
large-area mapping efforts should have pro- physical dimensions of hardcopy maps. Dig-
tocols specifying the scale, sources, equip- ital storage enables the production of seam-
ment, methods, classification, keys, and less digital maps of large areas. However,
cross-correlation to ensure consistent map- the inconsistencies that exist on hardcopy
ping across map sheet boundaries. In spite of maps may be transferred to the digital data.
these efforts, however, some differences due Inconsistencies at map sheet edges need to
to human interpretation occur. Feature place- be identified and resolved when maps are
ment, category assignment, and generaliza- converted to digital formats.
tion vary among interpreters. These
ware allows the human operator to specify flexibility when digitizing, and add the cost
the type of feature to be recorded, the extent of printing.
and magnification of the image on screen, On-screen digitizing is often more accu-
the mode of digitizing, and other options to rate than manual digitizing because manual
control how data are input. The operator typ- map digitization is often limited by the
ically guides a cursor over points to be visual acuity and pointing ability of the oper-
recorded using a mouse, and depresses a but- ator. The pointing imprecision of the opera-
ton or sequence of buttons to collect the tor and digitizing systems translates to a
point coordinates. On-screen digitizing can fixed ground distance when manually digi-
be used for recording information from tizing a hardcopy map. For example, con-
scanned aerial photographs, digital photo- sider an operator that can reliably digitize a
graphs, satellite images, or other images. location to the nearest 0.4 millimeters (0.01
On-screen digitizing offers advantages inch) on a 1:20,000-scale map. Also assume
over hardcopy and scan-digitizing, methods the best hardcopy digitizing table available
that are described in the following sections. is being used, and we know the observed
Many data sources are inherently digital, for error is larger than the error in the map. The
example, image data collected from aerial 0.4 millimeter error in precision translates to
photographs and airborne or satellite scan- approximately 8 meters of error on the
ners.These data may be magnified on screen Earth’s surface. The precision cannot be
to any desired scale. Converting the image to appreciably improved when using a digitiz-
a paper or other hardcopy form would likely ing table, because a majority of the impreci-
introduce error through the slight deforma- sion is due to operator abilities. In contrast,
tion of the paper or printing media, reduce once the map is scanned, the image may be
displayed on a computer screen at any map
Figure 4-11: An example of on-screen digitizing. Images or maps are displayed on a computer screen and
feature data digitized manually. Buildings, roads, or any other features that may be distinguished on the
image may be digitized.
142 GIS Fundamentals
scale. The operator may zoom to a 1:5,000- button specifies the puck location relative to
scale or greater on-screen, and digitizing the digitizer coordinate system. Digitizing
accuracy and precision improved. While tables can be quite accurate, with a resolu-
other factors affect the accuracy of the tion of between 0.25 and 0.025 millimeters
derived spatial data (for example map plot- (0.01 and 0.001 inches).
ting or production errors, or scanner accu- While once a major method for captur-
racy), on-screen digitizing may be used to ing spatial data, hardcopy map digitizing is
limit operator-induced positional error when diminishing in importance as most paper
digitizing. On-screen digitizing also documents have been converted to digital
removes or reduces the need for digitizing forms. The tables are large, somewhat
tables or map scanners, the specialized expensive, and now little-used. However,
equipment used for capturing coordinates because data from hardcopy sources are
from maps. likely to persist for many decades, and there
are still many specialized documents to con-
Hardcopy Map Digitization vert, you should be familiar with the process.
Hardcopy digitizing is human-guided Not all maps are appropriate as a source
coordinate capture from a paper, plastic, or of information for GIS. The type of map,
other hardcopy map. An operator securely how it was produced, and the intended pur-
attaches a map to a digitizing surface and pose must be considered when interpreting
traces lines or points with an electrically sen- the information on maps. Only cartometric
sitized puck (Figure 4-12).The most com- maps should be directly digitized, and even
mon digitizers are based on a wire grid though cartometric, a map may not be suit-
embedded in or under a table. Depressing a able. Consider the dot-density map
described in Figure 4-2. Population is tized data. This scale may be the production
depicted by points, but the points are plotted scale for hardcopy maps, or the display scale
with random offsets or using some method for digital images or scanned maps. Table 4-
that does not reflect the exact location of the 1 illustrates the effects of map scale on data
population within each polygon. Before the quality. Errors of one millimeter (0.039
information in the dot density map is entered inches) on a 1:24,000-scale map correspond
into a GIS, the map should be interpreted to 24 meters (79 feet) on the surface of the
correctly. The number of dots in a polygon Earth. This same one millimeter error on a
should be counted, this number multiplied 1:1,000,000-scale map corresponds to 1000
by the population per dot, and the population meters (3281 feet) on the Earth’s surface.
value assigned to the entire polygon. Thus, small errors in map production or
Maps may be unsuitable for digitizing interpretation may cause significant posi-
due to the media. Most hardcopy maps are tional errors when scaled to distances on the
on paper because it is ubiquitous, inexpen- Earth, and these errors are greater for
sive, and easily printed. Creases, folds, and smaller-scale maps. Errors due to human
wrinkles can lead to non-uniform deforma- pointing ability are reduced for on-screen
tion of paper maps. digitizing, because the operator can zoom in
to larger scales as needed. However, this
does not overcome errors inherent in original
Characteristics of Manual Digitiz- images or scanned documents.
ing Both device precision and map scales
Manual digitizing, whether from a digi- should be considered when selecting a digi-
tal image on screen or from a hardcopy tizing tablet. Map scale and repeatability
source, is common because it provides suffi- both set an upper limit on the positional
ciently accurate data for many, if not most, quality of digitized data. The most precise
applications. Manual digitizing may be at digitizers may be required when attempting
least the accuracy of most maps or images, to meet a stringent error standard while digi-
so the equipment, if properly used, does not tizing small-scale maps.
add substantial error. Manual digitizing also
requires low equipment investment, often
just the software for image display and coor-
Table 4-1: The surface error caused by a
dinate capture. The human ability to inter- one millimeter (0.039 inch) map error
pret images or hardcopy maps in poor will change as map scale changes. Note
condition is a unique and important benefit the larger error at smaller map scales.
of manual digitizing. Humans are usually
better than machines at interpreting the Map Scale Error Error
information contained on faded, stained, or (m) (ft)
poor quality maps and images. Finally, man-
ual digitizing is often best because short 1:24,000 24 79
training periods are required, data quality
may be frequently evaluated, and digitizing 1:50,000 50 164
equipment is commonly available. For these
reasons manual digitization is likely to 1:62,500 63 205
remain an important data entry method for
some time to come. 1:100,000 100 328
There are a number of characteristics of
1:250,000 250 820
manual digitization that may negatively
affect the positional quality of spatial data. 1:1,000,000 1,000 3,281
As noted earlier, map or image scale and res-
olution impacts the spatial accuracy of digi-
144 GIS Fundamentals
automatically sampled at a fixed time or dis- may pause without creating a rat’s nest of
tance frequency, perhaps once each meter. line segments. The threshold must be chosen
Stream mode helps when large numbers of carefully - neither too large, missing useful
lines are digitized, because vertices may be detail, nor too small, in effect reverting back
sampled more quickly and the operator may to stream digitizing.
become less fatigued.
The stream sampling rate must be speci- Digitizing Errors, Node and Line
fied with care to avoid over- or under-sam- Snapping
pled lines. Too short a collection interval
results in redundant points not needed to Positional errors are inevitable when
accurately represent line or polygon shape. data are manually digitized. These errors
Too long a collection interval may result in may be “small” relative to the intended use
the loss of important spatial detail. In addi- of the data, for example the positional errors
tion, when using time-triggered stream digi- may be less than 2 meters when only 5 meter
tizing, the operator must remember to accuracy is required. However, these rela-
continuously move the digitizing puck; if the tively small errors may still prevent the gen-
operator rests the digitizing puck for a period eration of correct networks or polygons. For
longer than the sampling interval there will example, a data layer representing a river
be multiple points clustered together. These system may not be correct because major
will redundantly represent a portion of the tributaries may not connect. Polygon fea-
line and may result in overlapping segments. tures may not be correctly defined because
Pausing for an extended period of time often their boundaries may not completely close.
creates a “rat’s nest” of lines that must later These small errors must be removed or
be removed. avoided during digitizing. Figure 4-15
shows some common digitizing errors.
Minimum distance digitizing is a variant
of stream mode digitizing that avoids some Undershoots and overshoots are com-
of the problems inherent with time-sampled mon errors that occur when digitizing.
streaming. In minimum distance digitizing a Undershoots are nodes that do not quite
new point is not recorded unless it is more reach the line or another node, and over-
than some minimum threshold distance from shoots are lines that cross over existing
the previously digitized point. The operator nodes or lines (Figure 4-15). Undershoots
cause unconnected networks and unclosed is digitized. Line snapping forces a node to
polygons. Overshoots typically do not cause connect to a nearby line while digitizing, but
problems when defining polygons, but they only when the undershoot or overshoot is
may cause difficulties when defining and less than the snapping distance. Line snap-
analyzing line networks. ping requires the calculation of an intersec-
Node snapping and line snapping are tion point on an already existing line. The
used to reduce undershoots and overshoots snap process places a new node at the inter-
while digitizing. Snapping is a process of section point, and connects the digitized line
automatically setting nearby points to have to the existing line at the intersection point.
the same coordinates. Snapping relies on a This splits the existing line into two new
snap tolerance or snap distance. This dis- lines. When used properly, line and node
tance may be interpreted as a minimum dis- snapping reduce the number of undershoots
tance between features. Nodes or vertices and overshoots. Closed polygons or inter-
closer than this distance are moved to secting lines are easier to digitize accurately
occupy the same location (Figure 4-16). and efficiently when node and line snapping
Node snapping prevents a new node from are in force.
being placed within the snap distance of an The snap distance must be carefully
already existing node; instead, the new node selected for snapping to be effective. If the
is joined or “snapped” to the existing node. snap distance is too short, then snapping has
Remember that nodes are used to define the little impact. Consider a system where the
ending points of a line. By snapping two operator may digitize with better than 5
nodes together, we ensure a connection meter accuracy only 10% of the time. This
between digitized lines. means 90% of the digitized points will be
Line snapping may also be specified. more than 5 meters from the intended loca-
Line snapping inserts a node at a line cross- tion. If the snap tolerance is set to the equiv-
ing and clips the end when a small overshoot alent of 0.1 meters, then very few nodes will
Figure 4-16: Undershoots, overshoots, and snapping. Snapping may join nodes, or may place a node onto
a nearby line segment. Snapping does not occur if the nodes and/or lines are separated by more than the
snap tolerance.
Chapter 4: Maps and Data Entry 147
be within the snap tolerance, and snapping Data may also be digitized with too
has little effect. Another problem comes many vertices. High densities may occur
from setting the snap tolerance too large. If when data are manually digitized in stream
the snap tolerance in our previous example is mode, and the operator moves slowly rela-
set to 10 meters, and we want the data accu- tive to the time interval. High vertex densi-
rate to the nearest 5 meters, then we may ties may also be found when data are derived
lose significant spatial information that is from spline or smoothing functions that
contained in the hardcopy map. Lines less specify too high a point density. Finally,
than 10 meters apart cannot be digitized as automated scanning and then raster-to-vec-
separate objects. Many features may not be tor conversion may result in coordinate pairs
represented in the digital data layer. The spaced at absurdly high densities. Many of
snap distance should be smaller than the these coordinate data are redundant and may
desired positional accuracy, such that signif- be removed without sacrificing spatial accu-
icant detail contained in the digitized map is racy. Too many vertices may be a problem in
recorded. It is also important that the snap that they slow processing, although this has
distance is not below the capabilities of the become less important as computing power
system used for digitizing. Careful selection has increased. Point thinning algorithms
of the snap distance may reduce digitizing have been developed to reduce the number
errors and significantly reduce time required of points while maintaining the line shape.
for later editing. Many point thinning methods use a per-
pendicular “weed” distance, measured from
Reshaping: Line Smoothing and a spanning line, to identify redundant points
Thinning (Figure 4-18, top). The Lang method exem-
plifies this approach. A spanning line con-
Digitizing software may provide tools to nects two non-adjacent vertices in a line. A
smooth, densify, or thin points while enter-
ing data. One common technique uses spline
functions to smoothly interpolate curves
between digitized points and thereby both
smooth and densify the set of vertices used
to represent a line. A spline is set of polyno-
mial functions that join smoothly (Figure 4-
17). Polynomial functions are fit to succes-
sive sets of points along the vertices in a
line; for example, a function may be fit to
points 1 through 5, and a separate polyno-
mial function fit to points 5 through 11 (Fig-
ure 4-17). Constraints force these functions
to connect smoothly, usually by requiring
the first and second derivatives of the func-
tions to be continuous at the intersection
point. This means the lines have the same
slope at the intersection point, and the slope
is changing at the same rate for both lines at
the intersection point. Once the spline func-
tions are calculated they may be used to add
vertices. For example, several new vertices
may be automatically placed on the line Figure 4-17: Spline interpolation to smooth
between digitized vertices 8 and 9, leading digitized lines.
to the “smooth” curve shown in Figure 4-17.
148 GIS Fundamentals
Figure 4-18:The Lang algorithm is a common line-thinning method. In the Lang method vertices are
removed, or thinned, when they are within a weed distance to a spanning line (adapted from Weibel, 1997).
Chapter 4: Maps and Data Entry 149
Figure 4-20: Skeletonizing, a form of line thinning that is often applied after scan-digitizing.
most instances. A short distance for an based on ground surveys while another was
undershoot is subjectively defined, but typi- based on aerial photographs. Digitizing can
cally it is below the error inherent in the also compound the problem due to differ-
source map, or at least a distance that is ences in digitizing methods or operators.
insignificant when considering the intended There are several ways to remove this
use of the spatial data. “common feature” inconsistency. One
involves re-drafting the data from conflict-
Features Common to Several ing sources onto one base map. Inconsisten-
Layers cies are removed at the drafting stage. For
example, vegetation and roads data may
One common problem in digitizing show vegetation type boundaries at road
derives from representation of features that edges that are inconsistent with the road
occur on different maps or images. These locations. Both of these data layers may be
features rarely have identical locations on drafted onto the same base, and the common
each map or image, and often occur in dif- boundaries fixed by a single line. This line is
ferent locations when digitized into their digitized once, and used to specify the loca-
respective data layers (Figure 4-22). For tion of both the road and vegetation bound-
example, water boundaries on soil survey ary when digitizing. Re-drafting, although
maps rarely correspond exactly to water labor intensive and time consuming, forces a
boundaries found on USGS topographic resolution of inconsistent boundary loca-
maps. tions. Re-drafting also allows several maps
Features may appear differently on dif- to be combined into a single data layer.
ferent maps for many reasons. Perhaps the A second, often preferable method
maps were made for different purposes or at involves establishing a “master” boundary
different times. Features may differ because which is the highest accuracy composite of
the maps were from different source materi- the available data sets. A digital copy or
als, for example, one map may have been overlay operation establishes the common
features as a base in all the data layers, and
this base may be used as each new layer is
produced. For example, water boundaries
might be extracted from the soil survey and
USGS quad maps and these data combined
in a third data layer. The third data layer
would be edited to produce a composite,
high-quality water layer. The composite
water layer would then be copied back into
both the soils and USGS quad layers. This
second approach, while resulting in visually
consistent spatial data layers, is in many
instances only a cosmetic improvement of
the data. If there are large discrepancies
(“large” is defined relative to the required
spatial data accuracy), then the source of the
discrepancies should be identified and the
most accurate data used, or new, higher
Figure 4-22: Common features may be spa-
tially inconsistent in different spatial data lay- accuracy data collected from the field or
ers. original sources.
Chapter 4: Maps and Data Entry 153
Coordinate Transformation
Coordinate transformation is a common dinates are usually recorded in pixel, inch, or
operation in the development of spatial data centimeter units relative to an origin located
for GIS. A coordinate transformation brings near the lower left corner of the image. The
spatial data into an Earth-based map coordi- absolute values of the coordinates depend on
nate system so that each data layer aligns where the image happened to be placed on
with every other data layer. This alignment the table prior to scanning, but the relative
ensures features fall in their proper relative position of digitized points does not change.
position when digital data from different lay- Before these newly digitized data may be
ers are combined. Within the limits of data used with other data, these “inch-space” or
accuracy, a good transformation helps avoid “digitizer” coordinates must be transformed
inconsistent spatial relationships such as into an Earth-based map coordinate system.
farm fields on freeways, roads under water,
or cities in the middle of swamps, except
where these truly exist. Coordinate transfor- Control Points
mation is also referred to as registration, A set of control points is used to trans-
because it “registers” the layers to a map form the digitized data from the digitizer or
coordinate system. photo coordinate system to a map-projected
Coordinate transformation is most com- coordinate system. Control points are differ-
monly used to convert newly digitized data ent from other digitized features. When we
from the digitizer/scanner coordinate system digitize most points, lines, or areas, we do
to a standard map coordinate system (Figure not know the map projection coordinates for
4-23). The input coordinate system is usu- these features. We simply collect the digi-
ally based on the digitizer or scanner- tizer x and y coordinates that are established
assigned values. An image may be scanned with reference to some arbitrary origin on
and coordinates recorded as a cursor is the digitizing tablet or photo. Control points
moved across the image surface. These coor- differ from other digitized points in that we
Figure 4-23: Control points in a coordinate transformation. Control points are used to guide the trans-
formation of a source, input set of coordinates to a target, output set of coordinates. There are five con-
trol points in this example. Corresponding positions are shown in both coordinate systems.
154 GIS Fundamentals
know both the map projection coordinates depends on the mathematical form of the
and the digitizer coordinates for these points. transformation, but additional control points
These two sets of coordinates for each above the minimum number are usually col-
control point, one for the map projection and lected; this usually improves the quality and
one for the digitizer system, are used to esti- accuracy of the statistically-fit transforma-
mate the coefficients for transformation tion functions.
equations, usually through a statistical, least- The x, y (horizontal), and sometimes z
squares process. The transformation equa- (vertical or elevation) coordinates of control
tions are then used to convert coordinates points are known to a high degree of accu-
from the digitizer system to the map projec- racy and precision. Because high precision
tion system. and accuracy are subjectively defined, there
The transformation may be estimated in are many methods to determine control point
the initial digitizing steps, and applied as the locations. Sub-centimeter accuracy may be
coordinates are digitized from the map or required for control points used in property
image. This “on-the-fly” transformation boundary layers, while accuracies of a few
allows data to be output and analyzed with meters may be acceptable for large-area veg-
reference to map-projected coordinates. A etation mapping. Common sources of con-
previously registered data layer or image trol point coordinates are traditional transit
may be displayed on screen just prior to dig- and distance surveys, global positioning sys-
itizing a new map. Control points may then tem measurements, existing cartometric
be entered, the new map attached to the digi- quality maps, or existing digital data layers
tizing table, and the map registered. The new on which suitable features may be identified.
data may then be displayed on top of the pre-
viously registered data. This allows a quick The Affine Transformation
check on the location of the newly digitized
objects against corresponding objects in the The affine coordinate transformation
study area. employs linear equations to calculate map
coordinates. Map projection coordinates are
In contrast to on-the-fly transformations, often referred to as eastings (E) and north-
data can also be recorded in digitizer coordi- ings (N), and are related to the x and y digi-
nates and the transformation applied later. tizer coordinates by the equations:
All data are digitized, including the control
point locations. The digitizer coordinates of
the control point may then be matched to E = TE+a1x+a2y (4.1)
corresponding map projection coordinates,
and transformation equations estimated.
These transformation equations are then N = TN+b1x+b2y (4.2)
applied to convert all digitized data to map
projection coordinates.
Control points should meet or exceed Equations 4.1 and 4.2 allow us to move
several criteria. First, control points should from the arbitrary digitizer coordinate sys-
be from a source that provides the highest tem to the project map coordinate system.
feasible coordinate accuracy. Second, con- We know the x and y coordinates for every
trol point accuracy should be at least as good digitized point, line vertex, or polygon ver-
as the desired overall positional accuracy tex. We may calculate the E and N coordi-
required for the spatial data. Third, control nates by applying the above equations to
points should be as evenly distributed as every digitized point.
possible throughout the data area. A suffi-
TE and TN are translation changes
cient number of control points should be col-
between the coordinate systems, and can be
lected. The minimum number of points
thought of as shifts in the origins from one
Chapter 4: Maps and Data Entry 155
coordinate system to the next. The ai and bi minimizes the root mean square error,
parameters incorporate the change in scales RMSE. The RMSE is defined as:
and rotation angle between one coordinate
system and the next. The affine is the most
e1 + e2 + e3+ en
2 2 2 2
commonly applied coordinate transforma- RMSE = -------------------------------- (4.5)
tion because it provides for these three main n
effects of translation, rotation, and scaling,
and because it often introduces less error where the ei are the residual distances
than higher-order polynomial transforma- between the true E and N coordinates and
tions. the E and N coordinates in the output data
The affine system of equations has six layer:
parameters to be estimated, TE, TN, a1, a2, b1,
and b2. Each control point provides E, N, x, 2 2
and y coordinates, and allows us to write two e= xt–xd +yt–yd (4.6)
equations. For example, we may have a con-
trol point consisting of a precisely surveyed
center of a road intersection. This point has This residual is the difference between the
digitizer coordinates of x=103.0 centimeters true coordinates xt, yt, and the transformed
and y = -100.1 centimeters, and correspond- output coordinates xd, yd. Figure 4-24
ing Earth-based map projection coordinates shows examples of this lack of fit. Individual
of E = 500,083.4 and N = 4,903,683.5. We residuals may be observed at each control
may then write two equations based on this point location.
control point:
A statistical method for estimating
transformation equations is preferred
because it also identifies transformation
500,083.4=TE+a1(103.0)+a2(-100.1) (4.3) error. Control point coordinates contain
unavoidable measurement errors. A statisti-
cal process provides an RMSE, a summary
4,903,683.5=TN+b1(103.0)+b2(-100.1) (4.4) of the difference between the “true” (mea-
sured) and predicted control point coordi-
nates. It provides one index of
We cannot find a unique solution to transformation quality. Transformations are
these equations, because there are six
unknowns (TE, TN, a1, a2, b1, b2) and only
two equations. We need as many equations
as unknowns to solve a linear system of
equations. Each control point gives us two
equations, so we need a minimum of three
control points to estimate the parameters of
an affine transformation. Statistical estima-
tion requires a total of four control points.
As with all statistical estimates, more control
points are better than fewer, but we will
reach a point of diminishing returns after
some number of points, typically somewhere
between 18 and 30 control points.
The affine coordinate transformation is
usually fit using a statistical method that Figure 4-24: Examples of control points, pre-
dicted control locations, and residuals from coor-
dinate transformation.
156 GIS Fundamentals
fit (Figure 4-25). The RMSE will usually be and y directions. Note the symmetry in the
less than the true transformation error at a equations 4.7 and 4.8, in that the x and y
randomly selected point, because we are coefficients match across equations, and
actively minimizing the N and E residual there is a change in sign for the d coefficient.
errors when we statistically fit the transfor- This results in a system of equations with
mation equations. However, the RMSE is an only four unknown parameters, and so the
index of accuracy, and a lower RMSE gener- conformal may be estimated when only two
ally indicates a more accurate affine trans- control points are available.
formation. Higher-order polynomial transforma-
Estimating the coordinate transforma- tions are sometimes used to transform
tion parameters is often an iterative process. among coordinate systems. An example of a
Control points are rarely exact, and x and y 2nd-order polynomial is:
coordinates may not be precisely digitized.
Poor eyesight, a shaky hand, fatigue, lack of
2 2
attention, mis-identification of the control E = b1+b2x+b3y+b4x +b5y +b6xy (4.9)
location, or a blunder may result in errone-
ous x and y values. There may also be errors
in the E and N coordinates. Typically, con- Note that the combined powers of the x
trol points are entered, the affine transforma- and y variables may be up to 2. This allows
tion parameters estimated, and the overall for curvature in the transformation in both
RMSE and individual point E and N errors the x and y directions. A minimum of six
evaluated (Figure 4-24, Figure 4-25). Sus- control points is required to fit this 2nd-order
pect points are fixed, and the transformation polynomial transformation, and seven are
re-estimated and errors evaluated until a required when using a statistical fit. The esti-
final transformation is estimated. The trans- mated parameters TE, TN, a1, a2, b1, and b2
formation is then applied to all features to will be different in equations 4.1 and 4.2
convert them from digitizer to map coordi- when compared to 4.9, even if the same set
nates. of control points is used for both statistical
fits. We change the form of the equations by
Other Coordinate Transforma- including the higher-order squared and xy
cross-product terms, and all estimated
tions
parameters will vary.
Other coordinate transformations are
sometimes used. The conformal coordinate
transformation is similar to the affine, and A Caution When Evaluating
has the form: Transformations
Selecting the “best” coordinate transfor-
mation to apply is a subjective process,
E = TE + cx - dy (4.7) guided by multiple goals. We hope to
develop an accurate transformation based on
a large set of well-distributed control points.
N = TN + dx + cy (4.8) Isolated control points that substantially
improve our coverage may also contribute
substantially to our transformation error.
The coefficients TE, TN, c, and d are esti- There are no clear rules on the number
mated from control point data. Like the of points versus distribution of points trade-
affine transformation, the conformal trans- off, but it is typically best to strive for the
formation is also a first-order polynomial. widest distribution of points. We want at
Unlike the affine, the conformal transforma- least two control points in each quadrant of
tion requires equal scale changes in the x the working area, with a target of 20% in
Chapter 4: Maps and Data Entry 157
Figure 4-25: Iterative fitting of an affine transformation. Control points were examined after each fit, to
discover blunders in entry or poor matching of points. Control points with large residuals were exam-
ined to determine if the cause for the error may be identified. If so, the control point coordinates may be
modified, and transformation re-fit.
158 GIS Fundamentals
each quadrant. This is often not possible. as in Figure 4-25. The RMSE is not useful
This latter reason is less common with the when comparing among different model
development of GNSS. The transformation forms, for example, when comparing an
equation should be developed with the fol- affine to a 2nd-order polynomial. The RMSE
lowing observations in mind. is typically lower for a 2nd and other higher-
First, bad control points happen, but we order polynomials than an affine transforma-
should thoroughly justify the removal of any tion, but this does not mean the higher-order
control point. Every attempt should be made polynomial provides a more accurate trans-
to identify the source of the error, either in formation. The higher-order polynomial will
the collection or in the processing of field introduce more error than an affine transfor-
coordinates, the collection of image coordi- mation on most orthographic maps, and an
nates, or in some blunder in coordinate tran- affine transformation is preferred. High-
scription. A common error is the mis- order polynomials allow more flexibility in
identification of coordinate location on the warping the surface to fit the control points.
image or map, for example, when the control Unfortunately, this warping may signifi-
location is placed on the wrong side of a cantly deform the non-control-point coordi-
road. nates, and add large errors when the
transformation is applied to all data in a
Second, a lower RMSE does not mean a layer (Figure 4-26). Thus, high order poly-
better transformation. The RMSE is a useful nomials and others should be used with cau-
tool when comparing among transformations tion.
that have the same model form, for example,
when comparing one affine to another affine Finally, independent tests of the trans-
formations make the best comparisons
Figure 4-26: An illustration that RMSE should not be used to compare different order transfor-
mations, nor should it be used as the sole criterion for selecting the best transformation. Above
are portions of a transformed image that was registered to a road network. This area is interstitial
to 18 well distributed control points. Because the 3rd-order polynomial is quite flexible in fitting
the points and reducing RMSE, it distorts areas between the control points. This is shown by the
poor match between image and vector roads, above right. Although it has a higher RMSE, the
first order transformation on the left is better overall.
Chapter 4: Maps and Data Entry 159
Figure 4-28: Potential control points, indicated here by arrows, may be extracted from digital reference
images. Permanent, well-defined features are identified and coordinates determined from the digital image.
Note the white cross, circled in the lower right corner. This is a photogrammetric panel, typically a plastic
or painted wooden target placed prior to photo capture, and with precisely surveyed coordinates. These tar-
gets are used to create the corrected digital image with a known coordinate system, a process described in
Chapter 6.
ond, we must have precise ground coordi- Control Points from Existing
nates in our target map projection. The first Maps and Digital Data
requirement, visibility on the source map or
photograph, is often not met for survey- Registered digital image data are com-
defined control. Therefore, we must often mon sources of ground control points, par-
obtain additional control points. ticularly when natural resources or
municipal databases are to be developed for
One option for obtaining control points managing large areas. Digital images often
is to perform additional surveys that measure provide a richly detailed depiction of surface
the coordinates of features that are visible on features (Figure 4-28). Digital image data
the source materials. Precise surveys are may be obtained that are registered to a
used to establish the coordinate locations of known coordinate system. Typically, the
a well-distributed, sufficient set of points coordinates of a corner pixel are provided,
throughout the area covered by the source and the lines and columns for the image run
map. While sometimes expensive, new sur- parallel to the easting (E) and northing (N)
veys are the chosen method when the highest direction of the coordinate system. Because
accuracies are required. Costs were prohibi- the pixel dimensions are known, the calcula-
tive with traditional optical surveying meth- tion of a pixel coordinate involves multiply-
ods, however, GNSS positioning ing the row and column number by the pixel
technologies allow more frequent, custom size, and applying the corner offset, either by
collection of control points. addition or subtraction. In this manner, the
image row/column may be converted to an
Chapter 4: Maps and Data Entry 161
E, N coordinate pair, and control point coor- of control points at road intersections and
dinates determined. other distinct locations.
Existing maps are another common
source of control points. Point locations are GNSS Control Points
plotted and coordinates often printed on
maps, for example the corner location coor- The global positioning system (GPS),
dinates are printed on USGS quadrangle GLONASS, and Galileo are Global Naviga-
maps. Road intersections and other well- tion Satellite Systems (GNSS) that allow us
defined locations are often represented on to establish control points. GNSS, discussed
maps. If enough recognizable features can in detail in Chapter 5, can help us obtain the
be identified, then control points may be coordinates of control points that are visible
obtained from the maps. Control points on a map or image. GNSS are particularly
derived in this manner typically come only useful because we may quickly survey
from cartometric maps, those maps pro- widely-spaced points. GNSS positional
duced with the intent of giving an accurate, accuracy depends on the technology and
map-projected representation of features on methods employed; it typically ranges from
the Earth’s surface. sub-centimeter (tenths of inches) to a few
meters (tens of feet). Most points recently
Existing digital data may also provide added to the NGS and other government-
control points. A short description of these maintained networks were measured using
digital data sources are provided here, and GNSS technologies.
expanded descriptions of these and other
digital data are provided in Chapter 7. For To sum up: control points are necessary
example, the USGS has produced Digital for coordinate transformation, and typically
Raster Graphics (DRG) files that are a number of control points are identified for
scanned images of the 1:24,000-scale quad- a study area. The x and y coordinates for the
rangle maps. These DRGs come referenced control points are obtained from a digitized
to a standard coordinate system, so it is a map or image, and the map projection coor-
simple and straightforward task to extract dinates, E and N, are determined from sur-
the coordinates of road intersections or other vey, GNSS, or other sources (Figure 4-29).
well-defined features that have been plotted These coordinate pairs are then used with a
on the USGS quadrangle maps. Vector data set of transformation equations to convert
of roads are often widely available, and if of data layers into a desirable map coordinate
sufficient accuracy, may be used as a source system.
Figure 4-29: An example of control point locations from a road data layer, and corresponding digitizer
and map projection coordinates.
162 GIS Fundamentals
Raster Geometry and Resam- based averaging of the four nearest cells),
pling and cubic convolution (a weighted average
of the sixteen nearest cells, Figure 4-30).
Data often must be resampled when
converting between coordinate systems, or An example of a bilinear interpolation is
changing the cell size of a raster data set shown in Figure 4-31. This algorithm uses a
(Figure 4-30). Resampling involves reas- distance-weighted average of the four near-
signing the cell values when changing raster est cells in the input to calculate the value for
coordinates or geometry. Resampling is the output. The new output location is repre-
required when changing cell sizes because sented by the black post. Initially, the height,
the new cell centers will not align exactly or Zout value, of the output location is
with old cell centers. Changing coordinate unknown. Zout is calculated based on the dis-
systems may change the direction of the x tances between the output locations and the
and y axes, and GIS systems often require input locations. The distance in the x direc-
that the cell edges align with the coordinate tion is denoted in Figure 4-31 by d1, and the
system axes. Hence, the new cells often do distance in the y direction by d2. The values
not correspond to the same locations or in the input are shown as gray posts and are
extents as the old cells. labeled as Z1 through Z4. Intermediate
heights Zb and Zu are shown. These repre-
Common resampling approaches sent the average of the input values when
include the nearest neighbor (taking the out- taken in pairs in the x direction. These pairs
put layer value from the nearest input layer are, Z1 and Z2, to yield Zu, and Z3 and Z4, to
cell center), bilinear interpolation (distance- yield Zb. Zu and Zb are then averaged to cal-
Figure 4-30: Raster resampling. When the orientation or cell size of a raster data set is changed, out-
put cell values are calculated based on the closest (nearest neighbor), four nearest (bilinear interpola-
tion), or sixteen closest (cubic convolution) input cell values.
Chapter 4: Maps and Data Entry 163
Figure 4-31: The bilinear interpolation method uses a distance weighted average to assign the output
value, Zout, based on input values, Z1 through Z4.
culate Zout, using the distance d2 between Chapter 3, differs from a transformation in
the input and output locations to weight val- that it is an analytical, formula-based con-
ues at each input location. The cubic convo- version between coordinate systems, usu-
lution resampling calculation is similar, ally from a curved, latitude/longitude
except that more cells are used, and the coordinate system to a Cartesian coordinate
weighting is not an average based on linear system. No statistical fitting process is used
distance. with a map projection.
Map transformations should rarely be
Map Projection vs. Transforma- used in place of map projection equations
tion when converting geographic data between
map projections. Consider an example
Map transformations should not be when data are delivered to an organization
confused with map projections. A map in Universal Transverse Mercator (UTM)
transformation typically employs a statisti- coordinates and are to be converted to State
cally-fit linear equation to convert coordi- Plane coordinates prior to integration into a
nates from one Cartesian coordinate system GIS database. Two paths may be chosen.
to another. A map projection, described in The first involves projection from UTM to
164 GIS Fundamentals
geographic coordinates (latitude and longi- cific calculations have shown that the
tude), and then from these geographic spatial errors in using a transformation
coordinates to the appropriate State Plane instead of a projection are small at these
coordinates. This is the correct, most accu- map scales under typical digitizing condi-
rate approach. tions.
An alternate and often less-accurate This second approach, using a transfor-
approach involves using a transformation mation when a projection is called for,
to convert between different map projec- should not be used until it has been tested
tions. In this case a set of control points as appropriate for each new set of condi-
would be identified and the coordinates tions. Each map projection distorts the sur-
determined in both UTM and State Plane face geometry. These distortions are
coordinate systems. The transformation complex and nonlinear. Affine or polyno-
coefficients would be estimated and these mial transformations are unlikely to
equations applied to all data in the UTM remove this non-linear distortion. Excep-
data layer. This new output data layer tions to this rule occur when the area being
would be in State Plane coordinates. This transformed is small, particularly when the
transformation process should be avoided, projection distortion is small relative to the
as a transformation may introduce addi- random uncertainties, transformation
tional positional error. errors, or errors in the spatial data. How-
Transforming between projections is ever, there are no guidelines on what con-
used quite often, inadvertently, when digi- stitutes a sufficiently “small” area. In our
tizing data from paper maps. For example, example above, USGS 1:24,000 maps are
USGS 1:24,000-scale maps are cast on a often digitized directly into a UTM coordi-
polyconic projection. If these maps are dig- nate system with no obvious ill effects,
itized, it would be preferable to register because the errors in map production and
them to the appropriate polyconic projec- digitizing are often much larger than those
tion, and then re-project these data to the in the projection distortion for the map
desired end projection. This is often not area. However, you should not infer this
done, because the error in ignoring the pro- practice is appropriate under all conditions,
jection over the size of the mapped area is particularly when working with smaller-
typically less than the positional error asso- scale maps.
ciated with digitizing. Experience and spe-
devoted to the science and art of cartogra- media sizes, such as the page or screen size
phy. Our aim in the next few pages is to pro- possible for a document. As noted earlier,
vide a brief overview of cartography with a the map scale is the ratio of lengths on a map
particular focus on map design. This is both to true lengths. If we wish to display an area
to acquaint new students with the most basic that spans 25 kilometers (25,000 meters) on
concepts of cartography, and help them a screen that spans 25 centimeters (0.25
apply these concepts in the consumption and meters), the map scale will be near 0.25 to
production of spatial information. Readers 25,000, or 1:100,000. This decision on size,
interested in a more complete treatment area, and scale then drives further map
should consult the references listed at the design. For example, scale limits the features
end of this chapter. we may display, and the size, number, and
A primary purpose of cartography is to labeling of features. At a 1:100,000 scale we
communicate spatial information. This may not be able to show all cities, burgs, and
requires identification of the towns, as there may be too many to fit at a
readable size.
-intended audience,
Maps typically have a primary theme or
-information to communicate, purpose that is determined by the intended
-area of interest, audience. Is the map for a general popula-
-physical and resource limitations, tion, or for a target audience with specific
expectations for map features and design?
in short, the whom, what, where, and how General purpose maps typically have a wide
we may present our information. range of features represented, including
These considerations drive the major transportation networks, towns, elevation or
cartographic design decisions we make each other common features (Figure 4-32a). Spe-
time we produce a map. We must consider cial purpose maps, such as road maps, focus
the: on a more limited set of features, in this
-scale, size, shape, and other general instance road locations and names, town
names, and large geographic features (Figure
map properties,
4-32b).
-data to plot,
Once the features to include on a map
-symbol shapes, sizes, or patterns, are defined, we must choose the symbols
-labeling, including type font and size, used to draw them. Symbology depends in
-legend properties, size, and borders, part on the type of feature. For example, we
have a different set of options when repre-
and
senting continuous features such as elevation
-the placement of all these elements on or pollution concentration than when repre-
a map. senting discrete features. We also must
Map scale, size, and shape depend pri- choose among symbols for each of the types
marily on the intended map use. Wall maps of discrete features, for example, the set of
for viewing at a distance of a meter or more symbols for points are generally different
may have few, large, boldly colored features. from those for line or area features.
In contrast, commonly produced street maps Symbol size is an important attribute of
for navigation in metropolitan areas are map symbology, often specified in a unit
detailed, to be viewed at short ranges, and called a point. One point is approximately
have a rich set of additional tables, lists, or equal to 0.467 mm, or about 1/72 of an inch.
other features. A specific point number is most often used
Map scale is often determined in part by to specify the size of symbols, for example,
the size of the primary objects we wish to the dimensions of small squares to represent
display, and in part by the most appropriate houses on a map, or the characteristics of a
specific pattern used to fill areas on a map. A
166 GIS Fundamentals
line width may also be specified in points. background. Although size limits depend
Setting a line width of two points means we largely on background color and contrast,
want that particular line plotted with a width point features are typically not resolvable at
of 0.93 mm. It is unfortunate that “point” is sizes smaller than about one half a point, and
both the name of the distance unit and a gen- distinguishing between shapes is difficult for
eral property of a geographic feature, as in point features smaller than approximately
“a tree is a point feature.” This forces us to two points in their largest dimension.
talk about the “point size” of symbols to rep- The pattern and color of symbols must
resent points, lines, or area fills or patterns, also be chosen, generally from a set pro-
but if we are careful, we may communicate vided by the software (Figure 4-33). Sym-
these specifications clearly. bols generally distinguish among feature
The best size, pattern, shape, and color type by characteristics, and although most
used to symbolize each feature depends on symbols are not associated with a feature
the viewing distance, the number, density, type, some are, such as, plane outlines for
and type of features, and the purpose of the airports, numbered shields for highways, or
map. Generally, we use larger, bolder, or a hatched line for a railroad.
thicker symbols for maps to be viewed from We also must often choose whether and
longer distances, while we reduce this limit how to label features. Most GIS software
when producing maps for viewing at 50 cm provides a range of tools for creating and
(18 inches). Most people with normal vision placing labels, and in all cases we must
under good lighting may resolve lines down choose the label font type and size, location
to near 0.2 points at close distances, pro- relative to the feature, and orientation. Pri-
vided the lines show good contrast with the mary considerations when labeling point
Figure 4-32: Example of a) a detailed, general-purpose map, here a portion of a US Geological Survey
map, and b) a specialized map focusing a specific set of selected features, here showing roads. The fea-
tures chosen for depiction on the map depend on the intended map use.
Chapter 4: Maps and Data Entry 167
angled for clarity. Cities near the coast show legend elements and symbols which may be
both, to avoid labels crossing the water/land used. Typically these tools allow a wide
boundary where practical. Semi-transparent range of symbolizations, and a compact way
background shading is added for Parainen of describing the symbolization in a legend
and Hanko, cities placed in the island matrix. (Figure 4-36).
This example demonstrates the individual The specific layout of legend features
editing often required when placing labels. must be defined, for example the point fea-
Most maps should have legends. The ture symbol size may be graduated based on
legend identifies map features succinctly and some attribute for the points. Successively
describes the symbols used to depict those larger features may be assigned for succes-
features. Legends often include or are sively larger cities. This must be noted in the
grouped with additional map information legend, and the symbols nested, shown
such as scale bars, north arrows, and descrip- sequentially, or otherwise depicted (Figure
tive text. The cartographer must choose the 4-36, top left).
size and shape of the descriptive symbol, The legend should be exhaustive. Exam-
and the font type, size, and orientation for ples of each different symbol type that
each symbol in the legend. The primary goal appears on the map should appear in the leg-
is to have a clear, concise, and complete leg- end. This means each point, line, or area
end. symbol is drawn in the legend with some
The kind of symbols appropriate for descriptive label. Labels may be next to,
map legends depends on the types of fea- wrapped around, or embedded within the
tures depicted. Different choices are avail- features, and sometimes descriptive numbers
able for point, line, and polygon features, or are added, for example, a range of continu-
for continuously variable features stored as ous variables (Figure 4-36, upper left). Scale
rasters. Most software provides a range of bars, north arrows, and descriptive text
boxes are typically included in the legend.
Map composition or layout is another
primary task. Composition consists of deter-
mining the map elements, their size, and
their placement. Typical map elements
shown in Figure 4-3 and Figure 4-4, include
one or more main data panes or areas, a leg-
end, a title, a scale bar and north arrow, a
grid or graticule, and perhaps descriptive
text. These each must be sized and placed on
the map.
These map elements should be posi-
tioned and sized in accordance with their
importance. The map’s most important data
pane should be largest, and it is often cen-
tered or otherwise given visual dominance.
Other elements are typically smaller and
located around the periphery or embedded
within the main data pane. These other ele-
ments include map insets, which are smaller
data panes that show larger or smaller scale
Figure 4-35: Example label placement for cities views of a region in the primary data pane.
in southern Finland.
Good map compositions usually group
related elements and uses empty space effec-
Chapter 4: Maps and Data Entry 169
tively. Data panes are often grouped and leg- the map shown at the top of Figure 4-37
end elements placed near each other, and leaves large empty spaces on the left (west-
grouping is often indicated with enclosing ern) edge, with the Atlantic Ocean devoid of
boxes. features. The cartographer may address this
Neophyte cartographers should avoid in several ways, either by changing the size,
two tendencies in map composition, both shape, or extent of the area mapped, adding
depicted in Figure 4-37. First, it is generally new features, such as data panes as insets,
easy to create a map with automatic label additional text boxes, or other elements, or
and legend generation and placement. The moving the legend or other map elements to
map shown at the top of Figure 4-37 is typi- that space. The map shown at the bottom of
cal of this automatic composition, and Figure 4-37, while not perfect, fixes these
includes poorly placed legend elements and design flaws, in part by moving the legend
too small, poorly placed labels. Labels and scale bar, and in part by adding labels
crowd each other, are ambiguous, cross for the Atlantic Ocean and Mediterranean
water/land or other feature boundaries, and Sea. The empty space is more balanced in
fonts are poorly chosen. You should note that it appears around the major map ele-
that automatic map symbol selection and ments in approximately equal proportions.
placement is nearly always sub-optimal, and As noted earlier, this is only a brief
the novice cartographer should scrutinize introduction to cartography, a subject cov-
these choices and manually improve them. ered by many good books, some listed at the
The second common error is poor use of end of this chapter. Perhaps the best com-
empty space, those parts of the map without pendium of examples is the Map Book
map elements. There are two opposite ten- Series, by ESRI, published annually since
dencies: either to leave too much or unbal- 1984. Examples are available at the time of
anced empty space, or to clutter the map in this writing at www.esri.com/mapmuseum.
an attempt to fill all empty space. Note that You should leaf through several volumes in
Figure 4-36: Examples of legend elements and representation of symbols. Some symbols may be
grouped in a compact way to communicate the values associated with each symbol, e.g., sequential or
nested graduated circles to represent city population size, area pattern or color fills to distinguish
among different polygon features, line and point symbols, and informative elements such as scale bars
and north arrows.
170 GIS Fundamentals
Figure 4-37: An example of poor map design (top). This top panel shows a number of mistakes common
for the neophyte cartographer, including small labels (cities) and mismatched fonts (graticule labels,
title), poor labeling (city labels overlapping, ambiguously placed, and crossing distinctly shaded areas),
unlabeled features (oceans and seas), poorly placed scale bar and legend, and unbalanced open space on
the left side of the map. These problems are not present in the improved map design, shown in the lower
panel.
Chapter 4: Maps and Data Entry 171
this series, with an eye towards critical map tor, raster, point, and computer aided design
design. Each volume contains many beauti- data are stored for transfer. Digital data in
ful and informative maps, and provides tech- SDTS formats typically span multiple data
niques worth emulating. files, each holding various data components.
At the time of this writing the U.S. Geologi-
cal Survey was in charge maintaining the
Digital Data Output SDTS, with the full specification found at
We often must transfer the digital data mcmcweb.er.usgs.gov/sdts/standard.html.
we create to another organization or user. There are many legacy digital data
Given the number of different GIS software, transfer formats that were widely used
operating systems, and computer types, before the publication of the SDTS. Among
transferring data is not always a straightfor- these are several US Geological Survey for-
ward process. Digital data output typically mats for the transfer of digital elevation
includes two components, the data them- models or digital vector data, or software
selves in some standard, defined format, and specific formats, such as an ASCII format
metadata, or data about the digital data. We known as the GEN/UNGEN format that was
will describe data formats and metadata in developed by ESRI. These were useful for a
turn. limited set of transfers, but shortcomings in
Digital data are the data in some elec- each of these transfer formats led to the
tronic form. As described at the end of the development of the SDTS. They will not be
first chapter, there are many file formats, or discussed further here.
ways of encoding the spatial and attribute
data in digital files. Digital data output often
Metadata: Data Documentation
consist of recording or converting data into
one of these file formats. These data are typ- Metadata are information about spatial
ically converted with a utility, tool, or option data. Metadata describe the content, source,
available in the data development software lineage, methods, developer, coordinate sys-
(Figure 4-38). The most useful of these utili-
ties support a broad range of input and out-
put options, each fully described in the
program documentation.
All formats strive for complete data
transfer without loss. They must transmit the
spatial and attribute data, the metadata, and
all other information necessary to effectively
use the spatial data. There are many digital
data output formats, although many are leg-
acy formats that are used with decreasing
frequency.
A common contemporary format is the
Spatial Data Transfer Standard. This trans-
fer format, also known by the abbreviation
SDTS, is a specification first defined by the
U.S. Government in 1992. This standard has
three basic parts: 1) a logical specification,
2) a description of the types of spatial fea-
tures supported, and 3) the International Figure 4-38: An example of a conversion util-
Standards Organization (ISO) encoding ity, here from the ESRI ArcGIS software. Date
used. There are four additional parts which may be converted from one of several formats
define profiles, or descriptions of how vec- to an ESRI-specific digital data.
172 GIS Fundamentals
tem, extent, structure, spatial accuracy, attri- 1.5.1.1West Bounding Coordinate – west-
butes, and responsible organization for ern-most coordinate of the limit of cover-
spatial data. age expressed in longitude.
Metadata are required for the effective Type: real
use of spatial data. Metadata allow the effi- Domain: -180.0 < = West Bounding Coor-
cient transfer of information about data, and dinate < 180.0
inform new users about the geographic
extent, coordinate system, quality, and other Short Name: westbc
data characteristics. Metadata aid organiza- The numbering system is hierarchical.
tions in evaluating data to determine if they Here, 1 indicates it is basic identification
are suitable for an intended use -- are they information, 1.5 indicates identification
accurate enough, do they cover the area of information about the spatial domain, 1.5.1
interest, do they provide the necessary infor- is for bounding coordinates, and 1.5.1.1 is
mation? Metadata may also aid in data the western-most bounding coordinate.
updates by guiding the choice of appropriate
collection methods and formats for new There are 10 basic types of information
data. in the CSDGM:
1) identification, describing the data set,
Most governments have or are in the
process of establishing standard methods for 2) data quality,
reporting metadata. In the United States, the 3) spatial data organization,
Federal Geographic Data Committee 4) spatial reference coordinate system,
(FGDC) has defined a Content Standard for
5) entity and attribute,
Digital Geospatial Metadata (CSDGM) to
specify the content and format for metadata. 6) distribution and options for obtaining the
The CSDGM ensures that spatial data are data set,
clearly described so that they may be used 7) currency of metadata and responsible party,
effectively within an organization. The use 8) citation,
of the CSDGM also ensures that data may be 9) time period information, used with other sec-
described to other organizations in a stan- tions to provide temporal information, and
dard manner, and that spatial data may be
more easily evaluated by and transferred to 10) contact organization or person.
other organizations. The CSDGM is a content standard and
The CSDGM consists of a standard set does not specify the format of the metadata.
of elements that are presented in a specified As long as the elements are included, prop-
order. The standard is exhaustive in the erly numbered, and identified with correct
information it provides, and is flexible in values describing the data set, the metadata
that it may be extended to include new ele- are considered to conform with the CSDGM.
ments for new categories of information in Indentation and spacing are not specified.
the future. There are over 330 different ele- However, because metadata may be quite
ments in the CSDGM. Some of these ele- complex, there are a number of conventions
ments contain information about the spatial that are emerging in the presentation of
data, and some elements describe or provide metadata. These conventions seek to ensure
linkages to other elements. Elements have that metadata are presented in a clear, logical
standardized long and short names and are way to humans, and are also easily ingested
provided in a standard order with a hierar- by computer software. There is a Standard
chical numbering system. For example, the Generalized Markup Language (SGML) for
western-most bounding coordinate of a data the exchange of metadata. An example of a
set is element 1.5.1.1, defined as follows: portion of the metadata for a 1:100,000 scale
digital line graph data set is shown in Figure
4-39.
Chapter 4: Maps and Data Entry 173
Figure 4-39: Example of a small portion of the FGDC recommended metadata for a 1:100,000 scale
derived digital data set.
174 GIS Fundamentals
Metadata are most often created using information and services. It provides infor-
specialized software tools. Although meta- mation about the identification, the extent,
data may be produced using a text editor, the the quality, the spatial and temporal schema,
numbering system, names, and other con- spatial reference, and distribution of digital
ventions are laborious to type. There are geographic data”.
often complex linkages between metadata There is a need to reconcile international
elements, and some elements are repeated or and national metadata standards, because
redundant. Software tools may ease the task they may differ. National standards may
of metadata entry by reducing redundant require information not contained in interna-
entries, ensuring correct linkages, and tional standards, or vice versa. Governments
checking elements for contradictory infor- typically create metadata profiles that are
mation or errors. For example the metadata consistent with the international standard.
entry tool may check to make sure the west- These profiles establish the correspondence
ern-most boundary is west of the eastern- between elements in the different standards,
most boundary. Metadata are most easily and identify elements of the international
and effectively produced when their devel- profile that are not in the national profile.
opment is integrated into the workflow of
data production.
Although not all organizations in the Summary
United States adhere to the CSDGM meta- Spatial data entry is a common activity
data standard, most organizations record and for many GIS users. Although data may be
organize a description and other important derived from several sources, maps are a
information about their data, and many orga- common source, and care must be taken to
nizations consider a data set incomplete if it choose appropriate map types and to inter-
lacks metadata. All U.S. government units pret the maps correctly when converting
are required to adhere to the CSDGM when them to spatial data in a GIS.
documenting and distributing spatial data.
Maps are used for spatial data entry
Many other national governments are due to several unique characteristics. These
developing metadata standards. One exam- include our long history of hardcopy map
ple is the spatial metadata standard devel- production, so centuries of spatial informa-
oped by the Australia and New Zealand tion are stored there. In addition, maps are
Land Information Council (ANZLIC), inexpensive, widely available, and easy to
known as the ANZLIC Metadata Guidelines. convert to digital forms, although the pro-
ANZLIC is a group of government, busi- cess is often time consuming, and may be
ness, and academic representatives working costly. Maps are usually converted to digi-
to develop spatial data standards. The tal data through a manual digitization pro-
ANZLIC metadata guidelines define the cess, whereby a human analyst traces and
core elements of metadata, and describe how records the location of important features.
to write, store, and disseminate these core Maps may also be digitized via a scanning
elements. Data entry tools, examples, and device.
spatial data directory have been developed to
The quality of data derived from a map
assist in the use of ANZLIC spatial metadata
depends on the type and size of the map,
guidelines.
how the map was produced, the map scale,
There is a parallel effort to develop and and the methods used for digitizing. Large-
maintain international standards for meta- scale maps generally provide more accu-
data. The standards are known as the ISO rate positional data than comparable small-
19115 International Standards for Metadata. scale maps. Large-scale maps often have
According to the International Standards less map generalization, and small horizon-
Organization, the ISO 19115 “defines the tal errors in plotting, printing, and digitiz-
schema required for describing geographic
Chapter 4: Maps and Data Entry 175
ing are magnified less during conversion of consider data complete until metadata have
large-scale maps. been created.
Snapping, smoothing, vertex thinning,
and other tools may be used to improve the
quality and utility of digitized data. These
methods are used to ensure positional data
are captured efficiently and at the proper
level of detail.
Map and other data often need to be
converted to a target coordinate system via
a map transformation. Transformations are
different from map projections, which were
discussed in Chapter 3, in that a transfor-
mation uses an empirical, least-squares
process to convert coordinates from one
Cartesian systems to another. Transforma-
tions are often used when registering digi-
tized data to a known coordinate system.
Map transformations should not be used
when a map projection is called for.
Cartography is an important aspect of
GIS, because we often communicate spa-
tial information through maps. Map design
depends on both the target audience and
purpose, setting and modes of map view-
ing, and available resources. Proper map
design considers the scale, symbols, labels,
legend, and placement to effectively com-
municate the desired information.
Metadata are the “data about data.”
They describe the content, origin, form,
coordinate system, spatial and attribute
data characteristics, and other relevant
information about spatial data. Metadata
facilitate the proper use, maintenance, and
transfer of spatial data. Metadata standards
have been developed, both nationally and
internationally, with profiles used to cross-
reference elements between metadata stan-
dards. Metadata are a key component of
spatial data, and many organizations do not
176 GIS Fundamentals
Suggested Reading
Bolstad, P., Gessler, P., & Lillesand, T.M. (1990). Positional uncertainty in manually
digitized map data. International Journal of Geographical Information Systems,
4:399-412.
Burrough, P.A., & Frank, A.U. (1996). Geographical Objects with Indeterminate
Boundaries.Taylor & Francis: London.
Chrisman, N.R. (1984). The role of quality information in the long-term functioning
of a geographic information system. Cartographica, 21:79-87.
Douglas, D.H. & Peuker, T.K. (1973). Algorithms for the reduction of the number of
points required to represent a digitized line or its caricature. Canadian Cartogra-
pher, 10:112-122.
Gesch, D., Oimoen, M., Greenlee, S., Nelson, C,. Steuck, M., & Tyler C., (2002). The
National Elevation Dataset. Photogrammetric Engineering and Remote Sensing,
68:5-32.
Holroyd, F. & Bell, S.B.M. (1992). Raster GIS: Models of raster encoding. Computers
and Geosciences, 18:419-426
Joao, E. M. (1998). Causes and Consequences of Map Generalization. Taylor & Fran-
cis: London.
Maquire, D. J., Goodchild, M. F., & Rhind, D. (Eds.). (1991). Geographical Informa-
tion Systems: Principles and Applications.Longman Scientific: Harlow.
McBratney, A.B., Santos, M.L.M., & Minasny, B. (2003). On digital soil mapping.
Geoderma, 117:3-52.
Muehrcke, P.C. & Muehrcke, J.P. (1992). Map Use: Reading, Analysis, and Interpre-
tation (3rd ed.). J.P. Publications: Madison.
Nagy, G. & Wagle, S.G. (1979). Approximation of polygonal maps by cellular maps.
Communications of the Association of Computational Machinery, 22:518-525.
Chapter 4: Maps and Data Entry 177
Peuquet, D.J. (1984). A conceptual framework and comparison of spatial data models,
Cartographica, 21:66-113.
Shaeffer, C.A., Samet, H., & Nelson R.C. (1990). QUILT: a geographic information
system based on quadtrees, International Journal of Geographical Information
Systems, 4:103-132.
Shea, K.S., & McMaster, R.B. (1989). Cartographic generalization in a digital envi-
ronment: when and how to generalize. Proceedings AutoCarto 9, pp.56-67.
Warner, W. & Carson, W. (1991). Errors associated with a standard digitizing tablet.
ITC Journal, 2:82-85.
Zeiler, M. (1999). Modeling Our World: The ESRI Guide to Geodatabase Design.
ESRI Press: Redlands.
178 GIS Fundamentals
Study Questions
4.1 - Why have so many digital spatial data been derived from hardcopy maps?
e) 1 mm to 1 kilometer, or f) 1:1,500,000
4.4 - Identify the kind of generalization at the labeled locations a through d in the map
below, left, compared to the “truth” in the image, below right. Categorize the general-
izations as fused, simplified, displaced, omitted, or exaggerated.
Chapter 4: Maps and Data Entry 179
4.5 - Identify the kind of generalization at the labeled locations a through d in the map
below, left, compared to the “truth” in the image, below right. Categorize the general-
izations as fused, simplified, displaced, omitted, or exaggerated, or if it doesn’t fit in
one of these categories, then categorize it as “other,” and describe the generalization.
4.7 - Is
media deformation more problematic with large scale maps or small scale
maps? Why?
4.8 - Which
map typically shows more detail -- a large-scale map or a small-scale
map? Can you give three reasons why?
4.9 - Complete the following table that shows scale measurements and calculations:
180 GIS Fundamentals
4.10 - What is snapping in the context of digitizing? What are undershoots and over-
shoots, and why are they undesirable?
4.11 - Identify a characteristic feature or error in digitizing at each of the labeled letter
locations in the drawing below, e.g., node, overshoot, missing label, etc.:
4.12 - Identify a characteristic feature or error in digitizing at each of the labeled letter
locations in the drawing below, e.g., node, overshoot, missing label, etc.:
Chapter 4: Maps and Data Entry 181
4.13 - Sketch the results of combined node (open circle), vertex (closed circle) and
edge (lines) snapping with a snap tolerance of a) a distance of 5 units, and b) a dis-
tance of 10 units, as shown snap circles. Note the radius and not the diameter of these
circles defines the snapping distance.
4.14 - What is a spline, and how are they used during digitizing?
4.16 - Can you contrast manual digitizing to the various forms of scan digitizing?
What are the advantages and disadvantages of each?
4.17 - What is the “common feature problem” when digitizing, and how might it be
overcome?
4.18 - Can you describe the general goal and process of map registration?
4.19 - What are control points, and where do they come from?
4.20 - Can you define an affine transformation, including the form of the equation?
Why is it called a linear transformation?
182 GIS Fundamentals
4.21 - What is the root mean square error (RMSE), and how does it relate to a coordi-
nate transformation?
4.22 - Is
the average positional error likely be larger, smaller, or about equal to the
RMSE? Why?
4.23 - Why are higher order (polynomial) projections to be avoided under most cir-
cumstances?
4.24 - Which of the following transformations will likely have the smallest average
error at a set of independent test points?
a) affine, RMSE = 10.23 b) affine, RMSE = 9.8
c) 2nd order polynomial, RMSE = 4.7 d) 3rd order polynomial, RMSE = 0.45
4.25 - Which of the following transformations will likely have the smallest average
error at a set of independent test points?
a) 1st order polynomial, RMSE = 5.3 b) affine, RMSE = 9.8
c) 2nd order polynomial, RMSE = 2.9 d) 1st order polynomial, RMSE = 9.9