Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
28 views52 pages

Chapter4 SM

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 52

131

4 Maps, Data Entry, Editing, and


Output
Building a GIS Database
Introduction Most spatial data sources may be catego-
rized as either hardcopy or digital. Hardcopy
Spatial data entry and editing are fre- forms are any drawn, written, or printed doc-
quent activities for many GIS users. A large uments, including hand-drawn maps, manu-
number of coordinates is needed to represent ally measured survey data, legal records, and
features in a GIS, and each coordinate value coordinate lists with associated tabular data.
must be entered into the GIS database. This Most historical spatial data were recorded on
is often painstakingly slow, even with auto- maps (Figure 4-1), and although not all maps
mated techniques, and spatial data entry and are suitable for conversion to digital formats,
editing take significant time for most organi- many maps are. Much data were created
zations. from hardcopy sources in the early years of
GIS via digitizing, the process of collecting

Figure 4-1: Maps have served to store geographic knowledge for at least the past 4000 years. This early map
of northern Europe shows approximate shapes and relative locations.
132 GIS Fundamentals

tems, and for commerce and advertising.


These maps are flexible, easily customized,
inexpensively distributed, and often
dynamic.
Most maps, whether digital or hardcopy,
contain several components (Figure 4-3). A
data area or pane occupies the largest part of
the map, and contains most of the depicted
spatial data. A neatline is often included to
provide a frame around all map elements,
and insets may contain additional map ele-
ments. Scalebars, legends, titles, and other
graphic elements such as a north arrow are
often included. All maps have a map scale,
defined as the ratio of the distance on the
map to corresponding distance on the
Figure 4-2: An example of commonly produced ground. (Figure 4-3).
digital maps.
Maps often depict coordinate lines (Fig-
digital coordinates. Digitizing is a common ure 4-4). When the lines represent constant
data entry method today, although primarily latitude and longitude, a set of coordinate
from satellite and aerial images. lines is called a graticule (Figure 4-4a).
Digital maps, an electronic, graphic These lines may appear curved, depending
depiction of spatial data, are by far the most on the map scale, the map coordinate sys-
common map form today (Figure 4-2). Mil- tem, and the location of the area on the
lions of electronic maps are generated each Earth’s surface. Maps may also depict a grid
hour, composed on demand in response to consisting of lines of constant coordinates.
web queries, on automobile navigations sys- Grid lines are typically drawn in both the x

Figure 4-3: An example of a map and its components.


Chapter 4: Maps and Data Entry 133

Figure 4-4: Maps often depict lines representing (a) a graticule of constant latitude and longitude or (b) a
grid of constant x and y coordinates.

and y directions, and appear straight on most include complete raster and vector data lay-
maps (Figure 4-4b). Graticules and grids are ers, text files, lists of coordinates, and digital
useful because they provide a reference images. Files and export formats can be used
against which location may be quickly esti- to transfer them to a local GIS system.
mated. Graticules are particularly useful for Global Navigation Satellite Systems
depicting the distortion inherent in a map (GNSS), such as the U.S. Global Positioning
projection, because they show how geo- System (GPS) and coordinate survey devices
graphic north or east lines are deformed, and described in Chapter 5, are direct measure-
how this distortion varies across the map. ment system that can be used to record coor-
Grids may establish a map-projected north, dinates in the field and report them directly
in contrast to geographic north, and may be into digital formats. Finally, a number of
useful when trying to navigate or locate a digital image sources are available, such as
position on the map. satellite or airborne images that are collected
Historical and current images are valu- in a digital raster format, or hardcopy aerial
able sources of geographic data, and photographs that have been scanned to pro-
although they are not maps, the line is duce digital images.
becoming blurred, as aerial and satellite pho- Hardcopy data are an important source
tographs become common backdrops for of geographic information for many reasons.
digital maps. Photographs do not typically First, most geographic information produced
provide an orthographic (flat, undistorted) before 1980 was recorded in hardcopy form.
view, and houses, rivers, or features of inter- Advances in optics, metallurgy, and industry
est are not explicitly identified. However, during the 18th and 19th centuries allowed
images are a rich source of geographic infor- the mass production of precise surveying
mation, and standard techniques may be devices, and by the mid 20th century, much
used to remove major systematic distortions of the world had been plotted on cartometric
and extract features, through manual digitiz- quality maps. Cartometric maps are those
ing, described later in this chapter, or that faithfully represent the relative position
through image classification, described in of objects and thus may be suitable as a
Chapter 6. source of spatial data.
Digital spatial data are those provided in While much spatial data has been col-
a computer-compatible format. These lected from hardcopy sources, data entry
134 GIS Fundamentals

from digital sources now dominates. Coordi- road may be plotted with a symbol defining
nates are increasingly captured via interpre- the type of road or a point may be plotted
tation of digital image sources (these sources indicating the location of a city center, but
are described in Chapter 6) or collected the width of the road or number of city
directly in the field by satellite-based posi- dwellers are not provided in the shading or
tioning services (Chapter 5). other symbology on the map. Feature maps
Our objective in this chapter is to intro- are perhaps the most common map form,
duce spatial data entry via digitizing and and examples include most road maps, and
coordinate surveying. We will also cover standard map series such as the 7.5 minute
basic editing methods and data documenta- topographic maps produced by the U.S.
tion, and rudimentary cartography and out- Geological Survey.
put. Choropleth maps depict quantitative
information for areas. A mapped variable
such as population density may be repre-
Map Types sented in the map (Figure 4-5, top right).
Many types of maps are produced, and Polygons define area boundaries, such as
the types are often referred to by the way counties, states, census tracts, or other stan-
features are depicted on the map. Feature dard administrative units. Each polygon is
maps are among the simplest, because they given a color, shading, or pattern corre-
map points, lines, or areas and provide nom- sponding to values for a mapped variable,
inal information (Figure 4-5, upper left). A for example, in Figure 4-5, top right, the

Figure 4-5: Common hardcopy map types depicting New England, in the northeastern United States.
Chapter 4: Maps and Data Entry 135

darkest polygons have a population density


greater than 1000 persons per square mile.
Dot-density maps are another map form
commonly used to show quantitative data
(Figure 4-5, bottom left). Dots or other point
symbols are plotted to represent values. Dots
are placed in the polygon such that the num-
ber of dots equals the total value for the
polygon. Note that the dots are typically
placed randomly within the polygon area.
Each dot on the map in the lower left of Fig-
ure 4-5 represents 50,000 people, however Figure 4-6: Lines on isopleth maps typically do
each point is not a city or other concentration not cross. However, as shown at the arrow in this
image, lines may coincide when there is a com-
of inhabitants. Note the position of points in mon value. Here cliffs or overhangs result in con-
the dot-density map relative to the city loca- verging isopleth lines.
tions in the feature map directly above it in
Figure 4-5.
distance on the map is equal to 24,000 units
Isopleth maps, also known as contour on the Earth’s surface (Figure 4-7). Digital
maps, display lines of equal value (Figure 4- maps most often use a third method to report
5, bottom right). Isopleth maps are used to scale, as a bar or line of known distance,
represent continuous surfaces. Rainfall, ele- labeled on the map.
vation, and temperature are features that are
Note that depicting map scale was
commonly represented using isopleth maps.
unambiguous when only hardcopy maps
A line on the isopleth map represents a spec-
were produced. A written ratio or conver-
ified value, for example, a 10oC isopleth
sion, e.g., 1 inch to the mile, was true
defines the position on the landscape that is
because the features were fixed on paper or
at that temperature. Lines typically do not
other physical medium. A fixed scale may
cross, in that there cannot be two different
be erroneous on an electronic document,
temperatures at the same location. However,
because the document may be altered by
isopleth maps are commonly used to depict
zooming, which changes the magnification
elevation, and cliffs or overhanging terrain
on an electronic display. One inch as dis-
do have multiple elevations at the same loca-
played may not correspond to a mile. Digital
tion. In this case the lower elevations typi-
documents should most often include a fixed
cally pass “under” the higher elevations, and
scale bar, depicting an equivalent surface
the isopleth is labeled with the tallest height
distance, e.g., 1 kilometer, embedded in the
(Figure 4-6). Note that the isopleths are typi-
map, or some mechanism for re-calculating
cally estimated surfaces, with the lines
the scale as the digital map changes size on
drawn based on measurements at a set of
screen.
point locations; various methods for estimat-
ing isopleth lines from points are described The notion of large vs. small scale is
in chapter 12. often confused because scale implies a ratio,
or fraction. A larger ratio signifies a large-
scale map, so a 1:24,000-scale map is con-
Map Scale sidered large-scale relative to a 1:100,000-
All maps have a scale, a relationship scale map. Many people mistakenly refer to
between a distance on the map and a corre- a 1:100,000-scale map as larger scale than a
sponding distance projected on Earth. Map 1:24,000-scale map because it covers a
scale is often reported as a distance conver- larger area. A 1:100,000-scale map that is 50
sion, such as one inch to a mile, or as a unit- cm (20 inches) on a side covers more ground
less ratio, such as 1:24,000, indicating a unit than a 1:24,000-scale map that is 50 cm on a
136 GIS Fundamentals

Figure 4-7: Coverage, relative distance, and detail change from larger-scale (top) to smaller-scale (bot-
tom) maps.
Chapter 4: Maps and Data Entry 137

side. However, it is the size of the ratio or Map Generalization


fraction, and not the area covered that deter-
mines the map scale. It is helpful to remem- Maps are abstractions of reality, as are
ber that features appear larger on a large- spatial data in a GIS database. This abstrac-
scale map (Figure 4-7). It is also helpful to tion introduces map generalization, the
remember that large scale maps of a given unavoidable approximation of real features
paper size show more detail, but less area. when they are represented on a map. Not all
Notice in Figure 4-7, the larger scale map at the geometric or attribute detail of the physi-
the top shows details of Tokyo city. Tokyo cal world are recorded; only the most
shrinks in the successively smaller scale important characteristics are included. The
maps, but large additional areas are covered. set of features that are most important is sub-
The larger the ratio (and smaller the denomi- jectively defined and will differ among
nator), the larger the map scale. users. The mapmaker determines the set of
features to place on the map, and selects the
Because maps often report an average methods to collect and represent the shape
scale, and because there are upper limits on and location of these features on the map.
the accuracy with which data can be plotted
on a map, large-scale maps generally have The choice of data sources and map-
less geometric error than small-scale maps if making methods will unavoidably set limits
the same methods were used to produce on the size and shape of features that may be
them. Small errors in measurement, plotting, represented. Consider a project to map lakes,
printing, and paper deformation are magni- based on image data with a 250 meter cell
fied by the scale factor. These errors, which size (Figure 4-8). The abstraction of the
occur during map production, are magnified shoreline will not represent bays and penin-
more on a small-scale map than a large-scale sulas that are smaller than approximately
map. 250 meters across, by conscious choice of
the mapmakers. Small features will be
missed, edge detail will be lost, and dis-
tances along boundaries will depend on the
resolution of the source image.

Figure 4-8: A mapmaker chooses the materials and methods used to produce a map, and so imposes a
limit on spatial detail. Here, the choice of an input image with a 250 meter resolution (left) renders it
impossible to represent all the details of the real lake boundaries (right). In this example, features
smaller than approximately 250 meters on a side may not be faithfully represented on the map.
138 GIS Fundamentals

A finer resolution source, such as a 30 Displaced: features may be offset to pre-


meter resolution, may more faithfully depict vent overlap or to provide a standard
map detail, but may not be an appropriate distance between mapping symbols,
choice. The finer resolution may be more Omitted: Small features in a group may be
expensive, difficult to reproduce, unavail- excluded from the map, or
able for the entire mapping area, or inappro-
priate because it does not show important Exaggerated: standard symbol sizes are
features, for example, vegetation types or often chosen, e.g., standard road sym-
recent developments. Cartographers often bol widths, which are much larger
must balance several factors in map design, when scaled than the true road width.
and their choices inevitably lead to some Generalization is present at some level
form of map generalization. in every map, and should be recognized and
Feature generalization is one common evaluated for each map that is used as a
form of generalization. Feature generaliza- source for data in a GIS (Figure 4-10). If
tion is a modification of features when repre- generalization results in omission or degra-
senting them on a map. The geographic dation of data beyond acceptable levels, then
aspects of features are generalized because the analyst or organization should switch to
there are limits on the time, methods, or a larger-scale map if appropriate and avail-
materials available when collecting geo- able, or return to the field or original source
graphic data. These limits also apply when materials to collect data at the required pre-
compiling or printing a map. These feature cision.
generalizations, depicted in Figure 4-9, may
be classed as: Map Boundaries and Spatial Data
Fused: multiple features may be grouped to One final characteristic of maps affects
form a larger feature, their use as a source of spatial data: hard-
Simplified: boundary or shape details are copy maps have edges, and discontinuities
lost or “rounded off”, often occur at these edges. Much digital data
have been converted from legacy paper
maps, so edge discontinuities have been car-

Figure 4-9: Generalizations common in maps.


Chapter 4: Maps and Data Entry 139

a
Figure 4-10: Examples of map generalization. Por-
tions are shown for three maps for an area in central
Minnesota. Excerpts from a large scale
(a, 1:24,000), intermediate scale (b, 1:62,500), and
small scale (c, 1:250,000) map are shown. Note
that the maps are not drawn at true scale to
facilitate comparison. The smaller-scale maps (b
and c) have been magnified more than a to better
show the effects of generalization. Each map has a
different level of map generalization. Generaliza-
tions increase with smaller-scale maps, and include
omissions of smaller lakes, successively greater
road width exaggerations, and increasingly general-
ized shorelines as one moves from maps a through
c.

b c

ried to the present. These errors are disap- Larger scale maps generally cover smaller
pearing as newer data are collected with areas. A 1:100,000-scale map that is 18
digital methods, but will be encountered and inches (47 centimeters) on a side spans
should be understood. approximately 28 miles (47 kilometers). A
Large-scale, high-quality maps gener- 1:24,000-scale map that is 18 inches on a
ally cover small areas. This is because of the side represents 9 miles (15 kilometers) on
trade-off between scale and area coverage, the Earth’s surface. Because spatial data in a
and because of limits on the practical size of GIS often span several large-scale maps,
a map. Cartometric maps larger than a meter these map boundaries may occur in a spatial
in any dimension have proven to be imprac- database. Problems often arise when adja-
tical for most organizations. Maps above this cent maps are entered into a spatial database
size are expensive and difficult to print, because features do not align or have mis-
store, or view. Thus, human ergonomics set a matched attributes across map boundaries.
practical limit on the physical size of a map. Differences in the time of data collection
The fixed maximum map dimension for adjacent map sheets may lead to incon-
when coupled with a fixed map scale defines sistencies across map borders. Landscape
the area coverage of the hardcopy map. change through time is a major source of dif-
140 GIS Fundamentals

ferences across map boundaries. For exam- problems are compounded when extensive
ple, the U.S. Geological Survey has checking and guidelines are not enforced
produced 1:24,000-scale map sheets for all across map sheet boundaries, especially
of the lower 48 United States of America. when adjacent areas are mapped at different
The original mapping took place over sev- times or by two different organizations.
eral decades, and there were inevitable time Finally, differences in coordinate regis-
lags between mapping some adjacent areas. tration can lead to spatial mismatch across
As much as two decades passed between map sheets. Registration, discussed later in
mapping or updating adjacent map sheets. this chapter, is the conversion of digitizer or
Thus, many features, such as roads, canals, other coordinate data to an earth-surface
or municipal boundaries, are discontinuous coordinate system. These registrations con-
or inconsistent across map sheets. tain unavoidable errors that translate into
Different interpreters may also cause spatial uncertainty. There may be mis-
differences across map boundaries. Large- matches when data from two separate regis-
area mapping projects typically employ sev- trations are joined along the edge of a map.
eral interpreters, each working on different Spatial data stored in a GIS are not
map sheets for a region. All professional, bound by the same constraints that limit the
large-area mapping efforts should have pro- physical dimensions of hardcopy maps. Dig-
tocols specifying the scale, sources, equip- ital storage enables the production of seam-
ment, methods, classification, keys, and less digital maps of large areas. However,
cross-correlation to ensure consistent map- the inconsistencies that exist on hardcopy
ping across map sheet boundaries. In spite of maps may be transferred to the digital data.
these efforts, however, some differences due Inconsistencies at map sheet edges need to
to human interpretation occur. Feature place- be identified and resolved when maps are
ment, category assignment, and generaliza- converted to digital formats.
tion vary among interpreters. These

Digitizing: Coordinate Capture


Digitizing is the process by which coor- pressing a button on the digitizing device.
dinates from a map, image, or other sources Important point, line, or area features are
are converted into a digital format in a GIS. traced on the source materials, and the coor-
Points, lines, and areas on maps or images dinates are recorded in GIS-compatible for-
represent real-world entities or phenomena, mats. Valuable data on historical maps may
and these must be recorded in digital forms be converted to digital forms through the use
before they can be used in a GIS. The coor- of manual digitizing. On-screen digitizing
dinate values that define the locations and and hardcopy digitizing are the two most
shapes of entities must be captured, that is, common forms of manual digitization.
recorded as numbers and structured in the
spatial database. There is a wealth of spatial
data in existing maps and photographs, and On-screen Digitizing
new imagery and maps add to this source of On-screen digitizing, also known as
information on a nearly continuous basis. heads-up digitizing, involves manually digi-
Manual digitization is human-guided tizing on a computer screen, using a digital
coordinate capture from a map or image image as a backdrop. Digitizing software
source. The operator guides an electronic allows the operator to trace the points, lines,
device over a map or image and signals the or polygons that are identified on the
capture of important coordinates, often by scanned map (Figure 4-11). Digitizing soft-
Chapter 4: Maps and Data Entry 141

ware allows the human operator to specify flexibility when digitizing, and add the cost
the type of feature to be recorded, the extent of printing.
and magnification of the image on screen, On-screen digitizing is often more accu-
the mode of digitizing, and other options to rate than manual digitizing because manual
control how data are input. The operator typ- map digitization is often limited by the
ically guides a cursor over points to be visual acuity and pointing ability of the oper-
recorded using a mouse, and depresses a but- ator. The pointing imprecision of the opera-
ton or sequence of buttons to collect the tor and digitizing systems translates to a
point coordinates. On-screen digitizing can fixed ground distance when manually digi-
be used for recording information from tizing a hardcopy map. For example, con-
scanned aerial photographs, digital photo- sider an operator that can reliably digitize a
graphs, satellite images, or other images. location to the nearest 0.4 millimeters (0.01
On-screen digitizing offers advantages inch) on a 1:20,000-scale map. Also assume
over hardcopy and scan-digitizing, methods the best hardcopy digitizing table available
that are described in the following sections. is being used, and we know the observed
Many data sources are inherently digital, for error is larger than the error in the map. The
example, image data collected from aerial 0.4 millimeter error in precision translates to
photographs and airborne or satellite scan- approximately 8 meters of error on the
ners.These data may be magnified on screen Earth’s surface. The precision cannot be
to any desired scale. Converting the image to appreciably improved when using a digitiz-
a paper or other hardcopy form would likely ing table, because a majority of the impreci-
introduce error through the slight deforma- sion is due to operator abilities. In contrast,
tion of the paper or printing media, reduce once the map is scanned, the image may be
displayed on a computer screen at any map

Figure 4-11: An example of on-screen digitizing. Images or maps are displayed on a computer screen and
feature data digitized manually. Buildings, roads, or any other features that may be distinguished on the
image may be digitized.
142 GIS Fundamentals

scale. The operator may zoom to a 1:5,000- button specifies the puck location relative to
scale or greater on-screen, and digitizing the digitizer coordinate system. Digitizing
accuracy and precision improved. While tables can be quite accurate, with a resolu-
other factors affect the accuracy of the tion of between 0.25 and 0.025 millimeters
derived spatial data (for example map plot- (0.01 and 0.001 inches).
ting or production errors, or scanner accu- While once a major method for captur-
racy), on-screen digitizing may be used to ing spatial data, hardcopy map digitizing is
limit operator-induced positional error when diminishing in importance as most paper
digitizing. On-screen digitizing also documents have been converted to digital
removes or reduces the need for digitizing forms. The tables are large, somewhat
tables or map scanners, the specialized expensive, and now little-used. However,
equipment used for capturing coordinates because data from hardcopy sources are
from maps. likely to persist for many decades, and there
are still many specialized documents to con-
Hardcopy Map Digitization vert, you should be familiar with the process.
Hardcopy digitizing is human-guided Not all maps are appropriate as a source
coordinate capture from a paper, plastic, or of information for GIS. The type of map,
other hardcopy map. An operator securely how it was produced, and the intended pur-
attaches a map to a digitizing surface and pose must be considered when interpreting
traces lines or points with an electrically sen- the information on maps. Only cartometric
sitized puck (Figure 4-12).The most com- maps should be directly digitized, and even
mon digitizers are based on a wire grid though cartometric, a map may not be suit-
embedded in or under a table. Depressing a able. Consider the dot-density map

Figure 4-12: Manual digitizing on a digitizing table.


Chapter 4: Maps and Data Entry 143

described in Figure 4-2. Population is tized data. This scale may be the production
depicted by points, but the points are plotted scale for hardcopy maps, or the display scale
with random offsets or using some method for digital images or scanned maps. Table 4-
that does not reflect the exact location of the 1 illustrates the effects of map scale on data
population within each polygon. Before the quality. Errors of one millimeter (0.039
information in the dot density map is entered inches) on a 1:24,000-scale map correspond
into a GIS, the map should be interpreted to 24 meters (79 feet) on the surface of the
correctly. The number of dots in a polygon Earth. This same one millimeter error on a
should be counted, this number multiplied 1:1,000,000-scale map corresponds to 1000
by the population per dot, and the population meters (3281 feet) on the Earth’s surface.
value assigned to the entire polygon. Thus, small errors in map production or
Maps may be unsuitable for digitizing interpretation may cause significant posi-
due to the media. Most hardcopy maps are tional errors when scaled to distances on the
on paper because it is ubiquitous, inexpen- Earth, and these errors are greater for
sive, and easily printed. Creases, folds, and smaller-scale maps. Errors due to human
wrinkles can lead to non-uniform deforma- pointing ability are reduced for on-screen
tion of paper maps. digitizing, because the operator can zoom in
to larger scales as needed. However, this
does not overcome errors inherent in original
Characteristics of Manual Digitiz- images or scanned documents.
ing Both device precision and map scales
Manual digitizing, whether from a digi- should be considered when selecting a digi-
tal image on screen or from a hardcopy tizing tablet. Map scale and repeatability
source, is common because it provides suffi- both set an upper limit on the positional
ciently accurate data for many, if not most, quality of digitized data. The most precise
applications. Manual digitizing may be at digitizers may be required when attempting
least the accuracy of most maps or images, to meet a stringent error standard while digi-
so the equipment, if properly used, does not tizing small-scale maps.
add substantial error. Manual digitizing also
requires low equipment investment, often
just the software for image display and coor-
Table 4-1: The surface error caused by a
dinate capture. The human ability to inter- one millimeter (0.039 inch) map error
pret images or hardcopy maps in poor will change as map scale changes. Note
condition is a unique and important benefit the larger error at smaller map scales.
of manual digitizing. Humans are usually
better than machines at interpreting the Map Scale Error Error
information contained on faded, stained, or (m) (ft)
poor quality maps and images. Finally, man-
ual digitizing is often best because short 1:24,000 24 79
training periods are required, data quality
may be frequently evaluated, and digitizing 1:50,000 50 164
equipment is commonly available. For these
reasons manual digitization is likely to 1:62,500 63 205
remain an important data entry method for
some time to come. 1:100,000 100 328
There are a number of characteristics of
1:250,000 250 820
manual digitization that may negatively
affect the positional quality of spatial data. 1:1,000,000 1,000 3,281
As noted earlier, map or image scale and res-
olution impacts the spatial accuracy of digi-
144 GIS Fundamentals

The Digitizing Process


Manual digitizing involves displaying a
digital image on screen or placing a map on
a digitizing surface, and tracing the location
of feature boundaries. Coordinate data are
sampled by manually positioning the puck or
cursor over each target point and collecting
coordinate locations. This position/collect
step is repeated for every point to be cap-
tured, and in this manner the locations and
shapes of all required map features are
defined. Features that are viewed as points
are represented by digitizing a single loca-
tion. Lines are represented by digitizing an
Figure 4-13: Digitizing error, defined by repeat ordered set of points, and polygons by digi-
digitizing. Points repeatedly digitized cluster
around the true location, and follow a normal tizing a connected set of lines. Lines have a
probability distribution. (from Bolstad et al., starting point, often called a starting node, a
1990). set of vertices defining the line shape, and an
ending node (Figure 4-14). Hence, lines may
be viewed as a series of straight line seg-
The abilities and attitude of the person ments connecting vertices and nodes.
digitizing (the operator) may also affect the Digitizing may be in point mode, where
geometric quality of manually digitized data. the operator must depress a button or other-
Operators vary in their visual acuity, steadi- wise signal to the computer to sample each
ness of hand, attention to detail, and ability point, or in stream mode, where points are
to concentrate. Some operators will more
accurately capture the coordinate informa-
tion contained in maps. The abilities of any
single operator will also vary through time,
due to fatigue or difficulty maintaining focus
on a repetitive task. Operators should take
frequent breaks from digitizing, and compar-
isons among operators and quality and con-
sistency checks should be integrated into any
manual digitization process to ensure accu-
rate and consistent data collection.
The combined errors from both opera-
tors and equipment have been well-charac-
terized and may be quite small. One test
using a high-precision digitizing table
revealed digitizing errors averaging approxi-
mately 0.067 millimeters (Figure 4-13).
Errors followed a random normal distribu-
tion, and varied significantly among opera-
tors. These average errors translated to an
approximately 1.6 meter error when scaled
from the 1:24,000 map to a ground-equiva-
lent distance. This average error is less than Figure 4-14: Nodes define the starting and
the acceptable production error for the map, ending points of lines. Vertices define line
shape.
and is suitable for many spatial analyses.
Chapter 4: Maps and Data Entry 145

automatically sampled at a fixed time or dis- may pause without creating a rat’s nest of
tance frequency, perhaps once each meter. line segments. The threshold must be chosen
Stream mode helps when large numbers of carefully - neither too large, missing useful
lines are digitized, because vertices may be detail, nor too small, in effect reverting back
sampled more quickly and the operator may to stream digitizing.
become less fatigued.
The stream sampling rate must be speci- Digitizing Errors, Node and Line
fied with care to avoid over- or under-sam- Snapping
pled lines. Too short a collection interval
results in redundant points not needed to Positional errors are inevitable when
accurately represent line or polygon shape. data are manually digitized. These errors
Too long a collection interval may result in may be “small” relative to the intended use
the loss of important spatial detail. In addi- of the data, for example the positional errors
tion, when using time-triggered stream digi- may be less than 2 meters when only 5 meter
tizing, the operator must remember to accuracy is required. However, these rela-
continuously move the digitizing puck; if the tively small errors may still prevent the gen-
operator rests the digitizing puck for a period eration of correct networks or polygons. For
longer than the sampling interval there will example, a data layer representing a river
be multiple points clustered together. These system may not be correct because major
will redundantly represent a portion of the tributaries may not connect. Polygon fea-
line and may result in overlapping segments. tures may not be correctly defined because
Pausing for an extended period of time often their boundaries may not completely close.
creates a “rat’s nest” of lines that must later These small errors must be removed or
be removed. avoided during digitizing. Figure 4-15
shows some common digitizing errors.
Minimum distance digitizing is a variant
of stream mode digitizing that avoids some Undershoots and overshoots are com-
of the problems inherent with time-sampled mon errors that occur when digitizing.
streaming. In minimum distance digitizing a Undershoots are nodes that do not quite
new point is not recorded unless it is more reach the line or another node, and over-
than some minimum threshold distance from shoots are lines that cross over existing
the previously digitized point. The operator nodes or lines (Figure 4-15). Undershoots

Figure 4-15: Common digitiz-


ing errors.
146 GIS Fundamentals

cause unconnected networks and unclosed is digitized. Line snapping forces a node to
polygons. Overshoots typically do not cause connect to a nearby line while digitizing, but
problems when defining polygons, but they only when the undershoot or overshoot is
may cause difficulties when defining and less than the snapping distance. Line snap-
analyzing line networks. ping requires the calculation of an intersec-
Node snapping and line snapping are tion point on an already existing line. The
used to reduce undershoots and overshoots snap process places a new node at the inter-
while digitizing. Snapping is a process of section point, and connects the digitized line
automatically setting nearby points to have to the existing line at the intersection point.
the same coordinates. Snapping relies on a This splits the existing line into two new
snap tolerance or snap distance. This dis- lines. When used properly, line and node
tance may be interpreted as a minimum dis- snapping reduce the number of undershoots
tance between features. Nodes or vertices and overshoots. Closed polygons or inter-
closer than this distance are moved to secting lines are easier to digitize accurately
occupy the same location (Figure 4-16). and efficiently when node and line snapping
Node snapping prevents a new node from are in force.
being placed within the snap distance of an The snap distance must be carefully
already existing node; instead, the new node selected for snapping to be effective. If the
is joined or “snapped” to the existing node. snap distance is too short, then snapping has
Remember that nodes are used to define the little impact. Consider a system where the
ending points of a line. By snapping two operator may digitize with better than 5
nodes together, we ensure a connection meter accuracy only 10% of the time. This
between digitized lines. means 90% of the digitized points will be
Line snapping may also be specified. more than 5 meters from the intended loca-
Line snapping inserts a node at a line cross- tion. If the snap tolerance is set to the equiv-
ing and clips the end when a small overshoot alent of 0.1 meters, then very few nodes will

Figure 4-16: Undershoots, overshoots, and snapping. Snapping may join nodes, or may place a node onto
a nearby line segment. Snapping does not occur if the nodes and/or lines are separated by more than the
snap tolerance.
Chapter 4: Maps and Data Entry 147

be within the snap tolerance, and snapping Data may also be digitized with too
has little effect. Another problem comes many vertices. High densities may occur
from setting the snap tolerance too large. If when data are manually digitized in stream
the snap tolerance in our previous example is mode, and the operator moves slowly rela-
set to 10 meters, and we want the data accu- tive to the time interval. High vertex densi-
rate to the nearest 5 meters, then we may ties may also be found when data are derived
lose significant spatial information that is from spline or smoothing functions that
contained in the hardcopy map. Lines less specify too high a point density. Finally,
than 10 meters apart cannot be digitized as automated scanning and then raster-to-vec-
separate objects. Many features may not be tor conversion may result in coordinate pairs
represented in the digital data layer. The spaced at absurdly high densities. Many of
snap distance should be smaller than the these coordinate data are redundant and may
desired positional accuracy, such that signif- be removed without sacrificing spatial accu-
icant detail contained in the digitized map is racy. Too many vertices may be a problem in
recorded. It is also important that the snap that they slow processing, although this has
distance is not below the capabilities of the become less important as computing power
system used for digitizing. Careful selection has increased. Point thinning algorithms
of the snap distance may reduce digitizing have been developed to reduce the number
errors and significantly reduce time required of points while maintaining the line shape.
for later editing. Many point thinning methods use a per-
pendicular “weed” distance, measured from
Reshaping: Line Smoothing and a spanning line, to identify redundant points
Thinning (Figure 4-18, top). The Lang method exem-
plifies this approach. A spanning line con-
Digitizing software may provide tools to nects two non-adjacent vertices in a line. A
smooth, densify, or thin points while enter-
ing data. One common technique uses spline
functions to smoothly interpolate curves
between digitized points and thereby both
smooth and densify the set of vertices used
to represent a line. A spline is set of polyno-
mial functions that join smoothly (Figure 4-
17). Polynomial functions are fit to succes-
sive sets of points along the vertices in a
line; for example, a function may be fit to
points 1 through 5, and a separate polyno-
mial function fit to points 5 through 11 (Fig-
ure 4-17). Constraints force these functions
to connect smoothly, usually by requiring
the first and second derivatives of the func-
tions to be continuous at the intersection
point. This means the lines have the same
slope at the intersection point, and the slope
is changing at the same rate for both lines at
the intersection point. Once the spline func-
tions are calculated they may be used to add
vertices. For example, several new vertices
may be automatically placed on the line Figure 4-17: Spline interpolation to smooth
between digitized vertices 8 and 9, leading digitized lines.
to the “smooth” curve shown in Figure 4-17.
148 GIS Fundamentals

pre-determined number of vertices is The process may be repeated for succes-


spanned initially. The initial spanning num- sive sets of points in a line segment until all
ber has been set to 4 in Figure 4-18, meaning vertices have been evaluated (Figure 4-18e
four points will be considered at each start- to h). All close vertices are viewed as not
ing point. Areas closer than the weed dis- recording a significant change in the line
tance are shown in gray in the figure. A shape, and hence are expendable. Increasing
straight line is drawn between a starting the weed distance thins more vertices, and at
point and an endpoint that is the 4th point some upper weed distance too many vertices
down the line (Figure 4-18a). Any interme- may be removed. A balance must be struck
diate points that are closer than the weed dis- between the removal of redundant vertices
tance are marked for removal. In Figure 4- and the loss of shape-defining points, usu-
18a, no points are within the weed distance, ally through a careful set of test cases with
therefore none are marked. The endpoint is successively larger weed distances.
then moved to the next closest remaining There are many variants on this basic
point (Figure 4-18b), and all intermediate concept. Some look only at three immedi-
points tested for removal. Again, any points ately adjacent points, testing the middle
closer than the weed distance are marked for point against the line spanned by its two
removal. Note that in Figure 4-18b, one neighboring points. Others constrain or
point is within the weed distance, and is expand the search based on the complexity
removed. Once all points in the initial span- of the line. Rather than always looking at
ning distance are checked, the last remaining four points, as in our example above, more
endpoint becomes the new starting point, points are scrutinized when the line is not
and a new spanning line drawn to connect 4 complex (nearly straight), and fewer when
points (Figure 4-18c, d). the line is complex (many changes in direc-
tion).

Figure 4-18:The Lang algorithm is a common line-thinning method. In the Lang method vertices are
removed, or thinned, when they are within a weed distance to a spanning line (adapted from Weibel, 1997).
Chapter 4: Maps and Data Entry 149

Scan Digitizing arrays are typically used to measure the


reflectance so that one to several rows of
Optical scanning is another method for cells may be scanned simultaneously. A
converting hardcopy documents into digital motor then moves the optical train to the
formats (Figure 4-19). Scanners have ele- adjacent lines and the process is repeated.
ments that emit and sense light. Most scan-
ners pass a sensing element over an Drum scanners differ from flatbed scan-
illuminated map. This device measures both ners in that they employ a rotating cylinder.
the precise location of the point being sensed A map is fixed onto the surface of this cylin-
and the strength of the light reflected or der, and the cylinder set to rotate at a uni-
transmitted from that point. Reflected light form velocity. The angular velocity of a
intensities are sensed and converted to num- rotating cylinder is easier to control than the
bers. straight-line motion of a bed scanner, so
many of the early high-precision scanners
A threshold is often applied to deter- used drums. Many drum scanners are similar
mine if the sensed point is part of a feature. to bed scanners in that they use optical
For example, a map may consist of dark detection of reflected light to sense map ele-
lines on a white background. A threshold ments.
might be set such that if less than 10% of the
light striking the map is returned to the sen- Scanners work best when very clean
sor, the sensed point is considered part of a maps are available. Even the most expensive
line. If 10% or more of the energy is scanners may report a significant number of
reflected back to the sensor, the point is con- spurious lines or points when old, marked,
sidered part of the white space between folded, or wrinkled maps are used. These
lines. The scanner then produces a raster spurious features must be subsequently
representation of the map. Values are removed via manual editing, thus negating
recorded where points or lines exist on the the speed advantage of scanning over man-
map and null or zero values are recorded in ual digitizing. Scanning also works best
the intervening spaces. when maps are available as map separates,
with one thematic feature type on each map.
Most scanners are either bed or drum Editing takes less time when maps do not
designs. Bed scanners provide a flat surface contain writing or other annotation. Strongly
on which the map is placed. A mat or hinged contrasting colors are preferred, such as
cover is then placed on top of the map, flat- black lines on a white background, rather
tening and securing the map to the bed. On then dark grey on light grey. Finally, scan-
some bed scanners an optical train is passed ning is most advantageous when a large
over the map, emitting light and sensing the number of cartographic elements is found on
light reflected back from the map. Sensing the maps.
Scan digitization usually requires some
form of skeletonizing, or line thinning, par-
ticularly if the data are to be converted to a
vector data format. Scanned lines are often
wider than a single pixel (Figure 4-20). One
of several pixels may be selected to specify
the position of a given portion of the line.
The same holds true for points. A pixel near
the “center” of the point or line is typically
chosen, with the center of a line defined as
the pixel nearest the center of the local per-
pendicular bisector of the line. Skeletonizing
Figure 4-19: A map scanner (courtesy Cal- reduces the widths of lines or points to a sin-
comp). gle pixel.
150 GIS Fundamentals

Figure 4-20: Skeletonizing, a form of line thinning that is often applied after scan-digitizing.

Editing Geographic Data Software help operators identify poten-


tial errors. Line features typically begin and
Spatial data may be edited, or changed, end with a node, and nodes may be classified
for several reasons. Errors and inconsisten- as connecting or dangling. A connecting
cies are inevitably introduced during spatial node joins two or more lines, while a dan-
data entry. Undershoots, overshoots, missing gling node is attached to only one line. Some
or extra lines, missing or extra points or dangling nodes may be intentional, for
labels are all errors that must be corrected. example, a cul-de-sac in a street network,
Spatial data can change over time. Parcels while others will be the result of under- or
are subdivided, roads extended or moved, overshoots. Dangling nodes that are plotted
forests grow or are cut, and these changes with unique symbols can be quickly evalu-
may be entered in the spatial database ated, and if appropriate, corrected.
through editing. New technologies may be
developed that provide more accurate posi- Attribute consistency may also be used
tional information, and even though existing to identify errors. Operators note areas in
data may be consistent and current, the more which contradictory theme types occur in
accurate data may be more useful, leading to different data layers. The two layers are
data editing. either graphically or cartographically over-
lain. Contradictory co-occurrences are iden-
Identifying errors is the first step in edit- tified, such as water in one layer and upland
ing. Errors may be identified by printing a areas in a second. These contradictions are
map of the digitized data and verifying that then either resolved manually, or automati-
each point, line, and area feature is present cally via some pre-defined precedence hier-
and correctly located. Plots are often printed archy.
both at a similar scale and at a significantly
larger scale than the original source materi- Many GIS software packages provide a
als. The large-scale plots are often paneled comprehensive set of editing tools (Figure 4-
with some overlap among panels. Plots at 21). Editing typically includes the ability to
scale are helpful for identifying missing fea- select, split, update, and add features. Selec-
tures, and large-scale plots aid in identifying tion may be based on geometric attributes, or
undershoots, overshoots, and small omis- with a cursor guided by the operator. Selec-
sions or additions. Operators typically anno- tions may be made individually, by geo-
tate these plots as they are checked graphic extent (select all features in a box,
systematically for each feature. circle, or within a certain distance of the
pointer) or by geometric attributes (e.g.,
select all nodes that connect to only one
Chapter 4: Maps and Data Entry 151

Figure 4-21: GIS software provide for a flexible and complete


set of editing tools. These tools provide for the rapid, precise,
controlled creation and modification of coordinates and attri-
butes of spatial data (courtesy ESRI).

line). Once a feature is selected, various selected by dragging interactively on the


operations may be available, including eras- screen to match point locations. All lines and
ing all or part of the feature, changing the points except the anchor points are interac-
coordinate values defining the feature, and tively adjusted. One common application of
in the case of lines, splitting or adding to the rubbersheeting involves adjusting linework
feature. A line may be split into parts, either representing cultural features, such as a road
to isolate a segment for future deletion, or to network, when higher geometric-accuracy
modify only a portion of the line. Coordi- photo or satellite image data are available.
nates are typically altered by interactively The linework is overlain on an image back-
selecting and dragging points, nodes or ver- drop and subsequently adjusted.
tices to their best shape and location. Points All edits should be made with due atten-
or line segments are added as needed. tion to the magnitude of positional change
Groups of features in an area may be introduced during editing. On-screen edit-
adjusted through interactive rubbersheeting. ing to eliminate undershoots should only be
Rubbersheeting involves fitting a local equa- performed when the “true” locations of fea-
tion to adjust the coordinates of features. tures may be identified accurately, and the
Polynomial equations are often used due to new features can be confidently placed in the
their flexibility and ease of application. correct location. Automatic removal of
Anchor points are selected, again on the “short” undershoots may be performed with-
graphics screen, and other points are out introducing additional spatial error in
152 GIS Fundamentals

most instances. A short distance for an based on ground surveys while another was
undershoot is subjectively defined, but typi- based on aerial photographs. Digitizing can
cally it is below the error inherent in the also compound the problem due to differ-
source map, or at least a distance that is ences in digitizing methods or operators.
insignificant when considering the intended There are several ways to remove this
use of the spatial data. “common feature” inconsistency. One
involves re-drafting the data from conflict-
Features Common to Several ing sources onto one base map. Inconsisten-
Layers cies are removed at the drafting stage. For
example, vegetation and roads data may
One common problem in digitizing show vegetation type boundaries at road
derives from representation of features that edges that are inconsistent with the road
occur on different maps or images. These locations. Both of these data layers may be
features rarely have identical locations on drafted onto the same base, and the common
each map or image, and often occur in dif- boundaries fixed by a single line. This line is
ferent locations when digitized into their digitized once, and used to specify the loca-
respective data layers (Figure 4-22). For tion of both the road and vegetation bound-
example, water boundaries on soil survey ary when digitizing. Re-drafting, although
maps rarely correspond exactly to water labor intensive and time consuming, forces a
boundaries found on USGS topographic resolution of inconsistent boundary loca-
maps. tions. Re-drafting also allows several maps
Features may appear differently on dif- to be combined into a single data layer.
ferent maps for many reasons. Perhaps the A second, often preferable method
maps were made for different purposes or at involves establishing a “master” boundary
different times. Features may differ because which is the highest accuracy composite of
the maps were from different source materi- the available data sets. A digital copy or
als, for example, one map may have been overlay operation establishes the common
features as a base in all the data layers, and
this base may be used as each new layer is
produced. For example, water boundaries
might be extracted from the soil survey and
USGS quad maps and these data combined
in a third data layer. The third data layer
would be edited to produce a composite,
high-quality water layer. The composite
water layer would then be copied back into
both the soils and USGS quad layers. This
second approach, while resulting in visually
consistent spatial data layers, is in many
instances only a cosmetic improvement of
the data. If there are large discrepancies
(“large” is defined relative to the required
spatial data accuracy), then the source of the
discrepancies should be identified and the
most accurate data used, or new, higher
Figure 4-22: Common features may be spa-
tially inconsistent in different spatial data lay- accuracy data collected from the field or
ers. original sources.
Chapter 4: Maps and Data Entry 153

Coordinate Transformation
Coordinate transformation is a common dinates are usually recorded in pixel, inch, or
operation in the development of spatial data centimeter units relative to an origin located
for GIS. A coordinate transformation brings near the lower left corner of the image. The
spatial data into an Earth-based map coordi- absolute values of the coordinates depend on
nate system so that each data layer aligns where the image happened to be placed on
with every other data layer. This alignment the table prior to scanning, but the relative
ensures features fall in their proper relative position of digitized points does not change.
position when digital data from different lay- Before these newly digitized data may be
ers are combined. Within the limits of data used with other data, these “inch-space” or
accuracy, a good transformation helps avoid “digitizer” coordinates must be transformed
inconsistent spatial relationships such as into an Earth-based map coordinate system.
farm fields on freeways, roads under water,
or cities in the middle of swamps, except
where these truly exist. Coordinate transfor- Control Points
mation is also referred to as registration, A set of control points is used to trans-
because it “registers” the layers to a map form the digitized data from the digitizer or
coordinate system. photo coordinate system to a map-projected
Coordinate transformation is most com- coordinate system. Control points are differ-
monly used to convert newly digitized data ent from other digitized features. When we
from the digitizer/scanner coordinate system digitize most points, lines, or areas, we do
to a standard map coordinate system (Figure not know the map projection coordinates for
4-23). The input coordinate system is usu- these features. We simply collect the digi-
ally based on the digitizer or scanner- tizer x and y coordinates that are established
assigned values. An image may be scanned with reference to some arbitrary origin on
and coordinates recorded as a cursor is the digitizing tablet or photo. Control points
moved across the image surface. These coor- differ from other digitized points in that we

Figure 4-23: Control points in a coordinate transformation. Control points are used to guide the trans-
formation of a source, input set of coordinates to a target, output set of coordinates. There are five con-
trol points in this example. Corresponding positions are shown in both coordinate systems.
154 GIS Fundamentals

know both the map projection coordinates depends on the mathematical form of the
and the digitizer coordinates for these points. transformation, but additional control points
These two sets of coordinates for each above the minimum number are usually col-
control point, one for the map projection and lected; this usually improves the quality and
one for the digitizer system, are used to esti- accuracy of the statistically-fit transforma-
mate the coefficients for transformation tion functions.
equations, usually through a statistical, least- The x, y (horizontal), and sometimes z
squares process. The transformation equa- (vertical or elevation) coordinates of control
tions are then used to convert coordinates points are known to a high degree of accu-
from the digitizer system to the map projec- racy and precision. Because high precision
tion system. and accuracy are subjectively defined, there
The transformation may be estimated in are many methods to determine control point
the initial digitizing steps, and applied as the locations. Sub-centimeter accuracy may be
coordinates are digitized from the map or required for control points used in property
image. This “on-the-fly” transformation boundary layers, while accuracies of a few
allows data to be output and analyzed with meters may be acceptable for large-area veg-
reference to map-projected coordinates. A etation mapping. Common sources of con-
previously registered data layer or image trol point coordinates are traditional transit
may be displayed on screen just prior to dig- and distance surveys, global positioning sys-
itizing a new map. Control points may then tem measurements, existing cartometric
be entered, the new map attached to the digi- quality maps, or existing digital data layers
tizing table, and the map registered. The new on which suitable features may be identified.
data may then be displayed on top of the pre-
viously registered data. This allows a quick The Affine Transformation
check on the location of the newly digitized
objects against corresponding objects in the The affine coordinate transformation
study area. employs linear equations to calculate map
coordinates. Map projection coordinates are
In contrast to on-the-fly transformations, often referred to as eastings (E) and north-
data can also be recorded in digitizer coordi- ings (N), and are related to the x and y digi-
nates and the transformation applied later. tizer coordinates by the equations:
All data are digitized, including the control
point locations. The digitizer coordinates of
the control point may then be matched to E = TE+a1x+a2y (4.1)
corresponding map projection coordinates,
and transformation equations estimated.
These transformation equations are then N = TN+b1x+b2y (4.2)
applied to convert all digitized data to map
projection coordinates.
Control points should meet or exceed Equations 4.1 and 4.2 allow us to move
several criteria. First, control points should from the arbitrary digitizer coordinate sys-
be from a source that provides the highest tem to the project map coordinate system.
feasible coordinate accuracy. Second, con- We know the x and y coordinates for every
trol point accuracy should be at least as good digitized point, line vertex, or polygon ver-
as the desired overall positional accuracy tex. We may calculate the E and N coordi-
required for the spatial data. Third, control nates by applying the above equations to
points should be as evenly distributed as every digitized point.
possible throughout the data area. A suffi-
TE and TN are translation changes
cient number of control points should be col-
between the coordinate systems, and can be
lected. The minimum number of points
thought of as shifts in the origins from one
Chapter 4: Maps and Data Entry 155

coordinate system to the next. The ai and bi minimizes the root mean square error,
parameters incorporate the change in scales RMSE. The RMSE is defined as:
and rotation angle between one coordinate
system and the next. The affine is the most
e1 + e2 + e3+ en
2 2 2 2
commonly applied coordinate transforma- RMSE = -------------------------------- (4.5)
tion because it provides for these three main n
effects of translation, rotation, and scaling,
and because it often introduces less error where the ei are the residual distances
than higher-order polynomial transforma- between the true E and N coordinates and
tions. the E and N coordinates in the output data
The affine system of equations has six layer:
parameters to be estimated, TE, TN, a1, a2, b1,
and b2. Each control point provides E, N, x, 2 2
and y coordinates, and allows us to write two e= xt–xd +yt–yd (4.6)
equations. For example, we may have a con-
trol point consisting of a precisely surveyed
center of a road intersection. This point has This residual is the difference between the
digitizer coordinates of x=103.0 centimeters true coordinates xt, yt, and the transformed
and y = -100.1 centimeters, and correspond- output coordinates xd, yd. Figure 4-24
ing Earth-based map projection coordinates shows examples of this lack of fit. Individual
of E = 500,083.4 and N = 4,903,683.5. We residuals may be observed at each control
may then write two equations based on this point location.
control point:
A statistical method for estimating
transformation equations is preferred
because it also identifies transformation
500,083.4=TE+a1(103.0)+a2(-100.1) (4.3) error. Control point coordinates contain
unavoidable measurement errors. A statisti-
cal process provides an RMSE, a summary
4,903,683.5=TN+b1(103.0)+b2(-100.1) (4.4) of the difference between the “true” (mea-
sured) and predicted control point coordi-
nates. It provides one index of
We cannot find a unique solution to transformation quality. Transformations are
these equations, because there are six
unknowns (TE, TN, a1, a2, b1, b2) and only
two equations. We need as many equations
as unknowns to solve a linear system of
equations. Each control point gives us two
equations, so we need a minimum of three
control points to estimate the parameters of
an affine transformation. Statistical estima-
tion requires a total of four control points.
As with all statistical estimates, more control
points are better than fewer, but we will
reach a point of diminishing returns after
some number of points, typically somewhere
between 18 and 30 control points.
The affine coordinate transformation is
usually fit using a statistical method that Figure 4-24: Examples of control points, pre-
dicted control locations, and residuals from coor-
dinate transformation.
156 GIS Fundamentals

fit (Figure 4-25). The RMSE will usually be and y directions. Note the symmetry in the
less than the true transformation error at a equations 4.7 and 4.8, in that the x and y
randomly selected point, because we are coefficients match across equations, and
actively minimizing the N and E residual there is a change in sign for the d coefficient.
errors when we statistically fit the transfor- This results in a system of equations with
mation equations. However, the RMSE is an only four unknown parameters, and so the
index of accuracy, and a lower RMSE gener- conformal may be estimated when only two
ally indicates a more accurate affine trans- control points are available.
formation. Higher-order polynomial transforma-
Estimating the coordinate transforma- tions are sometimes used to transform
tion parameters is often an iterative process. among coordinate systems. An example of a
Control points are rarely exact, and x and y 2nd-order polynomial is:
coordinates may not be precisely digitized.
Poor eyesight, a shaky hand, fatigue, lack of
2 2
attention, mis-identification of the control E = b1+b2x+b3y+b4x +b5y +b6xy (4.9)
location, or a blunder may result in errone-
ous x and y values. There may also be errors
in the E and N coordinates. Typically, con- Note that the combined powers of the x
trol points are entered, the affine transforma- and y variables may be up to 2. This allows
tion parameters estimated, and the overall for curvature in the transformation in both
RMSE and individual point E and N errors the x and y directions. A minimum of six
evaluated (Figure 4-24, Figure 4-25). Sus- control points is required to fit this 2nd-order
pect points are fixed, and the transformation polynomial transformation, and seven are
re-estimated and errors evaluated until a required when using a statistical fit. The esti-
final transformation is estimated. The trans- mated parameters TE, TN, a1, a2, b1, and b2
formation is then applied to all features to will be different in equations 4.1 and 4.2
convert them from digitizer to map coordi- when compared to 4.9, even if the same set
nates. of control points is used for both statistical
fits. We change the form of the equations by
Other Coordinate Transforma- including the higher-order squared and xy
cross-product terms, and all estimated
tions
parameters will vary.
Other coordinate transformations are
sometimes used. The conformal coordinate
transformation is similar to the affine, and A Caution When Evaluating
has the form: Transformations
Selecting the “best” coordinate transfor-
mation to apply is a subjective process,
E = TE + cx - dy (4.7) guided by multiple goals. We hope to
develop an accurate transformation based on
a large set of well-distributed control points.
N = TN + dx + cy (4.8) Isolated control points that substantially
improve our coverage may also contribute
substantially to our transformation error.
The coefficients TE, TN, c, and d are esti- There are no clear rules on the number
mated from control point data. Like the of points versus distribution of points trade-
affine transformation, the conformal trans- off, but it is typically best to strive for the
formation is also a first-order polynomial. widest distribution of points. We want at
Unlike the affine, the conformal transforma- least two control points in each quadrant of
tion requires equal scale changes in the x the working area, with a target of 20% in
Chapter 4: Maps and Data Entry 157

Figure 4-25: Iterative fitting of an affine transformation. Control points were examined after each fit, to
discover blunders in entry or poor matching of points. Control points with large residuals were exam-
ined to determine if the cause for the error may be identified. If so, the control point coordinates may be
modified, and transformation re-fit.
158 GIS Fundamentals

each quadrant. This is often not possible. as in Figure 4-25. The RMSE is not useful
This latter reason is less common with the when comparing among different model
development of GNSS. The transformation forms, for example, when comparing an
equation should be developed with the fol- affine to a 2nd-order polynomial. The RMSE
lowing observations in mind. is typically lower for a 2nd and other higher-
First, bad control points happen, but we order polynomials than an affine transforma-
should thoroughly justify the removal of any tion, but this does not mean the higher-order
control point. Every attempt should be made polynomial provides a more accurate trans-
to identify the source of the error, either in formation. The higher-order polynomial will
the collection or in the processing of field introduce more error than an affine transfor-
coordinates, the collection of image coordi- mation on most orthographic maps, and an
nates, or in some blunder in coordinate tran- affine transformation is preferred. High-
scription. A common error is the mis- order polynomials allow more flexibility in
identification of coordinate location on the warping the surface to fit the control points.
image or map, for example, when the control Unfortunately, this warping may signifi-
location is placed on the wrong side of a cantly deform the non-control-point coordi-
road. nates, and add large errors when the
transformation is applied to all data in a
Second, a lower RMSE does not mean a layer (Figure 4-26). Thus, high order poly-
better transformation. The RMSE is a useful nomials and others should be used with cau-
tool when comparing among transformations tion.
that have the same model form, for example,
when comparing one affine to another affine Finally, independent tests of the trans-
formations make the best comparisons

First order transformation 3rd order transformation


RMSE = 6.7 m RMSE = 4.2 m

Figure 4-26: An illustration that RMSE should not be used to compare different order transfor-
mations, nor should it be used as the sole criterion for selecting the best transformation. Above
are portions of a transformed image that was registered to a road network. This area is interstitial
to 18 well distributed control points. Because the 3rd-order polynomial is quite flexible in fitting
the points and reducing RMSE, it distorts areas between the control points. This is shown by the
poor match between image and vector roads, above right. Although it has a higher RMSE, the
first order transformation on the left is better overall.
Chapter 4: Maps and Data Entry 159

among transformations. A completely inde- number of government sources. In the


pendent set of well distributed test points United States these sources include county
would appear to be ideal, but these rarely surveyors, state surveyors, departments of
exist. The extra points either haven’t been transportation, and the National Geodetic
collected, or suitable locations do not exist. Survey (NGS).
The best way to test the accuracy of the The ground survey network is often
transformation typically uses a “bootstrap” quite sparse and insufficient for registering
approach that treats each point as an inde- many large-scale maps or images. Even
pendent test point. One point is withheld, the when there is a sufficient number of ground-
transformation estimated, and the error at the surveyed points in an area, many may not be
withheld point calculated. The point is suitable for use as control points in a coordi-
replaced in the estimation set, and the next nate transformation of spatial data. The con-
point withheld, fitting the same type of trol points may not be visible on the maps or
transformation. The equations will be images to be registered. For example, a sur-
slightly different. The error at this second veyed point may fall along the edge of a
withheld point is then calculated. This pro- road. If the control point is at a mapped road
cess is repeated for each control point, and a intersection, we may use the easting and
mean error calculated. northing coordinates of the road intersection
as a control point during map registration.
Control Point Sources: Survey- However, if the surveyed point is along the
ing edge of a road that is not near any mapped
feature such as a road intersection, building,
Traditional ground surveys based on or water tower, then it may not be used as a
optical surface measurements are a common, control point. Our control points must have
although decreasingly used method for two characteristics to be useful: first, the
determining control point locations. Modern point must be visible on the map, data layer,
surveys use complex instruments such as or image that we wish to register, and sec-
transits and theodolites to precisely measure
the relative location of points. If the survey
starts from a known point, then the coordi-
nate location of any survey station may be
determined via simple trigonometric func-
tions. Federal, state, county, and local gov-
ernments all maintain a set of accurately
surveyed locations (Figure 4-27), and these
points may be used as control points or as
starting points for additional surveys. Many
of these known points have been established
using traditional surveying techniques.
Indeed, the development of this “control net-
work” infrastructure is one of the first and
most important responsibilities of govern-
ment. These survey points form the basis for
distance, location, and area measurements
used to define property, political, and munic-
ipal boundaries. As a result, this control net-
work underlies most commerce,
transportation, and land ownership and man-
agement. Coordinates, general location, and Figure 4-27: Previous surveys are a common
descriptions are documented for these con- source of control points.
trol networks, and may be obtained from a
160 GIS Fundamentals

Figure 4-28: Potential control points, indicated here by arrows, may be extracted from digital reference
images. Permanent, well-defined features are identified and coordinates determined from the digital image.
Note the white cross, circled in the lower right corner. This is a photogrammetric panel, typically a plastic
or painted wooden target placed prior to photo capture, and with precisely surveyed coordinates. These tar-
gets are used to create the corrected digital image with a known coordinate system, a process described in
Chapter 6.

ond, we must have precise ground coordi- Control Points from Existing
nates in our target map projection. The first Maps and Digital Data
requirement, visibility on the source map or
photograph, is often not met for survey- Registered digital image data are com-
defined control. Therefore, we must often mon sources of ground control points, par-
obtain additional control points. ticularly when natural resources or
municipal databases are to be developed for
One option for obtaining control points managing large areas. Digital images often
is to perform additional surveys that measure provide a richly detailed depiction of surface
the coordinates of features that are visible on features (Figure 4-28). Digital image data
the source materials. Precise surveys are may be obtained that are registered to a
used to establish the coordinate locations of known coordinate system. Typically, the
a well-distributed, sufficient set of points coordinates of a corner pixel are provided,
throughout the area covered by the source and the lines and columns for the image run
map. While sometimes expensive, new sur- parallel to the easting (E) and northing (N)
veys are the chosen method when the highest direction of the coordinate system. Because
accuracies are required. Costs were prohibi- the pixel dimensions are known, the calcula-
tive with traditional optical surveying meth- tion of a pixel coordinate involves multiply-
ods, however, GNSS positioning ing the row and column number by the pixel
technologies allow more frequent, custom size, and applying the corner offset, either by
collection of control points. addition or subtraction. In this manner, the
image row/column may be converted to an
Chapter 4: Maps and Data Entry 161

E, N coordinate pair, and control point coor- of control points at road intersections and
dinates determined. other distinct locations.
Existing maps are another common
source of control points. Point locations are GNSS Control Points
plotted and coordinates often printed on
maps, for example the corner location coor- The global positioning system (GPS),
dinates are printed on USGS quadrangle GLONASS, and Galileo are Global Naviga-
maps. Road intersections and other well- tion Satellite Systems (GNSS) that allow us
defined locations are often represented on to establish control points. GNSS, discussed
maps. If enough recognizable features can in detail in Chapter 5, can help us obtain the
be identified, then control points may be coordinates of control points that are visible
obtained from the maps. Control points on a map or image. GNSS are particularly
derived in this manner typically come only useful because we may quickly survey
from cartometric maps, those maps pro- widely-spaced points. GNSS positional
duced with the intent of giving an accurate, accuracy depends on the technology and
map-projected representation of features on methods employed; it typically ranges from
the Earth’s surface. sub-centimeter (tenths of inches) to a few
meters (tens of feet). Most points recently
Existing digital data may also provide added to the NGS and other government-
control points. A short description of these maintained networks were measured using
digital data sources are provided here, and GNSS technologies.
expanded descriptions of these and other
digital data are provided in Chapter 7. For To sum up: control points are necessary
example, the USGS has produced Digital for coordinate transformation, and typically
Raster Graphics (DRG) files that are a number of control points are identified for
scanned images of the 1:24,000-scale quad- a study area. The x and y coordinates for the
rangle maps. These DRGs come referenced control points are obtained from a digitized
to a standard coordinate system, so it is a map or image, and the map projection coor-
simple and straightforward task to extract dinates, E and N, are determined from sur-
the coordinates of road intersections or other vey, GNSS, or other sources (Figure 4-29).
well-defined features that have been plotted These coordinate pairs are then used with a
on the USGS quadrangle maps. Vector data set of transformation equations to convert
of roads are often widely available, and if of data layers into a desirable map coordinate
sufficient accuracy, may be used as a source system.

Figure 4-29: An example of control point locations from a road data layer, and corresponding digitizer
and map projection coordinates.
162 GIS Fundamentals

Raster Geometry and Resam- based averaging of the four nearest cells),
pling and cubic convolution (a weighted average
of the sixteen nearest cells, Figure 4-30).
Data often must be resampled when
converting between coordinate systems, or An example of a bilinear interpolation is
changing the cell size of a raster data set shown in Figure 4-31. This algorithm uses a
(Figure 4-30). Resampling involves reas- distance-weighted average of the four near-
signing the cell values when changing raster est cells in the input to calculate the value for
coordinates or geometry. Resampling is the output. The new output location is repre-
required when changing cell sizes because sented by the black post. Initially, the height,
the new cell centers will not align exactly or Zout value, of the output location is
with old cell centers. Changing coordinate unknown. Zout is calculated based on the dis-
systems may change the direction of the x tances between the output locations and the
and y axes, and GIS systems often require input locations. The distance in the x direc-
that the cell edges align with the coordinate tion is denoted in Figure 4-31 by d1, and the
system axes. Hence, the new cells often do distance in the y direction by d2. The values
not correspond to the same locations or in the input are shown as gray posts and are
extents as the old cells. labeled as Z1 through Z4. Intermediate
heights Zb and Zu are shown. These repre-
Common resampling approaches sent the average of the input values when
include the nearest neighbor (taking the out- taken in pairs in the x direction. These pairs
put layer value from the nearest input layer are, Z1 and Z2, to yield Zu, and Z3 and Z4, to
cell center), bilinear interpolation (distance- yield Zb. Zu and Zb are then averaged to cal-

Figure 4-30: Raster resampling. When the orientation or cell size of a raster data set is changed, out-
put cell values are calculated based on the closest (nearest neighbor), four nearest (bilinear interpola-
tion), or sixteen closest (cubic convolution) input cell values.
Chapter 4: Maps and Data Entry 163

Figure 4-31: The bilinear interpolation method uses a distance weighted average to assign the output
value, Zout, based on input values, Z1 through Z4.

culate Zout, using the distance d2 between Chapter 3, differs from a transformation in
the input and output locations to weight val- that it is an analytical, formula-based con-
ues at each input location. The cubic convo- version between coordinate systems, usu-
lution resampling calculation is similar, ally from a curved, latitude/longitude
except that more cells are used, and the coordinate system to a Cartesian coordinate
weighting is not an average based on linear system. No statistical fitting process is used
distance. with a map projection.
Map transformations should rarely be
Map Projection vs. Transforma- used in place of map projection equations
tion when converting geographic data between
map projections. Consider an example
Map transformations should not be when data are delivered to an organization
confused with map projections. A map in Universal Transverse Mercator (UTM)
transformation typically employs a statisti- coordinates and are to be converted to State
cally-fit linear equation to convert coordi- Plane coordinates prior to integration into a
nates from one Cartesian coordinate system GIS database. Two paths may be chosen.
to another. A map projection, described in The first involves projection from UTM to
164 GIS Fundamentals

geographic coordinates (latitude and longi- cific calculations have shown that the
tude), and then from these geographic spatial errors in using a transformation
coordinates to the appropriate State Plane instead of a projection are small at these
coordinates. This is the correct, most accu- map scales under typical digitizing condi-
rate approach. tions.
An alternate and often less-accurate This second approach, using a transfor-
approach involves using a transformation mation when a projection is called for,
to convert between different map projec- should not be used until it has been tested
tions. In this case a set of control points as appropriate for each new set of condi-
would be identified and the coordinates tions. Each map projection distorts the sur-
determined in both UTM and State Plane face geometry. These distortions are
coordinate systems. The transformation complex and nonlinear. Affine or polyno-
coefficients would be estimated and these mial transformations are unlikely to
equations applied to all data in the UTM remove this non-linear distortion. Excep-
data layer. This new output data layer tions to this rule occur when the area being
would be in State Plane coordinates. This transformed is small, particularly when the
transformation process should be avoided, projection distortion is small relative to the
as a transformation may introduce addi- random uncertainties, transformation
tional positional error. errors, or errors in the spatial data. How-
Transforming between projections is ever, there are no guidelines on what con-
used quite often, inadvertently, when digi- stitutes a sufficiently “small” area. In our
tizing data from paper maps. For example, example above, USGS 1:24,000 maps are
USGS 1:24,000-scale maps are cast on a often digitized directly into a UTM coordi-
polyconic projection. If these maps are dig- nate system with no obvious ill effects,
itized, it would be preferable to register because the errors in map production and
them to the appropriate polyconic projec- digitizing are often much larger than those
tion, and then re-project these data to the in the projection distortion for the map
desired end projection. This is often not area. However, you should not infer this
done, because the error in ignoring the pro- practice is appropriate under all conditions,
jection over the size of the mapped area is particularly when working with smaller-
typically less than the positional error asso- scale maps.
ciated with digitizing. Experience and spe-

Output: Hardcopy Maps, Digital Data, and Metadata


We create spatial data to use, share, and digital maps. We then provide a description
archive. Maps are often produced during of metadata, and some observations on data
data creation and distribution, as intermedi- conversion and data transfer standards.
ate documents while editing, for analysis, or
as finished products to communicate some
aspect of our data. To be widely useful, we Cartography and Map Design
must also generate information, or “meta- Cartography is the art and techniques of
data,” about the spatial data we’ve created, making maps. It encompasses both mapmak-
and we may have to convert our data to stan- ing tools and how these tools may be com-
dard forms. This section describes some bined to communicate spatial information.
characteristics of data output. We start with a Cartography is a discipline of much depth
brief treatment of cartography and map and breadth, and there are many books, jour-
design, by which we produce hardcopy and nal articles, conferences, and societies
Chapter 4: Maps and Data Entry 165

devoted to the science and art of cartogra- media sizes, such as the page or screen size
phy. Our aim in the next few pages is to pro- possible for a document. As noted earlier,
vide a brief overview of cartography with a the map scale is the ratio of lengths on a map
particular focus on map design. This is both to true lengths. If we wish to display an area
to acquaint new students with the most basic that spans 25 kilometers (25,000 meters) on
concepts of cartography, and help them a screen that spans 25 centimeters (0.25
apply these concepts in the consumption and meters), the map scale will be near 0.25 to
production of spatial information. Readers 25,000, or 1:100,000. This decision on size,
interested in a more complete treatment area, and scale then drives further map
should consult the references listed at the design. For example, scale limits the features
end of this chapter. we may display, and the size, number, and
A primary purpose of cartography is to labeling of features. At a 1:100,000 scale we
communicate spatial information. This may not be able to show all cities, burgs, and
requires identification of the towns, as there may be too many to fit at a
readable size.
-intended audience,
Maps typically have a primary theme or
-information to communicate, purpose that is determined by the intended
-area of interest, audience. Is the map for a general popula-
-physical and resource limitations, tion, or for a target audience with specific
expectations for map features and design?
in short, the whom, what, where, and how General purpose maps typically have a wide
we may present our information. range of features represented, including
These considerations drive the major transportation networks, towns, elevation or
cartographic design decisions we make each other common features (Figure 4-32a). Spe-
time we produce a map. We must consider cial purpose maps, such as road maps, focus
the: on a more limited set of features, in this
-scale, size, shape, and other general instance road locations and names, town
names, and large geographic features (Figure
map properties,
4-32b).
-data to plot,
Once the features to include on a map
-symbol shapes, sizes, or patterns, are defined, we must choose the symbols
-labeling, including type font and size, used to draw them. Symbology depends in
-legend properties, size, and borders, part on the type of feature. For example, we
have a different set of options when repre-
and
senting continuous features such as elevation
-the placement of all these elements on or pollution concentration than when repre-
a map. senting discrete features. We also must
Map scale, size, and shape depend pri- choose among symbols for each of the types
marily on the intended map use. Wall maps of discrete features, for example, the set of
for viewing at a distance of a meter or more symbols for points are generally different
may have few, large, boldly colored features. from those for line or area features.
In contrast, commonly produced street maps Symbol size is an important attribute of
for navigation in metropolitan areas are map symbology, often specified in a unit
detailed, to be viewed at short ranges, and called a point. One point is approximately
have a rich set of additional tables, lists, or equal to 0.467 mm, or about 1/72 of an inch.
other features. A specific point number is most often used
Map scale is often determined in part by to specify the size of symbols, for example,
the size of the primary objects we wish to the dimensions of small squares to represent
display, and in part by the most appropriate houses on a map, or the characteristics of a
specific pattern used to fill areas on a map. A
166 GIS Fundamentals

line width may also be specified in points. background. Although size limits depend
Setting a line width of two points means we largely on background color and contrast,
want that particular line plotted with a width point features are typically not resolvable at
of 0.93 mm. It is unfortunate that “point” is sizes smaller than about one half a point, and
both the name of the distance unit and a gen- distinguishing between shapes is difficult for
eral property of a geographic feature, as in point features smaller than approximately
“a tree is a point feature.” This forces us to two points in their largest dimension.
talk about the “point size” of symbols to rep- The pattern and color of symbols must
resent points, lines, or area fills or patterns, also be chosen, generally from a set pro-
but if we are careful, we may communicate vided by the software (Figure 4-33). Sym-
these specifications clearly. bols generally distinguish among feature
The best size, pattern, shape, and color type by characteristics, and although most
used to symbolize each feature depends on symbols are not associated with a feature
the viewing distance, the number, density, type, some are, such as, plane outlines for
and type of features, and the purpose of the airports, numbered shields for highways, or
map. Generally, we use larger, bolder, or a hatched line for a railroad.
thicker symbols for maps to be viewed from We also must often choose whether and
longer distances, while we reduce this limit how to label features. Most GIS software
when producing maps for viewing at 50 cm provides a range of tools for creating and
(18 inches). Most people with normal vision placing labels, and in all cases we must
under good lighting may resolve lines down choose the label font type and size, location
to near 0.2 points at close distances, pro- relative to the feature, and orientation. Pri-
vided the lines show good contrast with the mary considerations when labeling point

Figure 4-32: Example of a) a detailed, general-purpose map, here a portion of a US Geological Survey
map, and b) a specialized map focusing a specific set of selected features, here showing roads. The fea-
tures chosen for depiction on the map depend on the intended map use.
Chapter 4: Maps and Data Entry 167

features are label placement relative to the


point location, label size, and label orienta-
tion (Figure 4-34). We may also use gradu-
ated labels, that is, resize them according to
some variable associated with the point fea-
ture. For example, it is common to have
larger features and label fonts for larger cit-
ies (Figure 4-34). Labels may be bent,
angled, or wrapped around features to
improve clarity and more efficiently use
space in a map.
Label placement is very much an art,
and there is often much individual editing
required when placing and sizing labels for
finished maps. Most software provides for
automatic label placement, usually specified
relative to feature location. For example, one
may specify labels above and to the right of
all points, or lines labels placed over line
features, or polygon labels placed near the

Figure 4-34: Common labeling options, includ-


ing straight, angled, wrapped text, and gradu-
ated labels for points, (top two sets), and
angled, wrapped, fronting, and embedded
labels for line and polygon features (bottom
two sets).

polygon centroid. However, these automatic


placements may not be satisfactory because
labels may overlap, labels may fall in clut-
tered areas of the map, or features associated
with labels may be ambiguous. Some soft-
ware provides options for simple to elabo-
rate automatic label placement, including
automatic removal or movement of overlap-
ping labels. These often reduce manual edit-
ing, but sometimes increase it.
Figure 4-35 shows a portion of a map of
southern Finland. This region presents sev-
eral mapping problems, including the high
density of cities near the upper right, an
Figure 4-33: Examples of point (top), line (mid),
and area (bottom) symbols used to distinguish irregular coastline, and dense clustering of
among features of different types. Most GIS soft- islands along the coast. Most labels are
ware provide a set of standard symbols for point, placed above and to the right of their corre-
line, area, and continuous surface features.
sponding city, however some are moved or
168 GIS Fundamentals

angled for clarity. Cities near the coast show legend elements and symbols which may be
both, to avoid labels crossing the water/land used. Typically these tools allow a wide
boundary where practical. Semi-transparent range of symbolizations, and a compact way
background shading is added for Parainen of describing the symbolization in a legend
and Hanko, cities placed in the island matrix. (Figure 4-36).
This example demonstrates the individual The specific layout of legend features
editing often required when placing labels. must be defined, for example the point fea-
Most maps should have legends. The ture symbol size may be graduated based on
legend identifies map features succinctly and some attribute for the points. Successively
describes the symbols used to depict those larger features may be assigned for succes-
features. Legends often include or are sively larger cities. This must be noted in the
grouped with additional map information legend, and the symbols nested, shown
such as scale bars, north arrows, and descrip- sequentially, or otherwise depicted (Figure
tive text. The cartographer must choose the 4-36, top left).
size and shape of the descriptive symbol, The legend should be exhaustive. Exam-
and the font type, size, and orientation for ples of each different symbol type that
each symbol in the legend. The primary goal appears on the map should appear in the leg-
is to have a clear, concise, and complete leg- end. This means each point, line, or area
end. symbol is drawn in the legend with some
The kind of symbols appropriate for descriptive label. Labels may be next to,
map legends depends on the types of fea- wrapped around, or embedded within the
tures depicted. Different choices are avail- features, and sometimes descriptive numbers
able for point, line, and polygon features, or are added, for example, a range of continu-
for continuously variable features stored as ous variables (Figure 4-36, upper left). Scale
rasters. Most software provides a range of bars, north arrows, and descriptive text
boxes are typically included in the legend.
Map composition or layout is another
primary task. Composition consists of deter-
mining the map elements, their size, and
their placement. Typical map elements
shown in Figure 4-3 and Figure 4-4, include
one or more main data panes or areas, a leg-
end, a title, a scale bar and north arrow, a
grid or graticule, and perhaps descriptive
text. These each must be sized and placed on
the map.
These map elements should be posi-
tioned and sized in accordance with their
importance. The map’s most important data
pane should be largest, and it is often cen-
tered or otherwise given visual dominance.
Other elements are typically smaller and
located around the periphery or embedded
within the main data pane. These other ele-
ments include map insets, which are smaller
data panes that show larger or smaller scale
Figure 4-35: Example label placement for cities views of a region in the primary data pane.
in southern Finland.
Good map compositions usually group
related elements and uses empty space effec-
Chapter 4: Maps and Data Entry 169

tively. Data panes are often grouped and leg- the map shown at the top of Figure 4-37
end elements placed near each other, and leaves large empty spaces on the left (west-
grouping is often indicated with enclosing ern) edge, with the Atlantic Ocean devoid of
boxes. features. The cartographer may address this
Neophyte cartographers should avoid in several ways, either by changing the size,
two tendencies in map composition, both shape, or extent of the area mapped, adding
depicted in Figure 4-37. First, it is generally new features, such as data panes as insets,
easy to create a map with automatic label additional text boxes, or other elements, or
and legend generation and placement. The moving the legend or other map elements to
map shown at the top of Figure 4-37 is typi- that space. The map shown at the bottom of
cal of this automatic composition, and Figure 4-37, while not perfect, fixes these
includes poorly placed legend elements and design flaws, in part by moving the legend
too small, poorly placed labels. Labels and scale bar, and in part by adding labels
crowd each other, are ambiguous, cross for the Atlantic Ocean and Mediterranean
water/land or other feature boundaries, and Sea. The empty space is more balanced in
fonts are poorly chosen. You should note that it appears around the major map ele-
that automatic map symbol selection and ments in approximately equal proportions.
placement is nearly always sub-optimal, and As noted earlier, this is only a brief
the novice cartographer should scrutinize introduction to cartography, a subject cov-
these choices and manually improve them. ered by many good books, some listed at the
The second common error is poor use of end of this chapter. Perhaps the best com-
empty space, those parts of the map without pendium of examples is the Map Book
map elements. There are two opposite ten- Series, by ESRI, published annually since
dencies: either to leave too much or unbal- 1984. Examples are available at the time of
anced empty space, or to clutter the map in this writing at www.esri.com/mapmuseum.
an attempt to fill all empty space. Note that You should leaf through several volumes in

Figure 4-36: Examples of legend elements and representation of symbols. Some symbols may be
grouped in a compact way to communicate the values associated with each symbol, e.g., sequential or
nested graduated circles to represent city population size, area pattern or color fills to distinguish
among different polygon features, line and point symbols, and informative elements such as scale bars
and north arrows.
170 GIS Fundamentals

Figure 4-37: An example of poor map design (top). This top panel shows a number of mistakes common
for the neophyte cartographer, including small labels (cities) and mismatched fonts (graticule labels,
title), poor labeling (city labels overlapping, ambiguously placed, and crossing distinctly shaded areas),
unlabeled features (oceans and seas), poorly placed scale bar and legend, and unbalanced open space on
the left side of the map. These problems are not present in the improved map design, shown in the lower
panel.
Chapter 4: Maps and Data Entry 171

this series, with an eye towards critical map tor, raster, point, and computer aided design
design. Each volume contains many beauti- data are stored for transfer. Digital data in
ful and informative maps, and provides tech- SDTS formats typically span multiple data
niques worth emulating. files, each holding various data components.
At the time of this writing the U.S. Geologi-
cal Survey was in charge maintaining the
Digital Data Output SDTS, with the full specification found at
We often must transfer the digital data mcmcweb.er.usgs.gov/sdts/standard.html.
we create to another organization or user. There are many legacy digital data
Given the number of different GIS software, transfer formats that were widely used
operating systems, and computer types, before the publication of the SDTS. Among
transferring data is not always a straightfor- these are several US Geological Survey for-
ward process. Digital data output typically mats for the transfer of digital elevation
includes two components, the data them- models or digital vector data, or software
selves in some standard, defined format, and specific formats, such as an ASCII format
metadata, or data about the digital data. We known as the GEN/UNGEN format that was
will describe data formats and metadata in developed by ESRI. These were useful for a
turn. limited set of transfers, but shortcomings in
Digital data are the data in some elec- each of these transfer formats led to the
tronic form. As described at the end of the development of the SDTS. They will not be
first chapter, there are many file formats, or discussed further here.
ways of encoding the spatial and attribute
data in digital files. Digital data output often
Metadata: Data Documentation
consist of recording or converting data into
one of these file formats. These data are typ- Metadata are information about spatial
ically converted with a utility, tool, or option data. Metadata describe the content, source,
available in the data development software lineage, methods, developer, coordinate sys-
(Figure 4-38). The most useful of these utili-
ties support a broad range of input and out-
put options, each fully described in the
program documentation.
All formats strive for complete data
transfer without loss. They must transmit the
spatial and attribute data, the metadata, and
all other information necessary to effectively
use the spatial data. There are many digital
data output formats, although many are leg-
acy formats that are used with decreasing
frequency.
A common contemporary format is the
Spatial Data Transfer Standard. This trans-
fer format, also known by the abbreviation
SDTS, is a specification first defined by the
U.S. Government in 1992. This standard has
three basic parts: 1) a logical specification,
2) a description of the types of spatial fea-
tures supported, and 3) the International Figure 4-38: An example of a conversion util-
Standards Organization (ISO) encoding ity, here from the ESRI ArcGIS software. Date
used. There are four additional parts which may be converted from one of several formats
define profiles, or descriptions of how vec- to an ESRI-specific digital data.
172 GIS Fundamentals

tem, extent, structure, spatial accuracy, attri- 1.5.1.1West Bounding Coordinate – west-
butes, and responsible organization for ern-most coordinate of the limit of cover-
spatial data. age expressed in longitude.
Metadata are required for the effective Type: real
use of spatial data. Metadata allow the effi- Domain: -180.0 < = West Bounding Coor-
cient transfer of information about data, and dinate < 180.0
inform new users about the geographic
extent, coordinate system, quality, and other Short Name: westbc
data characteristics. Metadata aid organiza- The numbering system is hierarchical.
tions in evaluating data to determine if they Here, 1 indicates it is basic identification
are suitable for an intended use -- are they information, 1.5 indicates identification
accurate enough, do they cover the area of information about the spatial domain, 1.5.1
interest, do they provide the necessary infor- is for bounding coordinates, and 1.5.1.1 is
mation? Metadata may also aid in data the western-most bounding coordinate.
updates by guiding the choice of appropriate
collection methods and formats for new There are 10 basic types of information
data. in the CSDGM:
1) identification, describing the data set,
Most governments have or are in the
process of establishing standard methods for 2) data quality,
reporting metadata. In the United States, the 3) spatial data organization,
Federal Geographic Data Committee 4) spatial reference coordinate system,
(FGDC) has defined a Content Standard for
5) entity and attribute,
Digital Geospatial Metadata (CSDGM) to
specify the content and format for metadata. 6) distribution and options for obtaining the
The CSDGM ensures that spatial data are data set,
clearly described so that they may be used 7) currency of metadata and responsible party,
effectively within an organization. The use 8) citation,
of the CSDGM also ensures that data may be 9) time period information, used with other sec-
described to other organizations in a stan- tions to provide temporal information, and
dard manner, and that spatial data may be
more easily evaluated by and transferred to 10) contact organization or person.
other organizations. The CSDGM is a content standard and
The CSDGM consists of a standard set does not specify the format of the metadata.
of elements that are presented in a specified As long as the elements are included, prop-
order. The standard is exhaustive in the erly numbered, and identified with correct
information it provides, and is flexible in values describing the data set, the metadata
that it may be extended to include new ele- are considered to conform with the CSDGM.
ments for new categories of information in Indentation and spacing are not specified.
the future. There are over 330 different ele- However, because metadata may be quite
ments in the CSDGM. Some of these ele- complex, there are a number of conventions
ments contain information about the spatial that are emerging in the presentation of
data, and some elements describe or provide metadata. These conventions seek to ensure
linkages to other elements. Elements have that metadata are presented in a clear, logical
standardized long and short names and are way to humans, and are also easily ingested
provided in a standard order with a hierar- by computer software. There is a Standard
chical numbering system. For example, the Generalized Markup Language (SGML) for
western-most bounding coordinate of a data the exchange of metadata. An example of a
set is element 1.5.1.1, defined as follows: portion of the metadata for a 1:100,000 scale
digital line graph data set is shown in Figure
4-39.
Chapter 4: Maps and Data Entry 173

Figure 4-39: Example of a small portion of the FGDC recommended metadata for a 1:100,000 scale
derived digital data set.
174 GIS Fundamentals

Metadata are most often created using information and services. It provides infor-
specialized software tools. Although meta- mation about the identification, the extent,
data may be produced using a text editor, the the quality, the spatial and temporal schema,
numbering system, names, and other con- spatial reference, and distribution of digital
ventions are laborious to type. There are geographic data”.
often complex linkages between metadata There is a need to reconcile international
elements, and some elements are repeated or and national metadata standards, because
redundant. Software tools may ease the task they may differ. National standards may
of metadata entry by reducing redundant require information not contained in interna-
entries, ensuring correct linkages, and tional standards, or vice versa. Governments
checking elements for contradictory infor- typically create metadata profiles that are
mation or errors. For example the metadata consistent with the international standard.
entry tool may check to make sure the west- These profiles establish the correspondence
ern-most boundary is west of the eastern- between elements in the different standards,
most boundary. Metadata are most easily and identify elements of the international
and effectively produced when their devel- profile that are not in the national profile.
opment is integrated into the workflow of
data production.
Although not all organizations in the Summary
United States adhere to the CSDGM meta- Spatial data entry is a common activity
data standard, most organizations record and for many GIS users. Although data may be
organize a description and other important derived from several sources, maps are a
information about their data, and many orga- common source, and care must be taken to
nizations consider a data set incomplete if it choose appropriate map types and to inter-
lacks metadata. All U.S. government units pret the maps correctly when converting
are required to adhere to the CSDGM when them to spatial data in a GIS.
documenting and distributing spatial data.
Maps are used for spatial data entry
Many other national governments are due to several unique characteristics. These
developing metadata standards. One exam- include our long history of hardcopy map
ple is the spatial metadata standard devel- production, so centuries of spatial informa-
oped by the Australia and New Zealand tion are stored there. In addition, maps are
Land Information Council (ANZLIC), inexpensive, widely available, and easy to
known as the ANZLIC Metadata Guidelines. convert to digital forms, although the pro-
ANZLIC is a group of government, busi- cess is often time consuming, and may be
ness, and academic representatives working costly. Maps are usually converted to digi-
to develop spatial data standards. The tal data through a manual digitization pro-
ANZLIC metadata guidelines define the cess, whereby a human analyst traces and
core elements of metadata, and describe how records the location of important features.
to write, store, and disseminate these core Maps may also be digitized via a scanning
elements. Data entry tools, examples, and device.
spatial data directory have been developed to
The quality of data derived from a map
assist in the use of ANZLIC spatial metadata
depends on the type and size of the map,
guidelines.
how the map was produced, the map scale,
There is a parallel effort to develop and and the methods used for digitizing. Large-
maintain international standards for meta- scale maps generally provide more accu-
data. The standards are known as the ISO rate positional data than comparable small-
19115 International Standards for Metadata. scale maps. Large-scale maps often have
According to the International Standards less map generalization, and small horizon-
Organization, the ISO 19115 “defines the tal errors in plotting, printing, and digitiz-
schema required for describing geographic
Chapter 4: Maps and Data Entry 175

ing are magnified less during conversion of consider data complete until metadata have
large-scale maps. been created.
Snapping, smoothing, vertex thinning,
and other tools may be used to improve the
quality and utility of digitized data. These
methods are used to ensure positional data
are captured efficiently and at the proper
level of detail.
Map and other data often need to be
converted to a target coordinate system via
a map transformation. Transformations are
different from map projections, which were
discussed in Chapter 3, in that a transfor-
mation uses an empirical, least-squares
process to convert coordinates from one
Cartesian systems to another. Transforma-
tions are often used when registering digi-
tized data to a known coordinate system.
Map transformations should not be used
when a map projection is called for.
Cartography is an important aspect of
GIS, because we often communicate spa-
tial information through maps. Map design
depends on both the target audience and
purpose, setting and modes of map view-
ing, and available resources. Proper map
design considers the scale, symbols, labels,
legend, and placement to effectively com-
municate the desired information.
Metadata are the “data about data.”
They describe the content, origin, form,
coordinate system, spatial and attribute
data characteristics, and other relevant
information about spatial data. Metadata
facilitate the proper use, maintenance, and
transfer of spatial data. Metadata standards
have been developed, both nationally and
internationally, with profiles used to cross-
reference elements between metadata stan-
dards. Metadata are a key component of
spatial data, and many organizations do not
176 GIS Fundamentals

Suggested Reading

Aronoff, S. (1989). Geographic Information Systems, A Management Perspective.


WDL Publications: Ottawa.

Bolstad, P., Gessler, P., & Lillesand, T.M. (1990). Positional uncertainty in manually
digitized map data. International Journal of Geographical Information Systems,
4:399-412.

Burrough, P.A., & Frank, A.U. (1996). Geographical Objects with Indeterminate
Boundaries.Taylor & Francis: London.

Chrisman, N.R. (1984). The role of quality information in the long-term functioning
of a geographic information system. Cartographica, 21:79-87.

Chrisman, N.R. (1987). Efficient digitizing through the combination of appropriate


hardware and software for error detection and editing. International Journal of
Geographical Information Systems, 1:265-277.

DeMers, M. (2000). Fundamentals of Geographic Information Systems (2nd ed.).


Wiley: New York.

Douglas, D.H. & Peuker, T.K. (1973). Algorithms for the reduction of the number of
points required to represent a digitized line or its caricature. Canadian Cartogra-
pher, 10:112-122.

Gesch, D., Oimoen, M., Greenlee, S., Nelson, C,. Steuck, M., & Tyler C., (2002). The
National Elevation Dataset. Photogrammetric Engineering and Remote Sensing,
68:5-32.

Holroyd, F. & Bell, S.B.M. (1992). Raster GIS: Models of raster encoding. Computers
and Geosciences, 18:419-426

Joao, E. M. (1998). Causes and Consequences of Map Generalization. Taylor & Fran-
cis: London.

Laurini, R. & Thompson, D. (1992). Fundamentals of Spatial Information Systems.


Academic Press: London.

Maquire, D. J., Goodchild, M. F., & Rhind, D. (Eds.). (1991). Geographical Informa-
tion Systems: Principles and Applications.Longman Scientific: Harlow.

McBratney, A.B., Santos, M.L.M., & Minasny, B. (2003). On digital soil mapping.
Geoderma, 117:3-52.

Muehrcke, P.C. & Muehrcke, J.P. (1992). Map Use: Reading, Analysis, and Interpre-
tation (3rd ed.). J.P. Publications: Madison.

Nagy, G. & Wagle, S.G. (1979). Approximation of polygonal maps by cellular maps.
Communications of the Association of Computational Machinery, 22:518-525.
Chapter 4: Maps and Data Entry 177

Peuquet, D.J. (1984). A conceptual framework and comparison of spatial data models,
Cartographica, 21:66-113.

Peuquet, D.J. (1981). An examination of techniques for reformatting digital carto-


graphic data. Part II: the raster to vector process. Cartographica, 18:21-33.

Peuker, T. K. & Chrisman, N. (1975). Cartographic data structures. The American


Cartographer, 2:55-69.

Shaeffer, C.A., Samet, H., & Nelson R.C. (1990). QUILT: a geographic information
system based on quadtrees, International Journal of Geographical Information
Systems, 4:103-132.

Shea, K.S., & McMaster, R.B. (1989). Cartographic generalization in a digital envi-
ronment: when and how to generalize. Proceedings AutoCarto 9, pp.56-67.

Warner, W. & Carson, W. (1991). Errors associated with a standard digitizing tablet.
ITC Journal, 2:82-85.

Weibel, R. (1997). Generalization of spatial data: principles and selected algorithms.


In van Kreveld, M., Nievergelt, J., Roos, T., & Widmayer, P. (Eds.), Algorithmic
Foundations of Geographic Information Systems, Springer-Verlag: Berlin.

Wolf, P.R., & C. Ghilani (2002). Elementary Surveying, an Introduction to Geomatics


(10th ed.). Prentice-Hall: New Jersey.

Zeiler, M. (1999). Modeling Our World: The ESRI Guide to Geodatabase Design.
ESRI Press: Redlands.
178 GIS Fundamentals

Study Questions

4.1 - Why have so many digital spatial data been derived from hardcopy maps?

4.2 - Which is a larger scale map,


a)1:20,000 or b)1:1,000,000?

c) 1 inch equals 1 mile, or d) 1:100,000

e) 1 mm to 1 kilometer, or f) 1:1,500,000

4.3 - Can you describe three different types of generalization?

4.4 - Identify the kind of generalization at the labeled locations a through d in the map
below, left, compared to the “truth” in the image, below right. Categorize the general-
izations as fused, simplified, displaced, omitted, or exaggerated.
Chapter 4: Maps and Data Entry 179

4.5 - Identify the kind of generalization at the labeled locations a through d in the map
below, left, compared to the “truth” in the image, below right. Categorize the general-
izations as fused, simplified, displaced, omitted, or exaggerated, or if it doesn’t fit in
one of these categories, then categorize it as “other,” and describe the generalization.

4.6 - What are the most common map media? Why?

4.7 - Is
media deformation more problematic with large scale maps or small scale
maps? Why?

4.8 - Which
map typically shows more detail -- a large-scale map or a small-scale
map? Can you give three reasons why?

4.9 - Complete the following table that shows scale measurements and calculations:
180 GIS Fundamentals

4.10 - What is snapping in the context of digitizing? What are undershoots and over-
shoots, and why are they undesirable?

4.11 - Identify a characteristic feature or error in digitizing at each of the labeled letter
locations in the drawing below, e.g., node, overshoot, missing label, etc.:

4.12 - Identify a characteristic feature or error in digitizing at each of the labeled letter
locations in the drawing below, e.g., node, overshoot, missing label, etc.:
Chapter 4: Maps and Data Entry 181

4.13 - Sketch the results of combined node (open circle), vertex (closed circle) and
edge (lines) snapping with a snap tolerance of a) a distance of 5 units, and b) a dis-
tance of 10 units, as shown snap circles. Note the radius and not the diameter of these
circles defines the snapping distance.

4.14 - What is a spline, and how are they used during digitizing?

4.15 - a) Why is line thinning sometimes necessary?


b) Does increasing the width of the line thinning band tend to increase, decrease,
or not affect the number of points removed?
c) Does increasing the number of points initially spanned tend to increase,
decrease, or not affect the number of points removed?

4.16 - Can you contrast manual digitizing to the various forms of scan digitizing?
What are the advantages and disadvantages of each?

4.17 - What is the “common feature problem” when digitizing, and how might it be
overcome?

4.18 - Can you describe the general goal and process of map registration?

4.19 - What are control points, and where do they come from?

4.20 - Can you define an affine transformation, including the form of the equation?
Why is it called a linear transformation?
182 GIS Fundamentals

4.21 - What is the root mean square error (RMSE), and how does it relate to a coordi-
nate transformation?

4.22 - Is
the average positional error likely be larger, smaller, or about equal to the
RMSE? Why?

4.23 - Why are higher order (polynomial) projections to be avoided under most cir-
cumstances?

4.24 - Which of the following transformations will likely have the smallest average
error at a set of independent test points?
a) affine, RMSE = 10.23 b) affine, RMSE = 9.8
c) 2nd order polynomial, RMSE = 4.7 d) 3rd order polynomial, RMSE = 0.45

4.25 - Which of the following transformations will likely have the smallest average
error at a set of independent test points?
a) 1st order polynomial, RMSE = 5.3 b) affine, RMSE = 9.8
c) 2nd order polynomial, RMSE = 2.9 d) 1st order polynomial, RMSE = 9.9

4.26 - Define and describe metadata. Why are metadata important?

You might also like