ELements of Geoinformatics

What is GIS?
A system for capturing, storing, checking, integrating, manipulating, analysing and displaying
data which are spatially referenced to the Earth. This is normally considered to involve a
spatially referenced computer database and appropriate applications software”

A GIS is a system of hardware, software and procedures to facilitate the management,

manipulation, analysis, modelling, representation and display of geo referenced data to solve
complex problems regarding planning and management of resources (NCGIA, 1990)


Why is GIS Unique?
1. GIS handles SPATIAL information- Information referenced by its location in space
2. GIS makes connections between activities based on spatial proximity

Historical Background
This technology has developed from:
a. Digital cartography and CAD
b. Data Base Management Systems

CAD System Database Management system

The Need for GIS

1. The real world has a lot of spatial data
a. Manipulation, analysis and modeling can be effective and efficiently carried out
with a GIS
b. The neighborhood of the intended purchase of house
c. The route for fire-fighting vehicles to the fire area
d. Location of historical sites to visit
e. The earth surface for purposes of army

2. The earth surface is a limited resource


a. Rational decisions on space utilization
b. Fast and quality information in decision making

3. Complexity of management
Due to the need to combine and process many sets of data, in addition to judge as
many as possible, situation that might happen.
4. Intense competition
The need to use technology in making decisions and strategy in the world of
intense competition.

1. Data visualisation
A. Tabular

S/N State Population

1 Kano 11,290,000
2 Lagos 14,380,000
3 Bauchi 6,000,000

B. Map

2. Location
a) Where is object X?
1. Answer: X = 1, A South
b) What can be found at a certain location?
1. EXAMPLE: What can be found at 1, C East?
2. Answer: Y


3. Question: Pattern And Relationship

a) Is object x in the pattern?

Yes, in the form of line, from northwest to southeast in the form of a line

b) Is there a relationship between x and y?

Yes, y is always near with x

c) What other spatial pattern exist?

Object z is always near the borders and its size increases from left to right

4. Attribute Question


a) Attribute explanation
Example: what is the attribute for item 2?

b) Where a certain scenario might happen?

Example: who has the highest quality of minerals?

5. Question: Relational Database

Selection of an area (according to rules)

Example: which item has?
a) Area >40,000 hectare b); owner: not Silima c); tax code: B; d) mineral quality: High

6. Trend Question

1. What are the changes for a,b and c from 1980 to 1990?
a) A: increase in size
b) B: decrease size and change in location
c) C: changes in shape

2. What has changed since 1980?

a) A and B have changes in size
b) Location B changes
c) C changes shape
d) Addition of item D

Benefits of GIS
1. Geospatial data are better maintained in a standard format.
2. Revision and updating are easier.
3. Geospatial data and information are easier to search, analyze and represent.
4. More value added product.
5. Geospatial data can be shared and exchanged freely.
6. Productivity of the staff can be improved and more efficient.
7. Time and money are saved.
8. Better decision can be made.

GIS Application
Because GIS systems are designed as a generic system for handling any kind of spatial data, they
are applied in a wide range of applications in urban and rural environments.


Facilities Management
Locating underground pipes & cables, planning facility maintenance, telecommunication
network services
Environmental and Natural Resources Management
 Agricultural Planning/Conservation - Agricultural capability analysis, market analysis,
whole farm planning
 Forestry - Timber assessment and management, Harvest scheduling and planning,
environmental impact assessment, pest management
 Wildlife - Habitat assessment and management, identification of rare/endangered species
and habitats, impact assessment
Street Network
Locating houses and streets, car navigation, transportation planning
Planning and Engineering
 Utilities management -Water, gas, electricity, sewer
 Flow demand/analysis - Hydrologic performance of utility networks
 Storm water management - Modelling surface/subsurface flows, effects of
retention/detention basins on peak flows
Land Information
Taxation, zoning of land use, land acquisition etc.
Hazard modeling
Landslides, flood, Pollution, volcano, earthquake, land subsidence, and forest fire….etc.
Some Ways GIS is Employed
 Emergency Service – Fire & Police
 Environmental – Monitoring & Modelling
 Business – Site Location, Delivery Systems
 Industry – Transportation, Communication, Mining, Pipelines, Healthcare
 Government – Local, State, Federal, Military
 Education – Research, Teaching Tool, Administration
 Wherever Spatial Data analysis is needed

GIS subsystems
1. Data input subsystem;
2. Data storage and retrieval subsystem;
3. Data manipulation and analysis subsystem; and
4. Data output and display subsystem.

1. Data acquisition
 Define, entry and preliminary processing data


 Digitizer, scanner, survey equipment, etc.
2. Data management
 Control existence, update, recall and deletion of data
 Database management system
3. Analysis of data
 Generation of information from data achieves
 Spatial analysis software
 Value of GIS depends on the analytic functions it provides
4. Information output
 Generated information conveyed to the users
 Information managed by computers can be plotted, printed or exported into other

The basic elements of a GIS

1. A GIS is a 5-part system: 2. Six Functions of a GIS
a) People a) Capture data
b) Data b) Store data
c) Hardware c) Query data
d) Software d) Analyse data
e) Methods e) Display data
f) Produce output
a) People
 GIS technology is of limited value without the people who manage the system and
develop plans for applying it to real world problems.

 GIS users range from technical specialists who design and maintain the system to those
who use it to help them perform their everyday work.

 The identification of GIS specialists versus end users is often critical to the proper
implementation of GIS technology.

b) Data
 Perhaps the most important component of a GIS is the data.


 Geographic data and related tabular data can be collected in-house, compiled to
custom specifications and requirements, or occasionally purchased from a
commercial data provider.

 A GIS can integrate spatial data with other existing data resources, often stored in a
corporate DBMS. The integration of spatial data (often proprietary to the GIS
software), and tabular data stored in a DBMS is a key functionality afforded by GIS.

c) Hardware
Hardware is the computer system on which a GIS operates. Today, GIS software runs on
a wide range of hardware types, from centralized computer servers to desktop computers
used in stand-alone or networked configurations.

d) Software
GIS software provides the functions and tools needed to store, analyse, and display
geographic information.

Desktop GIS:
 *Free ( GRASS GIS, Quantum GIS, ILWIS, JUMP GIS, and MapWindow GIS)
 *Commercial ( ERDAS IMAGINE, ESRI( Products include ArcGIS, ArcSDE
and ArcIMS), Intergraph and SuperMap Inc)…. Etc

Web map servers:

 *Free (GeoServer, MapGuide Open Source and OpenMap)…. etc
 *Commercial (ESRI (ArcWeb servics and ArcGIS Server))…. etc

Spatial database management systems:

 *Free( PostGIS, SpatiaLite and TerraLib ) …. etc


 *Commercial( Microsoft SQL Server 2008 and Oracle Spatial)…. Etc

e) Methods
 A successful GIS operates according to a well-designed implementation plan and
business rules, which are the models and operating practices unique to each

 As in all organizations dealing with sophisticated technology, new tools can only be
used effectively if they are properly integrated into the entire business strategy and
operation. To do this properly requires not only the necessary investments in
hardware and software, but also in the retraining and/or hiring of personnel to utilize
the new technology in the proper organizational context.

 Failure to implement your GIS without regard for a proper organizational

commitment will result in an unsuccessful system!

Data base management

A. Traditional (non-spatial) database management systems provide:
 Persistence across failures
 Allows concurrent access to data
 Scalability to search queries on very large datasets which do not fit inside main
memories of computers
 Efficient for non-spatial queries, but not for spatial queries

B. Non-spatial queries:
 List the names of all bookstore with more than ten thousand titles.
 List the names of ten customers, in terms of sales, in the year 2001

C. Spatial queries:
 List the names of all bookstores within ten miles of a city
 List all customers who live in the city and its adjoining states

Spatial Data Examples

A. Examples of non-spatial data
Names, phone numbers, email addresses of people

B. Examples of Spatial data

 Census Data
 NASA satellites imagery - terabytes of data per day
 Weather and Climate Data
 Rivers, Farms, ecological impact
 Medical Imaging

Spatial database management system (SDBMS)

SDBMS is a software module that can work with an underlying DBMS supports spatial data
models, spatial abstract data types (ADTs) and a query language from which these ADTs are
callable supports spatial indexing, efficient algorithms for processing spatial operations, and
domain specific rules for query optimization

A. Consider a spatial dataset with:

 County boundary (dashed white line)
 Census block - name, area, population, boundary (dark line)


 Water bodies (dark polygons)
 Satellite Imagery (gray scale pixels)

B. Storage in a SDBMS table:

 Name: string
 Area: float
 Population: number
 Boundary: polygon


GIS is a software to visualize and analyse spatial data using spatial analysis functions such as
 Search Thematic search, search by region, (re-)classification
 Location analysis Buffer, corridor, overlay
 Terrain analysis Slope/aspect, catchment, drainage network
 Flow analysis Connectivity, shortest path
 Distribution Change detection, proximity, nearest neighbor
 Spatial analysis/Statistics Pattern, centrality, autocorrelation, indices of similarity,
topology: hole description
 Measurements Distance, perimeter, shape, adjacency, direction

GIS uses SDBMS to store, search, query, share large spatial data sets

Spatial Data Structure

Spatial Data Types

1. Continuous:
Elevation, rainfall, ocean salinity

2. areas:
 unbounded: land use, market areas, soils, rock type
 bounded: city/county/state boundaries, ownership parcels, zoning
 moving: air masses, animal herds, schools of fish


3. networks:
Roads, transmission lines, streams

4. Points:
 fixed: wells, street lamps, addresses
 moving: cars, fish, deer

GIS Data Types

The basic data type in a GIS reflects traditional data found on a map. Accordingly, GIS
technology utilizes two basic types of data. These are:

1. Spatial data: describes the absolute and relative location of geographic features.

2. Attribute data: describes characteristics of the spatial features. These characteristics can
be quantitative and/or qualitative in nature. Attribute data is often referred to as tabular

The coordinate location of a forestry stand would be spatial data, while the characteristics of that
forestry stand, e.g. cover group, dominant species, crown closure, height, etc., would be attribute

Other data types, in particular image and multimedia data, are becoming more prevalent with
changing technology
Depending on the specific content of the data, image data may be considered either spatial, e.g.
photographs, animation, movies, etc., or attribute, e.g. sound, descriptions, narration's, etc

Attribute data types

Categorical (name): Numerical
nominal: Known difference between values
 no inherent ordering interval
 land use types, county names  No natural zero
ordinal:  can’t say ‘twice as much’
 inherent order  temperature (Celsius or Fahrenheit)
 road class; stream class ratio:
 natural zero
 ratios make sense (e.g. twice as much)
 income, age, rainfall

often coded to numbers eg SSN but can’t do may be expressed as integer [whole number] or
arithmetic floating point [decimal fraction]

Attribute data tables can contain locational information, such as addresses or a list of X, Y
coordinates. ArcView refers to these as event tables. However, these must be converted to true
spatial data (shape file), for example by geocoding, before they can be displayed as a map.

Spatial Data Models

Traditionally spatial data has been stored and presented in the form of a map.
Two basic types of spatial data models have evolved for storing geographic data digitally. These
are referred to as:
1. Vector:
Point, line and polygon (area)


2. Raster:
The selection of a particular data model, vector or raster, is dependent on the source and type of
data, as well as the intended use of the data. Certain analytical procedures require raster data
while others are better suited to vector data.

Concept of Vector and Raster


Spatial Data Models

Vector Data Formats

 All spatial data models are approaches for storing the spatial location of geographic
features in a database. Vector storage implies the use of vectors (directional lines) to
represent a geographic feature.

 Vector data is characterized by the use of sequential points or vertices to define a linear

 Each vertex consists of an X coordinate and a Y coordinate.

 Vector lines are often referred to as arcs and consist of a string of vertices terminated by a

 A node is defined as a vertex that starts or ends an arc segment. Point features are defined
by one coordinate pair, a vertex.

 Polygonal features are defined by a set of closed coordinate pairs. In vector

representation, the storage of the vertices for each feature is important, as well as the
connectivity between features, e.g. the sharing of common vertices where features

 The most popular method of retaining spatial relationships among features is to explicitly
record adjacency information in what is known as the topologic data model

 Topology is a mathematical concept that has its basis in the principles of feature
adjacency and connectivity.

Raster Data Formats

 Raster data models incorporate the use of a grid-cell data structure where the geographic
area is divided into cells identified by row and column. This data structure is commonly
called raster.

 While the term raster implies a regularly spaced grid other tessellated data structures do
exist in grid based GIS systems. In particular, the quad-tree data structure has found some
acceptance as an alternative raster data model.


 The size of cells in a tessellated data structure is selected on the basis of the data accuracy
and the resolution needed by the user.

 There is no explicit coding of geographic coordinates required since that is implicit in the
layout of the cells.

 A raster data structure is in fact a matrix where any coordinate can be quickly calculated
if the origin point is known, and the size of the grid cells is known.

 Since grid-cells can be handled as two-dimensional arrays in computer encoding many

analytical operations are easy to program. This makes tessellated data structures a
popular choice for many GIS software.

 Topology is not a relevant concept with tessellated structures since adjacency and
connectivity are implicit in the location of a particular cell in the data matrix.

 Several tessellated data structures exist, however only two are commonly used in GIS's.
The most popular cell structure is the regularly spaced matrix or raster structure.

 This data structure involves a division of spatial data into regularly spaced cells. Each
cell is of the same shape and size. Squares are most commonly utilized.

 Since geographic data is rarely distinguished by regularly spaced shapes, cells must be
classified as to the most common attribute for the cell.

 The problem of determining the proper resolution for a particular data layer can be a

 If one selects too coarse a cell size then data may be overly generalized.

 If one selects too fine a cell size then too many cells may be created resulting in a large
data volume, slower processing times, and a more cumbersome data set. As well, one can
imply accuracy greater than that of the original data capture process and this may result
in some erroneous results during analysis.

 As well, since most data is captured in a vector format, e.g. digitizing, data must be
converted to the raster data structure. This is called vector-raster conversion.

 Most GIS software allows the user to define the raster grid (cell) size for vector-raster
conversion. It is imperative that the original scale, e.g. accuracy, of the data be known
prior to conversion.

 The accuracy of the data, often referred to as the resolution, should determine the cell
size of the output raster map during conversion.

 Most raster based GIS software requires that the raster cell contain only a single discrete

 Accordingly, a data layer, e.g. forest inventory stands, may be broken down into a series
of raster maps, each representing an attribute type, e.g. a species map, a height map, a
density map, etc. These are often referred to as one attribute maps.

 This is in contrast to most conventional vector data models that maintain data as multiple
attribute maps, e.g. forest inventory polygons linked to a database table containing all


attributes as columns. This basic distinction of raster data storage provides the foundation
for quantitative analysis techniques. This is often referred to as raster or map algebra.

 The use of raster data structures allow for sophisticated mathematical modelling
processes while vector based systems are often constrained by the capabilities and
language of a relational DBMS.

 These differences are the major distinguishing factor between vector and raster based
GIS software. It is also important to understand that the selection of a particular data
structure can provide advantages and disadvantages during the analysis stage.

 For example, the vector data model does not handle continuous data, e.g. elevation, very
well while the raster data model is more ideally suited for this type of analysis.

 Accordingly, the raster structure does not handle linear data analysis, e.g. shortest path,
very well while vector systems do.

 It is important for the user to understand that there are certain advantages and
disadvantages to each data model.

 A matrix consists of regular grid cells

 Positions are defined by column and row numbers

 Each cell has a single value

Example Raster Formats

a) Digital Elevation Data (DEM)
b) Satellite Images
c) Digital Orthophotos
d) Scanned Maps
e) Graphics Files (scanned graphics)

Common Raster data file formats

a) BMP (bitmaps) - no compression
b) DIB (Device Independent Bitmaps)
c) GIF (Compuserve’s Graphical Interchange Format)
d) RLE - Run Length Encoding
e) CCITT Group 3, 4, (International Consultative Committee for Telephone and Telegraph)
- RLE, Huffman coding


f) JPEG (Joint Photographic Experts Group)
g) TIFF (Tagged Image File Format)
h) DEM (Digital Elevation Model)

Vector VS Raster
Vector data – Advantages Vector data – Disadvantages

 Data can be represented at its original  The location of each vertex needs to be
resolution and form without stored explicitly.
 For effective analysis, vector data must
 Graphic output is usually more be converted into a topological
aesthetically pleasing (traditional structure. This is often processing
cartographic representation); intensive and usually requires extensive
data cleaning.
 Since most data, e.g. hard copy maps, is
in vector form no data conversion is  Algorithms for manipulative and
required. analysis functions are complex and may
be processing intensive. Often, this
 Accurate geographic location of data is inherently limits the functionality for
maintained. large data sets, e.g. a large number of
 Allows for efficient encoding of
topology, and as a result more efficient  Continuous data, such as elevation data,
operations that require topological is not effectively represented in vector
information, e.g. proximity, network form. Usually substantial data
analysis. generalization or interpolation is
required for these data layers.

 Spatial analysis and filtering within

polygons is impossible

Raster data – Advantages Raster data – Disadvantages


 The geographic location of each cell is  The cell size determines the resolution
implied by its position in the cell at which the data is represented.;
matrix. Accordingly, other than an
origin point, e.g. bottom left corner, no  It is especially difficult to adequately
geographic coordinates are stored. represent linear features depending on
the cell resolution. Accordingly,
 Due to the nature of the data storage network linkages are difficult to
technique data analysis is usually easy establish.
to program and quick to perform.
 Processing of associated attribute data
 The inherent nature of raster maps, e.g. may be cumbersome if large amounts
one attribute maps, is ideally suited for of data exists. Raster maps naturally
mathematical modeling and quantitative reflect only one attribute or
analysis. characteristic for an area.

 Discrete data, e.g. forestry stands, is  Since most input data is in vector form,
accommodated equally well as data must undergo vector-to-raster
continuous data, e.g. elevation data, and conversion. Besides increased
facilitates the integrating of the two processing requirements this may
data types. introduce data integrity concerns due to
generalization and choice of
 Grid-cell systems are very compatible inappropriate cell size.
with raster-based output devices, e.g.
electrostatic plotters, graphic terminals.  Most output maps from grid-cell
systems do not conform to high-quality
cartographic needs.

It is often difficult to compare or rate GIS software that uses different data models. Some
personal computer (PC) packages utilize vector structures for data input, editing, and display but
convert to raster structures for any analysis. Other more comprehensive GIS offerings provide
both integrated raster and vector analysis techniques. They allow users to select the data structure
appropriate for the analysis requirements. Integrated raster and vector processing capabilities are
most desirable and provide the greatest flexibility for data manipulation and analysis.

 Topology refers to knowledge about relative spatial positioning of features i.e.
knowledge about how features are connected and which features are adjacent to each
other. Topology distinguishes GIS data models from non-topological data models
supported by many CAD, mapping and graphics systems.

 Topological relationships between geometric entities traditionally include connectivity,

orientation, adjacency (left, right) and containment (what encloses what).

 Topology allows automated error detection and elimination

 The tolerances controlling snapping, elimination, and merging must be considered

carefully, because they can move features

 Complete topology makes map overlay feasible


 Topology allows many GIS operations to be done without accessing the point files.

 knowledge of the geometrical relationships and connectivity between objects

 How could objects be related to each other?

Data Acquisition
Two sources
A. Primary Data Capture (first-hand collection) Digitizing Tablet
1. Digitizing
 Heads up digitizing
 Automatic digitizing
2. Scanning
3. Other point measurements (in text files)
4. Census data
5. GPS collections
6. Aerial photographs
7. Remote sensing data

B. Secondary Data Capture (from others)

1. Published or released data (originally primary data)
2. All primary data from others are secondary data for you and me
 DEM = Digital Elevation Model
 NED = National Elevation Dataset
 NHD = National Hydrography Dataset
 DRG = Digital Raster Graphic
 DLG = Digital Line Graph
 DOQQ = Digital Ortho Quarter Quad
 GNIS = Geographic Names Information System
 LULC = Land Use Land Cover
 NLCD = National Land Cover Data

Vector Primary Data Capture

1. Surveying
 Locations of objects determines by angle and distance measurements from known
 Uses expensive field equipment and crews
 Most accurate method for large scale, small areas


2. GPS
 Collection of satellites used to fix locations on Earth’s surface
 Differential GPS used to improve accuracy

Secondary Geographic Data Capture
1. Data collected for other purposes can be converted for use in GIS

2. Raster conversion
 Scanning of maps, aerial photographs, documents, etc.
 Important scanning parameters are spatial and spectral (bit depth) resolution

Accuracy & Precision

Data Quality
 Cos it’s in the computer, don’t mean it’s right

 “It’s not the things you don’t know that matter, it’s the things you know that aren’t so”.

- Will Rogers, Famous Okie GI specialist

 “But there are also unknown unknowns: the ones we don't know we don't know.”

- Donald Rumsfeld

 “Fast is fine, but accuracy is everything.”

-- Wyatt Earp

Data Quality: How good is your data?

1. Scale
 ratio of distance on a map to the equivalent distance on the earth's surface
 Primarily an output issue; at what scale do I wish to display?
2. Precision or Resolution
 The exactness of measurement or description
 Determined by input; can output at lower (but not higher) resolution
3. Accuracy
 The degree of correspondence between data and the real world


 Fundamentally controlled by the quality of the input
4. Lineage
 The original sources for the data and the processing steps it has undergone
5. Currency
 The degree to which data represents the world at the present moment in time
6. Documentation or Metadata
 Data about data: recording all of the above
7. Standards
 Common or “agreed-to” ways of doing things
 Data built to standards is more valuable since it’s more easily shareable

Measurement of Positional Accuracy
Usually measured by root mean square error: the square root of the average squared errors

where ei is the distance (horizontally or vertically) between the true location of point i on the
ground, and its location represented in the GIS.

 Usually expressed as a probability that no more than P% of points will be further than S
distance from their true location.

 Loosely we say that the RMSE tells us how far recorded points in the GIS are from their
true location on the ground, on average.

 More correctly, based on the normal distribution of errors, 68% of points will be RMSE
distance or less from their true location, 95% will be no more than twice this distance,
providing the errors are random and not systematic (i.e. the mean of the errors is zero)

Source of Spatial Data

Generally, there are three main sources that can be used for GIS which are:

1. Maps
2. Satellite data
3. Airborne

Nature of Spatial Data (Geographic Objects)

1. Spatial component
 Relative position between objects
 Coordinate system

2. Attribute component
 Explains spatial objects characteristics

3. Spatial relationship
 relationship between objects

4. Time component
 Temporal element


