ELements of Geoinformatics
ELements of Geoinformatics
ELements of Geoinformatics
What is GIS?
A system for capturing, storing, checking, integrating, manipulating, analysing and displaying
data which are spatially referenced to the Earth. This is normally considered to involve a
spatially referenced computer database and appropriate applications software”
Historical Background
This technology has developed from:
a. Digital cartography and CAD
b. Data Base Management Systems
3. Complexity of management
Due to the need to combine and process many sets of data, in addition to judge as
many as possible, situation that might happen.
4. Intense competition
The need to use technology in making decisions and strategy in the world of
intense competition.
1. Data visualisation
A. Tabular
B. Map
2. Location
a) Where is object X?
1. Answer: X = 1, A South
b) What can be found at a certain location?
1. EXAMPLE: What can be found at 1, C East?
2. Answer: Y
4. Attribute Question
6. Trend Question
1. What are the changes for a,b and c from 1980 to 1990?
a) A: increase in size
b) B: decrease size and change in location
c) C: changes in shape
Benefits of GIS
1. Geospatial data are better maintained in a standard format.
2. Revision and updating are easier.
3. Geospatial data and information are easier to search, analyze and represent.
4. More value added product.
5. Geospatial data can be shared and exchanged freely.
6. Productivity of the staff can be improved and more efficient.
7. Time and money are saved.
8. Better decision can be made.
GIS Application
Because GIS systems are designed as a generic system for handling any kind of spatial data, they
are applied in a wide range of applications in urban and rural environments.
GIS subsystems
1. Data input subsystem;
2. Data storage and retrieval subsystem;
3. Data manipulation and analysis subsystem; and
4. Data output and display subsystem.
1. Data acquisition
Define, entry and preliminary processing data
GIS users range from technical specialists who design and maintain the system to those
who use it to help them perform their everyday work.
The identification of GIS specialists versus end users is often critical to the proper
implementation of GIS technology.
b) Data
Perhaps the most important component of a GIS is the data.
A GIS can integrate spatial data with other existing data resources, often stored in a
corporate DBMS. The integration of spatial data (often proprietary to the GIS
software), and tabular data stored in a DBMS is a key functionality afforded by GIS.
c) Hardware
Hardware is the computer system on which a GIS operates. Today, GIS software runs on
a wide range of hardware types, from centralized computer servers to desktop computers
used in stand-alone or networked configurations.
d) Software
GIS software provides the functions and tools needed to store, analyse, and display
geographic information.
Desktop GIS:
*Free ( GRASS GIS, Quantum GIS, ILWIS, JUMP GIS, and MapWindow GIS)
*Commercial ( ERDAS IMAGINE, ESRI( Products include ArcGIS, ArcSDE
and ArcIMS), Intergraph and SuperMap Inc)…. Etc
e) Methods
A successful GIS operates according to a well-designed implementation plan and
business rules, which are the models and operating practices unique to each
organization.
As in all organizations dealing with sophisticated technology, new tools can only be
used effectively if they are properly integrated into the entire business strategy and
operation. To do this properly requires not only the necessary investments in
hardware and software, but also in the retraining and/or hiring of personnel to utilize
the new technology in the proper organizational context.
B. Non-spatial queries:
List the names of all bookstore with more than ten thousand titles.
List the names of ten customers, in terms of sales, in the year 2001
C. Spatial queries:
List the names of all bookstores within ten miles of a city
List all customers who live in the city and its adjoining states
GIS uses SDBMS to store, search, query, share large spatial data sets
2. areas:
unbounded: land use, market areas, soils, rock type
bounded: city/county/state boundaries, ownership parcels, zoning
moving: air masses, animal herds, schools of fish
4. Points:
fixed: wells, street lamps, addresses
moving: cars, fish, deer
1. Spatial data: describes the absolute and relative location of geographic features.
2. Attribute data: describes characteristics of the spatial features. These characteristics can
be quantitative and/or qualitative in nature. Attribute data is often referred to as tabular
data.
The coordinate location of a forestry stand would be spatial data, while the characteristics of that
forestry stand, e.g. cover group, dominant species, crown closure, height, etc., would be attribute
data.
Other data types, in particular image and multimedia data, are becoming more prevalent with
changing technology
Depending on the specific content of the data, image data may be considered either spatial, e.g.
photographs, animation, movies, etc., or attribute, e.g. sound, descriptions, narration's, etc
often coded to numbers eg SSN but can’t do may be expressed as integer [whole number] or
arithmetic floating point [decimal fraction]
Attribute data tables can contain locational information, such as addresses or a list of X, Y
coordinates. ArcView refers to these as event tables. However, these must be converted to true
spatial data (shape file), for example by geocoding, before they can be displayed as a map.
Vector data is characterized by the use of sequential points or vertices to define a linear
segment.
Vector lines are often referred to as arcs and consist of a string of vertices terminated by a
node.
A node is defined as a vertex that starts or ends an arc segment. Point features are defined
by one coordinate pair, a vertex.
The most popular method of retaining spatial relationships among features is to explicitly
record adjacency information in what is known as the topologic data model
Topology is a mathematical concept that has its basis in the principles of feature
adjacency and connectivity.
While the term raster implies a regularly spaced grid other tessellated data structures do
exist in grid based GIS systems. In particular, the quad-tree data structure has found some
acceptance as an alternative raster data model.
There is no explicit coding of geographic coordinates required since that is implicit in the
layout of the cells.
A raster data structure is in fact a matrix where any coordinate can be quickly calculated
if the origin point is known, and the size of the grid cells is known.
Topology is not a relevant concept with tessellated structures since adjacency and
connectivity are implicit in the location of a particular cell in the data matrix.
Several tessellated data structures exist, however only two are commonly used in GIS's.
The most popular cell structure is the regularly spaced matrix or raster structure.
This data structure involves a division of spatial data into regularly spaced cells. Each
cell is of the same shape and size. Squares are most commonly utilized.
Since geographic data is rarely distinguished by regularly spaced shapes, cells must be
classified as to the most common attribute for the cell.
The problem of determining the proper resolution for a particular data layer can be a
concern.
If one selects too coarse a cell size then data may be overly generalized.
If one selects too fine a cell size then too many cells may be created resulting in a large
data volume, slower processing times, and a more cumbersome data set. As well, one can
imply accuracy greater than that of the original data capture process and this may result
in some erroneous results during analysis.
As well, since most data is captured in a vector format, e.g. digitizing, data must be
converted to the raster data structure. This is called vector-raster conversion.
Most GIS software allows the user to define the raster grid (cell) size for vector-raster
conversion. It is imperative that the original scale, e.g. accuracy, of the data be known
prior to conversion.
The accuracy of the data, often referred to as the resolution, should determine the cell
size of the output raster map during conversion.
Most raster based GIS software requires that the raster cell contain only a single discrete
value.
Accordingly, a data layer, e.g. forest inventory stands, may be broken down into a series
of raster maps, each representing an attribute type, e.g. a species map, a height map, a
density map, etc. These are often referred to as one attribute maps.
This is in contrast to most conventional vector data models that maintain data as multiple
attribute maps, e.g. forest inventory polygons linked to a database table containing all
The use of raster data structures allow for sophisticated mathematical modelling
processes while vector based systems are often constrained by the capabilities and
language of a relational DBMS.
These differences are the major distinguishing factor between vector and raster based
GIS software. It is also important to understand that the selection of a particular data
structure can provide advantages and disadvantages during the analysis stage.
For example, the vector data model does not handle continuous data, e.g. elevation, very
well while the raster data model is more ideally suited for this type of analysis.
Accordingly, the raster structure does not handle linear data analysis, e.g. shortest path,
very well while vector systems do.
It is important for the user to understand that there are certain advantages and
disadvantages to each data model.
Vector VS Raster
Vector data – Advantages Vector data – Disadvantages
Data can be represented at its original The location of each vertex needs to be
resolution and form without stored explicitly.
generalization.
For effective analysis, vector data must
Graphic output is usually more be converted into a topological
aesthetically pleasing (traditional structure. This is often processing
cartographic representation); intensive and usually requires extensive
data cleaning.
Since most data, e.g. hard copy maps, is
in vector form no data conversion is Algorithms for manipulative and
required. analysis functions are complex and may
be processing intensive. Often, this
Accurate geographic location of data is inherently limits the functionality for
maintained. large data sets, e.g. a large number of
features.
Allows for efficient encoding of
topology, and as a result more efficient Continuous data, such as elevation data,
operations that require topological is not effectively represented in vector
information, e.g. proximity, network form. Usually substantial data
analysis. generalization or interpolation is
required for these data layers.
Discrete data, e.g. forestry stands, is Since most input data is in vector form,
accommodated equally well as data must undergo vector-to-raster
continuous data, e.g. elevation data, and conversion. Besides increased
facilitates the integrating of the two processing requirements this may
data types. introduce data integrity concerns due to
generalization and choice of
Grid-cell systems are very compatible inappropriate cell size.
with raster-based output devices, e.g.
electrostatic plotters, graphic terminals. Most output maps from grid-cell
systems do not conform to high-quality
cartographic needs.
It is often difficult to compare or rate GIS software that uses different data models. Some
personal computer (PC) packages utilize vector structures for data input, editing, and display but
convert to raster structures for any analysis. Other more comprehensive GIS offerings provide
both integrated raster and vector analysis techniques. They allow users to select the data structure
appropriate for the analysis requirements. Integrated raster and vector processing capabilities are
most desirable and provide the greatest flexibility for data manipulation and analysis.
Topology
Topology refers to knowledge about relative spatial positioning of features i.e.
knowledge about how features are connected and which features are adjacent to each
other. Topology distinguishes GIS data models from non-topological data models
supported by many CAD, mapping and graphics systems.
Data Acquisition
Two sources
A. Primary Data Capture (first-hand collection) Digitizing Tablet
1. Digitizing
Heads up digitizing
Automatic digitizing
2. Scanning
3. Other point measurements (in text files)
4. Census data
5. GPS collections
6. Aerial photographs
7. Remote sensing data
2. Raster conversion
Scanning of maps, aerial photographs, documents, etc.
Important scanning parameters are spatial and spectral (bit depth) resolution
Data Quality
Cos it’s in the computer, don’t mean it’s right
“It’s not the things you don’t know that matter, it’s the things you know that aren’t so”.
“But there are also unknown unknowns: the ones we don't know we don't know.”
- Donald Rumsfeld
-- Wyatt Earp
where ei is the distance (horizontally or vertically) between the true location of point i on the
ground, and its location represented in the GIS.
Usually expressed as a probability that no more than P% of points will be further than S
distance from their true location.
Loosely we say that the RMSE tells us how far recorded points in the GIS are from their
true location on the ground, on average.
More correctly, based on the normal distribution of errors, 68% of points will be RMSE
distance or less from their true location, 95% will be no more than twice this distance,
providing the errors are random and not systematic (i.e. the mean of the errors is zero)
1. Maps
2. Satellite data
3. Airborne
2. Attribute component
Explains spatial objects characteristics
3. Spatial relationship
relationship between objects
4. Time component
Temporal element