Mumbai University, T.Y.B.Sc.(I.T.), Semester VI, Principles of Geographic Information System, USIT604, Discipline Specific Elective Unit 1: Introduction to GIS
2. Unit 1:Introduction to GIS
The nature of GIS
The real world and representations of it
Geographic Information and Spatial Database
Models and Representations of the real world
Geographic Phenomena
Computer Representations of Geographic Information
Organizing and Managing Spatial Data
The Temporal Dimension
3. What is GIS?
G stands for geographic, so GIS has something
to do with geography.
I stands for information, so GIS has something
to do with information, namely geographic
information.
S stands for system, so GIS is an integrated
system of geography and information tied
together.
4. What is GIS?
•A computer system for
-collecting,
-storing,
-manipulating,
-analyzing,
-displaying, and
-querying
geographically related
information.
5. Definitions of GIS
GIS is a particular form of Information System applied
to geographical data.
GIS is a computer based tool that analyzes, stores,
manipulates and visualizes geographic information on
a map.
A Geographic Information System is a system of
computer software, hardware and data, and the
personnel that make it possible to enter, manipulate,
analyze, and present information that is tied to a
location on the earth’s surface.
6. A GIS is a computer-based system that provides the
following four sets of capabilities to handle
georeferenced data:
1. Data capture and preparation
2.Data management, including storage and maintenance
3. Data manipulation and analysis
4. Data presentation
7. Data capture and preparation
Data capture and input is done using existing data or
by creating new data.
New data can be created from sensed images, GPS
devices, field survey, user input and text files etc.
8. Data Management
Data management refers to the storage and
maintenance of the data.
Data is usually stored in tables in row and column
format.
Data manipulation includes data verification, attribute
data management, insertion, updation, deletion and
retrieval in different forms.
9. Data Manipulation and analysis
Once the data has been collected and organized, analysis
can be done using different analysis tools.
Data Presentation
After the data is gathered and stored, it is prepared for
producing output.
The data presentation phase deals with putting it all
together into a format that communicates the result of data
analysis in the best possible way.
For effective presentation, following point should be kept
in mind. What is message we want to portray, who the
audience is, what kind of presentation medium is used and
what techniques are available for representation.
10. GI System
GI System is a combination of a functional GIS
software and hardware components, users to work on
software and infrastructure support.
It is specialized software to facilitate input, process,
transform and analyze geospatial data.
11. GIS Application
GIS software can be applied to many different to many
different applications.
12. GIS Applications
Natural resource-based
wildlife habitat analysis, migration routes planning
Natural Resource Management
Land Use Planning
Natural Hazard assessment
Environmental impact analysis (EIA)
Groundwater modeling and contamination tracking
Street network-based
address matching - finding locations given street addresses
vehicle routing and scheduling
location analysis, site selection
development of evacuation plans
13. Land parcel-based
Zoning, subdivision plan review
Land record management
environmental impact statements
water quality management
maintenance of ownership
Facilities management
locating underground pipes, cables
balancing loads in electrical networks
planning facility maintenance
Others
Crime analysis
Market analysis
Location based services
In car navigation system
14. Who Uses GIS?
Planning Strategies
Police and Law Enforcement Agencies
Foresters
Industry
Environmental Engineers
Real Estate Professionals
Telecommunications Professionals
Emergency Response Organizations
Local and Federal Government
Health
Transportation
Geographers
Market Developers
15. GIS components
Computer system
Hardware
Software
Geographic data
People to carry out various management and analysis
tasks
16. Hardware – it is the computer on which a GIS operates
Software – it provides functions and tools needed to input
and store, query, performs analysis, and displays
geographic information in the form of maps or reports. All
GIS software packages rely on an underlying database
management system (DBMS) for storage and management
of the geographic and attribute data.
17. Data - Data is one of the most important, and often
most expensive, components of a GIS.
It is entered into a GIS using a technique called
digitizing.
Digitizing is done by tracing the location, path or
boundary of geographic features either on a computer
screen using a scanned map in the background, or a
paper map that is attached to a digitizing tablet.
Even data is available for free or for purchase from the
data provider or from a spatial data clearinghouse.
18. The real world and representations of it
One of the main uses of GIS is as a tool to help us
make decisions.
To do so we need to restrict ourselves to ‘some part’ of
the real world simply because it cannot be represented
completely.
It will allow us to enter and store data, analyze the data
and transfer it to humans or to other systems.
19. MODELS
A model is a representation of whole or some part of
the real world having certain characteristics in
common with the real world.
It is used to study and operate on the model itself
instead of the real world in order to test what happens
under various conditions, and help us answer ‘what if’
analysis by changing the data or alter the parameters
of the model, and investigate the effects of the
changes.
Models are of different types
Static model- it represent a single state of affairs at any
point of time.
Dynamic model- it is used for such systems,
20. Static model and Dynamic model
It represent a single state of affairs at any point of time.
Most maps and databases can be considered static models.
Usually, developments or changes in the real world are not
easily recognized in these models
Dynamic models – it emphasize changes that have taken
place, are taking place or may take place sometime in the
future.
Dynamic models are inherently more complicated than
static models, and usually require much more
computation.
Simulation models are an important class of dynamic
models that allow the simulation of real world processes.
21. Models as representations
Models—as representations—come in many different
forms.
The most familiar model is that of a map. A map is a
miniature representation of some part of the real world.
Paper maps are the most common, but digital maps also
exist.
Databases are another important class of models
Digital models have enormous advantages over paper
models (such as maps).
They are more flexible, and therefore more easily changed
for the purpose at hand.
Application models- it refer to models with a specific
application
22. Maps
A map is a graphic representation at a certain level of
detail, which is determined by the scale, having
physical boundaries and features.
Map is a miniature representation of some port of the
re
Cartography, is the science and art of map making,
functions as an interpreter, translating real world
phenomena into correct, clear and understandable
representations for our use.
Maps are also a data source for other applications,
including the development of other maps.
23. Database
A database is a repository for storing large amounts of data
A database offers a number of techniques for storing and
analyzing data.
A database can be used by multiple users at the same
time—i.e. it allows concurrent use
A database allows the imposition of rules on the stored
data
A database offers an easy to use data manipulation
language
Databases can store almost any kind of data
24. Spatial databases
Spatial database also known as geo-database can store
representations of real world geographic phenomena
for use in a GIS.
It stores about spatial references systems, and supports
all kinds of analysis that are geographic in nature, such
as computation of distance and area.
It may have point, line, and area or image
characteristics.
25. Spatial Analysis
Spatial analysis is the general term for all manipulations of
spatial data carried out to improve one’s understanding of
the geographic phenomena that the data represents.
It involves questions about how the data in various layers
might relate to each other, and how it varies over space
The aim of spatial analysis is usually to gain a better
understanding of geographic phenomena through
discovering patterns that were previously unknown to us,
or to build arguments on which to take important
decisions.
GIS functions for spatial analysis are simple and easy-to-
use, much more sophisticated, and demand higher levels of
analytical and operating skills
27. Models and representations of the real world
Modeling is he process of producing an abstraction of
the real world to observe and study it easily.
It is the process of representing key aspects of the real
world using computer system.
These representation are made up of spatial data,
stored in computer memory.
Modeling begins with the process of translating the
relevant aspects of the real world into a computer
representation.
It can be done using direct observations using sensors,
and digitizing the sensor output for computer usage or
by indirect means.
28. world phenomena inside
a GIS to build models or
simulations.
Representing real world Phenomena inside GIS
29. Geographic phenomena
A geographic phenomenon is something that
❖Can be named or described,
❖Can be georeferenced, and
❖Can be assigned a time (interval) at which it is/was
present.
For instance, in water management, the objects of study might be river basins,
measurements of actual evaporation, ground water levels, irrigation levels,
water budgets and measurements of total water use.
A spatial phenomena occur in a two- or three-dimensional Euclidean space.
Euclidean space can be informally defined as a model of space in which
locations Euclidean space are represented by coordinates—(x, y) in 2D; (x, y, z)
in 3D—and distance and di-rection can defined with geometric formulas.In the
2D case, this is known as the Euclidean plane
30. Types of geographic phenomena
In order to represent a phenomenon in a GIS, it requires to
state what it is, and where it is.
Some phenomena exists essentially everywhere in the
study area, while others only do so in certain localities.
Geographic Fields
A (geographic) field is a geographic phenomenon for
which, for every point in the study area, a value can be
determined. Ex air temperature, barometric pressure and
elevation.
A field is a mathematical function f that associates a
specific value with any position in the study area.
If (x, y) is a position in the study area, then f(x, y) stands for
the value of the field at locality (x, y).
31. Fields can be discrete or continuous
In a continuous field, the underlying function is
assumed to be ‘mathematically smooth’, meaning that
the field values along any path through the study area
do not change abruptly, but only gradually.
Example - air temperature, barometric pressure,
soil salinity and elevation
Discrete fields divide the study space in mutually
exclusive, bounded parts, with all locations in one part
having the same field value.
Example - soil type, land use type, crop type or natural
vegetation type
32. Data types and values
Different kinds of data values which we can use to represent
geographic ‘phenomena’
1. Nominal data values – it provide a name or identifier so that
we can discriminate between different values. This kind of data
value is called categorical data
2. Ordinal data values are data values that can be put in some
natural sequence but that do not allow any other type of
computation. Ex - Household income ‘low’, ‘average’ or ‘high’
3. Interval data values are quantitative, in that they allow simple
forms of computation like addition and subtraction. Interval
data has no arithmetic zero value, and does not support
multiplication or division
4. Ratio data values allow most, if not all, forms of arithmetic
computation. Rational data have a natural zero value, and
multiplication and division of values are possible operators
33. Geographic Objects
It populate the study area and are usually well
distinguished, discrete, and bounded entities.
Position of geographic objects is determined by a
combination of the following parameters
Location
Shape
Size
orientation
34. Boundaries
Boundary is required where shape and/or size of
contiguous areas matter
Boundary is used for geographic object and for discrete
geographic field.
A crisp boundary is one that can be determined with
almost arbitrary precision, dependent only on the data
acquisition technique applied.
Fuzzy boundaries contrast with crisp boundaries in that
the boundary is not a precise line, but rather itself an area
of transition.
Crisp boundaries are more common in man-made
phenomena, whereas fuzzy boundaries are more common
with natural phenomena.
35. Computer representations of geographic information
Geographic phenomenon can be represented in computer memory as
Store as many (location, elevation) observation pairs as possible, or
Find a symbolic representation of the elevation field function, as a
formula in x and y—like (3.0678x2 + 20.08x - 7.34y) or so—which can
be evaluated to give us the elevation at any given (x, y) location.
Both of these approaches have their drawbacks
The first suffers from the fact that , there are infinitely many locations.
In second approach, it is extremely difficult to derive function for larger
areas.
In GISs, a combination of both approaches is taken.
We store a finite, but intelligently chosen set of (sample) locations with
their elevation.
Interpolation function allows us to infer a reasonable elevation value
for locations that are not stored.
Interpolation is made possible by a principle called spatial
autocorrelation
fields are usually implemented with a tessellation approach, and
objects with a (topological) vector approach
36. Regular tessellations
A tessellation (or tiling) is a partitioning of space into
mutually exclusive cells that together make up the
complete study space.
With each cell, some (thematic) value is associated to
characterize that part of space.
In all regular tessellations, the cells are of the same
shape and size, and the field attribute value assigned
to a cell is associated with the entire area occupied by
the cell.
37. Raster
A raster is a set of regularly spaced (and contiguous)
cells with associated (field) values. The associated
values represent cell values, not point values.
The size of the area that a single raster cell represents
is called the raster’s resolution.
To improve on this continuity of cell
Make the cell size smaller, so as to make the ‘continuity
gaps’ between the cells smaller
Assume that a cell value only represents elevation for
one specific location in the cell.
38. Advantage and disadvantage of
Regular Tesselations
Regular tessellations know how they partition space,
and can make computations specific to this
partitioning. This leads to fast algorithms
An obvious disadvantage is that they are not adaptive
to the spatial phenomenon we want to represent.
The cell boundaries are both artificial and fixed: they
may or may not coincide with the boundaries of the
phenomena of interest.
another disadvantage is database size in big and
increases with the increase in resolution.
39. Irregular tessellations
Irregular tessellations are partitions of space into
mutually disjoint cells, but the cells may vary in size
and shape, allowing them to adapt to the spatial
phenomena that they represent
Irregular tessellations are more complex than the
regular ones, but they are also more adaptive.
It leads to a reduction in the amount of memory used
to store the data
41. It is based on a regular tessellation of square cells, but
takes advantage of cases where neighbouring cells
have the same field value, so that they can together be
represented as one bigger cell.
The quadtree that represents a raster is constructed by
repeatedly splitting up the area into four quadrants,
which are called NW, NE, SE, SW.
This procedure stops when all the cells in a quadrant
have the same field value. The procedure produces an
upside-down, tree-like structure, known as a quadtree.
In main memory, the nodes of a quadtree are
represented as records.
Quadtrees are adaptive because they apply the spatial
autocorrelation principle, i.e. the locations that are
near in space are likely to have similar field values.
42. Vector representations
Tessellations provide a georeference of the lower left
corner of the raster, plus an indicator of the raster’s
resolution
Vector representations, explicitly associate
georeferences with the geographic phenomena.
A georeference is a coordinate pair from some
geographic space, and is also known as a vector.
43. TIN (Triangulated Irregular Network)
It is a representation for geographic fields that can be
considered a hybrid between tessellations and vector
representations.
A TIN is a vector representation
It is a commonly used data structure in GIS software
It is one of the standard implementation techniques
for digital terrain models, but it can be used to
represent any continuous field.
The principles behind a TIN are simple. It is built from
a set of locations for which we have a measurement,
for instance an elevation.
45. Delaunay triangulation
It is an optimal triangulation.
First property is triangles are as equilateral (‘equal-
sided’) as they can be, given the set of anchor points.
The second property is that for each triangle, the
circumcircle through its three anchor points does not
contain any other anchor point.
46. Point representations
Points are defined as single coordinate pairs (x, y) in
2D, or coordinate triplets (x, y, z) in 3D.
Points are used to represent objects that are best
described as shape- and size-less, zero-dimensional
features.
47. Line representations
Line data are used to represent one-dimensional objects such as
roads, railroads,canals, rivers and power lines.
The two end nodes and zero or more internal nodes or vertices
define a line.
Other terms for ’line’ that are commonly used in some GISs are
polyline, arc or edge.
A node or vertex is like a point but it only serves to define the
line, and provide shape in orderto obtain a better approximation
of the actual feature.
The straight parts of a line between two consecutive vertices or
end nodes are called line segments.
Many GISs store a line as a simple sequence of coordinates of its
end nodes and vertices, assuming that all its segments are
straight.
48. Area representations
Areas as they are represented by their boundaries.
Each boundary is a cyclic sequence of line features;
Each line as is a sequence of two end nodes, with in
between zero or more vertices.
Area feature is represented by collection of arc/node
structure that determines a polygon as the area’s
boundary.
49. Topology and spatial relationships
Topology refers to the spatial relationships between
geographical elements in a data set that do not change
under a continuous transformation.
Area E is still inside area D, The neighbourhood
relationships between A, B, C, D, and E stay intact, and
their boundaries have the same start and end nodes
The areas are still bounded by the same boundaries, only
the shapes and lengths of their perimeters have
changed.
50. Topological relationships
Topological relationships are built from simple
elements into more complex elements: nodes define
line segments, and line segments connect to define
lines, which in turn define polygons.
The mathematical properties of the geometric space
used for spatial data can be described as follows:
The space is a three-dimensional Euclidean space where
for every point we can determine its three-
dimensional coordinates as a triple (x, y, z) of
real numbers
51. The space is a metric space, we can always compute the
distance between two points according to a given
distance function.
The space is a topological space, for every point in the
space we can find a neighbor hood around it that fully
belongs to that space as well.
Interior and boundary are properties of spatial features
that remain invariant under topological mappings.
52. We can define within the topological space, features
that are easy to handle and that can be used as
representations of geographic objects.
These features are called simplices as they are the
simplest geometric shapes of some dimension: point
(0-simplex), line segment (1-simplex), triangle (2-
simplex), and tetrahedron (3-simplex).
When we combine various simplices into a single
feature, we obtain a simplicial complex
53. The topology of two dimensions
Topological properties of interior and boundary can be
used to define relationships between spatial features.
We can define the interior of a region R as the largest
set of points of R for which we can construct a disk-like
environorment around it that also falls completely
inside R.
The boundary of R is the set of those points belonging
to R but that do not belong to the interior of R, i.e. one
cannot construct a disk-like environment around such
points that still belongs to R completely.
54. Set theory
Consider a spatial region A. It has a boundary and an
interior, both seen as (infinite) sets of points, and
which are denoted by boundary(A) and interior(A),
respectively.
We consider all possible combinations of
intersections(∩) between the boundary and the
interior of A with those of another region B, and test
whether they are the empty set (∅) or not.
From these intersection patterns, we can derive eight
(mutually exclusive) spatial relationships between two
regions.
55. A meets B =
interior(A) ∩ interior(B) = ∅ ∧
boundary(A) ∩ boundary(B) = ∅ ∧
interior(A) ∩ boundary(B) = ∅ ∧
boundary(A) ∩ interior(B) = ∅.
These relationships can be used in queries against a
spatial database, and represent the ‘building blocks’ of
more complex spatial queries.
56. The five rules of topological consistency in
two-dimensional space
57. Scale and resolution
Map scale can be defined as the ratio between the
distance on a paper map and the distance of the same
stretch in the terrain.
A 1:50,000 scale map means that 1 cm on the map
represents 50,000 cm, i.e. 500 m, in the terrain.
When applied to spatial data, the term resolution is
commonly associated with the cell width of the
tessellation applied
58. Representations of geographic fields
A geographic field can be represented through a
tessellation, through a TIN or through a vector
representation.
Raster representation of a field
Raster represents a continuous field like elevation.
A raster can be thought of as a long list of field values
59. Vector representation of a field
This technique uses isolines of the field.
An isoline is a linear feature that connects the points
with equal field value.
When the field is elevation, we also speak of contour
lines.
Both TINs and isoline representations use vectors.
60. Representation of geographic
objects
It is supported with vectors. All, objects are identified
by the parameters of location, shape, size and
orientation and many of these parameters can be
expressed in terms of vectors.
However, tessellations are still commonly used for
representing geographic objects
61. Organizing and managing spatial
data
The main principle of data organization applied in GIS
systems is that of a spatial data layer.
A spatial data layer is either a representation of a
continuous or discrete field, or a collection of objects
62. The temporal dimension
Geographic phenomena are dynamic, they change over
time.
Some features changes slowly, while other phenomena
change very rapidly.
The temporal dimension is of a continuous nature.
Therefore in order to represent it in a computer, we have to
‘discretize’ the time dimension.
Spatiotemporal data models are ways of organizing
representations of space and time in a GIS.
The most common technique is a ‘snapshot’ state that
represents a single point in time of an ongoing natural or
man-made process.
We may store a series of these snapshot states to represent
change
63. Different ‘concepts’ of time
Discrete and continuous time: Time can be
measured along a discrete or continuous scale.
Discrete time is composed of discrete elements
(seconds,minutes, hours, days, months, or years). In
continuous time, for any two different points in time,
there is always another point in between.
Valid time and transaction time: Valid time is the
time when an event really happened, or a string of
events took place. Transaction time (or database time)
is the time when the event was stored in the database
or GIS
64. Linear, branching and cyclic time-Time can be
considered to be linear, extending from the past to the
present (‘now’), and into the future. This view gives a
single time line.
Branching time—in which different time lines from a
certain point in time onwards are possible—and
cyclic time—in which repeating cycles such as seasons
or days of a week are recognized, make more sense and
can be useful.
Time granularity- granularity is the precision of a
time value in a GIS or database
Absolute and relative time-Absolute time marks a
point on the time line where events happen
Relative time is indicated relative to other points in
time
65. References
Principles of Geographic Information System –Sheth
Publication
Principles of Geographic Information Systems - An
Introductory Text Book – Publication-The
international Institute of Geo Information Science and
Earth Observation