Introduction To GIS
Introduction To GIS
1.0 Introduction
Geographic Information System (GIS) is a computer based information system used to digitally represent
and analyse the geographic features present on the Earth' surface and the events (non-spatial attributes
linked to the geography under study) that taking place on it. The meaning to represent digitally is to
convert analog (smooth line) into a digital form.
"Every object present on the Earth can be geo-referenced", is the fundamental key of associating any
database to GIS. Here, term 'database' is a collection of information about things and their relationship to
each other, and 'geo-referencing' refers to the location of a layer or coverage in space defined by the co-
ordinate referencing system.
Work on GIS began in late 1950s, but first GIS software came only in late 1970s from the lab of the ESRI.
Canada was the pioneer in the development of GIS as a result of innovations dating back to early 1960s.
Much of the credit for the early development of GIS goes to Roger Tomilson. Evolution of GIS has
transformed and revolutionized the ways in which planners, engineers, managers etc. conduct the
database management and analysis.
• Spatial Analysis
• Database
• Software
• Hardware
GIS involves complete understanding about patterns, space, and processes or methodology needed
to approach a problem. It is a tool acting as a means to attain certain objective quickly and efficiently.
Its applicability is realized when the user fully understands the overall spatial concept under which a
particular GIS is established and analyses his specific application in the light of those established
parameters.
Before the GIS implementation is considered the objectives, both immediate and long term, have to
be considered. Since the effectiveness and efficiency (i.e. benefit against cost) of the GIS will depend
largely on the quality of initial field data captured, organizational design has to be decided upon to
maintain this data continuously. This initial data capture is most important.
1. 70% of the information has geographic location as it's denominator making spatial analysis an
essential tool.
2. Ability to assimilate divergent sources of data both spatial and non-spatial (attribute data).
3. Visualization Impact
4. Analytical Capability
5. Sharing of Information
More deliberately, we see the need of GIS because of its capabilities to answer fundamental questions.
They are related to Location, Condition, Trends, patterns, Modeling, Aspatial questions, Spatial questions.
There are five types of questions that a sophisticated GIS can answer:
The first of these questions seeks to find out what exists at a particular location. A location can be
described in many ways, using, for example place name, post code, or geographic reference such as
longitude/latitude or x/y.
The second question is the converse of the first and requires spatial data to answer. Instead of identifying
what exists at a given location, one may wish to find location(s) where certain conditions are satisfied
(e.g., an unforested section of at-least 2000 square meters in size, within 100 meters of road, and with
soils suitable for supporting buildings)
The third question might involve both the first two and seeks to find the differences (e.g. in land use or
elevation) over time.
This question is more sophisticated. One might ask this question to determine whether landslides are
mostly occurring near streams. It might be just as important to know how many anomalies there are that
do not fit the pattern and where they are located.
"What if…" questions are posed to determine what happens, for example, if a new road is added to a
network or if a toxic substance seeps into the local ground water supply. Answering this type of question
requires both geographic and other information (as well as specific models). GIS permits spatial
operation.
Aspatial Questions
"What's the average number of people working with GIS in each location?" is an aspatial question - the
answer to which does not require the stored value of latitude and longitude; nor does it describe where
the places are in relation with each other.
Spatial Questions
" How many people work with GIS in the major centres of Kathmandu" OR " Which centres lie within 10
Kms. of each other? ", OR " What is the shortest route passing through all these centres". These are
spatial questions that can only be answered using latitude and longitude data and other information such
as the radius of earth. Geographic Information Systems can answer such questions.
GIS technology integrates common database operations such as query and statistical analysis with the
unique visualization and geographic analysis benefits offered by maps. These abilities distinguish GIS
from other information systems and make it valuable to a wide range of public and private enterprises for
explaining events, predicting outcomes, and planning strategies. (ESRI)
A Geographic Information System is a computer based system which is used to digitally reproduce and
analyse the feature present on earth surface and the events that take place on it. In the light of the fact
that almost 70% of the data has geographical reference as it's denominator, it becomes imperative to
underline the importance of a system which can represent the given data geographically.
A typical GIS can be understood by the help of various definitions given below:-
A geographic information system (GIS) is a computer-based tool for mapping and analyzing things that
exist and events that happen on Earth
• Burrough in 1986 defined GIS as, "Set of tools for collecting, storing, retrieving at will,
transforming and displaying spatial data from the real world for a particular set of purposes"
• Arnoff in 1989 defines GIS as, "a computer based system that provides four sets of capabilities to
handle geo-referenced data :
• data input
• data management (data storage and retrieval)
• manipulation and analysis
• data output.
This shows that a GIS is looked upon as a tool to assist in decision-making and management of attributes
that needs to be analyzed spatially.
Some of the important GI systems were put forwarded by ESRI for mapping and spatial analysis. ESRI
USA launched GIS/ARCINFO at the same time when Integraph’s Geomedia package came into
existence for GIS analysis. GIS systems were first used in armed forces as they would assist in depicting
the deployment of forces on thematic map as an information layer.
Data capture
Data used in GIS come from many sources, are of many types and are stored in different ways. A GIS
provides tools and methods for the integration of data into a format so that data can be compared and
analyzed. Data sources are mainly manual digitization/scanning of aerial photographs, paper maps and
existing digital data. Remote-sensing satellite imagery and GPS are also data input sources.
Spatial data editing involves the following functions to be used in editing of the capture data.
Geometric transformations help to obtain data from a original hard copy source through
digitizing the correct world geometry. These operators transform device coordinates
(coordinates from digitizing tablets or screen coordinates) into world coordinates (geographic
coordinates, metres, etc.).
Map projections provide means to map geographic coordinates onto a at surface (for map
production), and vice versa.
Edge matching is the process of joining two or more map sheets, for instance, after they have
separately been digitized. At the map sheet edges, feature representations have to be matched
so as to be combined.
Graphic element editing allows the change of digitized features so as to correct errors, and to
prepare a clean data set for topology building.
Data management
After data are collected and integrated, a GIS provides facilities that can contain and maintain data.
Effective data management includes the following aspects: data security, data integrity, data storage and
retrieval, and data maintenance.
Figure showing the spatial operation
Spatial analysis
Spatial analysis is the most important function of a GIS that makes it distinct from other systems such as
computer aided design and drafting (CADD). The spatial analysis provides functions such as spatial
interpolation, buffering and overlay operations. There are many functions which involve in a GIS they
are:
• Search functions allow the retrieval of features that fall within a given search window
(this window may be a rectangle, circle, or polygon).
• Buffer zone generation (or buffering) is one of the best known neighborhood functions. It
determines a spatial envelope (buffer) around (a) given feature(s). The created buffer may
have a fixed width, or a
variable width that depends on characteristics of the area.
• Interpolation functions predict unknown values using the known values at nearby
locations. This typically occurs for continuous fields, like elevation, when the data actually
stored does not provide the direct answer for the location(s) of interest.
Connectivity functions accumulate values as they traverse over a feature or over a set of
features.
• Contiguity functions evaluate a characteristic of a set of connected spatial units. One can think of
the search for a contiguous area of forest of certain size and shape in a satellite image.
• Network analytic functions are used to compute over connected line features that make up a
network. The network may consist of roads, public transport routes, high voltage lines or other forms
of transportation infrastructure. Analysis of such networks may entail shortest path computations (in
terms of distance or travel time) between two points in a network for routing purposes. Other forms
are to find all points reachable within a given distance or duration from a start point for allocation
purposes, or determination of the capacity of the network for transportation between an indicated
source location and sink location.• Visibility functions also fit in this list as they are used to compute
the points visible from a given location (viewshed modeling or view shed mapping) using a digital
terrain model.
Presenting results
One of the most exciting aspects of GIS is the variety of ways in which information can be presented
once it has been processed. Traditional methods of tabulating and graphing data can be
supplemented by maps and three-dimensional images. These capabilities have given rise to new
fields such as exploratory cartography and scientific visualization. Visual presentation is one of the
most remarkable capabilities of GIS that allows for effective communication of results.
1.5 Components of GIS
GIS constitutes of five key components:
• Hardware
• Software
• Data
• People
• Method
Hardware
It consists of the computer system on which the GIS software will run. The choice of
hardware system range from 300MHz Personal Computers to Super Computers having
capability in Tera FLOPS. The computer forms the backbone of the GIS hardware, which gets
it's input through the Scanner or a digitizer board. Scanner converts a picture into a digital
image for further processing. The output of scanner can be stored in many formats e.g. TIFF,
BMP, JPG etc. A digitizer board is flat board used for vectorisation of a given map objects.
Printers and plotters are the most common output devices for a GIS hardware setup.
Software
GIS software provides the functions and tools needed to store, analyze, and display
geographic information. GIS softwares in use are MapInfo, ARC/Info, AutoCAD Map, etc. The
software available can be said to be application specific. When the low cost GIS work is to be
carried out desktop MapInfo is the suitable option. It is easy to use and supports many GIS
feature. If the user intends to carry out extensive analysis on GIS, ARC/Info is the preferred
option. For the people using AutoCAD and willing to step into GIS, AutoCAD Map is a good
option.
Data
Geographic data and related tabular data can be collected in-house or purchased from a
commercial data provider. The digital map forms the basic data input for GIS. Tabular data
related to the map objects can also be attached to the digital data. A GIS will integrate spatial
data with other data resources and can even use a DBMS, used by most organization to
maintain their data, to manage spatial data.
People
GIS users range from technical specialists who design and maintain the system to those who
use it to help them perform their everyday work. The people who use GS can be broadly
classified into two classes. The CAD/GIS operator, whose work is to vectorise the map
objects. The use of this vectorised data to perform query, analysis or any other work is the
responsibility of a GIS engineer/user.
Method
And above all a successful GIS operates according to a well-designed plan and business
rules, which are the models and operating practices unique to each organization. There are
various techniques used for map creation and further usage for any project. The map creation
can either be automated raster to vector creator or it can be manually vectorised using the
scanned images. The source of these digital maps can be either map prepared by any survey
agency or satellite imagery.
Spatial features in a GIS database are stored in either vector or raster form. GIS data structures
adhering to a vector format store the position of map features as pairs of x, y (and sometimes z)
coordinates. A point is described by a single x-y coordinate pair and by its name or label. A line is
described by a set of coordinate pairs and by its name or label. In theory, a line is described by
an infinite number of points. In practice, this is not feasible. Therefore, a line is built up of straight-
line segments. An area, also called a polygon, is described by a set of coordinate pairs and by its
name or label with the difference that the coordinate pairs at the beginning and end are the same
(Figure 3.2).
• Point representation
Points are used to represent objects that are best described as shape- and sizeless, single-
locality features. Whether this is the case really depends on the purposes of the spatial
application and also on the spatial extent of the objects compared to the scale applied in the
application. For a tourist city map, parks will not usually be considered as point features, but
perhaps museums will be, and certainly public phone booths could be represented as point
features. Besides the georeference, usually extra data is stored for each point object.
• Line representations
Line data are used to represent one-dimensional objects such as roads, railroads, canals, rivers
and power lines. Again, there is an issue of relevance for the application and the scale that the
application requires. For the example application of mapping tourist information, bus, subway and
streetcar routes are likely to be relevant line features. Some cadastral systems, on the other
hand, may consider roads to be two-dimensional features, i.e. having a width as well.
• Area representations
When area objects are stored using a vector approach, the usual technique is to apply a
boundary model. This means that each area feature is represented by some arc/node structure
that determines a polygon as the area’s boundary. Common sense dictates that area features of
the same kind are best stored in a single data layer, represented by mutually non-overlapping
polygons. In essence, what we then get is an application-determined (i.e. adaptive) partition of
space.
A commonly used data structure in GIS software is the triangulated irregular network, orTIN. It is
one of the standard implementation techniques for digital terrain models, but it can be used to
represent any continuous field. The principles behind a TIN are simple. It is built from a set of
locations for which we have a measurement, for instance an elevation. The locations can be
arbitrarily scattered in space, and are usually not on a nice regular grid. Any location together with
its elevation value can be viewed as a point in three- dimensional space. From these 3D points,
we can construct an irregular tessellation made of triangles
If the rows and columns are numbered, the position of each element can be specified by using
column number and row number. These can be linked to coordinate positions through the
introduction of a coordinate system. Each cell has an attribute value (a number) that represents a
geographic phenomenon or nominal data such as land-use class, rainfall or elevation. The
fineness of the grid (in other words, the size of the cells in the grid matrix) will determine the level
of detail in which map features can be represented. There are advantages to the raster format for
storing and processing some types of data in GIS. The vector-raster relationship is shown in
Figure 3.4.
• Regular tessellations
A tessellation (or tiling) is a partition of space into mutually exclusive cells that together make up
the complete study space. With each cell, some (thematic) value is associated to characterize
that part of space. In a regular tessellation, the cells are the same shape and size. The simplest
example is a rectangular raster of unit squares, represented in a computer in the 2D case as an
array of n × m elements.
All regular tessellations have in common that the cells are of the same shape and size, and that
the field attribute value assigned to a cell is associated with the entire area occupied by the cell.
The square cell tessellation is by far the most commonly used, mainly be- cause georeferencing a
cell is so straightforward. These tessellations are known under various names in different GIS
packages: raster or raster map. The size of
the area that a single raster cell represents is called the raster’s resolution. Sometimes, the word
grid is also used, but strictly speaking, a grid is an equally spaced collection of points, which all
have some attribute value assigned. Grids are of ten used for discrete measurements that occur
at regular intervals. Grid points are often considered synonymous with raster cells.
Our finite approximation of the study space leads to some forms of interpolation that must be
dealt with. The field value of a cell can be interpreted as one for the complete tessellation cell, in
which case the field is discrete, not continuous or even differentiable. Some convention is needed
to state which value prevails on cell boundaries; with square cells, this convention often says that
lower and left boundaries belong to the cell. To improve on this continuity issue, we can do two
things:
• make the cell size smaller, so as to make the ‘continuity gaps’ between the cells smaller, and/or
• assume that a cell value only represents elevation for one specific location in the cell, and to
provide a good interpolation function for all other locations that has the continuity characteristic.
Usually, if one wants to use rasters for continuous field representation, one does the first but not
the second. The second technique is usually considered too computationally costly for large
rasters. The location associated with a raster cell is fixed by convention, and may be the cell
centroid (mid-point) or, for instance, its left lower corner. Values for other positions than these
must be computed through some form of interpolation function, which will use one or more nearby
field values to compute the value at the requested position. This allows us to represent
continuous, even differentiable, functions.
• Irregular tessellations
Above, we discussed that regular tessellations provide simple structures with straightforward
algorithms, which are, however, not adaptive to the phenomena they represent. This is why
substantial research effort has also been put into irregular tessellations. Again, these are
partitions of space into mutually disjoint cells, but now the cells may vary in size and shape,
allowing them to adapt to the spatial phenomena that they represent. We discuss here only one
type, namely the region quadtree, but we point out that many more structures have been
proposed in the literature, and have also been implemented. Irregular tessellations are more
complex than the regular ones, but they are also more adaptive, which typically leads to a
reduction in the amount of memory used to store the data.
A well-known data structure in this family—upon which many more variations have been based—
is the region quadtree. It is based on a regular tessellation of square cells, but takes advantage of
cases where neighboring cells have the same field value, so that they can together be
represented as one bigger cell.
A simple illustration is provided in Figure 3.4.1. It shows a small 8 × 8 raster with three possible
field values: white, green and blue. The quadtree that represents this raster is constructed by
repeatedly splitting up the area into four quadrants, which are called NW, NE, SE, SW for obvious
reasons. This procedure stops when all the cells in a quadrant have the same field value. The
procedure produces an upside-down, tree-like structure, known as a quadtree. In main memory,
the nodes of a quadtree (both circles and squares in the figure below) are represented as
records. The links between them are pointers, a programming technique to address (i.e. to point
to) other records. Quadtrees are adaptive because they apply the spatial autocorrelation
principle, i.e. locations that are near in space are likely to have similar field values.
A relational database is the perception of data as series of tables that are logically associated
with each other by shared attributes (Figure 3.6). Any data element in a relationship can be found
by knowing the table name, the attribute (column) name and the value of the primary key. The
advantage of these systems is that they are flexible and can answer any question formulated with
logical and mathematical operators.
Figure 3.6 relational database management systems
1.6.4 Metadata
Metadata are simply defined as data about data.. It gives the information about the content,
source, quality, condition and other relevant characteristics of the data (Figure 3.7). For instance,
it may describe the content as road or land-use data, the source as where the data have come
from, the quality as the level of accuracy, the condition as whether the data are outdated or partial
and so on.
Figure 3.7
• Nominal data values, values that provide a name or identifier so that we can discriminate
between different values. Also called categorical data. (Landuse)
• Ordinal data values, values that we can put in a natural sequence, could be assigned as
‘low’, ‘average’or ‘high’. (Course satisfaction)
• Interval data values do allow computation. It knows no arithmetic zero value, and does
not support multiplication or division. (Temperature)
• Ratio data values,do allow computation, know arithmetic zero value and do allow
multiplication or division. (Population)
The GIS technology is rapidly becoming a standard tool for management of natural resources.
The effective use of large spatial data volumes is dependent upon the existence of an efficient
geographic handling and processing system to transform this data into usable information.
• Planning of project
• Make better decisions
• Visual Analysis
• Improve Organizational Integration
Planning Of Project
Advantage of GIS is often found in detailed planning of project having a large spatial
component, where analysis of the problem is a pre requisite at the start of the project.
Thematic maps generation is possible on one or more than one base maps, example: the
generation of a land use map on the basis of a soil composition, vegetation and topography.
The unique combination of certain features facilitates the creation of such thematic maps.
With the various modules within GIS it is possible to calculate surface, length, width and
distance.
Making Decisions
The adage "better information leads to better decisions" is as true for GIS as it is for other
information systems. A GIS, however, is not an automated decision making system but a tool to
query, analyze, and map data in support of the decision making process. GIS technology has
been used to assist in tasks such as presenting information at planning inquiries, helping resolve
territorial disputes, and siting pylons in such a way as to minimize visual intrusion.
Visual Analysis
Digital Terrain Modeling (DTM) is an important utility of GIS. Using DTM/3D modeling, landscape
can be better visualized, leading to a better understanding of certain relations in the landscape.
Many relevant calculations, such as (potential) lakes and water volumes, soil erosion volume
(Example: landslides), quantities of earth to be moved (channels, dams, roads, embankments,
land leveling) and hydrological modeling becomes easier.
Not only in the previously mentioned fields but also in the social sciences GIS can prove
extremely useful. Besides the process of formulating scenarios for an Environmental Impact
Assessment, GIS can be a valuable tool for sociologists to analyze administrative data such as
population distribution, market localization and other related features.
Improving Organizational Integration
Many organizations that have implemented a GIS have found that one of its main benefits is
improved management of their own organization and resources. Because GIS has the ability to
link data sets together by geography, it facilitates interdepartmental information sharing and
communication. By creating a shared database one department can benefit from the work of
another--data can be collected once and used many times.