Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Week 6 - GIS and Spatial Data Mining

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Week 6: GIS and Spatial Data Mining

Chapter 3: Geodata Data Types

● Understanding geodata data types is key in working efficiently with


geodata.
● In geodata, the choice of data types is much more impacting.
● Transforming polygons of the shapes of countries into points is not a
trivial task, as this would require defining where you’d want to put
each point.
● This would be in the “middle,” which would often require quite costly
computations.
● The other way around, however, would not be possible anymore.
Once you have a point dataset with the centers of countries, you
would never be able to find the countries’ exact boundaries anymore.
● You see one that has a map with the contours of the countries of the
world, which has some black crosses indicating some of the
countries’ center points.
● In the second example, you see the same map, but with the polygons
deleted. You can see clearly that once the polygon information is lost,
you cannot go back to this information.
● Example 1:
● Example 2:

Vector vs. Raster Data


● The big split in geodata data types is between vector data and
raster data.
● There is a fundamental difference in how those two are organized.
● Vector data
○ Vector data is data that contains objects with the specific
coordinates of those objects.
○ Those objects are points, lines, and polygons.
○ In the two images introduced earlier, you have seen vector
data.
○ The first dataset is a polygon dataset that contains the spatial
location data of those polygons in a given coordinate system.
○ The second map shows a point dataset: it contains points and
their coordinates.
● Raster Data
○ A raster dataset works differently.
○ Raster data is like a digital image. Just like images, a raster
dataset is split into a huge number of very small squares: these
are pixels. For every pixel, a value is stored.
○ The big difference between vector and raster data is that vector
data stores only objects with their coordinates.
○ The rest of the map is effectively empty and that is not a
problem in vector data. In the real world, this is how many
objects work.
○ If you think about a transportation dataset containing
motorways and railroads, you can understand that most of the
earth is not covered in them. It is much more efficient to just
state where the objects are located (vector approach) than to
state for each pixel in the world whether or not there is a road
present (raster approach).
○ Raster data must have a value in every pixel, making it
particularly useful for representing real-world phenomena
that are not limited to a particular space.
○ One good example is an elevation map: every location on earth
has a specific height. By cutting the world into pixels, you could
assign the height of each location (wrt sea level). This would
give you a great map for relief mapping.

Dealing with Attributes in Vector and Raster


● Coordinates are generally not the only thing that you want to
know about your polygons or your raster.
● Additional information is stored in what we call the attributes of your
dataset.
● As there is a fundamental difference in working with vector and raster
data, it is interesting to understand how one would generally solve
such data storage.
● In case of vector data,
○ you will see a table that contains one row for each object
(point, line, or polygon).
○ In the columns of the table, you will see an ID and the
geographic information containing the shape and coordinates.
○ You can easily add to this any column of your choice with
additional information about this object, like a name, or any
other data that you may have about it: population size of the
country, date of creation and so on.

● For raster data


○ The storage is generally image-like.
○ Each pixel has a value. It is therefore common to store the data
as a two-dimensional table in which each row represents a
row of pixels, and each column represents a column of pixels.
○ The values in your data table represent the values of one and
only one variable.
○ Working with raster data can be a bit harder to get into, as this
image-like data format is not very accommodating to adding
additional data.
Points
● The simplest data type is probably the point and it is one of the
subtypes of vector data.
● Points are part of vector data, as each point is an object on the map
that has its own coordinates and that can have any number of
attributes necessary.
● Point datasets are great for identifying locations of specific landmarks
or other types of locations.
● Points cannot store anything like the shape of the size of
landmarks, so it is important that you use points only if you do
not need such information.
● In mathematics, a point is generally said to be an exact location that
has no length, width, or thickness.
● A point consists only of one exact location, indicated by one
coordinate pair (be it x and y, or latitude and longitude)
● Another consideration is that if you have a point object, you cannot
tell anything about its size.

Lines
● Line data is the second category of vector data in the world of
geospatial data.
● They are the logical next step after points.
● In mathematics, we generally consider straight lines that go from one
point to a second point. Lines have no width, but they do have a
length.
● In geodata, line datasets contain not just one line, but many lines.
● Line segments are straight, and therefore they only need a from
point and a to point.
● This means that a line needs two sets of coordinates (one of the first
point and one of the second point).
● Lines consist of multiple line segments, and they can therefore take
different forms, consisting of straight line segments and multiple
points.
Polygons
● Polygons are the next step in complexity after points and lines. They
are the third and last category of vector geodata.
● In mathematics, polygons are defined as two-dimensional shapes,
made up of lines that connect to make a closed shape. Examples are
triangles, rectangles, pentagons, etc.
● In geodata, the definition of the polygon is not much different. It is
simply a list of points that together make up a closed shape.
● Polygons are generally a much more realistic representation of the
real world.
● As you get to a very close-up map, you would need to represent the
landmark as a polygon (the contour) to be useful.
● Roads could be well represented by lines (remember that lines have
no width) but would have to be replaced by polygons once the map is
at a small enough scale to see houses, roads, etc.
● Polygons are the data type that has the most information as they are
able to store location (just like points and lines), length (just like
lines), and also area and perimeter.

You might also like