Making Use of Openstreetmap Data With Python
Making Use of Openstreetmap Data With Python
Andrii V. Mishkovskyi ()
1/1
I love Python I love OpenStreetMap I do map rendering at CloudMade using Python CloudMade uses OpenStreetMap data extensively
Andrii V. Mishkovskyi ()
2/1
Objectives
Understand OpenStreetMap data structure How to parse it Get a feel of how basic GIS services work
Andrii V. Mishkovskyi ()
3/1
OpenStreetMap
Founded in 2004 as a response to Ordnance Survey pricing scheme >400k registered users >16k active mappers Supported by Microsoft, MapQuest (AOL), Yahoo! Crowd-sourcing at its best
Andrii V. Mishkovskyi ()
4/1
Why OSM?
Andrii V. Mishkovskyi ()
5/1
Andrii V. Mishkovskyi ()
6/1
Andrii V. Mishkovskyi ()
7/1
Storage type
XML (.osm) Protocol buffers (.pbf, in beta status) Other formats through 3rd parties (Esri shapele, Garmin GPX, etc.)
Andrii V. Mishkovskyi ()
8/1
The data
Each object has geometry, tags and changeset information Tags are simply a list of key/value pairs Geometry denition differs for different types Changeset is not interesting when simply using the data (as opposed to editing)
Andrii V. Mishkovskyi ()
9/1
Data types
Node Geometric point or point of interest Way Collection of points Relation Collections of objects of any type
Andrii V. Mishkovskyi ()
10 / 1
Nodes
< node id = " 592637238 " lat = " 47.1675211 " lon = " 9.5 89882 " version = " 2 " changeset = " 6628391 " user = " phinret " uid = " 135921 " timestamp = " 2 1 -12 -11 T19:2 :16Z " > < tag k= " amenity " v = " bar " / > < tag k= " name " v = " Black Pearl " / > </ node >
Andrii V. Mishkovskyi ()
11 / 1
Ways
< way id =" 4781367 " version = " 1 " changeset = " 1 226 " uid = " 871 " user = " murmel " timestamp = " 2 7 - 6 -19 T 6:25:57Z " > < nd ref = " 3 6 4 7 " / > < nd ref = " 3 6 4 15 " / > < nd ref = " 3 6 4 17 " / > < nd ref = " 3 6 4 19 " / > < nd ref = " 3 6 4 2 " / > < tag k= " created_by " v = " JOSM " / > < tag k= " highway " v = " residential " / > < tag k= " name " v = " In den usseren " / > </ way >
Andrii V. Mishkovskyi ()
12 / 1
Relations
< relation id = " 16239 " version = " 699 " changeset = " 844 52 " uid = " 1224 6 " user = " hanskuster " timestamp = " 2 11 - 6 -14 T18:53:49Z " > < member type = " way " ref = " 75393767 " role = " outer " / > < member type = " way " ref = " 75393837 " role = " outer " / > < member type = " way " ref = " 75393795 " role = " outer " / > ... < member type = " way " ref = " 75393788 " role = " outer " / > < tag k= " admin_level " v = " 2 " / > < tag k= " boundary " v = " administrative " / > < tag k= " currency " v = " EUR " / > < tag k= " is_in " v = " Europe " / > < tag k= " ISO3166 -1 " v = " AT " / > < tag k= " name " v = " sterreich " / > ... < tag k= " wikipedia:de " v = " sterreich " / > < tag k= " wikipedia:en " v = " Austria " / > </ relation >
Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 13 / 1
Andrii V. Mishkovskyi ()
14 / 1
Andrii V. Mishkovskyi ()
15 / 1
Andrii V. Mishkovskyi ()
16 / 1
Parsing data
Andrii V. Mishkovskyi ()
17 / 1
Parsing data
Projection
import pyproj projection = pyproj . Proj ( + proj = merc + a =6378137 + b =6378137 + lat_ts = . + lon_ = . + x_ = . + y_ = +k =1. + units = m + nadgrids = @null + wktext + no_defs )
Andrii V. Mishkovskyi ()
18 / 1
Parsing data
Nodes
from shapely . geometry import Point class Node ( object ): def __init__ ( self , id , lonlat , tags ): self . id = id self . geometry = Point ( projection (* lonlat )) self . tags = tags
Andrii V. Mishkovskyi ()
19 / 1
Parsing data
Nodes
class SimpleHandler ( sax . handler . ContentHandler ): def __init__ ( self ): sax . handler . ContentHandler . __init__ ( self ) self . id = None self . geometry = None self . nodes = {} def startElement ( self , name , attrs ): if name == node : self . id = attrs [ id ] self . tags = {} self . geometry = map ( float , ( attrs [ lon ] , attrs [ lat ])) elif name == tag : self . tags [ attrs [ k ]] = attrs [ v ]
Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 19 / 1
Parsing data
Nodes
def endElement ( self , name ): if name == node : self . nodes [ self . id ] = Node ( self . id , self . geometry , self . tags ) self . id = None self . geometry = None self . tags = None
Andrii V. Mishkovskyi ()
19 / 1
Parsing data
Ways
from shapely . geometry import LineString nodes = {...} # dict of nodes , keyed by their ids class Way ( object ): def __init__ ( self , id , refs , tags ): self . id = id self . geometry = LineString ( [( nodes [ ref ]. x , nodes [ ref ]. y ) for ref in refs ]) self . tags = tags
Andrii V. Mishkovskyi ()
20 / 1
Parsing data
Ways
class SimpleHandler ( sax . handler . ContentHandler ): def __init__ ( self ): ... self . ways = {} def startElement ( self , name , attrs ): if name == way : self . id = attrs [ id ] self . tags = {} self . geometry = [] elif name == nd : self . geometry . append ( attrs [ ref ])
Andrii V. Mishkovskyi ()
20 / 1
Parsing data
Ways
def reset ( self ): self . id = None self . geometry = None self . tags = None def endElement ( self , name ): if name == way : self . way [ self . id ] = Way ( self . id , self . geometry , self . tags ) self . reset ()
Andrii V. Mishkovskyi ()
20 / 1
Parsing data
Relations
from shapely . geometry import MultiPolygon , MultiLineString , ways = {...} # dict of ways , with ids as keys class Relation ( object ): def __init__ ( self , id , members , tags ): self . id = id self . tags = tags if tags [ type ] == multipolygon : outer = [ ways [ member [ ref ]] for member in members if member [ role ] == outer ] inner = [ ways [ member [ ref ]] for member in members if member [ role ] == inner ] self . geometry = MultiPolygon ([( outer , inner )])
Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 21 / 1
Parsing data
Relations
Andrii V. Mishkovskyi ()
21 / 1
Andrii V. Mishkovskyi ()
22 / 1
The idea is simple The implementation can use ElementTree if you work with small extracts of data Have to stick to SAX when parsing huge extracts or the whole planet data
Andrii V. Mishkovskyi ()
23 / 1
Existing solutions
Andrii V. Mishkovskyi ()
24 / 1
Andrii V. Mishkovskyi ()
25 / 1
Andrii V. Mishkovskyi ()
26 / 1
Principles
Andrii V. Mishkovskyi ()
27 / 1
Layers
Not exactly physical layers Layers of graphical representation Dont render text in several layers
Andrii V. Mishkovskyi ()
28 / 1
Split your data in layers Make projection congurable Provide general way to select data sources Think about cartographers
Andrii V. Mishkovskyi ()
29 / 1
import mapnik map = mapnik . Map (1 , 1 ) mapnik . load_map ( map , " style . xml " ) bbox = mapnik . Envelope ( mapnik . Coord ( -18 . , -9 . ) , mapnik . Coord (18 . , 9 . )) map . zoom_to_box ( bbox ) mapnik . render_to_file ( map , map . png , png )
Andrii V. Mishkovskyi ()
30 / 1
Magic?
Andrii V. Mishkovskyi ()
31 / 1
Mapniks XML
< Style name = " Simple " > < Rule > < PolygonSymbolizer > < CssParameter name = " fill " ># f2eff9 </ CssParameter > </ PolygonSymbolizer > < LineSymbolizer > < CssParameter name = " stroke " > red </ CssParameter > < CssParameter name = " stroke - width " > .1 </ CssParameter > </ LineSymbolizer > </ Rule > </ Style >
Andrii V. Mishkovskyi ()
32 / 1
Mapniks XML
< Layer name = " world " srs = " + proj = latlong + datum = WGS84 " > < StyleName > My Style </ StyleName > < Datasource > < Parameter name = " type " > shape </ Parameter > < Parameter name = " file " > world_borders </ Parameter > </ Datasource > </ Layer >
Andrii V. Mishkovskyi ()
32 / 1
Andrii V. Mishkovskyi ()
33 / 1
Andrii V. Mishkovskyi ()
34 / 1
Whats that?
Andrii V. Mishkovskyi ()
35 / 1
Why is it hard?
Fuzzy search Order matters But not always One place can have many names One name can correspond to many places People dont care about this at all!
Andrii V. Mishkovskyi ()
36 / 1
Why is it hard?
I blame Google.
Andrii V. Mishkovskyi ()
36 / 1
Attempt at implementation
Put restrictions Make the request structured Or at least assume order Assume valid input from users
Andrii V. Mishkovskyi ()
37 / 1
Attempt at implementation
def geocode (** query ): boundary = world for key in [ country , zip , city , street , housenumber ]: try : value = query [ key ] boundary = find ( key , value , boundary ) except KeyError : continue return boundary def find ( key , value , boundary ): for tags , geometry in data : if geometry in boundary and \ tags . get ( key ) == value : return geometry
Andrii V. Mishkovskyi ()
37 / 1
Soundex/Metaphone/DoubleMetaphone Phonetic algorithms Works in 90% of the cases If your language is English Doesnt work well for placenames
Andrii V. Mishkovskyi ()
38 / 1
Andrii V. Mishkovskyi ()
38 / 1
Andrii V. Mishkovskyi ()
38 / 1
N-grams Substrings of n items from the search string Easier to index than edit distance Gives less false positives than phonetic algorithm Trigrams most commonly used
Andrii V. Mishkovskyi ()
38 / 1
Andrii V. Mishkovskyi ()
38 / 1
Normalize input: remove the, a, . . . Use existing free-form search solution Combine ranks from different sources
Andrii V. Mishkovskyi ()
39 / 1
Andrii V. Mishkovskyi ()
39 / 1
Andrii V. Mishkovskyi ()
40 / 1
Andrii V. Mishkovskyi ()
41 / 1
The problem
When introduced with routing problem, people think Build graph, use Dijsktra, youre done! (And they are mostly right)
Andrii V. Mishkovskyi ()
42 / 1
The problem
Not that simple Graph is sparse Graph has to be updated often Dijkstra algorithm is too general A* is no better
Andrii V. Mishkovskyi ()
42 / 1
The problem
Routing is not only a technical problem Different people expect different results for the same input Routing through cities is always a bad choice (even if its projected to be faster)
Andrii V. Mishkovskyi ()
42 / 1
Adjacency matrix is not space-efcient The graph representation has to very compact networkx and igraph are both pretty good for a start
Andrii V. Mishkovskyi ()
43 / 1
Andrii V. Mishkovskyi ()
43 / 1
There is no silver bullet No matter how nice these libs are, importing even Europe will require more than 20 GB of RAM Splitting data into country graphs is not enough Our in-house C++ graph library requires 20GB of mem for the whole world
Andrii V. Mishkovskyi ()
43 / 1
Other solutions
PgRouting easier to start with, couldnt make it fast, harder to congure Neo4j tried 2 years ago, proved to be lacking when presented with huge sparse graphs Eat your own dogfood if doing serious business, most probably the best solution. Half-wink.
Andrii V. Mishkovskyi ()
44 / 1
Bored already?
Andrii V. Mishkovskyi ()
45 / 1
Andrii V. Mishkovskyi ()
45 / 1
Highlights
Start using OpenStreetMap data its easy Try building something simple its cool Try building something cool its simple Python is one of the best languages [for doing GIS]
Andrii V. Mishkovskyi ()
46 / 1
Questions?
Andrii V. Mishkovskyi ()
47 / 1