Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
137 views

Making Use of Openstreetmap Data With Python

The document discusses parsing and rendering OpenStreetMap data using Python and Mapnik. It describes the node, way and relation data structures in OSM XML, and provides code examples to parse the data and build geometry objects using SAX. It also overview Mapnik's simple interface for rendering maps from an XML style specification.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views

Making Use of Openstreetmap Data With Python

The document discusses parsing and rendering OpenStreetMap data using Python and Mapnik. It describes the node, way and relation data structures in OSM XML, and provides code examples to parse the data and build geometry objects using SAX. It also overview Mapnik's simple interface for rendering maps from an XML style specification.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Using OpenStreetMap data with Python

Andrii V. Mishkovskyi June 22, 2011

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

1/1

Who is this dude anyway?

I love Python I love OpenStreetMap I do map rendering at CloudMade using Python CloudMade uses OpenStreetMap data extensively

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

2/1

Objectives

Understand OpenStreetMap data structure How to parse it Get a feel of how basic GIS services work

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

3/1

OpenStreetMap

Founded in 2004 as a response to Ordnance Survey pricing scheme >400k registered users >16k active mappers Supported by Microsoft, MapQuest (AOL), Yahoo! Crowd-sourcing at its best

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

4/1

Why OSM?

Fairly easy Good quality Growing community Absolutely free

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

5/1

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

6/1

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

7/1

Storage type

XML (.osm) Protocol buffers (.pbf, in beta status) Other formats through 3rd parties (Esri shapele, Garmin GPX, etc.)

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

8/1

The data

Each object has geometry, tags and changeset information Tags are simply a list of key/value pairs Geometry denition differs for different types Changeset is not interesting when simply using the data (as opposed to editing)

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

9/1

Data types

Node Geometric point or point of interest Way Collection of points Relation Collections of objects of any type

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

10 / 1

Nodes

< node id = " 592637238 " lat = " 47.1675211 " lon = " 9.5 89882 " version = " 2 " changeset = " 6628391 " user = " phinret " uid = " 135921 " timestamp = " 2 1 -12 -11 T19:2 :16Z " > < tag k= " amenity " v = " bar " / > < tag k= " name " v = " Black Pearl " / > </ node >

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

11 / 1

Ways
< way id =" 4781367 " version = " 1 " changeset = " 1 226 " uid = " 871 " user = " murmel " timestamp = " 2 7 - 6 -19 T 6:25:57Z " > < nd ref = " 3 6 4 7 " / > < nd ref = " 3 6 4 15 " / > < nd ref = " 3 6 4 17 " / > < nd ref = " 3 6 4 19 " / > < nd ref = " 3 6 4 2 " / > < tag k= " created_by " v = " JOSM " / > < tag k= " highway " v = " residential " / > < tag k= " name " v = " In den usseren " / > </ way >

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

12 / 1

Relations
< relation id = " 16239 " version = " 699 " changeset = " 844 52 " uid = " 1224 6 " user = " hanskuster " timestamp = " 2 11 - 6 -14 T18:53:49Z " > < member type = " way " ref = " 75393767 " role = " outer " / > < member type = " way " ref = " 75393837 " role = " outer " / > < member type = " way " ref = " 75393795 " role = " outer " / > ... < member type = " way " ref = " 75393788 " role = " outer " / > < tag k= " admin_level " v = " 2 " / > < tag k= " boundary " v = " administrative " / > < tag k= " currency " v = " EUR " / > < tag k= " is_in " v = " Europe " / > < tag k= " ISO3166 -1 " v = " AT " / > < tag k= " name " v = " sterreich " / > ... < tag k= " wikipedia:de " v = " sterreich " / > < tag k= " wikipedia:en " v = " Austria " / > </ relation >
Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 13 / 1

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

14 / 1

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

15 / 1

Major points when parsing OSM


Expect faulty data Parse iteratively Cache extensively Order of elements is not guaranteed But its generally: nodes, ways, relations Ids are unique to datatype, not to the whole data set

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

16 / 1

Parsing data

Using SAX Doing simple reprojection Create geometries using Shapely

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

17 / 1

Parsing data
Projection

import pyproj projection = pyproj . Proj ( + proj = merc + a =6378137 + b =6378137 + lat_ts = . + lon_ = . + x_ = . + y_ = +k =1. + units = m + nadgrids = @null + wktext + no_defs )

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

18 / 1

Parsing data
Nodes

from shapely . geometry import Point class Node ( object ): def __init__ ( self , id , lonlat , tags ): self . id = id self . geometry = Point ( projection (* lonlat )) self . tags = tags

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

19 / 1

Parsing data
Nodes

class SimpleHandler ( sax . handler . ContentHandler ): def __init__ ( self ): sax . handler . ContentHandler . __init__ ( self ) self . id = None self . geometry = None self . nodes = {} def startElement ( self , name , attrs ): if name == node : self . id = attrs [ id ] self . tags = {} self . geometry = map ( float , ( attrs [ lon ] , attrs [ lat ])) elif name == tag : self . tags [ attrs [ k ]] = attrs [ v ]
Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 19 / 1

Parsing data
Nodes

def endElement ( self , name ): if name == node : self . nodes [ self . id ] = Node ( self . id , self . geometry , self . tags ) self . id = None self . geometry = None self . tags = None

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

19 / 1

Parsing data
Ways

from shapely . geometry import LineString nodes = {...} # dict of nodes , keyed by their ids class Way ( object ): def __init__ ( self , id , refs , tags ): self . id = id self . geometry = LineString ( [( nodes [ ref ]. x , nodes [ ref ]. y ) for ref in refs ]) self . tags = tags

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

20 / 1

Parsing data
Ways

class SimpleHandler ( sax . handler . ContentHandler ): def __init__ ( self ): ... self . ways = {} def startElement ( self , name , attrs ): if name == way : self . id = attrs [ id ] self . tags = {} self . geometry = [] elif name == nd : self . geometry . append ( attrs [ ref ])

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

20 / 1

Parsing data
Ways

def reset ( self ): self . id = None self . geometry = None self . tags = None def endElement ( self , name ): if name == way : self . way [ self . id ] = Way ( self . id , self . geometry , self . tags ) self . reset ()

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

20 / 1

Parsing data
Relations

from shapely . geometry import MultiPolygon , MultiLineString , ways = {...} # dict of ways , with ids as keys class Relation ( object ): def __init__ ( self , id , members , tags ): self . id = id self . tags = tags if tags [ type ] == multipolygon : outer = [ ways [ member [ ref ]] for member in members if member [ role ] == outer ] inner = [ ways [ member [ ref ]] for member in members if member [ role ] == inner ] self . geometry = MultiPolygon ([( outer , inner )])
Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 21 / 1

Parsing data
Relations

The importing code is left as an exercise for the reader

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

21 / 1

For language zealots

Excuse me for not using namedtuples.

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

22 / 1

Parsing data: homework

The idea is simple The implementation can use ElementTree if you work with small extracts of data Have to stick to SAX when parsing huge extracts or the whole planet data

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

23 / 1

Existing solutions

Osmosis osm2pgsql osm2mongo, osm2shp, etc.

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

24 / 1

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

25 / 1

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

26 / 1

Principles

Scale Projection Cartography Types of maps

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

27 / 1

Layers

Not exactly physical layers Layers of graphical representation Dont render text in several layers

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

28 / 1

How to approach rendering

Split your data in layers Make projection congurable Provide general way to select data sources Think about cartographers

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

29 / 1

The magic of Mapnik

import mapnik map = mapnik . Map (1 , 1 ) mapnik . load_map ( map , " style . xml " ) bbox = mapnik . Envelope ( mapnik . Coord ( -18 . , -9 . ) , mapnik . Coord (18 . , 9 . )) map . zoom_to_box ( bbox ) mapnik . render_to_file ( map , map . png , png )

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

30 / 1

Magic?

Mapniks interface is straightforward The implementation is not Complexity is hidden in XML

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

31 / 1

Mapniks XML
< Style name = " Simple " > < Rule > < PolygonSymbolizer > < CssParameter name = " fill " ># f2eff9 </ CssParameter > </ PolygonSymbolizer > < LineSymbolizer > < CssParameter name = " stroke " > red </ CssParameter > < CssParameter name = " stroke - width " > .1 </ CssParameter > </ LineSymbolizer > </ Rule > </ Style >

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

32 / 1

Mapniks XML

< Layer name = " world " srs = " + proj = latlong + datum = WGS84 " > < StyleName > My Style </ StyleName > < Datasource > < Parameter name = " type " > shape </ Parameter > < Parameter name = " file " > world_borders </ Parameter > </ Datasource > </ Layer >

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

32 / 1

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

33 / 1

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

34 / 1

Whats that?

Codename geocoding Similar to magnets Fast or correct choose one

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

35 / 1

Why is it hard?

Fuzzy search Order matters But not always One place can have many names One name can correspond to many places People dont care about this at all!

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

36 / 1

Why is it hard?

I blame Google.

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

36 / 1

Attempt at implementation

Put restrictions Make the request structured Or at least assume order Assume valid input from users

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

37 / 1

Attempt at implementation
def geocode (** query ): boundary = world for key in [ country , zip , city , street , housenumber ]: try : value = query [ key ] boundary = find ( key , value , boundary ) except KeyError : continue return boundary def find ( key , value , boundary ): for tags , geometry in data : if geometry in boundary and \ tags . get ( key ) == value : return geometry

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

37 / 1

Fixing user input

Soundex/Metaphone/DoubleMetaphone Phonetic algorithms Works in 90% of the cases If your language is English Doesnt work well for placenames

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

38 / 1

Fixing user input


from itertools import groupby def soundex ( word ): table = { b : 1 , f : 1 , p : 1 , v : 1 , c : 2 , g : 2 , j : 2 , ...} yield word [ ] codes = ( table [ char ] for char in word [1:] if char in table ) for code in groupby ( codes ): yield code

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

38 / 1

Fixing user input


Edit distance Works for two words Most geocoding requests consist of several words Scanning database for each pair distance isnt feasible Unless you have it cached already Check out Peter Norvigs How to Write Spelling a Corrector article

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

38 / 1

Fixing user input

N-grams Substrings of n items from the search string Easier to index than edit distance Gives less false positives than phonetic algorithm Trigrams most commonly used

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

38 / 1

Fixing user input


from itertools import izip , islice , tee def nwise ( iterable , count =2): iterables = enumerate ( tee ( iterable , count )) return izip (*[ islice ( iterable , start , None ) for start , iterables in iterables ]) def trigrams ( string ): string = . join ([ , string , ]). lower () return nwise ( string , 3)

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

38 / 1

Making the search free-form

Normalize input: remove the, a, . . . Use existing free-form search solution Combine ranks from different sources

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

39 / 1

Making the search free-form


from operator import itemgetter from collections import defaultdict def freeform ( string ): ranks = defaultdict ( float ) searchfuncs = [( phonetic , .3) , ( levenshtein , .15) , ( trigrams , .55)] for searchfunc , coef in searchfuncs : for match , rank in searchfunc ( string ): ranks [ match ] += rank * coef return max ( ranks . iteritems () , key = itemgetter (1))

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

39 / 1

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

40 / 1

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

41 / 1

The problem

When introduced with routing problem, people think Build graph, use Dijsktra, youre done! (And they are mostly right)

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

42 / 1

The problem

Not that simple Graph is sparse Graph has to be updated often Dijkstra algorithm is too general A* is no better

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

42 / 1

The problem

Routing is not only a technical problem Different people expect different results for the same input Routing through cities is always a bad choice (even if its projected to be faster)

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

42 / 1

Building the graph

Adjacency matrix is not space-efcient The graph representation has to very compact networkx and igraph are both pretty good for a start

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

43 / 1

Building the graph


from networkx import Graph , shortest_path ... def build_graph ( ways ): graph = Graph () for way , tags in ways : for segment in nwise ( way . coords ): weight = length ( segment ) * coef ( tags ) graph . add_edge ( segment [ ] , segment [1] , weight = weight ) return graph shortest_path ( graph , source , dest )

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

43 / 1

Building the graph

There is no silver bullet No matter how nice these libs are, importing even Europe will require more than 20 GB of RAM Splitting data into country graphs is not enough Our in-house C++ graph library requires 20GB of mem for the whole world

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

43 / 1

Other solutions

PgRouting easier to start with, couldnt make it fast, harder to congure Neo4j tried 2 years ago, proved to be lacking when presented with huge sparse graphs Eat your own dogfood if doing serious business, most probably the best solution. Half-wink.

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

44 / 1

Bored already?

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

45 / 1

Lighten up, Im done

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

45 / 1

Highlights

Start using OpenStreetMap data its easy Try building something simple its cool Try building something cool its simple Python is one of the best languages [for doing GIS]

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

46 / 1

Questions?

contact@mishkovskyi.net Slides: mishkovskyi.net/ep2 11

Andrii V. Mishkovskyi ()

Using OpenStreetMap data with Python

June 22, 2011

47 / 1

You might also like