Pythonlearn-13-WebServices Python
Pythonlearn-13-WebServices Python
Pythonlearn-13-WebServices Python
Chapter 13
Data on the Web
•With the HTTP Request/Response well understood and well
supported, it became easy to retrieve documents and parse
documents over HTTP using programs.
•XML: The “eXtensible Markup Language” has been in use for a very
long time and is best suited for exchanging document-style data.
De-Serialize
{
Python "name" : "Chuck", Java
"phone" : "303-4456"
Dictionary }
HashMap
Serialize
JSON
XML
Marking up data to send across the network...
http://en.wikipedia.org/wiki/XML
eXtensible Markup Language - XML
•XML looks very similar to HTML, but XML is more structured than
HTML.
http://en.wikipedia.org/wiki/XML
XML Basics
• Start Tag <person>
<name>Chuck</name>
• End Tag
<phone type="intl">
• Text Content +1 734 303 4456
</phone>
• Attribute
<email hide="yes" />
• Self Closing Tag </person>
White Space
<person> Line ends do not matter.
<name>Chuck</name>
White space is generally
<phone type="intl">
+1 734 303 4456
discarded on text elements.
</phone> We indent only to be
<email hide="yes" /> readable.
</person>
<person>
<name>Chuck</name>
<phone type="intl">+1 734 303 4456</phone>
<email hide="yes" />
</person>
XML Terminology
• Tags indicate the beginning and ending of elements
• Attributes - Keyword/value pairs on the opening tag of XML
• Serialize / De-Serialize - Convert data in one program into a
common format that can be stored and/or transmitted between
systems in a programming language-independent manner
http://en.wikipedia.org/wiki/Serialization
XML as a Tree
a
<a>
<b>X</b>
<c>
b c
<d>Y</d>
<e>Z</e>
</c> X d e
</a>
Elements Text Y Z
XML Text and Attributes
a
<a>
<b w="5">X</b>
<c> w
b text
c
<d>Y</d> attrib node
<e>Z</e>
</c> 5 X d e
</a>
Elements Text Y Z
XML as Paths a
<a>
<b>X</b>
b c
<c> /a/b X
<d>Y</d> /a/c/d Y
<e>Z</e> /a/c/e Z X d e
</c>
</a>
Y Z
Elements Text
Parsing XML
Here is a simple application that parses some XML and extracts some data
elements from the XML:
Method 1
import xml.etree.ElementTree as ET xml1.py
data = '''<person>
<name>Chuck</name>
<phone type="intl">
+1 734 303 4456
</phone>
<email hide="yes"/>
</person>'''
tree = ET.fromstring(data)
print('Name:', tree.find('name').text)
print('Attr:',tree.find('email').get('hide'))
This is a short tutorial for using xml.etree.ElementTree (ET in
short).
Each node can have some text, some attributes (like hide), and some “child”
nodes.
Method 2
p1.xml
<person>
<name>Chuck</name>
<phone type="intl"> +1 734 303 4456 </phone>
<email hide="yes"/>
</person>
xml2.py
import xml.etree.ElementTree as ET
tree = ET.parse('p1.xml')
print('Name:', tree.find('name').text)
print('Attr:',tree.find('email').get('hide'))
Looping through nodes
Often the XML has multiple nodes and we need to write a loop to process
all of the nodes.
s = ET.fromstring(input)
lst = s.findall('users/user')
print('User count:', len(lst))
for item in lst:
print('Name', item.find('name').text)
print('Id', item.find('id').text)
print('Attribute', item.get("x"))
JavaScript Object Notation
import json json1.py
data = '''{
"name" : "Chuck",
"phone" : {
"type" : "intl",
"number" : "+1 734 303 4456" JSON represents data
}, as nested “lists” and
"email" : { “dictionaries”
"hide" : "yes"
}
}'''
info = json.loads(data)
print('Name:',info["name"])
print('Hide:',info["email"]["hide"])
import json json2.py
input = '''[
{ "id" : "001",
"x" : "2",
"name" : "Chuck"
},
{ "id" : "009", JSON represents data
"x" : "7",
"name" : "Chuck" as nested “lists” and
} “dictionaries”
]'''
info = json.loads(input)
print('User count:', len(info))
for item in info:
print('Name', item['name'])
print('Id', item['id'])
print('Attribute', item['x'])
Application Program Interface
http://en.wikipedia.org/wiki/Service-oriented_architecture
•When we begin to build our programs where the functionality of our program
includes access to services provided by other programs, we call the approach a
Service-Oriented Architecture or SOA.
•A SOA approach is one where our overall application makes use of the services
of other applications.
•A non-SOA approach is where the application is a single standalone application
which contains all of the code necessary to implement the application.
•(2) the owners of the data can set the rules about the use of their data.
•When an application makes a set of services in its API available over the web,
we call these web services.
Google Geocoding Web service
https://developers.google.com/maps/documentation/geocoding/
import urllib.request, urllib.parse, urllib.error
import json
serviceurl = 'http://maps.googleapis.com/maps/api/geocode/json?'
while True:
address = input('Enter location: ')
if len(address) < 1: break
lat = js["results"][0]["geometry"]["location"]["lat"]
lng = js["results"][0]["geometry"]["location"]["lng"] geojson.py
print('lat', lat, 'lng', lng)
location = js['results'][0]['formatted_address']
print(location)
• The program takes the search string and constructs a URL with
the search string as a properly encoded parameter and then
uses urllib to retrieve the text from the Google geocoding API.
• Unlike a fixed web page, the data we get depends on the
parameters we send and the geographical data stored in
Google’s servers.
• Once we retrieve the JSON data, we parse it with the json
library and do a few checks to make sure that we received
good data, then extract the information that we are looking for.
{
"status": "OK",
"results": [
{
"geometry": {
"location_type": "APPROXIMATE",
"location": {
"lat": 42.2808256,
"lng": -83.7430378 http://maps.googleapis.com/maps/api/geocode/json?
} address=Ann+Arbor%2C+MI
},
"address_components": [
{
"long_name": "Ann Arbor",
"types": [
"locality",
"political"
],
"short_name": "Ann Arbor"
}
],
"formatted_address": "Ann Arbor, MI, USA",
"types": [
"locality",
"political"
] geojson.py
}
]
}
API Security and Rate Limiting
• The compute resources to run these APIs are not “free”
• The data providers might limit the number of requests per day,
demand an API “key”, or even charge for usage
while True:
print('')
acct = input('Enter Twitter Account:')
if (len(acct) < 1): break
url = twurl.augment(TWITTER_URL,
{'screen_name': acct, 'count': '5'})
print('Retrieving', url)
connection = urllib.request.urlopen(url)
data = connection.read().decode()
headers = dict(connection.getheaders())
print('Remaining', headers['x-rate-limit-remaining'])
js = json.loads(data)
print(json.dumps(js, indent=4))
for u in js['users']:
print(u['screen_name'])
s = u['status']['text']
print(' ', s[:50])
Enter Twitter Account:drchuck
Retrieving https://api.twitter.com/1.1/friends ...
Remaining 14 twitter2.py
{
"users": [
{
"status": {
"text": "@jazzychad I just bought one .__.",
"created_at": "Fri Sep 20 08:36:34 +0000 2013",
},
"location": "San Francisco, California",
"screen_name": "leahculver",
"name": "Leah Culver",
},
{
"status": {
"text": "RT @WSJ: Big employers like Google ...",
"created_at": "Sat Sep 28 19:36:37 +0000 2013",
},
"location": "Victoria Canada",
"screen_name": "_valeriei",
"name": "Valerie Irvine",
],
}
Leahculver
@jazzychad I just bought one .__._
Valeriei
RT @WSJ: Big employers like Google, AT&T are h
Ericbollens
RT @lukew: sneak peek: my LONG take on the good &a
halherzog
Learning Objects is 10. We had a cake with the LO,
def oauth() : hidden.py
return { "consumer_key" : "h7Lu...Ng",
"consumer_secret" : "dNKenAC3New...mmn7Q",
"token_key" : "10185562-ein2...P4GEQQOSGI",
"token_secret" : "H0ycCFemmwyf1...qoIpBo" }
import urllib
import oauth
twurl.py
import hidden
https://api.twitter.com/1.1/statuses/user_timeline.json?
count=2&oauth_version=1.0&oauth_token=101...SGI&screen_name=drchuck&oauth_nonce=
09239679&oauth_timestamp=1380395644&oauth_signature=rLK...BoD&oauth_consumer_ke
y=h7Lu...GNg&oauth_signature_method=HMAC-SHA1
Summary
• Service Oriented Architecture - allows an application to be broken
into parts and distributed across a network