Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Pythonlearn-13-WebServices Python

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 54

Using Web Services

Chapter 13
Data on the Web
•With the HTTP Request/Response well understood and well
supported, it became easy to retrieve documents and parse
documents over HTTP using programs.

•We needed to come up with an agreed way to represent data


going between applications and across networks

•There are two commonly used formats: XML and JSON


•There are two common formats that we use when exchanging data
across the web.

•XML: The “eXtensible Markup Language” has been in use for a very
long time and is best suited for exchanging document-style data.

•JSON: When programs just want to exchange dictionaries, lists, or other


internal information with each other, they use JavaScript Object Notation
or JSON (see www.json.org).
XML
<person>
<name>
De-Serialize
Chuck
Python </name> Java
Dictionary <phone> HashMap
303 4456
Serialize
</phone>
</person>
XML
JSON

De-Serialize
{
Python "name" : "Chuck", Java
"phone" : "303-4456"
Dictionary }
HashMap
Serialize

JSON
XML
Marking up data to send across the network...

http://en.wikipedia.org/wiki/XML
eXtensible Markup Language - XML

•XML looks very similar to HTML, but XML is more structured than
HTML.

•Here is a sample of an XML document:


<person>
<name>Chuck</name>
<phone type="intl"> +1 734 303 4456 </phone>
<email hide="yes"/>
</person>
•Often it is helpful to think of an XML document as a tree structure where
there is a top tag person and other tags such as phone are drawn
as children of their parent nodes.
XML “Elements” (or Nodes)
<people>
<person>
<name>Chuck</name>
<phone>303 4456</phone>
• Simple Element </person>
<person>
• Complex Element
<name>Noah</name>
<phone>622 7421</phone>
</person>
</people>
eXtensible Markup Language
• Primary purpose is to help information systems share structured
data

• It started as a simplified subset of the Standard Generalized


Markup Language (SGML), and is designed to be relatively
human-legible

http://en.wikipedia.org/wiki/XML
XML Basics
• Start Tag <person>
<name>Chuck</name>
• End Tag
<phone type="intl">
• Text Content +1 734 303 4456
</phone>
• Attribute
<email hide="yes" />
• Self Closing Tag </person>
White Space
<person> Line ends do not matter.
<name>Chuck</name>
White space is generally
<phone type="intl">
+1 734 303 4456
discarded on text elements.
</phone> We indent only to be
<email hide="yes" /> readable.
</person>
<person>
<name>Chuck</name>
<phone type="intl">+1 734 303 4456</phone>
<email hide="yes" />
</person>
XML Terminology
• Tags indicate the beginning and ending of elements
• Attributes - Keyword/value pairs on the opening tag of XML
• Serialize / De-Serialize - Convert data in one program into a
common format that can be stored and/or transmitted between
systems in a programming language-independent manner

http://en.wikipedia.org/wiki/Serialization
XML as a Tree
a
<a>
<b>X</b>
<c>
b c
<d>Y</d>
<e>Z</e>
</c> X d e
</a>

Elements Text Y Z
XML Text and Attributes
a
<a>
<b w="5">X</b>
<c> w
b text
c
<d>Y</d> attrib node
<e>Z</e>
</c> 5 X d e
</a>

Elements Text Y Z
XML as Paths a
<a>
<b>X</b>
b c
<c> /a/b X
<d>Y</d> /a/c/d Y
<e>Z</e> /a/c/e Z X d e
</c>
</a>
Y Z
Elements Text
Parsing XML

Here is a simple application that parses some XML and extracts some data
elements from the XML:
Method 1
import xml.etree.ElementTree as ET xml1.py
data = '''<person>
<name>Chuck</name>
<phone type="intl">
+1 734 303 4456
</phone>
<email hide="yes"/>
</person>'''

tree = ET.fromstring(data)
print('Name:', tree.find('name').text)
print('Attr:',tree.find('email').get('hide'))
This is a short tutorial for using xml.etree.ElementTree (ET in
short).

The goal is to demonstrate some of the building blocks and


basic concepts of the module.
• XML is an inherently hierarchical data format, and the
most natural way to represent it is with a tree.

ET has two classes for this purpose -ElementTree


represents the whole XML document as a tree, and
Element represents a single node in this tree.

Interactions with the whole document (reading and writing


to/from files) are usually done on the ElementTree level.
Interactions with a single XML element and its sub-
elements are done on the Element level.
Calling fromstring converts the string representation of the XML into a “tree”
of XML nodes.

When the XML is in a tree, we have a series of methods we can call to


extract portions of data from the XML.
The find function searches through the XML tree and retrieves a node that
matches the specified tag.

Each node can have some text, some attributes (like hide), and some “child”
nodes.
Method 2
p1.xml

<person>
<name>Chuck</name>
<phone type="intl"> +1 734 303 4456 </phone>
<email hide="yes"/>
</person>
xml2.py

import xml.etree.ElementTree as ET
tree = ET.parse('p1.xml')
print('Name:', tree.find('name').text)
print('Attr:',tree.find('email').get('hide'))
Looping through nodes

Often the XML has multiple nodes and we need to write a loop to process
all of the nodes.

In the following program, we loop through all of the user nodes:


import xml.etree.ElementTree as ET
xml2.py
input = '''<stuff>
<users>
<user x="2">
<id>001</id>
<name>Chuck</name>
</user>
<user x="7">
<id>009</id>
<name>Brent</name>
</user>
</users>
</stuff>'''

s = ET.fromstring(input)
lst = s.findall('users/user')
print('User count:', len(lst))
for item in lst:
print('Name', item.find('name').text)
print('Id', item.find('id').text)
print('Attribute', item.get("x"))
JavaScript Object Notation
import json json1.py
data = '''{
"name" : "Chuck",
"phone" : {
"type" : "intl",
"number" : "+1 734 303 4456" JSON represents data
}, as nested “lists” and
"email" : { “dictionaries”
"hide" : "yes"
}
}'''

info = json.loads(data)
print('Name:',info["name"])
print('Hide:',info["email"]["hide"])
import json json2.py
input = '''[
{ "id" : "001",
"x" : "2",
"name" : "Chuck"
},
{ "id" : "009", JSON represents data
"x" : "7",
"name" : "Chuck" as nested “lists” and
} “dictionaries”
]'''

info = json.loads(input)
print('User count:', len(info))
for item in info:
print('Name', item['name'])
print('Id', item['id'])
print('Attribute', item['x'])
Application Program Interface

We now have the ability to exchange data between applications


using HTTP and a way to represent complex data that we are
sending back and forth between these applications using XML or
JSON.
• The next step is to begin to define and document “contracts” between
applications using these techniques.
• The general name for these application-to-application contracts is Application
Program Interfaces or APIs.
• When we use an API, generally one program makes a set of services available
for use by other applications and publishes the APIs (i.e., the “rules”) that must
be followed to access the services provided by the program.
Service Oriented Approach

http://en.wikipedia.org/wiki/Service-oriented_architecture
•When we begin to build our programs where the functionality of our program
includes access to services provided by other programs, we call the approach a
Service-Oriented Architecture or SOA.

•A SOA approach is one where our overall application makes use of the services
of other applications.
•A non-SOA approach is where the application is a single standalone application
which contains all of the code necessary to implement the application.

•A Service-Oriented Architecture has many advantages including:

•(1) we always maintain only one copy of data

•(2) the owners of the data can set the rules about the use of their data.

•When an application makes a set of services in its API available over the web,
we call these web services.
Google Geocoding Web service
https://developers.google.com/maps/documentation/geocoding/
import urllib.request, urllib.parse, urllib.error
import json

serviceurl = 'http://maps.googleapis.com/maps/api/geocode/json?'

while True:
address = input('Enter location: ')
if len(address) < 1: break

url = serviceurl + urllib.parse.urlencode({'address': address})

print('Retrieving', url) Enter location: Ann Arbor, MI


uh = urllib.request.urlopen(url)
data = uh.read().decode() Retrieving http://maps.googleapis.com/...
print('Retrieved', len(data), 'characters') Retrieved 1669 characters
try: lat 42.2808256 lng -83.7430378
js = json.loads(data) Ann Arbor, MI, USA
except:
js = None Enter location:

if not js or 'status' not in js or js['status'] != 'OK':


print('==== Failure To Retrieve ====')
print(data)
continue

lat = js["results"][0]["geometry"]["location"]["lat"]
lng = js["results"][0]["geometry"]["location"]["lng"] geojson.py
print('lat', lat, 'lng', lng)
location = js['results'][0]['formatted_address']
print(location)
• The program takes the search string and constructs a URL with
the search string as a properly encoded parameter and then
uses urllib to retrieve the text from the Google geocoding API.
• Unlike a fixed web page, the data we get depends on the
parameters we send and the geographical data stored in
Google’s servers.
• Once we retrieve the JSON data, we parse it with the json
library and do a few checks to make sure that we received
good data, then extract the information that we are looking for.
{
"status": "OK",
"results": [
{
"geometry": {
"location_type": "APPROXIMATE",
"location": {
"lat": 42.2808256,
"lng": -83.7430378 http://maps.googleapis.com/maps/api/geocode/json?
} address=Ann+Arbor%2C+MI
},
"address_components": [
{
"long_name": "Ann Arbor",
"types": [
"locality",
"political"
],
"short_name": "Ann Arbor"
}
],
"formatted_address": "Ann Arbor, MI, USA",
"types": [
"locality",
"political"
] geojson.py
}
]
}
API Security and Rate Limiting
• The compute resources to run these APIs are not “free”

• The data provided by these APIs is usually valuable

• The data providers might limit the number of requests per day,
demand an API “key”, or even charge for usage

• They might change the rules as things progress...


import urllib.request, urllib.parse, urllib.error
import twurl
import json twitter2.py
TWITTER_URL = 'https://api.twitter.com/1.1/friends/list.json'

while True:
print('')
acct = input('Enter Twitter Account:')
if (len(acct) < 1): break
url = twurl.augment(TWITTER_URL,
{'screen_name': acct, 'count': '5'})
print('Retrieving', url)
connection = urllib.request.urlopen(url)
data = connection.read().decode()
headers = dict(connection.getheaders())
print('Remaining', headers['x-rate-limit-remaining'])
js = json.loads(data)
print(json.dumps(js, indent=4))

for u in js['users']:
print(u['screen_name'])
s = u['status']['text']
print(' ', s[:50])
Enter Twitter Account:drchuck
Retrieving https://api.twitter.com/1.1/friends ...
Remaining 14 twitter2.py
{
"users": [
{
"status": {
"text": "@jazzychad I just bought one .__.",
"created_at": "Fri Sep 20 08:36:34 +0000 2013",
},
"location": "San Francisco, California",
"screen_name": "leahculver",
"name": "Leah Culver",
},
{
"status": {
"text": "RT @WSJ: Big employers like Google ...",
"created_at": "Sat Sep 28 19:36:37 +0000 2013",
},
"location": "Victoria Canada",
"screen_name": "_valeriei",
"name": "Valerie Irvine",
],
}
Leahculver
@jazzychad I just bought one .__._
Valeriei
RT @WSJ: Big employers like Google, AT&amp;T are h
Ericbollens
RT @lukew: sneak peek: my LONG take on the good &a
halherzog
Learning Objects is 10. We had a cake with the LO,
def oauth() : hidden.py
return { "consumer_key" : "h7Lu...Ng",
"consumer_secret" : "dNKenAC3New...mmn7Q",
"token_key" : "10185562-ein2...P4GEQQOSGI",
"token_secret" : "H0ycCFemmwyf1...qoIpBo" }
import urllib
import oauth
twurl.py
import hidden

def augment(url, parameters) :


secrets = hidden.oauth()
consumer = oauth.OAuthConsumer(secrets['consumer_key'], secrets['consumer_secret'])
token = oauth.OAuthToken(secrets['token_key'],secrets['token_secret'])
oauth_request = oauth.OAuthRequest.from_consumer_and_token(consumer,
token=token, http_method='GET', http_url=url, parameters=parameters)
oauth_request.sign_request(oauth.OAuthSignatureMethod_HMAC_SHA1(), consumer, token)
return oauth_request.to_url()

https://api.twitter.com/1.1/statuses/user_timeline.json?
count=2&oauth_version=1.0&oauth_token=101...SGI&screen_name=drchuck&oauth_nonce=
09239679&oauth_timestamp=1380395644&oauth_signature=rLK...BoD&oauth_consumer_ke
y=h7Lu...GNg&oauth_signature_method=HMAC-SHA1
Summary
• Service Oriented Architecture - allows an application to be broken
into parts and distributed across a network

• An Application Program Interface (API) is a contract for interaction

• Web Services provide infrastructure for applications cooperating


(an API) over a network - SOAP and REST are two styles of web
services

• XML and JSON are serialization formats

You might also like