100% found this document useful (3 votes)

17 views

Get Advanced Data Analytics Using Python: With Machine Learning, Deep Learning and NLP Examples Mukhopadhyay PDF ebook with Full Chapters Now

Data

Uploaded by

azmiinfraann

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

17 views

Get Advanced Data Analytics Using Python: With Machine Learning, Deep Learning and NLP Examples Mukhopadhyay PDF ebook with Full Chapters Now

Data

Uploaded by

azmiinfraann

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Experience Seamless Full Ebook Downloads for Every Genre at textbookfull.

com

Advanced Data Analytics Using Python: With Machine

Learning, Deep Learning and NLP Examples
Mukhopadhyay

https://textbookfull.com/product/advanced-data-analytics-
using-python-with-machine-learning-deep-learning-and-nlp-
examples-mukhopadhyay/

OR CLICK BUTTON

DOWNLOAD NOW

Explore and download more ebook at https://textbookfull.com

Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Deep Learning with Python Develop Deep Learning Models on

Theano and TensorFLow Using Keras Jason Brownlee

https://textbookfull.com/product/deep-learning-with-python-develop-
deep-learning-models-on-theano-and-tensorflow-using-keras-jason-
brownlee/
textboxfull.com

Mastering Machine Learning with Python in Six Steps: A

Practical Implementation Guide to Predictive Data
Analytics Using Python 1st Edition Manohar Swamynathan
(Auth.)
https://textbookfull.com/product/mastering-machine-learning-with-
python-in-six-steps-a-practical-implementation-guide-to-predictive-
data-analytics-using-python-1st-edition-manohar-swamynathan-auth/
textboxfull.com

Python Natural Language Processing Advanced machine

learning and deep learning techniques for natural language
processing 1st Edition Jalaj Thanaki
https://textbookfull.com/product/python-natural-language-processing-
advanced-machine-learning-and-deep-learning-techniques-for-natural-
language-processing-1st-edition-jalaj-thanaki/
textboxfull.com

Data Processing with Optimus: Supercharge big data

preparation tasks for analytics and machine learning with
Optimus using Dask and PySpark Leon
https://textbookfull.com/product/data-processing-with-optimus-
supercharge-big-data-preparation-tasks-for-analytics-and-machine-
learning-with-optimus-using-dask-and-pyspark-leon/
textboxfull.com
Practical Machine Learning for Data Analysis Using Python
1st Edition Abdulhamit Subasi

https://textbookfull.com/product/practical-machine-learning-for-data-
analysis-using-python-1st-edition-abdulhamit-subasi/

textboxfull.com

Data Science and Machine Learning Interview Questions

Using Python Second Edition Vishwanathan Narayanan

https://textbookfull.com/product/data-science-and-machine-learning-
interview-questions-using-python-second-edition-vishwanathan-
narayanan/
textboxfull.com

Introducing Data Science Big Data Machine Learning and

more using Python tools 1st Edition Davy Cielen

https://textbookfull.com/product/introducing-data-science-big-data-
machine-learning-and-more-using-python-tools-1st-edition-davy-cielen/

textboxfull.com

Beginning Anomaly Detection Using Python-Based Deep

Learning: With Keras and PyTorch Sridhar Alla

https://textbookfull.com/product/beginning-anomaly-detection-using-
python-based-deep-learning-with-keras-and-pytorch-sridhar-alla/

textboxfull.com

Learn PySpark: Build python-based machine learning and

deep learning models 1st Edition Pramod Singh

https://textbookfull.com/product/learn-pyspark-build-python-based-
machine-learning-and-deep-learning-models-1st-edition-pramod-singh/

textboxfull.com
Advanced
Data Analytics
Using Python
With Machine Learning,
Deep Learning and NLP Examples
—
Sayan Mukhopadhyay
Advanced Data
Analytics Using
Python
With Machine Learning, Deep
Learning and NLP Examples

Sayan Mukhopadhyay
Advanced Data Analytics Using Python
Sayan Mukhopadhyay
Kolkata, West Bengal, India

ISBN-13 (pbk): 978-1-4842-3449-5 ISBN-13 (electronic): 978-1-4842-3450-1

https://doi.org/10.1007/978-1-4842-3450-1
Library of Congress Control Number: 2018937906

Copyright © 2018 by Sayan Mukhopadhyay

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical
way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter
developed.
Trademarked names, logos, and images may appear in this book. Rather than use a
trademark symbol with every occurrence of a trademarked name, logo, or image we use the
names, logos, and images only in an editorial fashion and to the benefit of the trademark
owner, with no intention of infringement of the trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms,
even if they are not identified as such, is not to be taken as an expression of opinion as to
whether or not they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the
date of publication, neither the authors nor the editors nor the publisher can accept any
legal responsibility for any errors or omissions that may be made. The publisher makes no
warranty, express or implied, with respect to the material contained herein.
Managing Director, Apress Media LLC: Welmoed Spahr
Acquisitions Editor: Celestin
Development Editor: Matthew Moodie
Coordinating Editor: Divya Modi
Cover designed by eStudioCalamar
Cover image designed by Freepik (www.freepik.com)
Distributed to the book trade worldwide by Springer Science+Business Media New York,
233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201)
348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress
Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business
Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
For information on translations, please e-mail rights@apress.com, or visit www.apress.
com/rights-permissions.
Apress titles may be purchased in bulk for academic, corporate, or promotional use.
eBook versions and licenses are also available for most titles. For more information,
reference our Print and eBook Bulk Sales web page at www.apress.com/bulk-sales.
Any source code or other supplementary material referenced by the author in this
book is available to readers on GitHub via the book’s product page, located at
www.apress.com/978-1-4842-3449-5. For more detailed information, please visit
www.apress.com/source-code.
Printed on acid-free paper
This is dedicated to all my math teachers,
especially to Kalyan Chakraborty.
Table of Contents
About the Author��xi
About the Technical Reviewer��xiii
Acknowledgments��xv

Chapter 1: Introduction��1
Why Python?��1
When to Avoid Using Python��2
OOP in Python��3
Calling Other Languages in Python��12
Exposing the Python Model as a Microservice��14
High-Performance API and Concurrent Programming��17

Chapter 2: ETL with Python (Structured Data)��23

MySQL��23
How to Install MySQLdb?��23
Database Connection��24
INSERT Operation��24
READ Operation��25
DELETE Operation��26
UPDATE Operation��27
COMMIT Operation��28
ROLL-BACK Operation��28

v
Table of Contents

Elasticsearch��31
Connection Layer API��33
Neo4j Python Driver��34
neo4j-rest-client��35
In-Memory Database��35
MongoDB (Python Edition)��36
Import Data into the Collection��36
Create a Connection Using pymongo��37
Access Database Objects��37
Insert Data��38
Update Data��38
Remove Data��38
Pandas��38
ETL with Python (Unstructured Data)��40
E-mail Parsing��40
Topical Crawling��42

Chapter 3: Supervised Learning Using Python��49

Dimensionality Reduction with Python��49
Correlation Analysis��50
Principal Component Analysis��53
Mutual Information��56
Classifications with Python��57
Semisupervised Learning��58
Decision Tree��59
Which Attribute Comes First?��59
Random Forest Classifier��60

vi
Table of Contents

Naive Bayes Classifier��61

Support Vector Machine��62
Nearest Neighbor Classifier��64
Sentiment Analysis��65
Image Recognition��67
Regression with Python��67
Least Square Estimation��68
Logistic Regression��69
Classification and Regression��70
Intentionally Bias the Model to Over-Fit or Under-Fit��71
Dealing with Categorical Data��73

Chapter 4: Unsupervised Learning: Clustering��77

K-Means Clustering��78
Choosing K: The Elbow Method��82
Distance or Similarity Measure��82
Properties��82
General and Euclidean Distance��83
Squared Euclidean Distance��84
Distance Between String-Edit Distance��85
Similarity in the Context of Document��87
Types of Similarity��87
What Is Hierarchical Clustering?��88
Bottom-Up Approach��89
Distance Between Clusters��90
Top-Down Approach��92
Graph Theoretical Approach��97
How Do You Know If the Clustering Result Is Good?��97

vii
Table of Contents

Chapter 5: Deep Learning and Neural Networks��99

Backpropagation��100
Backpropagation Approach��100
Generalized Delta Rule��100
Update of Output Layer Weights��101
Update of Hidden Layer Weights��102
BPN Summary��103
Backpropagation Algorithm��104
Other Algorithms��106
TensorFlow��106
Recurrent Neural Network��113

Chapter 6: Time Series��121

Classification of Variation��121
Analyzing a Series Containing a Trend��121
Curve Fitting��122
Removing Trends from a Time Series��123
Analyzing a Series Containing Seasonality��124
Removing Seasonality from a Time Series��125
By Filtering��125
By Differencing��126
Transformation��126
To Stabilize the Variance��126
To Make the Seasonal Effect Additive��127
To Make the Data Distribution Normal��127
Stationary Time Series��128
Stationary Process��128
Autocorrelation and the Correlogram��129
Estimating Autocovariance and Autocorrelation Functions��129

viii
Table of Contents

Time-Series Analysis with Python��130

Useful Methods��131
Autoregressive Processes��133
Estimating Parameters of an AR Process��134
Mixed ARMA Models��137
Integrated ARMA Models��138
The Fourier Transform��140
An Exceptional Scenario��141
Missing Data��143

Chapter 7: Analytics at Scale��145

Hadoop��145
MapReduce Programming��145
Partitioning Function��146
Combiner Function��147
HDFS File System��159
MapReduce Design Pattern��159
Spark��166
Analytics in the Cloud��168
Internet of Things��179

Index��181

ix
About the Author
Sayan Mukhopadhyay has more than
13 years of industry experience and has been
associated with companies such as Credit
Suisse, PayPal, CA Technologies, CSC, and
Mphasis. He has a deep understanding of
applications for data analysis in domains such
as investment banking, online payments,
online advertisement, IT infrastructure, and
retail. His area of expertise is in applying
high-performance computing in distributed
and data-driven environments such as real-time analysis, high-frequency
trading, and so on.
He earned his engineering degree in electronics and instrumentation
from Jadavpur University and his master’s degree in research in
computational and data science from IISc in Bangalore.

xi
About the Technical Reviewer
Sundar Rajan Raman has more than 14 years
of full stack IT experience in machine
learning, deep learning, and natural
language processing. He has six years
of big data development and architect
experience, including working with Hadoop
and its ecosystems as well as other NoSQL
technologies such as MongoDB and
Cassandra. In fact, he has been the technical
reviewer of several books on these topics.
He is also interested in strategizing using Design Thinking principles
and in coaching and mentoring people.

xiii
Acknowledgments
Thanks to Labonic Chakraborty (Ripa) and Kusumika Mukherjee.

xv
CHAPTER 1

Introduction
In this book, I assume that you are familiar with Python programming.
In this introductory chapter, I explain why a data scientist should choose
Python as a programming language. Then I highlight some situations
where Python is not a good choice. Finally, I describe some good practices
in application development and give some coding examples that a data
scientist needs in their day-to-day job.

W
hy Python?
So, why should you choose Python?

• It has versatile libraries. You always have a ready-

made library in Python for any kind of application.
From statistical programming to deep learning to
network application to web crawling to embedded
systems, you will always have a ready-made library in
Python. If you learn this language, you do not have to
stick to a specific use case. R has a rich set of analytics
libraries, but if you are working on an Internet of Things
(IoT) application and need to code in a device-side
embedded system, it will be difficult in R.

© Sayan Mukhopadhyay 2018 1

S. Mukhopadhyay, Advanced Data Analytics Using Python,
https://doi.org/10.1007/978-1-4842-3450-1_1
Chapter 1 Introduction

• It is very high performance. Java is also a versatile

language and has lots of libraries, but Java code runs
on a Java virtual machine, which adds an extra layer
of latency. Python uses high-performance libraries
built in other languages. For example, SciPy uses
LAPACK, which is a Fortran library for linear algebra
applications. TensorFlow uses CUDA, which is a C
library for parallel GPU processing.

• It is simple and gives you a lot of freedom to code.

Python syntax is just like a natural language. It is easy to
remember, and it does not have constraints in variables
(like constants or public/private).

When to Avoid Using Python

Python has some downsides too.

• When you are writing very specific code, Python may

not always be the best choice. For example, if you are
writing code that deals only with statistics, R is a better
choice. If you are writing MapReduce code only, Java is
a better choice than Python.

• Python gives you a lot of freedom in coding. So, when

many developers are working on a large application,
Java/C++ is a better choice so that one developer/
architect can put constraints on another developer’s
code using public/private and constant keywords.

• For extremely high-performance applications, there is

no alternative to C/C++.

2
Chapter 1 Introduction

O
OP in Python
Before proceeding, I will explain some features of object-oriented
programming (OOP) in a Python context.
The most basic element of any modern application is an object. To
a programmer or architect, the world is a collection of objects. Objects
consist of two types of members: attributes and methods. Members can be
private, public, or protected. Classes are data types of objects. Every object
is an instance of a class. A class can be inherited in child classes. Two
classes can be associated using composition.
In a Python context, Python has no keywords for public, private, or
protected, so encapsulation (hiding a member from the outside world)
is not implicit in Python. Like C++, it supports multilevel and multiple
inheritance. Like Java, it has an abstract keyword. Classes and methods
both can be abstract.
The following code is an example of a generic web crawler that is
implemented as an airline’s web crawler on the Skytrax site and as a retail
crawler for the Mouthshut.com site. I’ll return to the topic of web crawling
in Chapter 2.

from abc import ABCMeta, abstractmethod

import BeautifulSoup
import urllib
import sys
import bleach
#################### Root Class (Abstract) ####################
class SkyThoughtCollector(object):
__metaclass__ = ABCMeta

       baseURLString = "base_url"
       airlinesString = "air_lines"
       limitString = "limits"

3
Chapter 1 Introduction

       baseURl = ""
       airlines = []
       limit = 10

       @abstractmethod
       def collectThoughts(self):
             print "Something Wrong!! You're calling
an abstract method"

       @classmethod
       def getConfig(self, configpath):
             #print "In get Config"
             config = {}
             conf = open(configpath)
             for line in conf:
                    if ("#" not in line):
                          words = line.strip().split('=')
                          config[words[0].strip()] = words[1].
strip()
             #print config
             self.baseURl = config[self.baseURLString]
             if config.has_key(self.airlinesString):
                    self.airlines = config[self.
airlinesString].split(',')
             if config.has_key(self.limitString):
                    self.limit = int(config[self.limitString])
             #print self.airlines

def downloadURL(self, url):

#print "downloading url"
pageFile = urllib.urlopen(url)

4
Chapter 1 Introduction

if pageFile.getcode() != 200:

                    return "Problem in URL"
             pageHtml = pageFile.read()
             pageFile.close()
             return "".join(pageHtml)

def remove_junk(self, arg):

             f = open('junk.txt')
             for line in f:
                    arg.replace(line.strip(),'')
             return arg

def print_args(self, args):

             out =''
             last = 0
             for arg in args:
                    if args.index(arg) == len(args) -1:
                          last = 1
                    reload(sys)
                    sys.setdefaultencoding("utf-8")

arg = arg.decode('utf8','ignore').
encode('ascii','ignore').strip()
                    arg = arg.replace('\n',' ')
                    arg = arg.replace('\r','')
                    arg = self.remove_junk(arg)
                    if last == 0:
                          out = out + arg + '\t'
                    else:
                          out = out + arg
             print out

5
Chapter 1 Introduction

####################### Airlines Chield #######################

class AirLineReviewCollector(SkyThoughtCollector):

months = ['January', 'February', 'March', 'April', 'May',

'June', 'July', 'August', 'September', 'October', 'November',
'December' ]

def init(self, configpath):

#print "In Config"
super(AirLineReviewCollector,self).getConfig(configpath)

def parseSoupHeader(self, header):

             #print "parsing header"
             name = surname = year = month = date = country =''
             txt = header.find("h9")
             words = str(txt).strip().split(' ')
             for j in range(len(words)-1):
                    if words[j] in self.months:
                          date = words[j-1]
                          month= words[j]
                          year = words[j+1]
                          name = words[j+3]
                          surname = words[j+4]
             if ")" in words[-1]:
                    country = words[-1].split(')')[0]
             if "(" in country:
                    country = country.split('(')[1]
             else:
                    country = words[-2].split('(')[1] + country
             return (name, surname, year, month, date, country)

6
Chapter 1 Introduction

def parseSoupTable(self, table):

             #print "parsing table"
             images = table.findAll("img")
             over_all = str(images[0]).split("grn_bar_")[1].
split(".gif")[0]
             money_value = str(images[1]).split("SCORE_")[1].
split(".gif")[0]
             seat_comfort = str(images[2]).split("SCORE_")[1].
split(".gif")[0]
             staff_service = str(images[3]).split("SCORE_")[1].
split(".gif")[0]
             catering = str(images[4]).split("SCORE_")[1].
split(".gif")[0]
             entertainment = str(images[4]).split("SCORE_")[1].
split(".gif")[0]
             if 'YES' in str(images[6]):
                    recommend = 'YES'
             else:
                    recommend = 'NO'
             status = table.findAll("p", {"class":"text25"})
             stat = str(status[2]).split(">")[1].split("<")[0]
             return (stat, over_all, money_value, seat_comfort,
staff_service, catering, entertainment, recomend)

       def collectThoughts(self):
             #print "Collecting Thoughts"
             for al in AirLineReviewCollector.airlines:
                    count = 0
                    while count < AirLineReviewCollector.limit:
                          count = count + 1
                          url = ''

7
Chapter 1 Introduction

                          if count == 1:
                                 url = AirLineReviewCollector.
baseURl + al + ".htm"
                          else:
                                 url = AirLineReviewCollector.
baseURl + al + "_"+str(count)+
".htm"
                          soup = BeautifulSoup.BeautifulSoup
(super(AirLineReviewCollector,self).
downloadURL(url))
                          blogs = soup.findAll("p",
{"class":"text2"})
                          tables = soup.findAll("table",
{"width":"192"})
                          review_headers = soup.findAll("td",
{"class":"airport"})
                          for i in range(len(tables)-1):
                                 (name, surname, year, month,
date, country) = self.parse
SoupHeader(review_headers[i])
                                 (stat, over_all, money_value,
seat_comfort, staff_service,
catering, entertainment,
recomend) = self.parseSoup
Table(tables[i])
                                 blog = str(blogs[i]).
split(">")[1].split("<")[0]
                                 args = [al, name, surname,
year, month, date, country,
stat, over_all, money_value,
seat_comfort, staff_service,
catering, entertainment,
recomend, blog]
8
Chapter 1 Introduction

super(AirLineReviewCo
llector,self).print_
args(args)

######################## Retail Chield ########################

class RetailReviewCollector(SkyThoughtCollector):
       def __init__(self, configpath):
             #print "In Config"
       super(RetailReviewCollector,self).getConfig(configpath)

       def collectThoughts(self):
             soup = BeautifulSoup.BeautifulSoup(super(RetailRev
iewCollector,self).downloadURL(RetailReviewCollect
or.baseURl))
             lines = soup.findAll("a",{"style":
"font-size:15px;"})
             links = []
             for line in lines:
                     if ("review" in str(line)) & ("target" in
str(line)):
                          ln = str(line)
                           link = ln.split("href=")[-1].split
("target=")[0].replace("\"","").
strip()
                          links.append(link)

for link in links:

soup = BeautifulSoup.BeautifulSoup(
super(RetailReviewCollector,self).
downloadURL(link))

9
Chapter 1 Introduction

                    comment = bleach.clean(str(soup.findAll("di
v",{"itemprop":"description"})[0]),tags=[],
strip=True)
                    tables = soup.findAll("table",
{"class":"smallfont space0 pad2"})
                    parking = ambience = range = economy =
product = 0
                    for table in tables:
                          if "Parking:" in str(table):
                                 rows = table.findAll("tbody")
[0].findAll("tr")
                                 for row in rows:
                                       if "Parking:" in
str(row):
                                              parking =
str(row).
count("read-
barfull")
                                       if "Ambience" in
str(row):
                                              ambience =
str(row).
count("read-
barfull")
                                       if "Store" in str(row):
                                              range = str(row).
count("read-
barfull")

10
Chapter 1 Introduction

if "Value" in str(row):

economy =
str(row).
count("read-
barfull")
                                       if "Product" in str(row):

product =
str(row).count
("smallratefull")

author = bleach.clean(soup.findAll("spa
n",{"itemprop":"author"})[0], tags=[],
strip=True)

date = soup.findAll("meta",{"itemprop":"dat
ePublished"})[0]["content"]

args = [date, author,str(parking),
str(ambience),str(range), str(economy),
str(product), comment]

super(RetailReview
Collector,self).print_
args(args)

######################## Main Function ########################

if __name__ == "__main__":
       if sys.argv[1] == 'airline':
             instance = AirLineReviewCollector(sys.argv[2])
             instance.collectThoughts()
       else:
             if sys.argv[1] == 'retail':

instance = RetailReviewCollector(sys.argv[2])
                    instance.collectThoughts()

11
Chapter 1 Introduction

             else:
                    print "Usage is"
                    print sys.argv[0], '<airline/retail>',
"<Config File Path>"

The configuration for the previous code is shown here:

base_url = http://www.airlinequality.com/Forum/
#base_url = http://www.mouthshut.com/product-reviews/Mega-Mart-
Bangalore-reviews-925103466
#base_url = http://www.mouthshut.com/product-reviews/Megamart-
Chennai-reviews-925104102
air_lines = emrts,brit_awys,ual,biman,flydubai
limits = 10

I’ll now discuss the previous code in brief. It has a root class that is an
abstract class. It contains essential attributes such as a base URL and a
page limit; these are essential for all child classes. It also contains common
logic in class method functions such as the download URL, print output,
and read configuration. It also has an abstract method collectThoughts,
which must be implemented in child classes. This abstract method is
passing on a common behavior to every child class that all of them must
collect thoughts from the Web. Implementations of this thought collection
are child specific.

Calling Other Languages in Python

Now I will describe how to use other languages’ code in Python. There are
two examples here; one is calling R code from Python. R code is required
for some use cases. For example, if you want a ready-made function for the
Holt-Winter method in a time series, it is difficult to do in Python. But it is

12
Chapter 1 Introduction

available in R. So, you can call R code from Python using the rpy2 module,
as shown here:

import rpy2.robjects as ro
ro.r('data(input)')
ro.r('x <-HoltWinters(input)')

Sometimes you need to call Java code from Python. For example,
say you are working on a name entity recognition problem in the field of
natural language processing (NLP); some text is given as input, and you
have to recognize the names in the text. Python’s NLTK package does have
a name entity recognition function, but its accuracy is not good. Stanford
NLP is a better choice here, which is written in Java. You can solve this
problem in two ways.

• You can call Java at the command line using

Python code.

import subprocess

subprocess.call(['java','-cp','*','edu.
stanford.nlp.sentiment.SentimentPipeline',
'-file','foo.txt'])

• You can expose Stanford NLP as a web service and call

it as a service.

nlp = StanfordCoreNLP('http://127.0.0.1:9000')
output = nlp.annotate(sentence, properties={
"annotators": "tokenize,ssplit,parse,sentiment",
"outputFormat": "json",
# Only split the sentence at End Of Line.
We assume that this method only takes in one
single sentence.
"ssplit.eolonly": "true",

13
Chapter 1 Introduction

# Setting enforceRequirements to skip some

annotators and make the process faster
"enforceRequirements": "false"
})

E xposing the Python Model

as a Microservice
You can expose the Python model as a microservice in the same way as
your Python model can be used by others to write their own code. The best
way to do this is to expose your model as a web service. As an example, the
following code exposes a deep learning model using Flask:

from flask import Flask, request, g

from flask_cors import CORS
import tensorflow as tf
from sqlalchemy import *
from sqlalchemy.orm import sessionmaker
import pygeoip
from pymongo import MongoClient
import json
import datetime as dt
import ipaddress
import math

app = Flask(__name__)
CORS(app)

@app.before_request
def before():
db = create_engine('sqlite:///score.db')
metadata = MetaData(db)

14
Chapter 1 Introduction

g.scores = Table('scores', metadata, autoload=True)

Session = sessionmaker(bind=db)
g.session = Session()

client = MongoClient()
g.db = client.frequency

g.gi = pygeoip.GeoIP('GeoIP.dat')

       sess = tf.Session()
       new_saver = tf.train.import_meta_graph('model.obj.meta')
       new_saver.restore(sess, tf.train.latest_checkpoint('./'))
       all_vars = tf.get_collection('vars')

       g.dropped_features = str(sess.run(all_vars[0]))
       g.b = sess.run(all_vars[1])[0]
       return

def get_hour(timestamp):
return dt.datetime.utcfromtimestamp(timestamp / 1e3).hour

def get_value(session, scores, feature_name, feature_value):

       s = scores.select((scores.c.feature_name == feature_
name) & (scores.c.feature_value == feature_value))
       rs = s.execute()
       row = rs.fetchone()
       if row is not None:
             return float(row['score'])
       else:
             return 0.0

15
Chapter 1 Introduction

@app.route('/predict', methods=['POST'])
def predict():
input_json = request.get_json(force=True)

       features = ['size','domain','client_time','device',
'ad_position','client_size', 'ip','root']
       predicted = 0
       feature_value = ''
       for f in features:
             if f not in g.dropped_features:
                    if f == 'ip':
                          feature_value = str(ipaddress.
IPv4Address(ipaddress.ip_address
(unicode(request.remote_addr))))
                    else:
                          feature_value = input_json.get(f)
                    if f == 'ip':
                          if 'geo' not in g.dropped_features:
                                 geo = g.gi.country_name_by_
addr(feature_value)
                                 predicted = predicted + get_
value(g.session, g.scores,
'geo', geo)
                          if 'frequency' not in g.dropped_
features:
                                 res = g.db.frequency.find_
one({"ip" : feature_value})
                                 freq = 1
                                 if res is not None:
                                       freq = res['frequency']
                                 predicted = predicted + get_
value(g.session, g.scores,
'frequency', str(freq))
16
Chapter 1 Introduction

                    if f == 'client_time':
                           feature_value = get_
hour(int(feature_value))
                    predicted = predicted + get_value(g.
session, g.scores, f, feature_value)
       return str(math.exp(predicted + g.b)-1)
app.run(debug = True, host ='0.0.0.0')

This code exposes a deep learning model as a Flask web service.

A JavaScript client will send the request with web user parameters such
as the IP address, ad size, ad position, and so on, and it will return the
price of the ad as a response. The features are categorical. You will learn
how to convert them into numerical scores in Chapter 3. These scores
are stored in an in-memory database. The service fetches the score from
the database, sums the result, and replies to the client. This score will be
updated real time in each iteration of training of a deep learning model. It
is using MongoDB to store the frequency of that IP address in that site. It is
an important parameter because a user coming to a site for the first time
is really searching for something, which is not true for a user where the
frequency is greater than 5. The number of IP addresses is huge, so they
are stored in a distributed MongoDB database.

igh-Performance API and Concurrent

H
Programming
Flask is a good choice when you are building a general solution that is
also a graphical user interface (GUI). But if high performance is the most
critical requirement of your application, then Falcon is the best choice. The
following code is an example of the same model shown previously exposed
by the Falcon framework. Another improvement I made in this code is that
I implemented multithreading, so the code will be executed in parallel.

17
Other documents randomly have
different content
»En minä nyt…»

»Mihin aiot?»

»Menen, etteivät jäljet haise!»

Eikä Jussi muuta sano. On jo ovella menossa, mutta siinä vielä

pyörähtää takaisin ja Jannelle sanoo:

»Jos vielä käyvät kysymässä, niin sano, että kävi ja Suomen

puolelle aikoi…»

Jussi lähtee nopein askelin Haaparannalle päin. Kun pääsee

kylään, niin ottaa kyydin ja ajaa menemään suoraan Haaparannan
ulkosatamaan.

Seuraavana yönä saapuvat taas Ruhmulainen ja Rämä-Heikki

Kaakkuri-Jannen pirtille. Ovat vähän humalassa kumpikin.

Janne koettaa väittää vastaan, ettei ole Jussi käynyt, eikä ole
hänen rahojaan täällä kätkössä. Mutta miehet ovat saaneet tietää,
että semmoinen mies on lautalla tullut alas Muoniosta ja sanonut
olevansa jääräkuohari…

Mutta ei tunnusta Janne, ei sano tietävänsä.

Silloin tarttuu Ruhmulainen häntä kurkusta ja puristaa niin että

silmät pullistuvat… Kun hän hellittää kouransa Jannen kurkusta, joka
on aivan mustaksi muuttunut, korahtaa mies kerran ja retkahtaa
lattialle…

»Kuoli se», sanoo Ruhmulainen.

»Jokohan?»
He kaivavat kaikki kätköt, ullakot, sängyn, kellarin… Löytävät
virsikirjan välistä sata kruunua, mutta eivät muuta rahaa yhtään
penniä…

Silloin alkaa aurinko nousta. Janne makaa kaula mustana lattialla,

ja suusta ja sieraimista valuu verta…

Ruhmulainen nostaa miehen olalleen ja lähtee.

»Joudu pois!» sanoo hän Rämä-Heikille, joka vielä hakee ja

hakee.

Joenranta on aivan lähellä, virtava ja syvä.

»Ota kiinni!» käskee Ruhmulainen.

Rämä-Heikki yrittää ottamaan jalkapäästä kiinni, hyvin ymmärtäen

mitä
Ruhmulainen tarkoittaa, mutta kivahtaakin äkäisesti:

»Sinä tapoit… sinä rahat otit… korjaa omasi!»

Mutta sitä sanoessaan hän parahtaa ja horjahtaa silmälleen

rannalta kivikkoon… Ruhmulainen tempaa puukkonsa hänen
rinnastaan ja heittää jokeen…

Ottaa sitten Jannen ruumiin ja heittää virtaan, vetää Rämä-Heikkiä

takinkauluksesta ja painaa menemään siihen kohti, jossa virta
suurissa häränsilmissä pyörii…

Aurinko nousee, ja Ruhmulainen soutaa yksin Suomen puolelle.

*** END OF THE PROJECT GUTENBERG EBOOK RUOTSIN
RAJALTA ***

Updated editions will replace the previous one—the old editions will
be renamed.

Creating the works from print editions not protected by U.S.

copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying copyright
royalties. Special rules, set forth in the General Terms of Use part of
this license, apply to copying and distributing Project Gutenberg™
electronic works to protect the PROJECT GUTENBERG™ concept
and trademark. Project Gutenberg is a registered trademark, and
may not be used if you charge for an eBook, except by following the
terms of the trademark license, including paying royalties for use of
the Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is very
easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.

START: FULL LICENSE

THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free

distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.

Section 1. General Terms of Use and

Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund from
the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be

used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law in
the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms of
this agreement by keeping this work in the same format with its
attached full Project Gutenberg™ License when you share it without
charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other

immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears, or
with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived

from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted

with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning of
this work.

1.E.4. Do not unlink or detach or remove the full Project

Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this

electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1 with
active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or expense
to the user, provide a copy, a means of exporting a copy, or a means
of obtaining a copy upon request, of the work in its original “Plain
Vanilla ASCII” or other form. Any alternate format must include the
full Project Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,

performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing

access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who

notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of

any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™

electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend

considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except

for the “Right of Replacement or Refund” described in paragraph
1.F.3, the Project Gutenberg Literary Archive Foundation, the owner
of the Project Gutenberg™ trademark, and any other party
distributing a Project Gutenberg™ electronic work under this
agreement, disclaim all liability to you for damages, costs and
expenses, including legal fees. YOU AGREE THAT YOU HAVE NO
REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF
WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE
PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE
FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you

discover a defect in this electronic work within 90 days of receiving it,
you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or entity
that provided you with the defective work may elect to provide a
replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied

warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the

Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and distribution
of Project Gutenberg™ electronic works, harmless from all liability,
costs and expenses, including legal fees, that arise directly or
indirectly from any of the following which you do or cause to occur:
(a) distribution of this or any Project Gutenberg™ work, (b)
alteration, modification, or additions or deletions to any Project
Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of

Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the

assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,

Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to

the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many small
donations ($1 to $5,000) are particularly important to maintaining tax
exempt status with the IRS.

The Foundation is committed to complying with the laws regulating

charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where

we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make

any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project

Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed

editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,

including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.