Get Advanced Data Analytics Using Python: With Machine Learning, Deep Learning and NLP Examples Mukhopadhyay PDF ebook with Full Chapters Now
Get Advanced Data Analytics Using Python: With Machine Learning, Deep Learning and NLP Examples Mukhopadhyay PDF ebook with Full Chapters Now
com
https://textbookfull.com/product/advanced-data-analytics-
using-python-with-machine-learning-deep-learning-and-nlp-
examples-mukhopadhyay/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/deep-learning-with-python-develop-
deep-learning-models-on-theano-and-tensorflow-using-keras-jason-
brownlee/
textboxfull.com
https://textbookfull.com/product/practical-machine-learning-for-data-
analysis-using-python-1st-edition-abdulhamit-subasi/
textboxfull.com
https://textbookfull.com/product/data-science-and-machine-learning-
interview-questions-using-python-second-edition-vishwanathan-
narayanan/
textboxfull.com
https://textbookfull.com/product/introducing-data-science-big-data-
machine-learning-and-more-using-python-tools-1st-edition-davy-cielen/
textboxfull.com
https://textbookfull.com/product/beginning-anomaly-detection-using-
python-based-deep-learning-with-keras-and-pytorch-sridhar-alla/
textboxfull.com
https://textbookfull.com/product/learn-pyspark-build-python-based-
machine-learning-and-deep-learning-models-1st-edition-pramod-singh/
textboxfull.com
Advanced
Data Analytics
Using Python
With Machine Learning,
Deep Learning and NLP Examples
—
Sayan Mukhopadhyay
Advanced Data
Analytics Using
Python
With Machine Learning, Deep
Learning and NLP Examples
Sayan Mukhopadhyay
Advanced Data Analytics Using Python
Sayan Mukhopadhyay
Kolkata, West Bengal, India
Chapter 1: Introduction������������������������������������������������������������������������1
Why Python?���������������������������������������������������������������������������������������������������������1
When to Avoid Using Python���������������������������������������������������������������������������������2
OOP in Python�������������������������������������������������������������������������������������������������������3
Calling Other Languages in Python���������������������������������������������������������������������12
Exposing the Python Model as a Microservice���������������������������������������������������14
High-Performance API and Concurrent Programming����������������������������������������17
v
Table of Contents
Elasticsearch�������������������������������������������������������������������������������������������������������31
Connection Layer API�������������������������������������������������������������������������������������33
Neo4j Python Driver��������������������������������������������������������������������������������������������34
neo4j-rest-client�������������������������������������������������������������������������������������������������35
In-Memory Database������������������������������������������������������������������������������������������35
MongoDB (Python Edition)����������������������������������������������������������������������������������36
Import Data into the Collection����������������������������������������������������������������������36
Create a Connection Using pymongo�������������������������������������������������������������37
Access Database Objects������������������������������������������������������������������������������37
Insert Data�����������������������������������������������������������������������������������������������������38
Update Data���������������������������������������������������������������������������������������������������38
Remove Data�������������������������������������������������������������������������������������������������38
Pandas����������������������������������������������������������������������������������������������������������������38
ETL with Python (Unstructured Data)������������������������������������������������������������������40
E-mail Parsing�����������������������������������������������������������������������������������������������40
Topical Crawling��������������������������������������������������������������������������������������������42
vi
Table of Contents
vii
Table of Contents
viii
Table of Contents
Index�������������������������������������������������������������������������������������������������181
ix
About the Author
Sayan Mukhopadhyay has more than
13 years of industry experience and has been
associated with companies such as Credit
Suisse, PayPal, CA Technologies, CSC, and
Mphasis. He has a deep understanding of
applications for data analysis in domains such
as investment banking, online payments,
online advertisement, IT infrastructure, and
retail. His area of expertise is in applying
high-performance computing in distributed
and data-driven environments such as real-time analysis, high-frequency
trading, and so on.
He earned his engineering degree in electronics and instrumentation
from Jadavpur University and his master’s degree in research in
computational and data science from IISc in Bangalore.
xi
About the Technical Reviewer
Sundar Rajan Raman has more than 14 years
of full stack IT experience in machine
learning, deep learning, and natural
language processing. He has six years
of big data development and architect
experience, including working with Hadoop
and its ecosystems as well as other NoSQL
technologies such as MongoDB and
Cassandra. In fact, he has been the technical
reviewer of several books on these topics.
He is also interested in strategizing using Design Thinking principles
and in coaching and mentoring people.
xiii
Acknowledgments
Thanks to Labonic Chakraborty (Ripa) and Kusumika Mukherjee.
xv
CHAPTER 1
Introduction
In this book, I assume that you are familiar with Python programming.
In this introductory chapter, I explain why a data scientist should choose
Python as a programming language. Then I highlight some situations
where Python is not a good choice. Finally, I describe some good practices
in application development and give some coding examples that a data
scientist needs in their day-to-day job.
W
hy Python?
So, why should you choose Python?
2
Chapter 1 Introduction
O
OP in Python
Before proceeding, I will explain some features of object-oriented
programming (OOP) in a Python context.
The most basic element of any modern application is an object. To
a programmer or architect, the world is a collection of objects. Objects
consist of two types of members: attributes and methods. Members can be
private, public, or protected. Classes are data types of objects. Every object
is an instance of a class. A class can be inherited in child classes. Two
classes can be associated using composition.
In a Python context, Python has no keywords for public, private, or
protected, so encapsulation (hiding a member from the outside world)
is not implicit in Python. Like C++, it supports multilevel and multiple
inheritance. Like Java, it has an abstract keyword. Classes and methods
both can be abstract.
The following code is an example of a generic web crawler that is
implemented as an airline’s web crawler on the Skytrax site and as a retail
crawler for the Mouthshut.com site. I’ll return to the topic of web crawling
in Chapter 2.
baseURLString = "base_url"
airlinesString = "air_lines"
limitString = "limits"
3
Chapter 1 Introduction
baseURl = ""
airlines = []
limit = 10
@abstractmethod
def collectThoughts(self):
print "Something Wrong!! You're calling
an abstract method"
@classmethod
def getConfig(self, configpath):
#print "In get Config"
config = {}
conf = open(configpath)
for line in conf:
if ("#" not in line):
words = line.strip().split('=')
config[words[0].strip()] = words[1].
strip()
#print config
self.baseURl = config[self.baseURLString]
if config.has_key(self.airlinesString):
self.airlines = config[self.
airlinesString].split(',')
if config.has_key(self.limitString):
self.limit = int(config[self.limitString])
#print self.airlines
4
Chapter 1 Introduction
5
Chapter 1 Introduction
class AirLineReviewCollector(SkyThoughtCollector):
6
Chapter 1 Introduction
def collectThoughts(self):
#print "Collecting Thoughts"
for al in AirLineReviewCollector.airlines:
count = 0
while count < AirLineReviewCollector.limit:
count = count + 1
url = ''
7
Chapter 1 Introduction
if count == 1:
url = AirLineReviewCollector.
baseURl + al + ".htm"
else:
url = AirLineReviewCollector.
baseURl + al + "_"+str(count)+
".htm"
soup = BeautifulSoup.BeautifulSoup
(super(AirLineReviewCollector,self).
downloadURL(url))
blogs = soup.findAll("p",
{"class":"text2"})
tables = soup.findAll("table",
{"width":"192"})
review_headers = soup.findAll("td",
{"class":"airport"})
for i in range(len(tables)-1):
(name, surname, year, month,
date, country) = self.parse
SoupHeader(review_headers[i])
(stat, over_all, money_value,
seat_comfort, staff_service,
catering, entertainment,
recomend) = self.parseSoup
Table(tables[i])
blog = str(blogs[i]).
split(">")[1].split("<")[0]
args = [al, name, surname,
year, month, date, country,
stat, over_all, money_value,
seat_comfort, staff_service,
catering, entertainment,
recomend, blog]
8
Chapter 1 Introduction
super(AirLineReviewCo
llector,self).print_
args(args)
class RetailReviewCollector(SkyThoughtCollector):
def __init__(self, configpath):
#print "In Config"
super(RetailReviewCollector,self).getConfig(configpath)
def collectThoughts(self):
soup = BeautifulSoup.BeautifulSoup(super(RetailRev
iewCollector,self).downloadURL(RetailReviewCollect
or.baseURl))
lines = soup.findAll("a",{"style":
"font-size:15px;"})
links = []
for line in lines:
if ("review" in str(line)) & ("target" in
str(line)):
ln = str(line)
link = ln.split("href=")[-1].split
("target=")[0].replace("\"","").
strip()
links.append(link)
9
Chapter 1 Introduction
comment = bleach.clean(str(soup.findAll("di
v",{"itemprop":"description"})[0]),tags=[],
strip=True)
tables = soup.findAll("table",
{"class":"smallfont space0 pad2"})
parking = ambience = range = economy =
product = 0
for table in tables:
if "Parking:" in str(table):
rows = table.findAll("tbody")
[0].findAll("tr")
for row in rows:
if "Parking:" in
str(row):
parking =
str(row).
count("read-
barfull")
if "Ambience" in
str(row):
ambience =
str(row).
count("read-
barfull")
if "Store" in str(row):
range = str(row).
count("read-
barfull")
10
Chapter 1 Introduction
author = bleach.clean(soup.findAll("spa
n",{"itemprop":"author"})[0], tags=[],
strip=True)
date = soup.findAll("meta",{"itemprop":"dat
ePublished"})[0]["content"]
args = [date, author,str(parking),
str(ambience),str(range), str(economy),
str(product), comment]
super(RetailReview
Collector,self).print_
args(args)
if __name__ == "__main__":
if sys.argv[1] == 'airline':
instance = AirLineReviewCollector(sys.argv[2])
instance.collectThoughts()
else:
if sys.argv[1] == 'retail':
instance = RetailReviewCollector(sys.argv[2])
instance.collectThoughts()
11
Chapter 1 Introduction
else:
print "Usage is"
print sys.argv[0], '<airline/retail>',
"<Config File Path>"
base_url = http://www.airlinequality.com/Forum/
#base_url = http://www.mouthshut.com/product-reviews/Mega-Mart-
Bangalore-reviews-925103466
#base_url = http://www.mouthshut.com/product-reviews/Megamart-
Chennai-reviews-925104102
air_lines = emrts,brit_awys,ual,biman,flydubai
limits = 10
I’ll now discuss the previous code in brief. It has a root class that is an
abstract class. It contains essential attributes such as a base URL and a
page limit; these are essential for all child classes. It also contains common
logic in class method functions such as the download URL, print output,
and read configuration. It also has an abstract method collectThoughts,
which must be implemented in child classes. This abstract method is
passing on a common behavior to every child class that all of them must
collect thoughts from the Web. Implementations of this thought collection
are child specific.
12
Chapter 1 Introduction
available in R. So, you can call R code from Python using the rpy2 module,
as shown here:
import rpy2.robjects as ro
ro.r('data(input)')
ro.r('x <-HoltWinters(input)')
Sometimes you need to call Java code from Python. For example,
say you are working on a name entity recognition problem in the field of
natural language processing (NLP); some text is given as input, and you
have to recognize the names in the text. Python’s NLTK package does have
a name entity recognition function, but its accuracy is not good. Stanford
NLP is a better choice here, which is written in Java. You can solve this
problem in two ways.
import subprocess
subprocess.call(['java','-cp','*','edu.
stanford.nlp.sentiment.SentimentPipeline',
'-file','foo.txt'])
nlp = StanfordCoreNLP('http://127.0.0.1:9000')
output = nlp.annotate(sentence, properties={
"annotators": "tokenize,ssplit,parse,sentiment",
"outputFormat": "json",
# Only split the sentence at End Of Line.
We assume that this method only takes in one
single sentence.
"ssplit.eolonly": "true",
13
Chapter 1 Introduction
app = Flask(__name__)
CORS(app)
@app.before_request
def before():
db = create_engine('sqlite:///score.db')
metadata = MetaData(db)
14
Chapter 1 Introduction
client = MongoClient()
g.db = client.frequency
g.gi = pygeoip.GeoIP('GeoIP.dat')
sess = tf.Session()
new_saver = tf.train.import_meta_graph('model.obj.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
all_vars = tf.get_collection('vars')
g.dropped_features = str(sess.run(all_vars[0]))
g.b = sess.run(all_vars[1])[0]
return
def get_hour(timestamp):
return dt.datetime.utcfromtimestamp(timestamp / 1e3).hour
15
Chapter 1 Introduction
@app.route('/predict', methods=['POST'])
def predict():
input_json = request.get_json(force=True)
features = ['size','domain','client_time','device',
'ad_position','client_size', 'ip','root']
predicted = 0
feature_value = ''
for f in features:
if f not in g.dropped_features:
if f == 'ip':
feature_value = str(ipaddress.
IPv4Address(ipaddress.ip_address
(unicode(request.remote_addr))))
else:
feature_value = input_json.get(f)
if f == 'ip':
if 'geo' not in g.dropped_features:
geo = g.gi.country_name_by_
addr(feature_value)
predicted = predicted + get_
value(g.session, g.scores,
'geo', geo)
if 'frequency' not in g.dropped_
features:
res = g.db.frequency.find_
one({"ip" : feature_value})
freq = 1
if res is not None:
freq = res['frequency']
predicted = predicted + get_
value(g.session, g.scores,
'frequency', str(freq))
16
Chapter 1 Introduction
if f == 'client_time':
feature_value = get_
hour(int(feature_value))
predicted = predicted + get_value(g.
session, g.scores, f, feature_value)
return str(math.exp(predicted + g.b)-1)
app.run(debug = True, host ='0.0.0.0')
17
Other documents randomly have
different content
»En minä nyt…»
»Mihin aiot?»
Janne koettaa väittää vastaan, ettei ole Jussi käynyt, eikä ole
hänen rahojaan täällä kätkössä. Mutta miehet ovat saaneet tietää,
että semmoinen mies on lautalla tullut alas Muoniosta ja sanonut
olevansa jääräkuohari…
»Jokohan?»
He kaivavat kaikki kätköt, ullakot, sängyn, kellarin… Löytävät
virsikirjan välistä sata kruunua, mutta eivät muuta rahaa yhtään
penniä…
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com