Download full Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing 1st Edition Taweh Beysolow Ii ebook all chapters
Download full Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing 1st Edition Taweh Beysolow Ii ebook all chapters
com
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/deep-learning-in-natural-language-
processing-deng/
textboxfull.com
https://textbookfull.com/product/deep-learning-for-natural-language-
processing-meap-v07-stephan-raaijmakers/
textboxfull.com
Deep Learning in Natural Language Processing 1st Edition
Li Deng
https://textbookfull.com/product/deep-learning-in-natural-language-
processing-1st-edition-li-deng/
textboxfull.com
https://textbookfull.com/product/natural-language-processing-with-
python-cookbook-1st-edition-krishna-bhavsar/
textboxfull.com
Taweh Beysolow II
Applied Natural Language Processing with Python
Taweh Beysolow II
San Francisco, California, USA
v
Table of Contents
vi
Table of Contents
Index�������������������������������������������������������������������������������������������������145
vii
About the Author
Taweh Beysolow II is a data scientist and
author currently based in San Francisco,
California. He has a bachelor’s degree in
economics from St. Johns University and a
master’s degree in applied statistics from
Fordham University. His professional
experience has included working at Booz
Allen Hamilton, as a consultant and in various
startups as a data scientist, specifically
focusing on machine learning. He has applied machine learning to federal
consulting, financial services, and agricultural sectors.
ix
About the Technical Reviewer
Santanu Pattanayak currently works at GE
Digital as a staff data scientist and is the author
of the deep learning book Pro Deep Learning
with TensorFlow: A Mathematical Approach
to Advanced Artificial Intelligence in Python
(Apress, 2017). He has more than eight years of
experience in the data analytics/data science
field and a background in development and
database technologies. Prior to joining GE,
Santanu worked at companies such as RBS,
Capgemini, and IBM. He graduated with a degree in electrical engineering
from Jadavpur University, Kolkata, and is an avid math enthusiast. Santanu
is currently pursuing a master’s degree in data science from the Indian
Institute of Technology (IIT), Hyderabad. He also devotes his time to data
science hackathons and Kaggle competitions, where he ranks within the
top 500 across the globe. Santanu was born and brought up in West Bengal,
India, and currently resides in Bangalore, India, with his wife.
xi
Acknowledgments
A special thanks to Santanu Pattanayak, Divya Modi, Celestin Suresh
John, and everyone at Apress for the wonderful experience. It has been a
pleasure to work with you all on this text. I couldn’t have asked for a better
team.
xiii
Introduction
Thank you for choosing Applied Natural Language Processing with Python
for your journey into natural language processing (NLP). Readers should
be aware that this text should not be considered a comprehensive study
of machine learning, deep learning, or computer programming. As such,
it is assumed that you are familiar with these techniques to some degree.
Regardless, a brief review of the concepts necessary to understand the
tasks that you will perform in the book is provided.
After the brief review, we begin by examining how to work with raw
text data, slowly working our way through how to present data to machine
learning and deep learning algorithms. After you are familiar with some
basic preprocessing algorithms, we will make our way into some of the
more advanced NLP tasks, such as training and working with trained
word embeddings, spell-check, text generation, and question-and-answer
generation.
All of the examples utilize the Python programming language and
popular deep learning and machine learning frameworks, such as scikit-
learn, Keras, and TensorFlow. Readers can feel free to access the source
code utilized in this book on the corresponding GitHub page and/or try
their own methods for solving the various problems tackled in this book
with the datasets provided.
xv
CHAPTER 1
What Is Natural
Language
Processing?
Deep learning and machine learning continues to proliferate throughout
various industries, and has revolutionized the topic that I wish to discuss
in this book: natural language processing (NLP). NLP is a subfield of
computer science that is focused on allowing computers to understand
language in a “natural” way, as humans do. Typically, this would refer to
tasks such as understanding the sentiment of text, speech recognition, and
generating responses to questions.
NLP has become a rapidly evolving field, and one whose applications
have represented a large portion of artificial intelligence (AI)
breakthroughs. Some examples of implementations using deep learning
are chatbots that handle customer service requests, auto-spellcheck on cell
phones, and AI assistants, such as Cortana and Siri, on smartphones. For
those who have experience in machine learning and deep learning, natural
language processing is one of the most exciting areas for individuals to
apply their skills. To provide context for broader discussions, however, let’s
discuss the development of natural language processing as a field.
2
Chapter 1 What Is Natural Language Processing?
The SLP model is seen to be in part due to Alan Turing’s research in the
late 1930s on computation, which inspired other scientists and researchers
to develop different concepts, such as formal language theory.
Moving forward to the second half of the twentieth century, NLP starts
to bifurcate into two distinct groups of thought: (1) those who support a
symbolic approach to language modelling, and (2) those who support a
stochastic approach. The former group was populated largely by linguists
who used simple algorithms to solve NLP problems, often utilizing pattern
recognition. The latter group was primarily composed of statisticians
and electrical engineers. Among the many approaches that were popular
with the second group was Bayesian statistics. As the twentieth century
progressed, NLP broadened as a field, including natural language
understanding (NLU) to the problem space (allowing computers to react
accurately to commands). For example, if someone spoke to a chatbot and
asked it to “find food near me,” the chatbot would use NLU to translate this
sentence into tangible actions to yield a desirable outcome.
Skip closer to the present day, and we find that NLP has experienced
a surge of interest alongside machine learning’s explosion in usage over
the past 20 years. Part of this is due to the fact that large repositories of
labeled data sets have become more available, in addition to an increase in
computing power. This increase in computing power is largely attributed
to the development of GPUs; nonetheless, it has proven vital to AI’s
development as a field. Accordingly, demand for materials to instruct
data scientists and engineers on how to utilize various AI algorithms has
increased, in part the reason for this book.
Now that you are aware of the history of NLP as it relates to the present
day, I will give a brief overview of what you should expect to learn. The
focus, however, is primarily to discuss how deep learning has impacted
NLP, and how to utilize deep learning and machine learning techniques to
solve NLP problems.
3
Chapter 1 What Is Natural Language Processing?
TensorFlow
One of the groundbreaking releases in open source software, in addition
to machine learning at large, has undoubtedly been Google’s TensorFlow.
It is an open source library for deep learning that is a successor to Theano,
a similar machine learning library. Both utilize data flow graphs for
4
Chapter 1 What Is Natural Language Processing?
targets
5
Chapter 1 What Is Natural Language Processing?
'output': tf.Variable(tf.random_normal([state_size,
n_classes]))}
biases = {'input': tf.Variable(tf.random_normal([1, state_
size])),
'output': tf.Variable(tf.random_normal([1, n_classes]))}
Keras
Due to the slow development process of applications in TensorFlow,
Theano, and similar deep learning frameworks, Keras was developed for
prototyping applications, but it is also utilized in production engineering
for various problems. It is a wrapper for TensorFlow, Theano, MXNet, and
DeepLearning4j. Unlike these frameworks, defining a computational graph
is relatively easy, as shown in the following Keras demo code.
def create_model():
model = Sequential()
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
input_shape=(None, 40, 40, 1),
padding='same', return_sequences=True))
model.add(BatchNormalization())
7
Chapter 1 What Is Natural Language Processing?
activation='sigmoid',
padding='same', data_format='channels_last'))
model.compile(loss='binary_crossentropy', optimizer='adadelta')
return model
Although having the added benefit of ease of use and speed with
respect to implementing solutions, Keras has relative drawbacks when
compared to TensorFlow. The broadest explanation is that Keras
users have considerably less control over their computational graph
than TensorFlow users. You work within the confines of a sandbox
when using Keras. TensorFlow is better at natively supporting more
complex operations, and providing access to the most cutting-edge
implementations of various algorithms.
Theano
Although it is not covered in this book, it is important in the progression
of deep learning to discuss Theano. The library is similar to TensorFlow
in that it provides developers with various computational functions (add,
matrix multiplication, subtract, etc.) that are embedded in tensors when
building deep learning and machine learning models. For example, the
following is a sample Theano code.
8
Chapter 1 What Is Natural Language Processing?
if __name__ == '__main__':
model_predict()
9
Chapter 1 What Is Natural Language Processing?
T opic Modeling
In Chapter 4, we discuss more advanced uses of deep learning, machine
learning, and NLP. We start with topic modeling and how to perform it via
latent Dirichlet allocation, as well as non-negative matrix factorization.
Topic modeling is simply the process of extracting topics from documents.
You can use these topics for exploratory purposes via data visualization or
as a preprocessing step when labeling data.
W
ord Embeddings
Word embeddings are a collection of models/techniques for mapping
words (or phrases) to vector space, such that they appear in a high-
dimensional field. From this, you can determine the degree of similarity,
or dissimilarity, between one word (or phrase, or document) and another.
When we project the word vectors into a high-dimensional space, we can
envision that it appears as something like what’s shown in Figure 1-3.
10
Chapter 1 What Is Natural Language Processing?
walked
swam
walking
swimming
Verb tense
Figure 1-3. Visualization of word embeddings
11
Chapter 1 What Is Natural Language Processing?
Summary
The purpose of this book is to familiarize you with the field of natural
language processing and then progress to examples in which you
can apply this knowledge. This book covers machine learning where
necessary, although it is assumed that you have already used machine
learning models in a practical setting prior.
While this book is not intended to be exhaustive nor overly academic,
it is my intention to sufficiently cover the material such that readers are
able to process more advanced texts more easily than prior to reading
it. For those who are more interested in the tangible applications of NLP
as the field stands today, it is the vast majority of what is discussed and
shown in examples. Without further ado, let’s begin our review of machine
learning, specifically as it relates to the models used in this book.
12
CHAPTER 2
Review of Deep
Learning
You should be aware that we use deep learning and machine learning
methods throughout this chapter. Although the chapter does not provide
a comprehensive review of ML/DL, it is critical to discuss a few neural
network models because we will be applying them later. This chapter also
briefly familiarizes you with TensorFlow, which is one of the frameworks
utilized during the course of the book. All examples in this chapter use toy
numerical data sets, as it would be difficult to both review neural networks
and learn to work with text data at the same time.
Again, the purpose of these toy problems is to focus on learning how
to create a TensorFlow model, not to create a deployable solution. Moving
forward from this chapter, all examples focus on these models with text data.
non-linear, rendering the SLP null and void. MLPs are able to overcome
this shortcoming—specifically because MLPs have multiple layers. We’ll
go over this detail and more in depth while walking through some code to
make the example more intuitive. However, let’s begin by looking at the
MLP visualization shown in Figure 2-1.
14
Chapter 2 Review of Deep Learning
This Python function contains all the TensorFlow code that forms the
body of the neural network. In addition to defining the graph, this function
invokes the TensorFlow session that trains the network and makes
predictions. We’ll begin by walking through the function, line by line, while
tying the code back to the theory behind the model.
First, let’s address the arguments in our function: train_data is the
variable that contains our training data; in this example; it is the returns of
specific stocks over a given period of time. The following is the header of
our data set:
15
Chapter 2 Review of Deep Learning
1 N
( )
2
Si =1 hq ( x ) - y i
i
q t +1 = q t - h (2.1)
N
q t +1 = q t - h
1
N
( i
)
Si =1 2 hq ( x ) - y i Ñq hq ( x )
i
16
Chapter 2 Review of Deep Learning
Each unit in a neural network (with the exception of the input layer)
receives the weighted sum of inputs multiplied by weights, all of which are
summed with a bias. Mathematically, this can be described in Equation 2.2.
y = f ( x ,w T ) + b (2.2)
In neural networks, the parameters are the weights and biases. When
referring to Figure 2-1, the weights are the lines that connect the units in
a layer to one another and are typically initialized by randomly sampling
from a normal distribution. The following is the code where this occurs:
weights = {'input': tf.Variable(tf.random_normal([train_x.
shape[1], num_hidden])),
'hidden1': tf.Variable(tf.random_normal([num_
hidden, num_hidden])),
'output': tf.Variable(tf.random_normal([num_hidden,
1]))}
biases = {'input': tf.Variable(tf.random_normal([num_
hidden])),
'hidden1': tf.Variable(tf.random_normal([num_
hidden])),
'output': tf.Variable(tf.random_normal([1]))}
Because they are part of the computational graph, weights and biases
in TensorFlow must be initialized as TensorFlow variables with the tf.
Variable(). TensorFlow thankfully has a function that generates numbers
randomly from a normal distribution called tf.random_normal(), which
takes an array as an argument that specifies the shape of the matrix that you
are creating. For people who are new to creating neural networks, choosing
the proper dimensions for the weight and bias units is a typical source of
frustration. The following are some quick pointers to keep in mind :
17
Chapter 2 Review of Deep Learning
The more general problem associated with weights that are initialized
at the same location is that it makes the network susceptible to getting
stuck in local minima. Let’s imagine an error function, such as the one
shown in Figure 2-2.
18
Exploring the Variety of Random
Documents with Different Content
kouluhuoneen ja oli nyt maailmallinen teikarinulikka, joka muutamia
päiviä ennen tepastellessaan roimahousut yllään, oli nostanut
hämmästyttävää pahennusta. Harmaamunkki-veli Johannes oli asian
aiheesta esiveljensä luvalla saarnannut roimahousu-perkeleestä ja
itse kirkkoherra olisi ehkä tehnyt samoin, ellei asia oikeimmiten
hänestä olisi näyttänyt naurettavalta ja hän olisi ollut pormestarin
pöytäystävä.
— Kala se on. Laulas nyt, niin saat jotain tietää siitä, samalla kun
laulat!
— Kalahan se oli.
— Älä sitä etupäässä terota, että se oli kala, vaan että se oli
hirvittävä, pahaa ennustava kala. Se on luettava näin:
— No, sanoi Lauri, lue niin hyvin kuin osaat! Ja tuo plakaatit
mukanasi kun palaat! Jäljennöksiä otettakoon niistä halusti. Hyvästi,
Fabbe, onnea matkallesi.
JÖNKÖPINGIN LINNASSA.
Kun Fabbe nyt oli niin likellä linnaa, ei hän tahtonut mennä siitä
sivutse hyvästi jättämättä sikäläisiä tuttaviaan. Hataraa laskusiltaa
myöten, joka ratisi ja ritisi saumoissaan kun se ohjesäännön
mukaisesti iltasin nostettiin ja aamusin laskettiin, pääsi kuivuneen
vallihaudan yli, johon linnansotilaat olivat kaalitarhoja istuttaneet.
Vallin halki aukeni kahden ruostuneen vallipyssyn alitse holvi
päättyen porttiin. Fabbe soitti: halkioin portinkello kilistä rämähteli
surkeasti, vastahakoinen lukko naukui nuivasti avainta
väännettäissä, ja Fabbe tervehti linnan vahtimestaria sekä
nelimiehistä linnanväkeä, joka täysilukuisena oli pihalla tornin
edustalla. Vahtimestari oli kuudenkymmenen ikäinen, pitkä, varteva
mies, vielä voimissaan. Yksi noista neljästä käveli vartioiden vankilan
edustalla; hän oli valkopartainen ja näytti käyvän seitsemättä
vuosikymmentänsä. Yksi ainoa oli nuori, se joka vahtimestarin
kanssa vuorotellen ratsasti lennättämässä tietoja esivallalle.
— Etkö tiedä?
— En.
TALAVIDIN MAJATALOSSA.
— Se olematonko villakoira?
Fabbe, joka oli palannut paikalleen rahille oven suuhun, lausui nyt
onnentoivotuksia sen johdosta että Arvi herra niin nuorella ijällä oli
päässyt niin ylhäisten seurusteluun. Fabbe lisäsi arvelunsa, että Arvi
herrasta kai piakkoin tulee valtakunnan neuvos. Eipä hän olisi
ensimmäinen suvustaan, joka niin korkealle nousisi. Pietari
Maununpoika, joka on Gudmund mestarin orpana, on saman
kunnian asteelle kiivennyt.
Fabbe lupasi toiste muistaa, kuinka komea herra Arvi on, eikä
ennen aikojaan rientää puhumaan. Pitäisikö hänen myös odottaa
sopivaa aikaa laulaakseen? Ja kun Arvi herra ei suvainnut vastata
tähän kysymykseen, alkoi Fabbe lasketella jonkun tuntemattoman
tekemää laulua, joka hiljan oli levinnyt kartanolaisten keskelle ja
sieltä siirtynyt syvemmälle kansaan:
— Se oli pilkkalaulu.
— Mitä tarkotat?
— En tiedä.
— Mutta jos nyt niin kävisi ja sinä joutuisit Slatten tielle, sanon
sulle: älä ole arka!
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com