Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (10 votes)
45 views

(Ebook) Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python by Akshay Kulkarni, Adarsha Shivananda ISBN 9781484273500, 1484273508 All Chapters Instant Download

The document provides information about various ebooks focused on Natural Language Processing (NLP) and related topics, authored by Akshay Kulkarni and Adarsha Shivananda. It highlights the importance of unlocking unstructured data and offers a comprehensive guide to implementing NLP techniques using Python, including advanced methods and real-world applications. The book is aimed at intermediate Python programmers with a foundational understanding of machine learning and NLP.

Uploaded by

swamyjinnyqp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (10 votes)
45 views

(Ebook) Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python by Akshay Kulkarni, Adarsha Shivananda ISBN 9781484273500, 1484273508 All Chapters Instant Download

The document provides information about various ebooks focused on Natural Language Processing (NLP) and related topics, authored by Akshay Kulkarni and Adarsha Shivananda. It highlights the importance of unlocking unstructured data and offers a comprehensive guide to implementing NLP techniques using Python, including advanced methods and real-world applications. The book is aimed at intermediate Python programmers with a foundational understanding of machine learning and NLP.

Uploaded by

swamyjinnyqp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Download the Full Ebook and Access More Features - ebooknice.

com

(Ebook) Natural Language Processing Recipes:


Unlocking Text Data with Machine Learning and Deep
Learning Using Python by Akshay Kulkarni, Adarsha
Shivananda ISBN 9781484273500, 1484273508
https://ebooknice.com/product/natural-language-processing-
recipes-unlocking-text-data-with-machine-learning-and-deep-
learning-using-python-34204404

OR CLICK HERE

DOWLOAD EBOOK

Download more ebook instantly today at https://ebooknice.com


Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...

Start reading on any device today!

(Ebook) Natural Language Processing Recipes: Unlocking


Text Data with Machine Learning and Deep Learning Using
Python by Akshay Kulkarni, Adarsha Shivananda ISBN
9781484273517, 9781484273500, 1484273508, 1484273516
https://ebooknice.com/product/natural-language-processing-recipes-
unlocking-text-data-with-machine-learning-and-deep-learning-using-
python-34204734
ebooknice.com

(Ebook) Time Series Algorithms Recipes: Implement Machine


Learning and Deep Learning Techniques with Python by
Akshay Kulkarni, Adarsha Shivananda, Anoosh Kulkarni, V
Adithya Krishnan ISBN 9781484289785, 9781484289778,
https://ebooknice.com/product/time-series-algorithms-recipes-
1484289781, 1484289773
implement-machine-learning-and-deep-learning-techniques-with-
python-47430754
ebooknice.com

(Ebook) Natural Language Processing Projects: Build Next-


Generation NLP Applications Using AI Techniques by Akshay
Kulkarni, Adarsha Shivananda, Anoosh Kulkarni ISBN
9781484273852, 1484273850
https://ebooknice.com/product/natural-language-processing-projects-
build-next-generation-nlp-applications-using-ai-techniques-36506460

ebooknice.com

(Ebook) Applied Natural Language Processing with Python:


Implementing Machine Learning and Deep Learning Algorithms
for Natural Language Processing by Taweh Beysolow II ISBN
9781484237328, 1484237323
https://ebooknice.com/product/applied-natural-language-processing-
with-python-implementing-machine-learning-and-deep-learning-
algorithms-for-natural-language-processing-7185412
ebooknice.com
(Ebook) Python Natural Language Processing: Advanced
machine learning and deep learning techniques for natural
language processing by Jalaj Thanaki ISBN 9781787121423,
1787121429
https://ebooknice.com/product/python-natural-language-processing-
advanced-machine-learning-and-deep-learning-techniques-for-natural-
language-processing-7212480
ebooknice.com

(Ebook) Deep Learning for Natural Language Processing:


Develop Deep Learning Models for Natural Language in
Python by Jason Brownlee
https://ebooknice.com/product/deep-learning-for-natural-language-
processing-develop-deep-learning-models-for-natural-language-in-
python-11575184
ebooknice.com

(Ebook) Introduction to Prescriptive AI: A Primer for


Decision Intelligence Solutioning with Python by Akshay
Kulkarni, Adarsha Shivananda, Avinash Manure ISBN
9781484295670, 1484295676
https://ebooknice.com/product/introduction-to-prescriptive-ai-a-
primer-for-decision-intelligence-solutioning-with-python-50765970

ebooknice.com

(Ebook) Advanced Data Analytics Using Python: With Machine


Learning, Deep Learning and NLP Examples by Mukhopadhyay,
Sayan ISBN 9781484234495, 1484234499
https://ebooknice.com/product/advanced-data-analytics-using-python-
with-machine-learning-deep-learning-and-nlp-examples-55670484

ebooknice.com

(Ebook) Computer Vision Projects with PyTorch: Design and


Develop Production-Grade Models by Akshay Kulkarni,
Adarsha Shivananda, Nitin Ranjan Sharma ISBN
9781484282724, 1484282728
https://ebooknice.com/product/computer-vision-projects-with-pytorch-
design-and-develop-production-grade-models-44187626

ebooknice.com
Akshay Kulkarni and Adarsha Shivananda

Natural Language Processing Recipes


Unlocking Text Data with Machine Learning and
Deep Learning Using Python
2nd ed.
Akshay Kulkarni
Bangalore, Karnataka, India

Adarsha Shivananda
Bangalore, Karnataka, India

ISBN 978-1-4842-7350-0 e-ISBN 978-1-4842-7351-7


https://doi.org/10.1007/978-1-4842-7351-7

© Akshay Kulkarni and Adarsha Shivananda 2021

Apress Standard

The use of general descriptive names, registered names, trademarks,


service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.

The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress


Media, LLC part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY
10004, U.S.A.
To our families
Introduction
According to industry estimates, more than 80% of the data being
generated is in an unstructured format in the form of text, images,
audio, or video. Data is being generated as we speak, write, tweet, use
social media platforms, send messages on messaging platforms, use
ecommerce to shop, and do various other activities. The majority of this
data exists in textual form.

So, what is unstructured data? Unstructured data is information that


doesn’t reside in a traditional relational database. Examples include
documents, blogs, social media feeds, pictures, and videos.
Most of the insights are locked within different types of
unstructured data. Unlocking unstructured data plays a vital role in
every organization wanting to make improved and better decisions.
This book unlocks the potential of textual data.
Textual data is the most common and comprises more than 50% of
unstructured data. Examples include tweets/posts on social media, chat
conversations, news, blogs, articles, product or services reviews, and
patient records in the healthcare sector. Recent examples include voice-
driven bots like Siri and Alexa.
To retrieve significant and actionable insights from textual data and
unlock its potential, we use natural language processing coupled with
machine learning and deep learning.
But what is natural language processing? Machines and algorithms
do not understand text or characters, so it is very important to convert
textual data into a machine-understandable format (like numbers or
binary) to analyze it. Natural language processing (NLP) allows
machines to understand and interpret the human language.
If you want to use the power of unstructured text, this book is the
right starting point. This book unearths the concepts and
implementation of natural language processing and its applications in
the real world. NLP offers unbounded opportunities for solving
interesting problems in artificial intelligence, making it the latest
frontier for developing intelligent, deep learning–based applications.
What Does This Book Cover?
Natural Language Processing Recipes is a handy problem/solution
reference for learning and implementing NLP solutions using Python.
The book is packed with lots of code and approaches that help you
quickly learn and implement both basic and advanced NLP techniques.
You will learn how to efficiently use a wide range of NLP packages,
implement text classification, and identify parts of speech. You also
learn about topic modeling, text summarization, text generation,
sentiment analysis, and many other NLP applications.
This new edition of Natural Language Processing Recipes focuses on
implementing end-to-end projects using Python and leveraging cutting-
edge algorithms and transfer learning.
The book begins by discussing text data collections, web scraping,
and different types of data sources. You learn how to clean and
preprocess text data and analyze it using advanced algorithms.
Throughout the book, you explore the semantic as well as syntactic
analysis of text. It covers complex NLP solutions that involve text
normalization, various advanced preprocessing methods, part-of-
speech (POS) tagging, parsing, text summarization, sentiment analysis,
topic modeling, named-entity recognition (NER), word2vec, seq2seq,
and more.
The book covers both fundamental and state-of-the-art techniques
used in machine learning applications and deep learning natural
language processing. This edition includes various advanced techniques
to convert text to features, like GloVe, ELMo, and BERT. It also explains
how transformers work, using Sentence-BERT and GPT as examples.
The book closes by discussing some of the advanced industrial
applications of NLP with a solution approach and implementation, also
leveraging the power of deep learning techniques for natural language
processing and natural language generation problems, employing
advanced RNNs, like long short-term memory, to solve complex text
generation tasks. It also explores embeddings—high-quality
representations of words in a language.
In this second edition, few advanced state-of-art embeddings and
industrial applications are explained along with end-to-end
implementation using deep learning.
Each chapter includes several code examples and illustrations.
By the end of the book, you will have a clear understanding of
implementing natural language processing. You will have worked on
multiple examples that implement NLP techniques in the real world.
Readers will be comfortable with various NLP techniques coupled with
machine learning and deep learning and its industrial applications,
making the NLP journey much more interesting and improving your
Python coding skills.
Who This Book Is For
This book explains various concepts and implementations to get more
clarity when applying NLP algorithms to chosen data. You learn about
all the ingredients you need to become successful in the NLP space.
Fundamental Python skills are assumed, as well as some knowledge of
machine learning and basic NLP. If you are an NLP or machine learning
enthusiast and an intermediate Python programmer who wants to
quickly master natural language processing, this learning path will do
you a lot of good.
All you need to know are the basics of machine learning and Python
to enjoy the book.

What You Will Learn


The core concepts of implementing NLP, its various approaches, and
using Python libraries such as NLTK, TextBlob, spaCy, and Stanford
CoreNLP
Text preprocessing and feature engineering in NLP along with
advanced methods of feature engineering
Information retrieval, text summarization, sentiment analysis, text
classification, and other advanced NLP techniques solved leveraging
machine learning and deep learning
The problems faced by industries and how to implement them using
NLP techniques
Implementing an end-to-end pipeline of NLP life cycle projects,
which includes framing the problem, finding the data, collecting,
preprocessing the data, and solving it using cutting-edge techniques
and tools

What Do You Need for This Book?


To perform all the recipes in this book successfully, you need Python 3.x
or higher running on any Windows- or Unix-based operating system
with a processor of 2.0 GHz or higher and a minimum of 4 GB RAM. You
can download Python from Anaconda and leverage a Jupyter notebook
for coding purposes. This book assumes you know Keras basics and
how to install the basic machine learning and deep learning libraries.
Please make sure you upgrade or install the latest version of all the
libraries.
Python is the most popular and widely used tool for building NLP
applications. It has many sophisticated libraries to perform NLP tasks,
from basic preprocessing to advanced techniques.
To install any library in a Python Jupyter notebook, use ! before the
pip install.
NLTK is a natural language toolkit and is commonly called “the
mother of all NLP libraries.” It is one of the primary resources when it
comes to Python and NLP.

!pip install nltk


nltk.download()

spaCy is a trending library that comes with the added flavors of a


deep learning framework. Although spaCy doesn’t cover all NLP
functionalities, it does many things well.

!pip install spacy


#if above doesn't work, try this in your terminal/
command prompt
conda install spacy
python -m spacy.en.download all
#then load model via
spacy.load('en')

TextBlob is one of data scientists’ favorite libraries when it comes


to implementing NLP tasks. It is based on both NLTK and Pattern.
TextBlob isn’t the fastest or most complete library, however.

!pip install textblob

CoreNLP is a Python wrapper for Stanford CoreNLP. The toolkit


provides robust, accurate, and optimized techniques for tagging,
parsing, and analyzing text in various languages.
!pip install CoreNLP
There are hundreds of other NLP libraries, but these are the widely
used and important ones.
There is an immense number of NLP industrial applications that are
leveraged to uncover insights. By the end of the book, you will have
implemented many of these use cases, from framing a business
problem to building applications and drawing business insights. The
following are some examples.
Sentiment analysis—a customer’s emotions toward products offered
by the business
Topic modeling extracts the unique topics from the group of
documents.
Complaint classifications/email classifications/ecommerce product
classification, and so on
Document categorization/management using different clustering
techniques.
Résumé shortlisting and job description matching using similarity
methods
Advanced feature engineering techniques (word2vec and fastText) to
capture context
Information/document retrieval systems, for example, search
engines
Chatbots, Q&A, and voice-to-text applications like Siri, Alexa
Language detection and translation using neural networks
Text summarization using graph methods and advanced techniques
Text generation/predicting the next sequence of words using deep
learning algorithms
Acknowledgments
We are grateful to our families for their motivation and constant
support.
We want to express our gratitude to out mentors and friends for
their input, inspiration, and support. A special thanks to Anoosh R.
Kulkarni, a data scientist at Quantziq, for his support in writing this
book and his technical input. A big thanks to the Apress team for their
constant support and help.
Finally, we would like to thank you, the reader, for showing an
interest in this book and making your natural language processing
journey more exciting.
Note that the views and opinions expressed in this book are those of
the authors.
Table of Contents
Chapter 1:​Extracting the Data
Introduction
Client Data
Free Sources
Web Scraping
Recipe 1-1.​Collecting Data
Problem
Solution
How It Works
Recipe 1-2.​Collecting Data from PDFs
Problem
Solution
How It Works
Recipe 1-3.​Collecting Data from Word Files
Problem
Solution
How It Works
Recipe 1-4.​Collecting Data from JSON
Problem
Solution
How It Works
Recipe 1-5.​Collecting Data from HTML
Problem
Solution
How It Works
Recipe 1-6.​Parsing Text Using Regular Expressions
Problem
Solution
How It Works
Recipe 1-7.​Handling Strings
Problem
Solution
How It Works
Recipe 1-8.​Scraping Text from the Web
Problem
Solution
How It Works
Chapter 2:​Exploring and Processing Text Data
Recipe 2-1.​Converting Text Data to Lowercase
Problem
Solution
How It Works
Recipe 2-2.​Removing Punctuation
Problem
Solution
How It Works
Recipe 2-3.​Removing Stop Words
Problem
Solution
How It Works
Recipe 2-4.​Standardizing Text
Problem
Solution
How It Works
Recipe 2-5.​Correcting Spelling
Problem
Solution
How It Works
Recipe 2-6.​Tokenizing Text
Problem
Solution
How It Works
Recipe 2-7.​Stemming
Problem
Solution
How It Works
Recipe 2-8.​Lemmatizing
Problem
Solution
How It Works
Recipe 2-9.​Exploring Text Data
Problem
Solution
How It Works
Recipe 2-10.​Dealing with Emojis and Emoticons
Problem
Solution
How It Works
Problem
Solution
How It Works
Problem
Solution
How It Works
Problem
Solution
How It Works
Problem
Solution
How It Works
Recipe 2-11.​Building a Text Preprocessing Pipeline
Problem
Solution
How It Works
Chapter 3:​Converting Text to Features
Recipe 3-1.​Converting Text to Features Using One-Hot
Encoding
Problem
Solution
How It Works
Recipe 3-2.​Converting Text to Features Using a Count
Vectorizer
Problem
Solution
How It Works
Recipe 3-3.​Generating n-grams
Problem
Solution
How It Works
Recipe 3-4.​Generating a Co-occurrence Matrix
Problem
Solution
How It Works
Recipe 3-5.​Hash Vectorizing
Problem
Solution
How It Works
Recipe 3-6.​Converting Text to Features Using TF-IDF
Problem
Solution
How It Works
Recipe 3-7.​Implementing Word Embeddings
Problem
Solution
How It Works
Recipe 3-8.​Implementing fastText
Problem
Solution
How It Works
Recipe 3-9.​Converting Text to Features Using State-of-the-Art
Embeddings
Problem
Solution
ELMo
Sentence Encoders
Open-AI GPT
How It Works
Chapter 4:​Advanced Natural Language Processing
Recipe 4-1.​Extracting Noun Phrases
Problem
Solution
How It Works
Recipe 4-2.​Finding Similarity Between Texts
Solution
How It Works
Recipe 4-3.​Tagging Part of Speech
Problem
Solution
How It Works
Recipe 4-4.​Extracting Entities from Text
Problem
Solution
How It Works
Recipe 4-5.​Extracting Topics from Text
Problem
Solution
How It Works
Recipe 4-6.​Classifying Text
Problem
Solution
How It Works
Recipe 4-7.​Carrying Out Sentiment Analysis
Problem
Solution
How It Works
Recipe 4-8.​Disambiguating Text
Problem
Solution
How It Works
Recipe 4-9.​Converting Speech to Text
Problem
Solution
How It Works
Recipe 4-10.​Converting Text to Speech
Problem
Solution
How It Works
Recipe 4-11.​Translating Speech
Problem
Solution
How It Works
Chapter 5:​Implementing Industry Applications
Recipe 5-1.​Implementing Multiclass Classification
Problem
Solution
How It Works
Recipe 5-2.​Implementing Sentiment Analysis
Problem
Solution
How It Works
Recipe 5-3.​Applying Text Similarity Functions
Problem
Solution
How It Works
Recipe 5-4.​Summarizing Text Data
Problem
Solution
How It Works
Recipe 5-5.​Clustering Documents
Problem
Solution
How It Works
Recipe 5-6.​NLP in a Search Engine
Problem
Solution
How It Works
Recipe 5-7.​Detecting Fake News
Problem
Solution
How It Works
Recipe 5-8.​Movie Genre Tagging
Problem
Solution
How It Works
Chapter 6:​Deep Learning for NLP
Introduction to Deep Learning
Convolutional Neural Networks
Data
Architecture
Convolution
Nonlinearity (ReLU)
Pooling
Flatten, Fully Connected, and Softmax Layers
Backpropagation:​Training the Neural Network
Recurrent Neural Networks
Training RNN:​Backpropagation Through Time (BPTT)
Long Short-Term Memory (LSTM)
Recipe 6-1.​Retrieving Information
Problem
Solution
How It Works
Recipe 6-2.​Classifying Text with Deep Learning
Problem
Solution
How It Works
Recipe 6-3.​Next Word Prediction
Problem
Solution
How It Works
Recipe 6-4.​Stack Overflow question recommendation
Problem
Solution
How It Works
Chapter 7:​Conclusion and Next-Gen NLP
Recipe 7-1.​Recent advancements in text to features or
distributed representations
Problem
Solution
Recipe 7-2.​Advanced deep learning for NLP
Problem
Solution
Recipe 7-3.​Reinforcement learning applications in NLP
Problem
Solution
Recipe 7-4.​Transfer learning and pre-trained models
Problem
Solution
Recipe 7-5.​Meta-learning in NLP
Problem
Solution
Recipe 7-6.​Capsule networks for NLP
Problem
Solution
Index
About the Authors
Akshay Kulkarni
is a renowned AI and machine learning
evangelist and thought leader. He has
consulted several Fortune 500 and global
enterprises on driving AI and data
science–led strategic transformation.
Akshay has rich experience in building
and scaling AI and machine learning
businesses and creating significant
impact. He is currently a data science
and AI manager at Publicis Sapient,
where he is part of strategy and
transformation interventions through AI.
He manages high-priority growth
initiatives around data science and
works on various artificial intelligence engagements by applying state-
of-the-art techniques to this space.
Akshay is also a Google Developers Expert in machine learning, a
published author of books on NLP and deep learning, and a regular
speaker at major AI and data science conferences.
In 2019, Akshay was named one of the top “40 under 40 data
scientists” in India.
In his spare time, he enjoys reading, writing, coding, and mentoring
aspiring data scientists. He lives in Bangalore, India, with his family.

Adarsha Shivananda
is a lead data scientist at Indegene Inc.’s product and technology team,
where he leads a group of analysts who enable predictive analytics and
AI features to healthcare software products. These are mainly
multichannel activities for pharma products and solving the real-time
problems encountered by pharma sales reps. Adarsha aims to build a
pool of exceptional data scientists within the organization to solve
greater health care problems through
brilliant training programs. He always
wants to stay ahead of the curve.
His core expertise involves machine
learning, deep learning,
recommendation systems, and statistics.
Adarsha has worked on various data
science projects across multiple domains
using different technologies and
methodologies. Previously, he worked
for Tredence Analytics and IQVIA.
He lives in Bangalore, India, and loves
to read, ride, and teach data science.
About the Technical Reviewer
Aakash Kag
is a data scientist at AlixPartners and is a
co-founder of the Emeelan application.
He has six years of experience in big data
analytics and has a postgraduate degree
in computer science with a specialization
in big data analytics. Aakash is
passionate about developing social
platforms, machine learning, and
meetups, where he often talks.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
A. Kulkarni, A. Shivananda, Natural Language Processing Recipes
https://doi.org/10.1007/978-1-4842-7351-7_1

1. Extracting the Data


Akshay Kulkarni1 and Adarsha Shivananda1
(1) Bangalore, Karnataka, India

This chapter covers various sources of text data and the ways to extract it. Textual data can act as
information or insights for businesses. The following recipes are covered.
Recipe 1. Text data collection using APIs
Recipe 2. Reading a PDF file in Python
Recipe 3. Reading a Word document
Recipe 4. Reading a JSON object
Recipe 5. Reading an HTML page and HTML parsing
Recipe 6. Regular expressions
Recipe 7. String handling
Recipe 8. Web scraping

Introduction
Before getting into the details of the book, let’s look at generally available data sources. We need to identify
potential data sources that can help with solving data science use cases.

Client Data
For any problem statement, one of the sources is the data that is already present. The business decides
where it wants to store its data. Data storage depends on the type of business, the amount of data, and the
costs associated with the sources. The following are some examples.
SQL databases
HDFS
Cloud storage
Flat files

Free Sources
A large amount of data is freely available on the Internet. You just need to streamline the problem and start
exploring multiple free data sources.
Free APIs like Twitter
Wikipedia
Government data (e.g., http://data.gov)
Census data (e.g., www.census.gov/data.html)
Health care claim data (e.g., www.healthdata.gov)
Data science community websites (e.g., www.kaggle.com)
Google dataset search (e.g., https://datasetsearch.research.google.com)

Web Scraping
Extracting the content/data from websites, blogs, forums, and retail websites for reviews with permission
from the respective sources using web scraping packages in Python.
There are a lot of other sources, such as news data and economic data, that can be leveraged for analysis.

Recipe 1-1. Collecting Data


There are a lot of free APIs through which you can collect data and use it to solve problems. Let’s discuss the
Twitter API.

Problem
You want to collect text data using Twitter APIs.

Solution
Twitter has a gigantic amount of data with a lot of value in it. Social media marketers make their living from
it. There is an enormous number of tweets every day, and every tweet has some story to tell. When all of this
data is collected and analyzed, it gives a business tremendous insights about their company, product,
service, and so forth.
Let’s now look at how to pull data and then explore how to leverage it in the coming chapters.

How It Works
Step 1-1. Log in to the Twitter developer portal
Log in to the Twitter developer portal at https://developer.twitter.com.
Create your own app in the Twitter developer portal, and get the following keys. Once you have these
credentials, you can start pulling data.
consumer key: The key associated with the application (Twitter, Facebook, etc.)
consumer secret: The password used to authenticate with the authentication server (Twitter, Facebook,
etc.)
access token: The key given to the client after successful authentication of keys
access token secret: The password for the access key

Step 1-2. Execute query in Python


Once all the credentials are in place, use the following code to fetch the data.

# Install tweepy
!pip install tweepy
# Import the libraries
import numpy as np
import tweepy
import json
import pandas as pd
from tweepy import OAuthHandler
# credentials
consumer_key = "adjbiejfaaoeh"
consumer_secret = "had73haf78af"
access_token = "jnsfby5u4yuawhafjeh"
access_token_secret = "jhdfgay768476r"
# calling API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Provide the query you want to pull the data. For example, pulling data for
the mobile phone ABC
query ="ABC"
# Fetching tweets
Tweets = api.search(query, count =
10,lang='en',exclude='retweets',tweet_mode='extended')
This query pulls the top ten tweets when product ABC is searched. The API pulls English tweets since the
language given is 'en'. It excludes retweets.

Recipe 1-2. Collecting Data from PDFs


Most of your data is stored in PDF files. You need to extract text from these files and store it for further
analysis.

Problem
You want to read a PDF file.

Solution
The simplest way to read a PDF file is by using the PyPDF2 library.

How It Works
Follow the steps in this section to extract data from PDF files.

Step 2-1. Install and import all the necessary libraries


Here are the first lines of code .

!pip install PyPDF2


import PyPDF2
from PyPDF2 import PdfFileReader

Note You can download any PDF file from the web and place it in the location where you are running
this Jupyter notebook or Python script.

Step 2-2. Extract text from a PDF file


Now let’s extract the text.

#Creating a pdf file object


pdf = open("file.pdf","rb")
#creating pdf reader object
pdf_reader = PyPDF2.PdfFileReader(pdf)
#checking number of pages in a pdf file
print(pdf_reader.numPages)
#creating a page object
page = pdf_reader.getPage(0)
#finally extracting text from the page
print(page.extractText())
#closing the pdf file
pdf.close()

Please note that the function doesn’t work for scanned PDFs.

Recipe 1-3. Collecting Data from Word Files


Next, let’s look at another small recipe that reads Word files in Python.

Problem
You want to read Word files .

Solution
The simplest way is to use the docx library.

How It Works
Follow the steps in this section to extract data from a Word file.

Step 3-1. Install and import all the necessary libraries


The following is the code to install and import the docx library.

#Install docx
!pip install docx
#Import library
from docx import Document

Note You can download any Word file from the web and place it in the location where you are running a
Jupyter notebook or Python script.

Step 3-2. Extract text from a Word file


Now let’s get the text .

#Creating a word file object


doc = open("file.docx","rb")
#creating word reader object
document = docx.Document(doc)
#create an empty string and call this document. #This document variable
stores each paragraph in the Word document.
#We then create a "for" loop that goes through each paragraph in the Word
document and appends the paragraph.
docu=""
for para in document.paragraphs.
docu += para.text
#to see the output call docu
print(docu)

Recipe 1-4. Collecting Data from JSON


JSON is an open standard file format that stands for JavaScript Object Notation. It’s often used when data is
sent to a webpage from a server. This recipe explains how to read a JSON file/object.

Problem
You want to read a JSON file/object.

Solution
The simplest way is to use requests and the JSON library.

How It Works
Follow the steps in this section to extract data from JSON.

Step 4-1. Install and import all the necessary libraries


Here is the code for importing the libraries.

import requests
import json
Step 4-2. Extract text from a JSON file
Now let’s extract the text .

#extracting the text from "https://quotes.rest/qod.json"


r = requests.get("https://quotes.rest/qod.json")
res = r.json()
print(json.dumps(res, indent = 4))
#output
{
"success": {
"total": 1
},
"contents": {
"quotes": [
{
"quote": "Where there is ruin, there is hope for a
treasure.",
"length": "50",
"author": "Rumi",
"tags": [
"failure",
"inspire",
"learning-from-failure"
],
"category": "inspire",
"date": "2018-09-29",
"permalink":
"https://theysaidso.com/quote/dPKsui4sQnQqgMnXHLKtfweF/rumi-where-there-is-
ruin-there-is-hope-for-a-treasure",
"title": "Inspiring Quote of the day",
"background":
"https://theysaidso.com/img/bgs/man_on_the_mountain.jpg",
"id": "dPKsui4sQnQqgMnXHLKtfweF"
}
],
"copyright": "2017-19 theysaidso.com"
}
}
#extract contents
q = res['contents']['quotes'][0]
q
#output
{'author': 'Rumi',
'background': 'https://theysaidso.com/img/bgs/man_on_the_mountain.jpg',
'category': 'inspire',
'date': '2018-09-29',
'id': 'dPKsui4sQnQqgMnXHLKtfweF',
'length': '50',
'permalink': 'https://theysaidso.com/quote/dPKsui4sQnQqgMnXHLKtfweF/rumi-
where-there-is-ruin-there-is-hope-for-a-treasure',
'quote': 'Where there is ruin, there is hope for a treasure.',
'tags': ['failure', 'inspire', 'learning-from-failure'],
'title': 'Inspiring Quote of the day'}
#extract only quote
print(q['quote'], '\n--', q['author'])
#output
It wasn't raining when Noah built the ark....
-- Howard Ruff

Recipe 1-5. Collecting Data from HTML


HTML is short for HyperText Markup Language. It structures webpages and displays them in a browser.
There are various HTML tags that build the content. This recipe looks at reading HTML pages .

Problem
You want to read parse/read HTML pages.

Solution
The simplest way is to use the bs4 library.

How It Works
Follow the steps in this section to extract data from the web.

Step 5-1. Install and import all the necessary libraries


First, import the libraries .

!pip install bs4


import urllib.request as urllib2
from bs4 import BeautifulSoup

Step 5-2. Fetch the HTML file


You can pick any website that you want to extract. Let’s use Wikipedia in this example.

response =
urllib2.urlopen('https://en.wikipedia.org/wiki/Natural_language_processing')
html_doc = response.read()

Step 5-3. Parse the HTML file


Now let’s get the data.

#Parsing
soup = BeautifulSoup(html_doc, 'html.parser')
# Formating the parsed html file
strhtm = soup.prettify()
# Print few lines
print (strhtm[:1000])
#output
<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>
Natural language processing - Wikipedia
</title>
<script>
document.documentElement.className = document.documentElement.className.rep
</script>
<script>
(window.RLQ=window.RLQ||[]).push(function()
{mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"
processing","wgCurRevisionId":860741853,"wgRevisionId":860741853,"wgArticleId"
["*"],"wgCategories":["Webarchive template wayback links","All accuracy disput
identifiers","Natural language processing","Computational linguistics","Speech

Step 5-4. Extract a tag value


You can extract a tag’s value from the first instance of the tag using the following code.

print(soup.title)
print(soup.title.string)
print(soup.a.string)
print(soup.b.string)
#output
<title>Natural language processing - Wikipedia</title>
Natural language processing - Wikipedia
None
Natural language processing

Step 5-5. Extract all instances of a particular tag


Here we get all the instances of the tag that we are interested in.

for x in soup.find_all('a'): print(x.string)


#sample output
None
Jump to navigation
Jump to search
Language processing in the brain
None
None
automated online assistant
customer service
[1]
computer science
artificial intelligence
natural language
speech recognition
natural language understanding
natural language generation

Step 5-6. Extract all text from a particular tag


Finally, we get the text .

for x in soup.find_all('p'): print(x.text)


#sample output
Natural language processing (NLP) is an area of computer science and
artificial intelligence concerned with the interactions between computers
and human (natural) languages, in particular how to program computers to
process and analyze large amounts of natural language data.
Challenges in natural language processing frequently involve speech
recognition, natural language understanding, and natural language
generation.
The history of natural language processing generally started in the 1950s,
although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Intelligence" which
proposed what is now called the Turing test as a criterion of intelligence.
Note that the p tag extracted most of the text on the page.

Recipe 1-6. Parsing Text Using Regular Expressions


This recipe discusses how regular expressions are helpful when dealing with text data. Regular expressions
are required when dealing with raw data from the web that contains HTML tags, long text, and repeated text.
During the process of developing your application, as well as in output, you don’t need such data.
You can do allsorts of basic and advanced data cleaning using regular expressions.

Problem
You want to parse text data using regular expressions.

Solution
The best way is to use the re library in Python.

How It Works
Let’s look at some of the ways we can use regular expressions for our tasks.
The basic flags are I, L, M, S, U, X.
re.I ignores casing.
re.L finds a local dependent.
re.M finds patterns throughout multiple lines.
re.S finds dot matches.
re.U works for Unicode data.
re.X writes regex in a more readable format.
The following describes regular expressions’ functionalities .
Find a single occurrence of characters a and b: [ab]
Find characters except for a and b: [^ab]
Find the character range of a to z: [a-z]
Find a character range except a to z: [^a-z]
Find all the characters from both a to z and A to Z: [a-zA-Z]
Find any single character: []
Find any whitespace character: \s
Find any non-whitespace character: \S
Find any digit: \d
Find any non-digit: \D
Find any non-words: \W
Find any words: \w
Find either a or b: (a|b)
The occurrence of a is either zero or one
Matches zero or not more than one occurrence: a? ; ?
The occurrence of a is zero or more times: a* ; * matches zero or more than that
The occurrence of a is one or more times: a+ ; + matches occurrences one or more
than one time
Match three simultaneous occurrences of a: a{3}
Match three or more simultaneous occurrences of a: a{3,}
Match three to six simultaneous occurrences of a: a{3,6}
Start of a string: ^
End of a string: $
Match word boundary: \b
Non-word boundary: \B
The re.match() and re.search() functions find patterns, which are then processed according to
the requirements of the application.
Let’s look at the differences between re.match() and re.search().
re.match() checks for a match only at the beginning of the string. So, if it finds a pattern at the
beginning of the input string, it returns the matched pattern; otherwise, it returns a noun.
re.search() checks for a match anywhere in the string. It finds all the occurrences of the pattern in the
given input string or data.
Now let’s look at a few examples using these regular expressions.

Tokenizing
Tokenizing means splitting a sentence into words. One way to do this is to use re.split.

# Import library
import re
#run the split query
re.split('\s+','I like this book.')
['I', 'like', 'this', 'book.']

For an explanation of regex, please refer to the main recipe.

Extracting Email IDs


The simplest way to extract email IDs is to use re.findall.
1. Read/create the document or sentences.

doc = "For more details please mail us at: xyz@abc.com, pqr@mno.com"

2. Execute the re.findall function.

addresses = re.findall(r'[\w\.-]+@[\w\.-]+', doc)


for address in addresses.
print(address)
#Output
xyz@abc.com
pqr@mno.com

Replacing Email IDs


Let’s replace email IDs in sentences or documents with other email IDs. The simplest way to do this is by
using re.sub.
1. Read/create the document or sentences.

doc = "For more details please mail us at xyz@abc.com"

2. Execute the re.sub function.

new_email_address = re.sub(r'([\w\.-]+)@([\w\.-]+)', r'pqr@mno.com', doc)


print(new_email_address)
#Output
For more details please mail us at pqr@mno.com
For an explanation of regex, please refer to Recipe 1-6.
If you observe in both instances when dealing with email using regex, we have implemented a very basic
one. We state that words separated by @ help capture email IDs. However, there could be many edge cases;
for example, the dot (.) incorporates domain names and handles numbers, the + (plus sign), and so on,
because they can be part of an email ID.
The following is an advanced regex to extract/find/replace email IDs.

([a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)

There are even more complex ones to handle all the edge cases (e.g., “.co.in” email IDs). Please give it a
try.

Extracting Data from an eBook and Performing regex


Let’s solve a case study that extracts data from an ebook by using the techniques you have learned so far.
1. Extract the content from the book.

# Import library
import re
import requests
#url you want to extract
url = 'https://www.gutenberg.org/files/2638/2638-0.txt'
#function to extract
def get_book(url).
# Sends a http request to get the text from project Gutenberg
raw = requests.get(url).text
# Discards the metadata from the beginning of the book
start = re.search(r"\*\*\* START OF THIS PROJECT GUTENBERG EBOOK .*
\*\*\*",raw ).end()
# Discards the metadata from the end of the book
stop = re.search(r"II", raw).start()
# Keeps the relevant text
text = raw[start:stop]
return text
# processing
def preprocess(sentence).
return re.sub('[^A-Za-z0-9.]+' , ' ', sentence).lower()
#calling the above function
book = get_book(url)
processed_book = preprocess(book)
print(processed_book)
# Output
produced by martin adamson david widger with corrections by andrew sly
the idiot by fyodor dostoyevsky translated by eva martin part i i. towards
the end of november during a thaw at nine o clock one morning a train on
the warsaw and petersburg railway was approaching the latter city at full
speed. the morning was so damp and misty that it was only with great
difficulty that the day succeeded in breaking and it was impossible to
distinguish anything more than a few yards away from the carriage windows.
some of the passengers by this particular train were returning from abroad
but the third class carriages were the best filled chiefly with
insignificant persons of various occupations and degrees picked up at the
different stations nearer town. all of them seemed weary and most of them
had sleepy eyes and a shivering expression while their complexions
generally appeared to have taken on the colour of the fog outside. when da
2. Perform an exploratory data analysis on this data using regex.

# Count number of times "the" is appeared in the book


len(re.findall(r'the', processed_book))
#Output
302
#Replace "i" with "I"
processed_book = re.sub(r'\si\s', " I ", processed_book)
print(processed_book)
#output
produced by martin adamson david widger with corrections by andrew sly
the idiot by fyodor dostoyevsky translated by eva martin part I i. towards
the end of november during a thaw at nine o clock one morning a train on
the warsaw and petersburg railway was approaching the latter city at full
speed. the morning was so damp and misty that it was only with great
difficulty that the day succeeded in breaking and it was impossible to
distinguish anything more than a few yards away from the carriage windows.
some of the passengers by this particular train were returning from abroad
but the third class carriages were the best filled chiefly with
insignificant persons of various occupations and degrees picked up at the
different stations nearer town. all of them seemed weary and most of them
had sleepy eyes and a shivering expression while their complexions
generally appeared to have taken on the colour of the fog outside. when da
#find all occurance of text in the format "abc--xyz"
re.findall(r'[a-zA-Z0-9]*--[a-zA-Z0-9]*', book)
#output
['ironical--it',
'malicious--smile',
'fur--or',
'astrachan--overcoat',
'it--the',
'Italy--was',
'malady--a',
'money--and',
'little--to',
'No--Mr',
'is--where',
'I--I',
'I--',
'--though',
'crime--we',
'or--judge',
'gaiters--still',
'--if',
'through--well',
'say--through',
'however--and',
'Epanchin--oh',
'too--at',
'was--and',
'Andreevitch--that',
'everyone--that',
'reduce--or',
'raise--to',
'listen--and',
'history--but',
'individual--one',
'yes--I',
'but--',
't--not',
'me--then',
'perhaps--',
'Yes--those',
'me--is',
'servility--if',
'Rogojin--hereditary',
'citizen--who',
'least--goodness',
'memory--but',
'latter--since',
'Rogojin--hung',
'him--I',
'anything--she',
'old--and',
'you--scarecrow',
'certainly--certainly',
'father--I',
'Barashkoff--I',
'see--and',
'everything--Lebedeff',
'about--he',
'now--I',
'Lihachof--',
'Zaleshoff--looking',
'old--fifty',
'so--and',
'this--do',
'day--not',
'that--',
'do--by',
'know--my',
'illness--I',
'well--here',
'fellow--you']

Recipe 1-7. Handling Strings


This recipe discusses how to handle strings and deal with textual data. You can do all sorts of basic text
explorations using string operations.

Problem
You want to explore handling strings.

Solution
The simplest way is to use the following string functionality.
s.find(t) is an index of the first instance of string t inside s (–1 if not found)
s.rfind(t) is an index of the last instance of string t inside s (–1 if not found)
s.index(t) is like s.find(t) except it raises ValueError if not found
s.rindex(t) is like s.rfind(t) except it raises ValueError if not found
s.join(text) combines the words of the text into a string using s as the glue
s.split(t) splits s into a list wherever a t is found (whitespace by default)
s.splitlines() splits s into a list of strings, one per line
s.lower() is a lowercase version of the string s
s.upper() is an uppercase version of the string s
s.title() is a titlecased version of the string s
s.strip() is a copy of s without leading or trailing whitespace
s.replace(t, u) replaces instances of t with u inside s

How It Works
Now let’s look at a few of the examples.

Replacing Content
Create a string and replace the content. Creating strings is easy. It is done by enclosing the characters in
single or double quotes. And to replace, you can use the replace function.
1. Create a string.

String_v1 = "I am exploring NLP"


#To extract particular character or range of characters from string
print(String_v1[0])
#output
"I"
#To extract the word “exploring”
print(String_v1[5:14])
#output
exploring

2. Replace "exploring" with "learning" in the preceding string.

String_v2 = String_v1.replace("exploring", "learning")


print(String_v2)
#Output
I am learning NLP

Concatenating Two Strings


The following is simple code.

s1 = "nlp"
s2 = "machine learning"
s3 = s1+s2
print(s3)
#output
'nlpmachine learning'

Searching for a Substring in a String


Use the find function to fetch the starting index value of the substring in the whole string.

var="I am learning NLP"


f= "learn"
var.find(f)
#output
5

Recipe 1-8. Scraping Text from the Web


This recipe discusses how to scrape data from the web.

Caution Before scraping any websites, blogs, or ecommerce sites, please make sure you read the site’s
terms and conditions on whether it gives permissions for data scraping. Generally, robots.txt contains the
terms and conditions (e.g., see www.alixpartners.com/robots.txt) and a site map contains a
URL’s map (e.g., see www.alixpartners.com/sitemap.xml).

Web scraping is also known as web harvesting and web data extraction. It is a technique to extract a large
amount of data from websites and save it in a database or locally. You can use this data to extract
information related to your customers, users, or products for the business’s benefit.
A basic understanding of HTML is a prerequisite.

Problem
You want to extract data from the web by scraping. Let’s use IMDB.com as an example of scraping top
movies.

Solution
The simplest way to do this is by using Python’s Beautiful Soup or Scrapy libraries. Let’s use Beautiful Soup
in this recipe.

How It Works
Follow the steps in this section to extract data from the web.

Step 8-1. Install all the necessary libraries


!pip install bs4
!pip install requests

Step 8-2. Import the libraries


from bs4 import BeautifulSoup
import requests
import pandas as pd
from pandas import Series, DataFrame
from ipywidgets import FloatProgress
from time import sleep
from IPython.display import display
import re
import pickle

Step 8-3. Identify the URL to extract the data


url = 'http://www.imdb.com/chart/top?ref_=nv_mv_250_6'

Step 8-4. Request the URL and download the content using Beautiful Soup
result = requests.get(url)
c = result.content
soup = BeautifulSoup(c,"lxml")
Step 8-5. Understand the website’s structure to extract the required information
Go to the website and right-click the page content to inspect the site’s HTML structure.
Identify the data and fields that you want to extract. For example, you want the movie name and IMDB
rating.
Check which div or class in the HTML contains the movie names and parse the Beautiful Soup
accordingly. In this example, you can parse the soup through <table class ="chart full-width">
and <td class="titleColumn"> to extract the movie name.
Similarly, you can fetch other data; refer to the code in step 8-6.

Step 8-6. Use Beautiful Soup to extract and parse the data from HTML tags
summary = soup.find('div',{'class':'article'})
# Create empty lists to append the extracted data .
moviename = []
cast = []
description = []
rating = []
ratingoutof = []
year = []
genre = []
movielength = []
rot_audscore = []
rot_avgrating = []
rot_users = []
# Extracting the required data from the html soup.
rgx = re.compile('[%s]' % '()')
f = FloatProgress(min=0, max=250)
display(f)
for row,i in
zip(summary.find('table').findAll('tr'),range(len(summary.find('table').findAl
for sitem in row.findAll('span',{'class':'secondaryInfo'}).
s = sitem.find(text=True)
year.append(rgx.sub(", s))
for ritem in row.findAll('td',{'class':'ratingColumn imdbRating'}).
for iget in ritem.findAll('strong').
rating.append(iget.find(text=True))
ratingoutof.append(iget.get('title').split(' ', 4)[3])
for item in row.findAll('td',{'class':'titleColumn'}).
for href in item.findAll('a',href=True).
moviename.append(href.find(text=True))
rurl = 'https://www.rottentomatoes.com/m/'+ href.find(text=True)
try.
rresult = requests.get(rurl)
except requests.exceptions.ConnectionError.
status_code = "Connection refused"
rc = rresult.content
rsoup = BeautifulSoup(rc)
try:
rot_audscore.append(rsoup.find('div',{'class':'meter-
value'}).find('span',{'class':'superPageFontColor'}).text)
rot_avgrating.append(rsoup.find('div',{'class':'audience-info
superPageFontColor'}).find('div').contents[2].strip())
rot_users.append(rsoup.find('div',{'class':'audience-info hidd
superPageFontColor'}).contents[3].contents[2].strip())
except AttributeError.
rot_audscore.append("")
rot_avgrating.append("")
rot_users.append("")
cast.append(href.get('title'))
imdb = "http://www.imdb.com" + href.get('href')
try.
iresult = requests.get(imdb)
ic = iresult.content
isoup = BeautifulSoup(ic)
description.append(isoup.find('div',
{'class':'summary_text'}).find(text=True).strip())
genre.append(isoup.find('span',{'class':'itemprop'}).find(text
movielength.append(isoup.find('time',
{'itemprop':'duration'}).find(text=True).strip())
except requests.exceptions.ConnectionError.
description.append("")
genre.append("")
movielength.append("")
sleep(.1)
f.value = i
Note that there is a high chance that you might encounter an error while executing this script because of
the following reasons.
Your request to the URL fails. If so, try again after some time. This is common in web scraping.
The webpages are dynamic, which means the HTML tags keep changing. Study the tags and make small
changes in the code in accordance with HTML, and you should be good to go.

Step 8-7. Convert lists to a data frame and perform an analysis that meets
business requirements
# List to pandas series
moviename = Series(moviename)
cast = Series(cast)
description = Series(description)
rating = Series(rating)
ratingoutof = Series(ratingoutof)
year = Series(year)
genre = Series(genre)
movielength = Series(movielength)
rot_audscore = Series(rot_audscore)
rot_avgrating = Series(rot_avgrating)
rot_users = Series(rot_users)
# creating dataframe and doing analysis
imdb_df = pd.concat([moviename,year,description,genre,movielength,cast,rating,
imdb_df.columns =
['moviename','year','description','genre','movielength','cast','imdb_rating','
imdb_df['rank'] = imdb_df.index + 1
imdb_df.head(1)
#output

Step 8-8. Download the data frame


# Saving the file as CSV.
imdb_df.to_csv("imdbdataexport.csv")

This chapter implemented most of the techniques to extract text data from sources. In the coming
chapters, you look at how to explore, process, and clean data. You also learn about feature engineering and
building NLP applications.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
A. Kulkarni, A. Shivananda, Natural Language Processing Recipes
https://doi.org/10.1007/978-1-4842-7351-7_2

2. Exploring and Processing Text Data


Akshay Kulkarni1 and Adarsha Shivananda1
(1) Bangalore, Karnataka, India

This chapter discusses various methods and techniques to preprocess textual data and
exploratory data analysis. It covers the following recipes.
Recipe 1. Lowercasing
Recipe 2. Punctuation removal
Recipe 3. Stop words removal
Recipe 4. Text standardization
Recipe 5. Spelling correction
Recipe 6. Tokenization
Recipe 7. Stemming
Recipe 8. Lemmatization
Recipe 9. Exploratory data analysis
Recipe 10. Dealing with emojis and emoticons
Recipe 11. End-to-end processing pipeline
Before directly jumping into the recipes, let’s first understand the need for preprocessing
the text data. As you know, about 90% of the world’s data is unstructured and may be present
in the form of an image, text, audio, and video. Text can come in various forms, from a list of
individual words to sentences to multiple paragraphs with special characters (like tweets and
other punctuations). It also may be present in the form of web, HTML, documents, and so on.
And this data is never clean and consists of a lot of noise. It needs to be treated and then
perform a few preprocessing functions to make sure you have the right input data for the
feature engineering and model building. If you don’t preprocess the data, any algorithms built
on top of such data do not add any value to a business. This reminds us of a very popular
phrase in data science: “Garbage in, garbage out.”
Preprocessing involves transforming raw text data into an understandable format. Real-
world data is often incomplete, inconsistent, and filled with a lot of noise, and is likely to
contain many errors. Preprocessing is a proven method of resolving such issues. Data
preprocessing prepares raw text data for further processing.

Recipe 2-1. Converting Text Data to Lowercase


This recipe discusses how to lowercase the text data to have all the data in a uniform format
and make sure “NLP” and “nlp” are treated as the same.

Problem
You want to lowercase the text data.
Solution
The simplest way is to use the default lower() function in Python.
The lower() method converts all uppercase characters in a string to lowercase characters
and returns them.

How It Works
Follow the steps in this section to lowercase a given text or document. Here, Python is used.

Step 1-1. Read/create the text data


Let’s create a list of strings and assign it to a variable .

text=['This is introduction to NLP','It is likely to be useful, to


people ','Machine learning is the new electrcity','There would be
less hype around AI and more action going forward','python is the
best tool!','R is good langauage','I like this book','I want more
books like this']
#convert list to data frame
import pandas as pd
df = pd.DataFrame({'tweet':text})
print(df)
#output
tweet
0 This is introduction to NLP
1 It is likely to be useful, to people
2 Machine learning is the new electrcity
3 There would be less hype around AI and more ac...
4 python is the best tool!
5 R is good langauage
6 I like this book
7 I want more books like this

Step 1-2. Execute the lower() function on the text data


When there is only a string, directly apply the lower() function as follows.

x = 'Testing'
x2 = x.lower()
print(x2)
#output
'testing'

When you want to perform lowercasing on a data frame, use the apply function as follows.

df['tweet'] = df['tweet'].apply(lambda x: " ".join(x.lower() for x


in x.split()))
df['tweet']
#output
0 this is introduction to nlp
1 it is likely to be useful, to people
Discovering Diverse Content Through
Random Scribd Documents
We chose the former, large as our foe was; so the drum beat our
men to quarters; our guns were shotted; and, with about as much
hope of victory as a bantam might have in a contest with a game-
cock, we turned to meet the enemy.
She was evidently surprised at our action, and was, therefore, not
ready for the fight quite so soon as ourselves. That gave us a slight
advantage, and we poured a broadside into her before she fired a
gun.
But she soon made up for her delay, and for some minutes the
unequal contest waged. Great gaps were torn in our sides; our decks
were swept; our sails were riddled; a score or more of our men were
killed or wounded.
But the Hind had also suffered. Our guns had been aimed largely at
her rigging, for it was the hope of Captain Tucker to so disable her
that she would be unable to follow him, and then he would continue
his flight. The time for that movement now seemed to be ripe, for
her foremast had been shot away, and the spars of her mizzen and
mainmasts sadly injured. So he gave the command to sheer off and
sail away.
We were coming about when a well directed shot from the Britisher,
who had divined our purpose, struck our rudder, breaking it into
splinters and causing our frigate to spin around like a top. We were
helpless, and in another instant the Hind had grappled with us, and
poured a large boarding party down upon our deck.
There was a short hand-to-hand fight, and then, overpowered by
numbers, Captain Tucker did the only thing he could do to save the
remnant of his men—he surrendered.
Only thirty-eight of our crew were able to line up on the deck of the
Hind and answer to our names as they were called from the ship’s
roster; forty-five more of our men were alive but so severely
wounded they were under the surgeon’s care, while forty-two had
been slain.
The Englishmen had not passed through the struggle unscathed,
however. More than one hundred of them had been killed or
wounded, and it was clear from the deference shown us by Captain
Young that the battle we had been able to put up with our small
numbers had won his respect.
It seems to me now as I recall the fight that it was a singular
circumstance that both Captain Tucker and myself should have come
out of it unharmed. I know he was ever at the front of his men, and
I am not conscious that I in any way attempted to shield myself, yet
it remains a fact, unaccountable though it may be except on the
belief that an overruling Providence protected us, we had not
received the slightest injury.
Our brother officers had not fared so well. Lieutenant Barrows,
saved only a half-hour before from a watery grave, was one of the
first among us to be slain. Our third officer, and two of our five
midshipmen had been wounded, and one of our midshipmen killed.
There had been even greater havoc among our warrant officers, as
all but four had given up their lives in defense of the flag they loved
so well, and the four who survived were among the wounded.
Proud that his men had fought so well, yet grieved over the terrible
loss among them, Captain Tucker asked, after our names had been
taken, that we might be permitted to care for the injured—a request
firmly though courteously refused.
“It would be a departure from our usual custom,” Sir William said,
“but I promise you that they shall have the best care we can give
them, the care that such brave men deserve.”
For ourselves, men and officers alike, we were sent to the brig,
where we were closely guarded until the Hind could reach port in
the Isle of St. Jean, now Prince Edward Island.
As I lay there in the darkness of the hold, I wondered over the fact
that when I had responded to the name of Lieutenant Arthur Dunn,
the officer calling the roll had manifested no surprise or seemed to
attach no special significance to it. It was so different from the
treatment I had been accustomed to receive when the name was
given to the British officials, I could not help calling the attention of
the Captain to the circumstance.
“It may be they think you are some other Dunn,” he suggested. “It is
not an uncommon name, and the higher rank here, and reported
death at Charleston may help to conceal your identity.”
“They may forget, too, that I am five years older now than when I
first left the Saint George,” I added, “and so are looking for a
younger person.”
“Possibly,” he acknowledged, “though I think the official record of
your death does more to prevent the recognition than all else. But
whatever the reason for this failure to identify in the lieutenant the
runaway midshipman, let us be thankful for it. It will doubtless save
us many anxious moments in the days that are to come.”
We were forty-eight hours in the Hind, and then she arrived at
Charlottetown, where we were transferred to the garrison and put
under the care of the commandant, General William Patterson, who
was also the governor of the colony.
Within the walls of the fort there was a huge log building used as a
prison, and in this we were confined. For some reason never known
to us, our officers were now separated from our crew, the latter
being put in the large room with the other prisoners, while we were
given a small room directly back of the office of the prison overseer.
It may be the authorities thought we would be safer where the
superintendent could keep his eye on us.
Our confinement was irksome, but nothing like what I had
experienced in the Halifax prison a few years before. We had a clean
room, there was plenty of fresh air, good water, and wholesome,
though coarse food, and there was no disease. As the hot months
came on, however, the tediousness of our confinement grew upon
us. We became restless, and one day the Captain put into words
what for some time had been in the thoughts of us all:
“We can’t stand this much longer, lads. We must find a way to get
out of here, and back to our homes. If we are ever to do it, this is
the time. When the cold weather comes on everything will be
against the attempt.”
From that day we talked of nothing else, planned for nothing else.
It was the Captain who finally hit upon a scheme which we hoped
would succeed. Our room was in the south-west corner of the
prison, the west side of the apartment forming a part of the rear of
the building. This we knew could not be far from the west wall of the
fort, but as there was no window on that side we could not tell
exactly how far.
With a knife allowed us for cutting our food the Captain one day
made a small aperture between the logs which had been hewed so
smoothly as to fit tightly together. Placing his eye to this, he made
his own estimation of the distance to the wall, and then had each of
us in turn make his estimate. Comparing notes, we found we did not
differ two feet in our opinions of the distance—ten feet being the
longest amount guessed by any of us.
“Very well,” commented our leader, “we will now allow five feet for
the thickness of the wall, an ample allowance. That will make fifteen
feet from here to the outside—not a long distance, surely, and one
the six of us here ought to be able to tunnel in two or three weeks.”
“But to dig a tunnel we must get under the floor,” I objected. “How
are we to do that?”
Our berths were arranged in a double tier on the north side of the
room, the Captain occupying one of the lower ones and I the other.
In answer to my question he led us over there, and, removing the
blanket from his own berth, showed us how one of the bottom
boards had cracked in two under his weight.
“It broke just before I got up this morning,” he explained, “and when
I arose I took a look at it to see how serious the damage was. Then
I discovered this,—” as he spoke he bent the two ends of the board
downwards until they had parted several inches at the center, and
we all saw what he meant. The floor of the room did not extend
under the tier of berths, and we were looking down upon the bare
ground.
Of course, the broken board did not give us an aperture wide
enough even for the smallest of us to crawl through, but with the
knife that had served to make the small opening between the logs
we at length succeeded in cutting out the entire bottom of the
captain’s berth, and then any of us could crawl beneath the building
at his will.
Rude paddles were made from the pieces of boards we had
removed, and that night the tunnel was begun. I will not attempt to
describe the feverish anxiety with which we slowly dug our way
down the passage that we believed would finally give us the one
thing we desired above all others—our liberty.
We worked only through the night hours, carefully covering all traces
of our work during the day. First we sank a pit about four feet deep,
and large enough for us to turn around in. The dirt from this we
hoisted in a blanket and emptied it in the open space under the floor
of the building.
Then the real tunnel began. We made it big enough for the largest
man among us to crawl in and out easily. The dirt from this was
pushed back to the pit, from which it was removed to the open
space under the floor.
The work went slowly. We gained only about twelve inches each
night, and therefore over two weeks elapsed before we had gone
the fifteen feet which we had estimated would carry us beyond the
outer wall of the fort. All was now ready for our last task, the
making of the opening from the tunnel into the open air. We
reserved this for the last night—the night we hoped to escape, and
waited therefore for one that would be favorable in every respect for
the enterprise. It came on the last night of July, at the end of the
seventeenth day since we had begun the digging, a rainy, drizzly
evening when a dark pall hung over the fort and all its surroundings.
Captain Tucker had asked that his own hands might do the last
work, and about nine o’clock he entered the tunnel for that purpose.
Midshipman Lawrence attended him to draw back the blanket as he
filled it with dirt. The rest of us gathered about the inner opening,
waiting for the word that should send us down the passage one by
one—down the passage to the outer world from which we had been
shut off for weeks.
Three times Master Lawrence drew back the filled blanket for us to
empty. The third time he said:
“The Captain had one hand up through the surface when I started
back here. He says we are beyond the fort wall, and by the time we
can come out there one after the other, he will have the opening
large enough for us to pass through. So come on.”
The order of our going had been pre-arranged. I was to be the last.
One by one I saw my comrades go down the tunnel, and then I
entered. As rapidly as I could I crept along, touching now and then
the heels of the man in front of me. Then he rose to his feet, and I
knew we had reached the outlet. I could even feel the fresh air as it
blew down upon me. How good it felt! One quick spring and I would
be free!
An exclamation from the man in front of me as he went out of the
opening—an exclamation quickly smothered as it seemed to me—
reached my ear. I wondered what it could mean, but there was no
time now for investigation, nor even for hesitation. So up I arose,
placed my hands on the firm ground, and leaped out of the hole into
the arms of two British soldiers who were waiting to capture me.
CHAPTER XXIII
THE ESCAPE
I struggled with my captors for a while, not so much because I
expected to escape from them, but in hopes that I would thereby aid
my companions in their flight. For I could neither see nor hear
anything of them, and believed I was the only one who had been
seized by the British guards. At length I ceased my efforts, and,
yielding to the inevitable, let them lead me away. They conducted
me around the prison to its front entrance and took me into the
superintendent’s office, where to my amazement I beheld all my
comrades, each like myself in the grasp of two soldiers.
There was a broad grin on the face of the prison overseer as he
gazed at us, and then addressing himself to Captain Tucker, he said:
“It was a neat little game, Captain, I admit that, and with some it
might have succeeded, but not with me. Why, sir, let me show you
that I have known of your scheme from the beginning. See here—”
and turning to the partition back of his chair, he pushed aside an old
garment that was hanging there, disclosing a small aperture, about
the size of a walnut on that side of the wall, but tapering down to a
small point on the side of our room. “With the coat hanging there to
shut out the light,” he continued, “you did not notice the tiny
opening, and did not suspect that many times each day I either had
my eye upon you, or my ear was where I could hear all you were
saying,” and he glanced at his prisoner with a complacent air which
said: “I was more than a match for you.”
Then he went on:
“Oh! I knew you were a shrewd fellow, Captain Tucker, and had
outwitted more than one of our officers before now, but I was
determined you should not outwit me. So I put you and your
subalterns in there where I could literally keep you under my eye. I
saw you the night you cut out the boards of your berth, and
immediately suspected your plan, but purposely allowed you to go
on to the end. I was outside the prison watching when your hand
first broke through the surface. Then I called my men, for I had
arranged a little plan, too, and captured each one of you as you
came out of the tunnel, and marched you in here. I assure you the
joke is on you,” and he threw back his head and laughed
immoderately.
It was no laughing matter for us, however, and we were a crest-
fallen group as we stood there looking first at our captors and then
at each other, and realizing that our weeks of hard toil had availed
us nothing.
But the worst was yet to come.
“Do you know what I am going to do with you?” the jailer asked
when his laughter was over. “Of course you don’t, so I will tell you. I
am going to put you right back into that room tonight, and leave the
passage open, and you are at liberty to go out if you wish. Only
remember twelve good men are to be stationed outside with orders
to pick you off as you come out of your hole like so many
woodchucks,” and again he laughed as though he had perpetrated
another good joke.
Nor was he yet done.
“Tomorrow,” he added, “I shall have you fill up the hole you have
taken such pains to dig. It will be quite a job to put all that dirt back,
but since you thrived while digging it out, you doubtless will enjoy
putting it back. The additional exercise will be good for you,” and for
the third time he laughed heartily.
This is where the worst came in. He kept his word to the letter. Back
into the room we were marched and left to ourselves. There the
opening stared silently at us. We knew it led out into the open air,
but not one of us cared to make use of it; and the next morning
under a guard of soldiers we were forced to fill up the tunnel we had
been so long in digging.
The day after this enforced task was completed the overseer came
to our room. He looked us over quizzically, and then remarked:
“You look tired, gentlemen, and hardly as though you were in a good
condition for a long journey, and yet I am compelled to ask you to
take one. The governor seems to think you are going to be more of
a burden here than he cares to have on his hands, so he has
decided to send you down to Halifax. At sunrise tomorrow you will
start, and I wish you a pleasant journey, a safe arrival, and a long
stay in the stoutest jail we have in all the colonies,” and with mock
politeness he bowed himself out of our presence.
The sun was just peeping above the horizon the next morning when
we were taken down to the river and put on board an open boat,
already manned with an officer and ten men. The jailer himself had
accompanied us, and his directions to the lieutenant in whose care
he placed us were brief but to the point:
“Here are the prisoners, sir; and the governor says you are to deliver
them alive or dead to the governor at Halifax, and take a receipt for
them. It matters little the condition they are in—the point is to
deliver them, so you will know what to do if they attempt to make
you any trouble,” and the grin we had so often seen was again upon
his face.
Then the ropes were cast off, the sail was hoisted, and the voyage
begun—a voyage destined to have an outcome very different from
what anyone in the boat, or even the watching official on the shore,
expected.
The wind was from the north, and we soon ran out of Hillsborough
bay into Northumberland Strait, which we crossed to Cape St.
George, where we went on shore for dinner.
The officer in charge of us did not mean to give us any opportunity
either to run away from him or to overpower himself and men, for
the moment the boat touched the shore he marched us up to a large
tree not far from the beach. There he made us sit down, and placed
six men with loaded guns around us with orders to shoot us down if
we even attempted to rise, a thing we should have been glad to do
as the long hours in the boat had cramped our limbs and rendered
them stiff and uncomfortable.
Under his direction, the other four men built a fire, cooked the
dinner, and with himself partook of it. The four fed soldiers then
changed places with four of our guards, who had their rations. The
remaining two were then relieved for their repast. When they were
done a small amount of food was brought to us, but there was no
time during our halt when we were not under the guard of at least
six men, who had their muskets ready for instant use.
During the afternoon we rounded the Cape, and going down St.
George’s Bay, passed through the gut of Canso to Chedabucto Bay,
where we ran in to the Isle of Madame for the night. Within the
walls of the garrison and under a strong guard furnished by the
commander, we were kept securely until the morning, when our
journey was resumed.
So far there had been no opportunity for us to have a single word of
private conversation with one another, and if the same vigilance was
maintained by our guards, we certainly should not have one. No plan
for any concerted action towards our freedom could therefore be
arranged by us. Yet we all knew by the looks the Captain
occasionally gave us that he was watching for the moment when we
might make such an effort with some hope of success, and we were
all on the alert to assist him when such a move was made.
During the night the wind had whipped around, and now blew mildly
from the south. It took us some time, therefore, to beat out around
Cape Canso to the ocean, and when there what breeze there was
left us. For a long time we lay there, gently tossing on the ground
swell with the hot sun beating down upon our heads. The natural
effect was for us to grow drowsy, and after a while even the men
holding the guns were nodding sleepily.
When the lieutenant joined us in the morning he had the
appearance of a man who had been up a good part of the night at
his cups, and it now began to tell upon him. For a while he struggled
to keep awake, and then, handing over the tiller to one of his men
whom he sternly cautioned to keep a sharp lookout, he put himself
in as comfortable a position as possible, with his head on the
gunwale for a nap.
The heat had a similar effect on us Yankees, but we had an
inducement to keep awake the red-coats did not have. By a glance
at us Captain Tucker gave us to understand that the favorable
moment for our action was close at hand, and with the prospect of
our liberty before us we had no difficulty in keeping our eyes open.
Soon after the English officer dropped asleep, Captain Tucker
changed his own position in the boat to one near the sleeping man.
Here he assumed an easy posture as though he too would take a
nap, yet we knew he was awake and was preparing to act.
That move came, however, sooner than we looked for it and in a
way we had not expected. Catching the lieutenant suddenly by the
feet, he tumbled him overboard, and so adroitly was it done that to
all of his nodding men it had the appearance of an accidental fall
into the sea.
Captain Tucker’s next move also seemed to confirm this view.
Springing to his feet as though aroused by the splash, he called out
excitedly:
“Quick, men! Put out your sweeps! You must save him! I’ll steer!”
He took the tiller from the bewildered soldier, and again cried out for
the men to get out their oars.
In the excitement that followed—an excitement increased by the
unfortunate officer’s calls for help, for his sword and pistols were
weighing him down—the red-coats dropped their guns and put out
the oars. They were awkward about it, however, and the Captain so
managed the tiller that we were a few minutes in coming up with
the struggling man. Those few minutes were enough for us, his
comrades, to seize the discarded weapons. Dropping overboard all
but five, we so placed ourselves that, when the British officer was
drawn into the boat again, we were in command of it.
Under the stern orders of Captain Tucker, enforced by our loaded
muskets, the discomfited soldiers pulled to the shore where they
were disembarked.
“It cannot be far across the point to Canso, where you will find
friends,” the Captain announced when they were on the beach.
“Your boat and your provisions we shall need. Good-by,” and with a
bow as polite as that the British jailer had given us a day or two
before, he waved his hand for us to pull the craft out to sea.
Early in the afternoon the breeze sprang up again, and we headed
the sloop down the coast, homeward bound, for after some
discussion we decided to run the risk of a voyage in the open boat to
Boston.
In the month of August the sea is usually light and the weather
serene from Nova Scotia to Massachusetts Bay. We found it so now,
and on the seventeenth arrived in port without mishap.
Bidding good-bye to our comrades, the Captain and I repaired to
Marblehead, where we awaited the further orders of the Naval
Committee. But two months later Cornwallis surrendered at
Yorktown, and the war for the independence of the Colonies was
over.
The navy, therefore, no longer needed us, and we resigned our
commissions to go back to the foreign trade. For several years the
Captain ran a large ship to French and Spanish ports, on which I
served as first mate. Then I was given command of a brig in the
East India trade and the Captain and I did not see each other for
some years.
The War of 1812 sent us back to the navy in which he rose to the
rank of a Commodore, while I won a Captain’s commission. At its
close he retired to a farm he had purchased in Bristol, Maine, while I
again sailed for foreign ports.
It was never my good fortune to visit him in his new home but once;
but I have many times since stood by his grave and read the few
lines written on his tombstone, a just tribute to the man and his
service:
In Memory of
COMMODORE SAMUEL TUCKER
Who Died
March 10, 1833
A Patriot of the Revolution
To this I would personally add:
“And the truest friend I ever knew.”
The End.
FICTION FOR BOYS

LITTLE RHODY
By JEAN K. BAIRD
Illustrated by R. G. Vosburgh.
At The Hall, a boys’ school, there is a set of boys known
as the “Union of States,” to which admittance is gained by
excelling in some particular the boys deem worthy of their
mettle.
Rush Petriken, a hunchback boy, comes to The Hall, and
rooms with Barnes, the despair of the entire school
because of his prowess in athletics. Petriken idolizes him,
and when trouble comes to him, the poor crippled lad
gladly shoulders the blame, and is expelled. But shortly
before the end of the term he returns and is hailed as
“little Rhody,” the “capitalest State of all.”

CLOTH, 12 mo, illustrated,—$1.50

BIGELOW BOYS
By Mrs. A. F. RANSOM
Illustrated by Henry Miller
Four boys, all bubbling over with energy and love of good
times, and their mother, an authoress, make this story of
a street-car strike in one of our large cities move with
leaps and bounds. For it is due to the four boys that a
crowded theatre car is saved from being wrecked, and the
instigators of the plot captured.
Mrs. Ransom is widely known by her patriotic work among
the boys in the navy, and she now proves herself a friend
of the lads on land by writing more especially for them.

CLOTH, 12 mo, illustrated,—$1.50


Books sent postpaid on receipt of price.
THE BRADEN BOOKS

FAR PAST THE FRONTIER.


By JAMES A. BRADEN
The sub-title “Two Boy Pioneers” indicates the nature of
this story—that it has to do with the days when the Ohio
Valley and the Northwest country were sparsely settled.
Such a topic is an unfailing fund of interest to boys,
especially when involving a couple of stalwart young men
who leave the East to make their fortunes and to incur
untold dangers.
“Strong, vigorous, healthy, manly.”—Seattle Times.

CONNECTICUT BOYS IN
THE WESTERN RESERVE
By JAMES A. BRADEN
The author once more sends his heroes toward the setting
sun. “In all the glowing enthusiasm of youth, the
youngsters seek their fortunes in the great, fertile
wilderness of northern Ohio, and eventually achieve fair
success, though their progress is hindered and sometimes
halted by adventures innumerable. It is a lively,
wholesome tale, never dull, and absorbing in interest for
boys who love the fabled life of the frontier.”—Chicago
Tribune.
THE TRAIL of THE SENECA
By JAMES A. BRADEN
In which we follow the romantic careers of John Jerome
and Return Kingdom a little farther.
These two self-reliant boys are living peaceably in their
cabin on the Cuyahoga when an Indian warrior is found
dead in the woods nearby. The Seneca accuses John of
witchcraft. This means death at the stake if he is
captured. They decide that the Seneca’s charge is made to
shield himself, and set out to prove it. Mad Anthony, then
on the Ohio, comes to their aid, but all their efforts prove
futile and the lone cabin is found in ashes on their return.

CAPTIVES THREE
By JAMES A. BRADEN
A tale of frontier life, and how three children—two boys
and a girl—attempt to reach the settlements in a canoe,
but are captured by the Indians. A common enough
occurrence in the days of our great-grandfathers has been
woven into a thrilling story.

BOUND IN CLOTH, each handsomely


illustrated, cloth, postpaid—$1.00
BOOKS FOR BOYS
WINFIELD SERIES:
LARRY BARLOW’S AMBITION
A YOUNG INVENTOR’S PLUCK
These two books of adventure for boys, by the popular
author of the Rover Boys’ Series, have attained an enviable
reputation, and are read by thousands and thousands of
boys everywhere.

CASTLEMON SERIES:
A STRUGGLE FOR A FORTUNE
WINGED ARROW’S MEDICINE
THE FIRST CAPTURE
Harry Castlemon ranks among the best of the writers of
juvenile fiction. His various books are in constant and
large demand by the boys who have learned to look for
his name as author as a guaranty of a good story.

BONEHILL SERIES:
THE BOY LAND BOOMER
THREE YOUNG RANCHMEN
Stories of western life that are full of adventure, which
read as if they happened day before yesterday.
RATHBORNE SERIES:
DOWN THE AMAZON
ADRIFT ON A JUNK
YOUNG VOYAGERS OF THE NILE
YOUNG CASTAWAYS
For boys who have had their fill of adventures on land, the
Rathborne books are ever welcome. They make one feel
the salt breeze, and hear the shouts of the sailor boys.

OTIS SERIES:
TEDDY
TELEGRAPH TOM
MESSENGER No. 48
DOWN THE SLOPE
James Otis writes for wide-awake American boys, and his
audience read his tales with keen appreciation.

Each of the above books bound in


Cloth,
Illustrated, 12mos, postpaid,—$1.00

The Saalfield Publishing Co.,


AKRON, OHIO
TRANSCRIBER’S NOTES:
Obvious typographical errors have been corrected.
Inconsistencies in hyphenation have been
standardized.
Archaic or variant spelling has been retained.
*** END OF THE PROJECT GUTENBERG EBOOK IN SHIP AND
PRISON ***

Updated editions will replace the previous one—the old editions


will be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the


free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only


be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project


Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United


States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is


derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is


posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute


this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or


providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project


Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except


for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set


forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the


Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebooknice.com

You might also like