Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
81 views

Machine Learning Notes - Lec 05 - Text Claasification Using Scikit-Learn

This document outlines the steps for performing text classification using Scikit-Learn. It discusses preprocessing text data by generating n-grams from documents and labeling the data. Machine learning algorithms like Naive Bayes and logistic regression are trained on the preprocessed data. The trained models are then evaluated on test data to select the best performing model. This top model is later used to classify new unlabeled documents by extracting features from input text and making predictions. The goal is to predict the gender of authors based on Facebook comments.

Uploaded by

Zara Jamshaid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views

Machine Learning Notes - Lec 05 - Text Claasification Using Scikit-Learn

This document outlines the steps for performing text classification using Scikit-Learn. It discusses preprocessing text data by generating n-grams from documents and labeling the data. Machine learning algorithms like Naive Bayes and logistic regression are trained on the preprocessed data. The trained models are then evaluated on test data to select the best performing model. This top model is later used to classify new unlabeled documents by extracting features from input text and making predictions. The goal is to predict the gender of authors based on Facebook comments.

Uploaded by

Zara Jamshaid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Machine Learning

Lecture 05
Text Classification using Scikit-Learn

Dr. Rao Muhammad Adeel Nawab


How to Work

Dr. Rao Muhammad Adeel Nawab 2


Power of Dua

Dr. Rao Muhammad Adeel Nawab 3


Dua – Take Help from Allah before starting any task

Dr. Rao Muhammad Adeel Nawab 4


Course Focus
Mainly get EXCELLENCE in two things
1. Become a great human being
2. Become a great Machine Learning Engineer

To become a great human being


Get sincere with yourself
When you get sincere with yourself your ‫ ﺧﻠﻭﺕ‬and ‫ ﺟﻠﻭﺕ‬is the
same
Dr. Rao Muhammad Adeel Nawab 5
Introduction
Aim
The main aim of this tutorial is to predict the
gender of an author from his/her written text
using Scikit-Learn Machine Learning toolkit.
Task
Learn Input-Output Function
Given a Facebook comment (input) predict the
gender of the author (output) who wrote it.

Dr. Rao Muhammad Adeel Nawab 6


Introduction
Goal
The problem of gender prediction is treated as a
supervised learning problem.
We need
Labelled data
High quality data
Large amount of data

Dr. Rao Muhammad Adeel Nawab 7


Input and Output
Input
Facebook Comment - Plain text only
Represented as n-grams
Note that in this tutorial we will use word uni-grams (or
1-grams) as features to represent a Facebook comment
(or text)
Output
Gender of Author who wrote the comments
Represented as Gender attribute (Male/Female)
Dr. Rao Muhammad Adeel Nawab 8
N-gram Models
An n-gram is a contiguous sequence of n items from a
given sample of text
N-gram can be
Word based
Character based
N represents the length of N-gram

Dr. Rao Muhammad Adeel Nawab 9


Example – N-gram Generation from Input Text
Input Text
R u coming?
Word Uni-grams (N = 1)
Toeknized Text:
R
u
coming
?
Set of Word Uni-grams = {R, u, coming, ?}
Dr. Rao Muhammad Adeel Nawab 10
Example – N-gram Generation from Input Text
Input Text
R u coming?
Word Bi-grams (N = 2)
Toeknized Text:
R
u
coming
?
Set of Word Bi-grams = {R u, u coming, coming ?}
Dr. Rao Muhammad Adeel Nawab 11
Example – N-gram Generation from Input Text
Input Text
R u coming?
Word Tri-grams (N = 3)
Toeknized Text:
R
u
coming
?
Set of Word Tri-grams = {R u coming, u coming ?}
Dr. Rao Muhammad Adeel Nawab 12
Example – N-gram Generation from Input Text
Input Text
R u coming?
Character Tri-grams (N = 3)
Toeknized Text:
Note that space is also a character
R, ,u, ,c,o,m,I,n,g,?
Set of Character Tri-grams
{R u, u , co,com,omi,min,ing,gn?)
Dr. Rao Muhammad Adeel Nawab 13
Three Phases of Machine Learning

Use subset of data (called Train


Training data) to train model (learning)

Use subset of data (called Test


Testing Data) to evaluate train model

Use your learned/trained models


Application in real world applications

Dr. Rao Muhammad Adeel Nawab 14


PHASE 1 & 2: TRAINING AND TESTING
Step 1: Import Libraries

Step 2: Read, Understand and Pre-process Train/Test Data

Step 2.1: Read Data

Step 2.2: Understand Data

Step 2.3: Pre-process Data


Dr. Rao Muhammad Adeel Nawab 15
PHASE 1 & 2: TRAINING AND TESTING
Step 3: Label Encoding for Train/Test Data

Step 4: Feature Extraction – Changing Representation of Data


“from String to Vector”

Step 5: Train Machine Learning Algorithms using Training Data

Step 6: Evaluate Machine Learning Algorithms using Test Data

Step 7: Selection of Best Model


Dr. Rao Muhammad Adeel Nawab 16
PHASE 3: Application Phase

Step 8: Application Phase

Step 8.1: Combine Data (Train + Test )

Step 8.2: Train Best Model (see Step 7) on all data(Train + Test)

Step 8.3: Save the Trained Model as Pickle File

Dr. Rao Muhammad Adeel Nawab 17


PHASE 3: Application Phase

Step 9: Make prediction on unseen/new data

Step 9.1: Load the Trained Model (saved in Step 8.3)

Step 9.2: Take Input from User

Dr. Rao Muhammad Adeel Nawab 18


PHASE 3: Application Phase

Step 9.3: Convert User Input into Feature Vector (Same as


Feature Vector of Trained Model)

Step 9.4: Apply Trained Model on Feature Vector of Unseen


Data and Output Prediction (Male/Female) to User

Dr. Rao Muhammad Adeel Nawab 19


Step 1: Import Libraries

Dr. Rao Muhammad Adeel Nawab 20


Step 2: Read, Understand and Pre-process Train/Test
Data

Read, Understand and Pre-process Train/Test Data

Dr. Rao Muhammad Adeel Nawab 21


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 22


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 23


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 24


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 25


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 26


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 27


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 28


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 29


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 30


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 31


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 32


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 33


Step 2.2: Understand Data
Understanding Data via

GRAPH is easy.

Let’s Go!

Dr. Rao Muhammad Adeel Nawab 34


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 35


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 36


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 37


Step 2.2: Understand Data

Dr. Rao Muhammad Adeel Nawab 38


Step 2.3: Pre-Process Data

Dr. Rao Muhammad Adeel Nawab 39


Step 2.3: Pre-Process Data

Dr. Rao Muhammad Adeel Nawab 40


Step 2.3: Pre-Process Data

Dr. Rao Muhammad Adeel Nawab 41


Step 2.3: Pre-Process Data

Dr. Rao Muhammad Adeel Nawab 42


Step 2.3: Pre-Process Data

Dr. Rao Muhammad Adeel Nawab 43


Step 2.3: Pre-Process Data

Dr. Rao Muhammad Adeel Nawab 44


Please convert data to a
form that I can
understand

Dr. Rao Muhammad Adeel Nawab 45


Step 3: Label Encoding for Train/Test Data

Dr. Rao Muhammad Adeel Nawab 46


Step 3: Label Encoding for Train/Test Data

Dr. Rao Muhammad Adeel Nawab 47


Step 4: Feature Extraction
vect = CountVectorizer(
strip_accents='unicode',
analyzer='word',
token_pattern=r'\w{1,}',
stop_words='english',
ngram_range=(1, 1),
max_features=10)

print("Parameters of TfidfVectorizer and its values:\n\n")


print(vect)
Dr. Rao Muhammad Adeel Nawab 48
Step 4: Feature Extraction

Dr. Rao Muhammad Adeel Nawab 49


Step 4: Feature Extraction

Dr. Rao Muhammad Adeel Nawab 50


Step 4: Feature Extraction

Dr. Rao Muhammad Adeel Nawab 51


Step 4: Feature Extraction

Dr. Rao Muhammad Adeel Nawab 52


Step 4: Feature Extraction

Dr. Rao Muhammad Adeel Nawab 53


Step 4: Feature Extraction

Dr. Rao Muhammad Adeel Nawab 54


Step 4: Feature Extraction

Dr. Rao Muhammad Adeel Nawab 55


Train Machine
Learning Algorithms as
I Am doing.

Dr. Rao Muhammad Adeel Nawab 56


Step 5: Train ML Algorithms using Train Data

Dr. Rao Muhammad Adeel Nawab 57


Step 5: Train ML Algorithms using Train Data

Dr. Rao Muhammad Adeel Nawab 58


Step 5: Train ML Algorithms using Train Data

Dr. Rao Muhammad Adeel Nawab 59


Step 5: Train ML Algorithms using Train Data

Dr. Rao Muhammad Adeel Nawab 60


Step 6: Evaluate ML Algorithms using Test Data

Dr. Rao Muhammad Adeel Nawab 61


Step 6: Evaluate ML Algorithms using Test Data

Dr. Rao Muhammad Adeel Nawab 62


Step 6: Evaluate ML Algorithms using Test Data

Dr. Rao Muhammad Adeel Nawab 63


Step 6: Evaluate ML Algorithms using Test Data

Dr. Rao Muhammad Adeel Nawab 64


Step 6: Evaluate ML Algorithms using Test Data

Dr. Rao Muhammad Adeel Nawab 65


Step 6: Evaluate ML Algorithms using Test Data

Dr. Rao Muhammad Adeel Nawab 66


Step 6: Evaluate ML Algorithms using Test Data

Dr. Rao Muhammad Adeel Nawab 67


Step 6: Evaluate ML Algorithms using Test Data

Dr. Rao Muhammad Adeel Nawab 68


Step 7: Selection of Best Model

Dr. Rao Muhammad Adeel Nawab 69


Step 8: Application Phase

PHASE 3: APPLICATION PHASE

Dr. Rao Muhammad Adeel Nawab 70


Step 8.1: Combine Data (Train+Test)

Dr. Rao Muhammad Adeel Nawab 71


Step 8.2: Train Best Model on All Data

Dr. Rao Muhammad Adeel Nawab 72


Step 9.2: Take Input from User

Dr. Rao Muhammad Adeel Nawab 73


Step 9.3: Convert User Input into Feature Vector
(Same ss Feature Vector of Trained Model)

Dr. Rao Muhammad Adeel Nawab 74


Step 9.4: Apply Trained Model on Feature Vector of Unseen
Data and Output Prediction to User

Dr. Rao Muhammad Adeel Nawab 75

You might also like