ML Chapter 1
ML Chapter 1
ML Chapter 1
For example, your spam filter is a Machine Learning program that can learn to flag spam given
examples of spam emails (e.g., flagged by users) and examples of regular (nonspam, also called
“ham”) emails. The examples that the system uses to learn are called the training set. Each training
example is called a training instance (or sample). In this case, the task T is to flag spam for new
emails, the experience E is the training data, and the performance measure P needs to be defined;
for example, you can use the ratio of correctly classified emails. This particular performance
measure is called accuracy and it is often used in classification tasks. If you just download a copy
of Wikipedia, your computer has a lot more data, but it is not suddenly better at any task. Thus, it
is not Machine Learning.
A spam filter based on Machine Learning techniques automatically learns which words and
phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam
examples compared to the ham examples (Figure 1-1 ML Approach).
1
To summarize, Machine Learning is great for:
Problems for which existing solutions require a lot of hand-tuning or long lists of rules:
one Machine Learning algorithm can often simplify code and perform better.
Complex problems for which there is no good solution at all using a traditional approach:
the best Machine Learning techniques can find a solution.
Fluctuating environments: a Machine Learning system can adapt to new data.
Getting insights about complex problems and large amounts of data.
As a scientific endeavour, machine learning grew out of the quest for artificial intelligence.
Already in the early days of AI as an academic discipline, some re-searchers were interested in
having machines learn from data. They attempted to approach the problem with various symbolic
methods, as well as what were then termed "neural networks"; these were mostly perceptrons and
other models that were later found to be reinventions of the generalized linear models of statistics.
Probabilistic reasoning was also employed, especially in automated medical diagnosis.
However, an increasing emphasis on the logical, knowledge- based approach caused a rift between
AI and machine learning. Probabilistic systems were plagued by theoretical and practical problems
of data acquisition and representation. By 1980, expert systems had
come to dominate AI, and statistics was out of favor. Work on symbolic/knowledge-based learning
did continue within AI, leading to inductive logic programming, but the more statistical line of
research was now out-side the field of AI proper, in pattern recognition and
information retrieval. Neural networks re-search had been abandoned by AI and computer science
around the same time. This line, too, was continued out-side the AI/CS field, as "connectionism",
by researchers from other disciplines including Hopfield, umelhart and Hinton. Their main success
came in the mid-1980s with the reinvention of backpropagation.
Machine learning, reorganized as a separate field, started to flourish in the 1990s. The field
changed its goal from achieving artificial intelligence to tackling solvable problems of a
practical nature. It shifted focus away from the symbolic approaches it had inherited from AI,
and toward methods and models borrowed from statistics and probability theory. It also
benefited from the increasing availability of digitized information, and the possibility to
distribute that via the internet.
Machine learning and data mining often employ the same methods and overlap significantly.
They can be roughly distinguished as follows:
Machine learning focuses on prediction, based on known properties learned from the
training data.
Data mining focuses on the discovery of (previously) unknown properties in the data.
This is the analysis step of Knowledge Discovery in Databases.
The two areas overlap in many ways: data mining uses many machine learning methods, but
often with a slightly different goal in mind. On the other hand, machine learning also employs
data mining methods as “unsupervised learning” or as a preprocessing step to improve learner
2
accuracy. Much of the confusion between these two research communities (which do often
have separate conferences and separate journals, ECML PKDD being a major exception)
comes from the basic assumptions they work with: in machine learning, performance is usually
evaluated with respect to the ability to re-produce known knowledge, while in Knowledge
Discovery and Data Mining (KDD) the key task is the discovery of previously unknown
knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised)
method will easily be outperformed by supervised methods, while in a typical KDD task,
supervised methods cannot be used due to the unavailability of training data.
Machine learning also has intimate ties to optimization: many learning problems are
formulated as minimization of some loss function on a training set of examples. Loss functions
express the discrepancy between the predictions of the model being trained and the actual
problem instances (for example, in classification, one wants to assign a label to instances, and
models are trained to correctly predict the pre-assigned labels of a set examples). The
difference between the two fields arises from the goal of generalization: while optimization
algorithms can minimize the loss on a training set, machine learning is concerned with
minimizing the loss on unseen samples.
Supervised/Unsupervised Learning
Machine Learning systems can be classified according to the amount and type of supervision they
get during training. There are four major categories: supervised learning, unsupervised learning,
semisupervised learning, and Reinforcement Learning.
Supervised learning
Supervised learning algorithms and supervised learning models make predictions based on labeled
training data. Each training sample includes an input and a desired output. A supervised learning
algorithm analyzes this sample data and makes an inference – basically, an educated guess when
determining the labels for unseen data. This is the most common and popular approach to machine
learning. It’s “supervised” because these models need to be fed manually tagged sample data to
learn from. Data is labeled to tell the machine what patterns (similar words and images, data
categories, etc.) it should be looking for and recognize connections with.
3
In supervised learning, the training data you feed to the algorithm includes the desired solutions,
called labels (Figure 1-2).
Figure 1-2. A labeled training set for supervised learning (e.g., spam classification)
A typical supervised learning task is classification. The spam filter is a good example of this: it is
trained with many example emails along with their class (spam or ham), and it must learn how to
classify new emails.
Supervised learning problems can be further grouped into regression and classification problems.
Unsupervised Learning
Unsupervised learning algorithms uncover insights and relationships in unlabeled data. In this
case, models are fed input data but the desired outcomes are unknown, so they have to make
inferences based on circumstantial evidence, without any guidance or training. The models are not
trained with the “right answer,” so they must find patterns on their own. One of the most common
types of unsupervised learning is clustering, which consists of grouping similar data. This method
is mostly used for exploratory analysis and can help you detect hidden patterns or trends.
Unsupervised learning problems can be further grouped into clustering and association problems.
Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.
4
Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
Semi-Supervised Learning
In semi-supervised learning, training data is split into two. A small amount of labeled data and a
larger set of unlabeled data. In this case, the model uses labeled data as an input to make inferences
about the unlabeled data, providing more accurate results than regular supervised-learning models.
This approach is gaining popularity, especially for tasks involving large datasets such as image
classification. Semi-supervised learning doesn’t require a large number of labeled data, so it’s
faster to set up, more cost-effective than supervised learning methods, and ideal for businesses that
receive huge amounts of data.
Reinforcement Learning
Reinforcement learning (RL) is concerned with how a software agent (or computer program) ought
to act in a situation to maximize the reward. Reinforced machine learning models attempt to
determine the best possible path they should take in a given situation. They do this through trial
and error. Since there is no training data, machines learn from their own mistakes and choose the
actions that lead to the best solution or maximum reward.
This machine learning method is mostly used in robotics and gaming. Video games demonstrate a
clear relationship between actions and results, and can measure success by keeping score.
Therefore, they’re a great way to improve reinforcement learning algorithms.
5
Before discussing the 4 math skills needed in machine learning, let’s first of all describe the
machine learning process. The machine learning process includes 4 main stages:
1. Problem Framing: This is where you decide what kind of problem are you trying to solve e.g.
model to classify emails as spam or not spam, model to classify tumor cells as malignant or benign,
model to improve customer experience by routing calls into different categories so that calls can
be answered by personnel with the right expertise, model to predict if a loan will charge off after
the duration of the loan, model to predict price of a house based on different features or predictors,
and so on.
2. Data Analysis: This is where you handle the data available for building the model. It includes
data visualization of features, handling missing data, handling categorical data, encoding class
labels, normalization, and standardization of features, feature engineering, dimensionality
reduction, data partitioning into training, validation and testing sets, etc.
3. Model Building: This is where you select the model that you would like to use, e.g. linear
regression, logistic regression, KNN, SVM, K-means, Monte Carlo simulation, time series
analysis, etc. The data set has to be divided into training, validation, and test sets. Hyperparameter
tuning is used to fine tune the model in order to prevent overfitting. Cross-validation is performed
to ensure the model performs well on the validation set. After fine-tuning model parameters, the
model is applied to the test data set. The model’s performance on the test data set is approximately
equal to what would be expected when the model is used for making predictions on unseen data.
4. Application: In this stage, the final machine learning model is put into production to start
improving the customer experience or increasing productivity, or deciding if a bank should
approve credit to a borrower, etc. The model is evaluated in a production setting in order to assess
its performance. This can be done by comparing the performance of the machine learning solution
against a baseline or control solution using methods such as A/B testing. Any mistakes encountered
when transforming from an experimental model to its actual performance on the production line
has to be analyzed. This can then be used in fine-tuning the original model.
Most of the math skills you need for building a machine learning model are used in stages 2, 3,
and 4, which is Data Analysis, Model Building, and Application.
6
The 4 stat and Math Skills for Machine Learning
1. Mean
2. Median
3. Mode
4. Standard deviation/variance
5. Correlation coefficient and the covariance matrix
6. Probability distributions (Binomial, Poisson, Normal)
7. p-value
8. Baye’s Theorem (Precision, Recall, Positive Predictive Value, Negative Predictive Value,
Confusion Matrix, ROC Curve)
9. A/B Testing
10. Monte Carlo Simulation
Mean : Mean is also known as average of all the numbers in the data set which is calculated by
below equation.
Heights = [168,170,150,160,182,140,175,191,152,150]
7
Median : Median is mid value in this ordered data set.
Arrange the data in the increasing order and then find the mid value.
If we have even number of values in the data set then median is sum of mid two numbers divided
by 2
8
In we have odd number in the data set like below we have 9 heights the median will be 5th
number value.
Mode : Mode is the number which occur most often in the data set. Here 150 is occurring twice
so this is our mode.
9
Variance: Variance is the numerical values that describe the variability of the observations from
its arithmetic mean and denoted by sigma-squared (σ2 )
Variance measure how far individuals in the group are spread out, in the set of data from the
mean.
Where
Step 1: This formula says that take each element from dataset(population) and subtract from
mean of data set. Later sum all the values.
Step 2: Take the sum in Step 1 and divide by total number of elements.
Square in the above formula will nullify the effect of negative sign(-)
10
Standard Deviation : It is a measure of dispersion of observation within dataset relative to their
mean.It is square root of the variance and denoted by Sigma (σ) .
Standard deviation is expressed in the same unit as the values in the dataset so it measure how
much observations of the data set differs from its mean.
11
Understanding Variance, Covariance, and Correlation
One of the topics that a data scientist must understand is the relationships that exist in your
dataset. Before you start the machine learning process, it is critical to prepare your data so that
only the relevant parts of your dataset is used for training. To understand the relationships in
your dataset, you need to understand the following concepts:
Variance
Covariance
Correlation
As usual, my aim is to make it easy for you to digest these topics. Let’s begin!
import pandas as pd
import numpy as npdf = pd.DataFrame({
'a':[1,3,4,6,8],
'b':[2,3,5,6,8],
'c':[6,5,4,3,2],
'd':[5,4,3,4,6]
})
df
Variance: Variance is the spread of values in a dataset around its mean value. It tells you how
far each number in the dataset is from its mean. The formula for variance (s²) is defined as above:
12
Covariance: Now that you have seen the variances of each columns, it is now time to see how
columns relate to each other. While variance measures the spread of data within its mean value,
covariance measures the relationalship between two random variables.
In statistics, covariance is the measure of the directional relationship between two random
variables.
Let’s plot a scatter plot to see how the columns in our dataframe relate to each other. We shall start
with the a and b columns first:
As you can see, there seems to be a trend between a and b — as a increases, so does b.
In statistics, a and b are known to have a positive covariance. A positive covariance indicates
that both random variables tend to move upward or downward at the same time.
plt.scatter(df['b'], df['c'])
plt.xlabel('b')
plt.ylabel('c')
13
This time round, the trend seems to go the other way — as b increases, c decreases.
In statistics, b and c are known to have a negative covariance. A negative covariance indicates
that both variables tend to move away from each other — when one moves upward the other
moves downward, and vice versa.
plt.scatter(df['c'], df['d'])
plt.xlabel('c')
plt.ylabel('d')
In statistics, c and d are known to have zero covariance (or close to zero). When two random
variables are independent, the covariance will be zero. However, the reverse is not necessarily
true — a covariance of zero does not mean that 2 random variables are independent (a non-linear
14
relationship can still exist between 2 random variables that has zero covariance). In the above
example, you can see that there exists some sort of non-linear v-shape relationship.
Covariance between 2 random variables is calculated by taking the product of the difference
between the value of each random variable and its mean, summing all the products, and finally
dividing it by the number of values in the dataset.
As usual, let’s calculate the covariance between a and b manually using NumPy:
Like variance, NumPy has the cov() function to calculate covariance of two random variables
directly:
np.cov(df['a'],df['b'])
# array([[7.3 , 6.35],
# [6.35, 5.7 ]])
The output of the cov() function is a 2D array containing the following values:
np.cov(df['b'], df['c'])
# array([[ 5.7 , -3.75],
# [-3.75, 2.5 ]])
15
While the covariance measures the directional relationship between 2 random variables, it does
not show the strength of the relationship between the 2 random variables. Its value is not
constrained, and can be from -infinity to +infinity.
Also, covariance is dependent on the scale of the values. For example, if you double each value
in columns a and b, you will get a different covariance:
np.cov(df['a']*2, df['b']*2)
# array([[29.2, 25.4],
# [25.4, 22.8]])
A much better way to measure the strength of two random variables is correlation.
Correlation
The correlation between two random variables measures both the strength and direction of a
linear relationship that exists between them. There are two ways to measure correlation:
Pearson Correlation Coefficient — captures the strength and direction of the linear
association between two continuous variables
Spearman’s Rank Correlation Coefficient—determines the strength and direction of
the monotonic relationship which exists between two ordinal (categorical) or continuous
variables.
The Pearson Correlation Coefficient is defined to be the covariance of x and y divided by the
product of each random variable’s standard deviation.
Substituting the formula for convariance and standard deviation for x and y, you have:
16
Simplifying, the formula now looks like this:
Pandas have a function corr() that calculates the correlation of columns in a dataframe:
df[['a','b']].corr()
The diagonal values of 1 indicates the correlation of each column to itself. Obviously, the
correlation of a to a itself is 1, and so is that for column b. The value of 0.984407 is the Pearson
correlation coefficient of a and b.
df[['b','c']].corr()
df[['c','d']].corr()
17
Like covariance, the sign of the pearson correlation coefficient indicates the direction of the
relationship. However, the values of the Pearson correlation coefficient is constrained to be
between -1 and 1. Based on the value, you can deduce the following degrees of correlation:
From the above results, you can see that a,b, and b,c have high degrees of correlation, while c,d
have very low degree of correlation.
Understanding the correlations between the various columns in your dataset is an important part
of the process of preparing your data for machine learning. You want to train your model using
the columns that has the highest correlation with the label of your dataset.
Unlike covariance, correlation is not affected by the scale of the values. As an experiment,
multiply columns a and b and you find their correlation:
If your data is not linearly distributed, you should use Spearman’s Rank Correlation Coefficient
instead of the Pearson Correlation Coefficient. The Spearman’s Rank Correlation Coefficient
is designed for distributions that are monotonic.
In algebra, a montonic function is a function whose gradient never changes sign. In other words,
it is a function which is either always increasing or decreasing. The following first two figures
18
are monotonic, while the third is not (since the gradient changes sign a few times going from left
to right).
Where d is the difference in rank between the 2 random variables. An example will make it
clear.
df = pd.DataFrame({
'math' :[78,89,75,67,60,58,71],
'science':[91,92,90,80,60,56,84]
})
df
19
plt.scatter(df['math'], df['science'])
plt.xlabel('math')
plt.ylabel('science')
And this is looks like a monotonic distribution. The next step is to rank the scores using the rank()
function in Pandas:
df['math_rank'] = df['math'].rank(ascending=False)
df['science_rank'] = df['science'].rank(ascending=False)
df
You now have two additional columns containing the ranks for each subject:
Let’s also create another two new columns to store the differences between the ranks and its
squared values:
20
You are now ready to calculate the Spearman’s Rank Correlation Coefficient using the
formula defined earlier:
n = df.shape[0]
p = 1 - ((6 * df['diff_sq'].sum()) / (n * (n**2 - 1)))
p # 1.0
And you get a perfect 1.0! Of course, to spare you all the effort in calculating the Spearman’s
Rank Correlation Coefficient manually, you can use the corr() function and specify
‘spearman’ for the method parameter:
df[['math','science']].corr(method='spearman')
Note that the formula for Spearman’s Rank Correlation Coefficient that I have just listed above
is for cases where you have distinct ranks (meaning there is no tie in either math or science scores).
In the event of tied ranks, the formula is a little more complicated.
21
Understanding P-values
The p-value is a number, calculated from a statistical test, that describes how likely you are to have
found a particular set of observations if the null hypothesis were true. P-values are used in
hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p-value,
the more likely you are to reject the null hypothesis.
All statistical tests have a null hypothesis. For most tests, the null hypothesis is that there is no
relationship between your variables of interest or that there is no difference among groups.
The alternate hypothesis (Ha) is usually your initial hypothesis that predicts a relationship
between variables. The null hypothesis (Ho) is a prediction of no relationship between the
variables you are interested in.
You want to test whether there is a relationship between gender and height. Based on your
knowledge of human physiology, you formulate a hypothesis that men are, on average, taller than
women. To test this hypothesis, you restate it as:
The p-value, or probability value, tells you how likely it is that your data could have occurred
under the null hypothesis. It does this by calculating the likelihood of your test statistic, which is
the number calculated by a statistical test using your data.
The p-value tells you how often you would expect to see a test statistic as extreme or more extreme
than the one calculated by your statistical test if the null hypothesis of that test was true. The p-
value gets smaller as the test statistic calculated from your data gets further away from the range
of test statistics predicted by the null hypothesis.
22
Normal Distribution
Normal Distribution is often called a bell curve and is broadly utilized in statistics, business
settings, and government entities such as the FDA. It’s widely recognized as being a grading
system for tests such as the SAT and ACT in high school or GRE for graduate students.
Example:
Formula Values:
Use the following formula to convert a raw data value ‘X’ to a standard score ‘Z’.
Assuming a specific population has = 4, and = 2. For example, finding the probability of the
randomly selected value being greater than 6 would resemble the following formula:
The Z score corresponding to X = 6 will be:
Z = 1 means that the value of X = 6 which is 1 standard deviation above the mean.
Business Applications
Can be utilized to model risks and following the distribution of likely outcomes for certain events,
like the amount of next month’s revenue from a specific service.
Process variations in operations management are sometimes normally distributed, as is employee
performance in Human Resource Management.
Human Resource management applies Normal Distribution to employee performance.
23
Binomial Distribution
Binomial Distribution is considered the likelihood of a pass or fail outcome in a survey or
experiment that is replicated numerous times. There are only two potential outcomes for this type
of distribution, like a True or False, or Heads or Tails, for example.
Formula Values:
x: Number of successes
X: Random variable
C: Combination of x successes from n trials
p: Probability of success
(n - x): Number of failures
(1 - p): Probability of failure
Assuming that 15% of changing street lights records a car running a red light, and the
data has a binomial distribution.
The formula used to determine the probability that exactly 3 cars will run a red light in 20
light changes would be as follows: P = 0.15, n = 20, X = 3
Apply the formula, substituting these values: P = (X-3) = 20 C3 X 0.153 * 0.8517 =
0.243
Therefore, the probability of 3 cars running a red light in 20 light changes would be 0.24,
or 24%.
24
Business Applications
Banks and other financial institutions use Binomial Distribution to determine the
likelihood of borrowers defaulting, and apply the number towards pricing insurance, and
figuring out how much money to keep in reserve, or how much to loan.
Poisson Distribution
The probability of events occurring at a specific time is Poisson Distribution. In other words,
when you are aware of how often the event happened, Poisson Distribution can be used to
predict how often that event will occur. It provides the likelihood of a given number of events
occurring in a set period.
Formula Values:
Here, λ = 5, and x = 3
Therefore, there’s a 14% chance that there will be exactly three accidents there this year.
25
Business Applications
Predicting customer sales on particular days/times of the year.
Supply and demand estimations to help with stocking products.
Service industries can prepare for an influx of customers, hire temporary help, order
additional supplies, and make alternative plans to reroute customers if needed.
Multivariable calculus
Multivariable calculus (also known as multivariate calculus) is the extension of calculus in one
variable to calculus with functions of several variables: the differentiation and integration of
functions involving several variables, rather than just one.
Multivariable calculus may be thought of as an elementary part of advanced calculus It deals with
the functions of multiple variables, whereas single variable calculus deals with the function of one
variable. The differentiation and integration process are similar to the single variable calculus. In
multivariable calculus, to find a partial derivative, first, take the derivative of the appropriate
variable while holding the other variables as constant. It majorly deals with three-dimensional
objects or higher dimensions. The typical operations involved in the multivariable calculus are:
One of the core tools of Applied Mathematics is multivariable calculus. It is used in various
fields such as Economics, Engineering, Physical Science, Computer Graphics, and so on. Some
of the applications of multivariable calculus are as follows:
26
In regression analysis, it helps to derive the formulas to estimate the relationship among
the set of empirical data.
In Engineering and Social Science, it helps to study and model the high dimensional
systems that exhibit the deterministic nature.
In Finance, Quantitative Analyst uses multivariable calculus to predict future trends in
the stock market.
Linear algebra is the most important math skill in machine learning. A data set is represented as a
matrix. Linear algebra is used in data preprocessing, data transformation, and model evaluation.
Here are the topics you need to be familiar with:
1. Vectors
2. Matrices
3. Transpose of a matrix
4. The inverse of a matrix
5. The determinant of a matrix
6. Dot product
Definition: A vector is a list of numbers. There are (at least) two ways to interpret what this list
of numbers mean: One way to think of the vector as being a point in a space. Then this list of
numbers is a way of identifying that point in space, where each number represents the vector’s
component that dimension. Another way to think of a vector is a magnitude and a direction, e.g. a
quantity like velocity (“the fighter jet’s velocity is 250 mph north-by-northwest”). In this way of
think of it, a vector is a directed arrow pointing from the origin to the end point given by the list
of numbers.
The “magnitude” of a vector is the distance from the endpoint of the vector to the origin – in a
word, it’s length.
Definition: A unit vector is a vector of magnitude 1. Unit vectors can be used to express the
direction of a vector independent of its magnitude.
Matrix
A matrix is a two-dimensional array that has a fixed number of rows and columns and contains a
number at the intersection of each row and column. A matrix is usually delimited by square
brackets. Example Here is an example of a matrix having two rows and two columns:
27
Dimension of a matrix: If a matrix has K rows and L columns, we say that it has dimension
$K x L$, or that it is a $K x L$, matrix.
Ex The matrix has 2 rows and 3 columns. So, we say that A is a 2 x 3 matrix.
Note:
If a matrix has only one row or only one column it is called a vector.
A matrix having only one row is called a row vector.
A matrix having only one column is called a column vector.
A matrix having only one row and one column is called a scalar.
Equal matrices: Two matrices and having the same dimension are said to be equal if and only if all
their corresponding elements are equal to each other
Zero matrices: A matrix is a zero matrix if all its elements are equal to zero, and we write matrix
A=0; then
Square matrices: A matrix is called a square matrix if the number of its rows is the same as the
number of its columns.
Identity matrix: A square matrix is called an identity matrix if all its diagonal elements are
equal to 1 and all its off-diagonal elements are equal to 0. It is usually indicated by the letter I.
Transpose of a matrix
If A is a K X L matrix, its transpose, denoted by AT, is the L x K matrix such that the (l,k)-th element
of AT is equal to the (k,l)-th element of A.
28
Symmetric matrices: A square matrix is said to be symmetric if it is equal to its transpose.
In summary, we’ve discussed the essential math skills that are needed for building a machine
learning model. There are several free online courses that will teach you the necessary math
skills that you need for building a machine learning model.
29
5. Applications of Machine Learning
There are many different applications of machine learning, which can benefit your business in
countless ways. You’ll just need to define a strategy to help you decide the best way to implement
machine learning into your existing processes. In the meantime, here are some common machine
learning use cases and applications that might spark some ideas:
Question: Discuss the how a machine learning approach could be used in the above listed
applications.
--------------------------------------------------------------------------------------------------------------------
Machine learning is the most algorithm-intense field in computer science. Gone are those days
when people had to code all algorithms for machine learning. Thanks to Python and it’s libraries,
modules, and frameworks.
Python machine learning libraries have grown to become the most preferred language for machine
learning algorithm implementations. Learning Python is essential to master data science and
machine learning. Let’s have a look at the main Python libraries used for machine learning.
1) NumPy
With NumPy, you can define arbitrary data types and easily integrate with most databases. NumPy
can also serve as an efficient multi-dimensional container for any generic data that is in any
30
datatype. The key features of NumPy include powerful N-dimensional array object, broadcasting
functions, and out-of-box tools to integrate C/C++ and Fortran code.
2) SciPy
With machine learning growing at supersonic speed, many Python developers were creating
python libraries for machine learning, especially for scientific and analytical computing. Travis
Oliphant, Eric Jones, and Pearu Peterson in 2001 decided to merge most of these bits and pieces
codes and standardize it. The resulting library was then named as SciPy library.
The current development of the SciPy library is supported and sponsored by an open community
of developers and distributed under the free BSD license.
The SciPy library offers modules for linear algebra, image optimization, integration interpolation,
special functions, Fast Fourier transform, signal and image processing, Ordinary Differential
Equation (ODE) solving, and other computational tasks in science and analytics.
The underlying data structure used by SciPy is a multi-dimensional array provided by the NumPy
module. SciPy depends on NumPy for the array manipulation subroutines. The SciPy library was
built to work with NumPy arrays along with providing user-friendly and efficient numerical
functions.
One of the unique features of SciPy is that its functions are useful in maths and other sciences.
Some of its extensively used functions are optimization functions, statistical functions, and signal
processing. It supports functions for finding the numerical solute to integrals. So you can solve
differential equations and optimization.
The following areas of SciPy’s applications make it one of the popular machine learning
libraries.
31
Solves Fourier transforms, and differential equations
Its optimized algorithms help you to efficiently and reliably perform linear algebra
calculations
3) Scikit-learn
Scikit Learn is perhaps the most popular library for Machine Learning. It provides almost every
popular model – Linear Regression, Lasso-Ridge, Logistics Regression, Decision Trees, SVMs and
a lot more.
In 2007, David Cournapeau developed the Scikit-learn library as part of the Google Summer of
Code project. In 2010 INRIA involved and did the public release in January 2010. Skikit-learn was
built on top of two Python libraries – NumPy and SciPy and has become the most popular Python
machine learning library for developing machine learning algorithms.
Scikit-learn has a wide range of supervised and unsupervised learning algorithms that works on a
consistent interface in Python. The library can also be used for data-mining and data analysis. The
main machine learning functions that the Scikit-learn library can handle are classification,
regression, clustering, dimensionality reduction, model selection, and preprocessing.
Many ML enthusiasts and data scientists use scikit-learn in their AI journey. Essentially, it is an
all-inclusive machine learning framework. Occasionally, many people overlook it because of the
prevalence of more cutting-edge Python libraries and frameworks. However, it is still a powerful
library and efficiently solves complex Machine Learning tasks.
The following features of scikit-learn make it one of the best machine learning libraries in
Python:
4) TensorFlow
TensorFlow was developed for Google’s internal use by the Google Brain team. Its first release
came in November 2015 under Apache License 2.0. TensorFlow is a popular computational
framework for creating machine learning models. TensorFlow supports a variety of different
toolkits for constructing models at varying levels of abstraction.
TensorFlow exposes a very stable Python and C++ APIs. It can expose, backward compatible APIs
for other languages too, but they might be unstable. TensorFlow has a flexible architecture with
which it can run on a variety of computational platforms CPUs, GPUs, and TPUs. TPU stands for
32
Tensor processing unit, a hardware chip built around TensorFlow for machine learning and
artificial intelligence.
The following key features of TensorFlow make it one of the best machine learning libraries
Python:
Comprehensive control on developing a machine learning model and robust neural network
Deploy models on cloud, web, mobile, or edge devices through TFX, TensorFlow.js, and
TensorFlow Lite
Supports abundant extensions and libraries for solving complex problems
Supports different tools for integration of Responsible AI and ML solutions
5) Keras
Keras has over 200,000 users as of November 2017. Keras is an open-source library used for neural
networks and machine learning. Keras can run on top of TensorFlow, Theano, Microsoft Cognitive
Toolkit, R, or PlaidML. Keras also can run efficiently on CPU and GPU.
Keras works with neural-network building blocks like layers, objectives, activation functions, and
optimizers. Keras also have a bunch of features to work on images and text images that comes
handy when writing Deep Neural Network code.
Apart from the standard neural network, Keras supports convolutional and recurrent neural
networks.
It was released in 2015 and by now, it is a cutting-edge open-source Python deep learning
framework and API. It is identical to Tensorflow in several aspects. But it is designed with a
human-based approach to make DL and ML accessible and easy for everybody.
You can conclude that Keras is one of the versatile machine learning libraries Python because
it includes:
33
6) PyTorch
PyTorch has a range of tools and libraries that support computer vision, machine learning, and
natural language processing. The PyTorch library is open-source and is based on the Torch library.
The most significant advantage of PyTorch library is it’s ease of learning and using.
PyTorch can smoothly integrate with the python data science stack, including NumPy. You will
hardly make out a difference between NumPy and PyTorch. PyTorch also allows developers to
perform computations on Tensors. PyTorch has a robust framework to build computational graphs
on the go and even change them in runtime. Other advantages of PyTorch include multi GPU
support, simplified preprocessors, and custom data loaders.
Facebook released PyTorch as a powerful competitor of TensorFlow in 2016. It has now attained
huge popularity among deep learning and machine learning researchers. Various aspects of
PyTorch suggest that it is one of the outstanding Python libraries for machine learning. Here
are some of its key capabilities.
7) Pandas
In simple terms, Pandas is the Python equivalent of Microsoft Excel. Whenever you have tabular
data, you should consider using Pandas to handle it.
Pandas are turning up to be the most popular Python library that is used for data analysis with
support for fast, flexible, and expressive data structures designed to work on both “relational” or
“labeled” data. Pandas today is an inevitable library for solving practical, real-world data analysis
in Python. Pandas is highly stable, providing highly optimized performance. The backend code is
purely written in C or Python.
Series (1-dimensional)
DataFrame (2-dimensional)
These two put together can handle a vast majority of data requirements and use cases from most
sectors like science, statistics, social, finance, and of course, analytics and other areas of
engineering.
34
Pandas support and perform well with different kinds of data including the below :
Tabular data with columns of heterogeneous data. For instance, consider the data coming
from the SQL table or Excel spreadsheet.
Ordered and unordered time series data. The frequency of time series need not be fixed,
unlike other libraries and tools. Pandas is exceptionally robust in handling uneven time-
series data
Arbitrary matrix data with the homogeneous or heterogeneous type of data in the rows and
columns
Any other form of statistical or observational data sets. The data need not be labeled at all.
Pandas data structure can process it even without labeling.
It was launched as an open-source Python library in 2009. Currently, it has become one of the
favourite Python libraries for machine learning among many ML enthusiasts. The reason is it
offers some robust techniques for data analysis and data manipulation. This library is extensively
used in academia. Moreover, it supports different commercial domains like business and web
analytics, economics, statistics, neuroscience, finance, advertising, etc. It also works as a
foundational library for many advanced Python libraries.
8) Matplotlib
Matplotlib is a data visualization library that is used for 2D plotting to produce publication-quality
image plots and figures in a variety of formats. The library helps to generate histograms, plots,
error charts, scatter plots, bar charts with just a few lines of code.
It is the oldest Python machine learning library. However, it is still not obsolete. It is one of the
most innovative data visualization libraries for Python. So, the ML community admires it.
The following features of the Matplotlib library make it a famous Python machine learning
among the ML community:
35
Offers embeddable visualizations with different GUI applications
Various Python frameworks and libraries extend Matplotlib
Summary
Purpose Libraries
Scientific Computation Numpy, SciPy
Tabular Data Pandas
Data Modelling &
Scikit Learn
Preprocessing
Deep Learning Keras, Tensorflow, Pytorch
Data visualization Matplotlib
36
37