Machine Learning - Python Libraries
Machine Learning - Python Libraries
NumPy
Pandas
SciPy
Scikit-learn
PyTorch
TensorFlow
Keras
Matplotlib
Seaborn
OpenCV
NLTK
SpaCy
NumPy
NumPy is a general purpose array and matrix processing package used for scientific
computing and to perform a variety of mathematical operations like linear algebra,
Fourier transform and others. It provides a high performance multi-dimensional array
object and tools , to manipulate the matrices for the improvement of machine learning
algorithms. It is a critical component of the Python machine learning ecosystem, as it
provides the underlying data structure and numerical operations required for many
machine learning algorithms.
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 1/12
Page 2 of 12
Fourier transformation
We can also see NumPy as the replacement of MATLAB because NumPy is mostly used
along with Scipy (Scientific Python) and Mat-plotlib (plotting library).
If you are using Anaconda distribution, then no need to install NumPy separately as it is
already installed with it. You just need to import the package into your Python script with
the help of following −
import numpy as np
On the other hand, if you are using standard Python distribution then NumPy can be
installed using popular python package installer, pip.
Example
Open Compiler
import numpy as np
data = np.array([1,2,3,4,5])
print(data)
print(len(data))
print(type(data))
print(data.shape)
Output
The above Python example code will produce the following result −
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 2/12
Page 3 of 12
[1 2 3 4 5]
5
<class 'numpy.ndarray'>
(5,)
Pandas
Pandas is a powerful library for data manipulation and analysis. This library is not exactly
used in machine learning algorithms but is used in the prior step i.e., for data
preparation. It functions based on two data structures: Series(one-dimensional) and
Data frames(two-dimensional). This allows it to handle vast typical use cases in various
sectors like Finance, Business, and Health.
With the help of Pandas, in data processing, we can accomplish the following five steps −
Load
Prepare
Manipulate
Model
Analyze
The entire representation of data in Pandas is done with the help of the following three
data structures −
1 5 10 15 24 25 28 36 40 89
Data frame − It is the most useful data structure and is used for almost all kinds of
data representation and manipulation in pandas. It is a two-dimensional data structure
that can contain heterogeneous data. Generally, tabular data is represented by using
data frames. For example, the following table shows the data of students having their
names and roll numbers, age and gender −
Aarav 1 15 Male
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 3/12
Page 4 of 12
Harshit 2 14 Male
Kanika 3 16 Female
Mayank 4 15 Male
The following table gives us the dimension and description about the above-mentioned
data structures used in Pandas −
We can understand these data structures as the higher dimensional data structure is the
container of lower dimensional data structure.
If you are using Anaconda distribution, then no need to install Pandas separately as it is
already installed with it. You just need to import the package into your Python script with
the help of following −
import pandas as pd
On the other hand, if you are using standard Python distribution then Pandas can be
installed using popular python package installer, pip.
After installing Pandas, you can import it into your Python script as did above.
Example
Open Compiler
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 4/12
Page 5 of 12
import pandas as pd
import numpy as np
data = np.array(['g','a','u','r','a','v'])
s = pd.Series(data)
print (s)
Output
0 g
1 a
2 u
3 r
4 a
5 v
dtype: object
Learn Python in-depth with real-world projects through our Python certification
course. Enroll and become a certified expert to boost your career.
SciPy
SciPy is an open-source library that performs scientific computing on large datasets. It is
easy to use and fast to execute data visualization and manipulation tasks. It consists of
modules used for the optimization of algorithms and to perform operations like
integration, linear algebra, or signal processing. SciPy is built on NumPy but extends its
functionality by performing complex tasks like numerical algorithms and algebraic
functions.
If you are using Anaconda distribution, then no need to install SciPy separately as it is
already installed with it. You just need to use the package into your Python script. For
example, with the following line of script we are importing linalg submodule from scipy
−
On the other hand, if you are using standard Python distribution and having NumPy, then
SciPy can be installed using a popular python package installer, pip.
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 5/12
Page 6 of 12
Example
Open Compiler
import numpy as np
import scipy
from scipy import linalg
A= np.array([[1,2],[3,4]])
print(linalg.inv(A))
Output
The above Python example code will produce the following result −
[[-2. 1. ]
[ 1.5 -0.5]]
Scikit-learn
Scikit-learn, a popular open-source library built on NumPy and SciPy, is used to
implement machine learning models and statistical modeling. It supports supervised and
unsupervised learning. It provides various tools for implementing data pre-processing,
feature selection, model selection, model evaluation, and many other tasks.
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 6/12
Page 7 of 12
On the other hand, if you are using standard Python distribution and having NumPy and
SciPy, then Scikit-learn can be installed using the popular python package installer, pip.
After installing Scikit-learn, you can use it in your Python script as you have done above.
Example
Following is an example to load breast cancer dataset −
Output
The above python exmaple code will produce the following result −
[0 1 0]
['malignant', 'benign']
For the more detailed study of Scikit-learn, you can go to the link
www.tutorialspoint.com/scikit_learn/index.htm.
PyTorch
PyTorch is an open-source Python library based on Torch library, generally used for
developing deep neural networks. It is based on intuitive Python and can dynamically
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 7/12
Page 8 of 12
For Python 3.8 or later and CPU plateform on Windows operating system, you can use
the following command to install PyTorch (torch, torchvision and torchaudio)
You can refer to the to following link for installation of PyTorch with more options
https://pytorch.org/get-started/locally/
import torch
After installing PyTorch, you can import it into your Python script as did above.
Example
import numpy as np
import torch
x = np.ones([3,4])
y = torch.from_numpy(x)
print(y)
Output
TensorFlow
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 8/12
Page 9 of 12
For CPU platform on Windows operating system, you can use the following command to
install TensorFlow using pip −
You can refer to the to the following link for installation of TensorFlow with more options
−
https://www.tensorflow.org/install/pip
import tensorflow as tf
After installing TensorFlow, you can import it into your Python script as did above.
Example
import tensorflow as tf
data = tf.constant([[2,1],[4,6]])
print(data)
Output
tf.Tensor(
[[2 1]
[4 6]], shape=(2, 2), dtype=int32)
Keras
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 9/12
Page 10 of 12
Keras is an high level neural network library that creates deep learning models. It runs
on top of TensorFlow, CNTK, or Theano. It provides a simple and intuitive API for building
and training deep learning models, making it an excellent choice for beginners and
researchers. Keras is one of the popular library as it allows for easy and fast prototyping.
import keras
After installing Keras, you can import it into your Python script as we did above.
Example
In the example below, we are importing CIFAR-10 dataset from Keras and printing the
shape of training data and test data −
import keras
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)
Output
Matplotlib
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 10/12
Page 11 of 12
Matplotlib is a popular plotting library usually used for data visualization, to create
graphs, plots, histograms and bar charts. It provides tools and functions for data
analysis, exploration and presentation tasks.
We can use the following line of script to install Matplotlib using pip −
Most of the matplotlib utilities lies under the pyplot submodule. We can import pyplot
from Matplot using the following lines of script −
After installing Matplotlib, you can import it into your Python script as we did above.
Example
Open Compiler
Seaborn
Seaborn is an open-source Python library built based on Matplotlib and integrates with
Pandas. It is used for making presentable and informative statistical graphics which
makes it ideal for business and marketing analysis. This library helps you learn and
explore about data.
We can use the following line of script to install Seaborn using pip −
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 11/12
Page 12 of 12
We can import Seaborn to our Python script using the following lines of script −
After installing Seaborn, you can import it into your Python script as we did above.
OpenCV
Open Source Computer Vision Library, in short OpenCV is an python library for computer
vision and image processing tasks. This library is used to identify an image pattern and
various features from the data, and can also be integrated with NumPy to process the
openCV array structure.
NLTK
Natural Language ToolKit, in short NLTK is a python programming environment usually
used for developing natural language processing tasks. It comprises easy-to-use
interfaces like WordNet, test processing libraries for classification, tokenization, parsing
and semantic reasoning.
spaCy
spaCy is a free open source Python Library. It provides features for advanced tasks in
Natural Language Processing in fast and better manner. Word tokenization and POS
tagging are two tasks that the library performs effectively.
XGBoost, LightGBM, and Gensim are many other tools and frameworks in Python
used for Machine learning. Studying Python Libraries would help to understand the
ecosystem of machine learning, and helps to built, train and deploy models.
https://www.tutorialspoint.com/machine_learning/machine_learning_python_libraries.htm 12/12