MBA (DS) - Introduction To Python Programming
MBA (DS) - Introduction To Python Programming
Table of Contents: Each unit has a well-defined table of contents. For example: “1.1.1.
(a)” should be read as “Module 1. Unit 1. Topic 1. (Sub-topic a)” and 1.2.3. (iii) should
be read as “Module 1. Unit 2. Topic 3. (Sub-topic iii).
Aim: It refers to the overall goal that can be achieved by going through the unit.
Learning Outcomes: These are demonstrations of the learner’s skills and experience
sequences in learning, and refer to what you will be able to accomplish after going
through the unit.
Did You Know?: You will learn some interesting facts about a topic that will help you
improve your knowledge. A unit can also contain Quiz, Case Study, Critical Learning
Exercises, etc., as metacognitive scaffold for learning.
Summary: This includes brief statements or restatements of the main points of unit
and summing up of the knowledge chunks in the unit.
Video Links: It has links to online videos that help you understand concepts from a
variety of online resources.
Author
Dr. A. Sivaramakrishnan
Director CDOE
C. Shanath Kumar
Instructional Designer
Nabina Das
Project Manager
K. D. N. Lakshmi
Graphic Designer
J. Srinivasa Reddy
Dr. A. Sivaramakrishnan
Associate Professor
Dr. A. Sivaramakrishnan completed his Ph.D in Computer Science from VELS University, in April
2015, Chennai, Tamil Nadu, India. Currently, he is an Associate Professor of Computer Science
and Applications at KL (Deemed to be) University, Vijayawada, India. He graduated in Computer
Science from Bharathiar University, Coimbatore. He pursued a Master of Computer Applications
from the M.I.E.T, Bharathidasan University, Trichy and did a Master of Philosophy in Computer
Science from Madurai Kamaraj University, Madurai. With exposure to Digital Image Processing,
he has 15+ years of teaching and industry experience. Dr. Sivaramakrishnan has published in
many international research journals and he has worked abroad for more than three years.
Introduction to programming basics (what it is and how it works), binary computation, problem-
solving methods and algorithm development. Includes procedural and data abstractions, program
design, debugging, testing, and documentation. Covers data types, control structures, functions,
parameterpassing, library functions, arrays, inheritance and object oriented design. Laboratory
exercises in Python.
Need for programming, Programming languages, History of python, Python Installation, Interactive
modes, keywords, variables, Identifiers, data types –Numbers, sequences, Sets, Mappings and
None, mutable vs Immutable data types
Numpy Array, Operations on Arrays, Indexing and Slicing; Introduction to Pandas: Series and
Data frames – simple examples.
Matplotlib – Usage of pyplot, pyplot fuctions with examples and Seaborn with simple examples.
MODULE 1
Introduction to Python Programming
MODULE 2
Operators, Conditional and Looping in Python
Unit 1 Operators in Python
Unit 2 Functions in Python
MODULE 3
Introduction to Numpy
MODULE 4
Introduction to Data visualization
Introduction to Python
Module Description
Programming helps in speeding up the input and output processes in a machine. It is important
to automate, collect, manage, calculate, and analyze the processing of data and information
accurately. Programming helps create software and applications that help computer and mobile
users in daily life.
In this module we are going to see the importance of programming and introduction about the
Python Programming, installation of Python programming and basic python commands and how
to built the python programming
Introduction to Python
Aim _____________________________________________________12
Instructional Objectives _____________________________________12
Learning Outcomes_________________________________________12
Summary _________________________________________________18
Glossary__________________________________________________18
Bibliography_______________________________________________19
e-References______________________________________________19
Instructional Objectives
Learning Outcomes
1. Python is currently the most widely used multi-purpose, high-level programming language.
2. Python allows programming in Object-Oriented and Procedural paradigms.
3. Python programs generally are smaller than other programming languages like Java. Pro-
grammers must type relatively less and indentation requirement of the language, makes
them readable all the time.
4. Python language is being used by almost all tech-giant companies like – Google, Ama-
zon, Facebook, Instagram, Dropbox, Uber… etc.
5. The biggest strength of Python is huge collection of standard library which can be
used for the following:
● Machine Learning
● GUI Applications (like Kivy, Tkinter, PyQt etc. )
● Web frameworks like Django (used by YouTube, Instagram, Dropbox)
● Image processing (like OpenCV, Pillow)
● Web scraping (like Scrapy, BeautifulSoup, Selenium)
● Test frameworks
● Multimedia
● Scientific computing
Programming is using a language that a machine can understand in order to get it to perform var-
ious tasks. Computer programming is how we communicate with machines in a way that makes
them function how we need.
What is a Program?
A program is a group of logical, mathematical, and sequential functions grouped together. When
they are grouped, these functions perform a task. Each programming language focuses on differ-
ent types of tasks as well as gives commands to the machine in different ways.
Programming language is also named as high-level languages. Some of the commonly used
languages are- C, C++, Java, JavaScript, React JS, PHP, .Net, etc. The mobile applications are
coded by using different languages having distinct features. However, programming languages
share a lot of similarities with each other.
To advance your ability to develop real algorithms- Most of the languages come with a lot of fea-
tures for the Programmers. They can be used in a proper way to get the best results.
To Improve Customization of Your Current Coding- By using basic features of the existing pro-
gramming language you can simplify things to program a better option to write resourceful codes.
There is no compulsion of writing code in a specific way. The thing which matters is the usage of
features used and clarity of the concept.
Low-level language is machine-dependent (0s and 1s) programming language. The processor
runs low- level programs directly without the need of a compiler or interpreter, so the programs
written in low- level language can be run very fast.
i. Machine Language
The advantage of machine language is that it helps the programmer to execute the pro-
grams faster than the high-level programming language.
Assembly language (ASM) is also a type of low-level programming language that is de-
signed for specific processors. It represents the set of instructions in a symbolic and
human-understandable form. It uses an assembler to convert the assembly language to
machine language.
The advantage of assembly language is that it requires less memory and less execution
time to execute a program.
High-level programming language (HLL) is designed for developing user-friendly software pro-
grams and websites. This programming language requires a compiler or interpreter to translate
the program into machine language (execute the program).
The main advantage of a high-level language is that it is easy to read, write, and maintain.
High-level programming language includes Python, Java, JavaScript, PHP, C#, C++, Objective
C, Cobol, Perl, Pascal, LISP, FORTRAN, and Swift programming language.
Object-Oriented Programming (OOP) language is based upon the objects. In this pro-
gramming language, programs are divided into small parts called objects. It is used to im-
plement real-world entities like inheritance, polymorphism, abstraction, etc in the program
to makes the program reusable, efficient, and easy-to-use.
The main advantage of object-oriented programming is that OOP is faster and easier to
execute, maintain, modify, as well as debug.
Natural language is a part of human languages such as English, Russian, German, and
Japanese. It is used by machines to understand, manipulate, and interpret human’s lan-
guage. It is used by developers to perform tasks such as translation, automatic summari-
zation, Named Entity Recognition (NER), relationship extraction, and topic segmentation.
The main advantage of natural language is that it helps users to ask questions in any
subject and directly respond within seconds.
● Its high-level built in data structures, combined with dynamic typing and dynamic
binding, make it very attractive for Rapid Application Development, as well as
for use as a scripting or glue language to connect existing components together.
● Python’s simple, easy to learn syntax emphasizes readability and therefore re-
duces the cost of program maintenance.
Glossary
● constant: Fixed values, either numbers, letters or strings, that do not change.
e-References
● https://www.amazon.in/Basic-Core-Python-Programming-Applica-
tions-ebook/dp/B0933F73LK
● https://slideplayer.com/slide/13549208/#.Y6AeQ1THs4I.gmail
● https://docs.python.org/3/whatsnew/3.11.html
Aim ____________________________________________________________22
Instructional Objectives ____________________________________________22
Learning Outcomes _______________________________________________22
Instructional Objectives
Learning Outcomes
Guido van Rossum was reading the script of a popular BBC comedy series “Monty Python’s Fly-
ing Circus”. It was late on-air 1970s.
Van Rossum wanted to select a name which unique, sort, and little-bit mysterious. So he decided
to select naming Python after the “Monty Python’s Flying Circus” for their newly created program-
ming language.
The comedy series was creative and well random. It talks about everything. Thus it is slow and
unpredictable, which made it very interesting.
Python is also versatile and widely used in every technical field, such as Machine Learning
● Artificial Intelligence
● Web Development, Mobile Application
● Desktop Application, Scientific Calculation, etc.
The first step is to learn how to install or update Python on a local machine or computer. Proce-
dure as follows
Installation on Windows
Double-click the executable file, which is downloaded; the following window will open. Select
Customize installation and proceed. Click on the Add Path check box, it will set the Python path
automatically.
Now, try to run python on the command prompt. Type the command python -version in case of
python3.11.1
Keywords are the reserved words in Python. We cannot use a keyword as a variable name,
function name or any other identifier.
Python Variables
A variable is created the moment you first assign a value to it. Example
X=56
Name=”Raju”
Print(x)
Print(y)
Variable Names
A variable can have a short name (like x and y) or a more descriptive name (age, carname,
total_volume). Rules for Python variables:
_my_name = “James”
myName = “James”
Print(name)
Print(my_name)
Print(_my_name)
Print(MYNAME)
Print(Myname2)
Output
James
James
James
James
James
James
● int
● float
● complex
Variables of numeric types are created when you assign a value to them:
Example
x=5 # int
y = 3.8 # float
z = 2j # complex
To verify the type of any object in Python, use the type() function:
Example
print(type(x))
print(type(y))
print(type(z))
Python Sets
Set is one of 4 built-in data types in Python used to store collections of data, the other 3 are
List, Tuple, and Dictionary, all with different qualities and usage.
map() function returns a map object(which is an iterator) of the results after applying the given
function to each item of a given iterable (list, tuple etc.)
Syntax :
map(fun, iter)
Parameters :
fun : It is a function to which map passes each element of given iterable.
iter : It is a iterable which is to be mapped.
Note : You can pass one or more iterable to the map() function.
Returns :
Returns a list of the results after applying the given function to each item of a given iterable (list,
tuple etc.)
Note : The returned value from map() (map object) then can be passed to functions like list() (to
create a list), set() (to create a set) .
CODE 1
Output :
[2, 4, 6, 8]
we can change the contents of a mutable data type in Python, by assigning new values or by
simply adding new values. On contrary to that, we cannot change the contents of an immutable
data type in Python.
Before lining the differences between them, let us first get a short idea about immutable objects
in Python --
In simple words, the value assigned to a variable cannot be changed for the immutable data
types. For example, String is an immutable data type in Python. We cannot change its content,
otherwise, we may fall into a TypeError. Even if we assign any new content to immutable objects,
then a new object is created (instead of the original being modified).
Python handles mutable and immutable objects quite differently. Let us look into the dif-
ference between both of these types of objects:
Python Datatypes
Numbers Lists
Strings Dictionary
Tuples Sets
2. Which one of the following is the correct extension of the Python file?
(a) .py
(b) .python
(c) .p
(d) None of these
(a) val()
(b) print()
(c) display()
(d) None of these
(a) ^
(b) *
(c) **
(d) None of the above
(a) a+bc
(b) abc
(c) a bc
(d) a
7. What will be the output of the following Python code snippet if x=1?
x<<2
(a) 4
(b) 2
(c) 1
(d) 8
(a) Underscore and ampersand are the only two special characters allowed
(b) Unlimited length
(c) All private members must have leading and trailing underscores
(d) None of these
(a) Tuples
(b) Lists
(c) Class
(d) Dictionary
(a) Error
(b) 6
(c) 4
(d) 3
Self-Assessment Questions
1 b
2 a
3 b
4 c
5 b
6 a
7 a
8 b
9 c
10 c
e-References
● https://www.amazon.in/Basic-Core-Python-Programming-Applica-
tions-ebook/dp/B0933F73LK
● https://slideplayer.com/slide/13549208/#.Y6AeQ1THs4I.gmail
● https://docs.python.org/3/whatsnew/3.11.html
Module Description
In Python programming, Operators in general are used to perform operations on values and
variables. These are standard symbols used for the purpose of logical and arithmetic operations.
In this module, we will look into different types of Python operators, conditional statements in
Python and looping structures in Python Programming
Unit 1
Operators in Python
Aim _______________________________________________________41
Instructional Objectives _______________________________________41
Learning Outcomes___________________________________________41
Bibliography_________________________________________________49
e-References________________________________________________49
When student complete operators with Python, they will be able to: Build basic pro-
grams using all operators, conditional logic, looping, and functions.
Work with user input to create interactive programs.
Instructional Objectives
To learn more about python programming students can enrich their knowl-
edge in the following areas
Learning Outcomes
In the example below, we use the + operator to add together two values:
print(12 + 5)
● Arithmetic operators
● Assignment operators
● Comparison operators
● Logical operators
● Identity operators
● Membership operators
● Bitwise operators
Arithmetic operators are used with numeric values to perform common mathematical opera-
tions:
Identity operators are used to compare the objects, not if they are equal, but if they are
actually the same object, with the same memory location:
In this tutorial, you’ll learn how precedence and associativity of operators affect the order of
operations in Python.
The combination of values, variables, operators, and function calls is termed as an expression.
The Python interpreter can evaluate a valid expression.
For example:
>>> 5 - 7
-2
Suppose we’re constructing an if...else block which runs if when lunch is either fruit or sandwich
and only if money is more than or equal to 2.
Suppose we’re constructing an if...else block which runs if when lunch is either fruit or sandwich
and only if money is more than or equal to 2.
Output
Lunch being delivered
Python defines type conversion functions to directly convert one data type to another which is
useful in day-to-day and competitive programming. This article is aimed at providing information
about certain conversion functions.
Example
x = 20
print(“x is of type:”,type(x))
y = 10.6
print(“y is of type:”,type(y))
z=x+y
print(z)
print(“z is of type:”,type(z))
Output:
x is of type: <class ‘int’>
y is of type: <class ‘float’>
30.6
z is of type: <class ‘float’>
As we can see the data type of ‘z’ got automatically changed to the “float” type while one variable
x is of integer type while the other variable y is of float type. The reason for the float value not be-
ing converted into an integer instead is due to type promotion that allows performing operations
by converting data into a wider-sized data type without any loss of information. This is a simple
case of Implicit type conversion in python.
In Explicit Type Conversion in Python, the data type is manually changed by the user as per their
requirement. With explicit type conversion, there is a risk of data loss since we are forcing an
expression to be changed in some specific data type. Various forms of explicit type conversion
are explained below
1. int(a, base): This function converts any data type to integer. ‘Base’ specifies the base in which
string is if the data type is a string.
2. float(): This function is used to convert any data type to a floating-point number.
Python Functions
Creating a Function
Calling a Function
def my_function():
print(“Hello from a function”)
my_function()
e-References
● https://www.amazon.in/Basic-Core-Python-Programming-Applica-
tions-ebook/dp/B0933F73LK
● https://slideplayer.com/slide/13549208/#.Y6AeQ1THs4I.gmail
● https://docs.python.org/3/whatsnew/3.11.html
Unit 2
Functions in Python
Aim _______________________________________________________ 52
Instructional Objectives _______________________________________ 52
Learning Outcomes___________________________________________ 52
When student complete operators with Python, they will be able to: Build basic pro-
grams using all operators, conditional logic, looping, and functions.
Work with user input to create interactive programs.
Instructional Objectives
To learn more about python programming students can enrich their knowl-
edge in the following areas
Learning Outcomes
As mentioned earlier, a function can also have arguments. A arguments is a value that is ac-
cepted by a function. For example,
If we create a function with arguments, we need to pass the corresponding values while calling
them.
For example,
Here, add_numbers(5, 4) specifies that arguments num1 and num2 will get values 5 and 4
respectively.
We can also call the function by mentioning the argument name as:
add_numbers(num1 = 5, num2 = 4)
In Python, we call it Keyword Argument (or named argument). The code above is equivalent to
add_numbers(5, 4)
Output
Sum: 9
In Python, standard library functions are the built-in functions that can be used directly in our
program.
For example,
These library functions are defined inside the module. And, to use them we must include the
module inside our program. For example, sqrt() is defined inside the math module.
Output
Square Root of 4 is 2.0
2 to the power 3 is 8
import math
Since sqrt() is defined inside the math module, we need to include it in our program.
1. Code Reusable - We can use the same function multiple times in our program which makes
our code reusable. For example,
# function definition
def get_square(num):
return num * num
for i in [1,2,3]:
# function call
result = get_square(i)
print(‘Square of’,i, ‘=’,result)
In the above example, we have created the function named get_square() to calculate the
square of a number. Here, the function is used to calculate the square of numbers from 1 to 3.
Hence, the same method is used again and again.
2. Code Readability : Functions help us break our code into chunks to make our program read-
able and easy to understand.
● Equals: a == b
● Not Equals: a != b
● Less than: a < b
● Less than or equal to: a <= b
● Greater than: a > b
● Greater than or equal to: a >= b
These conditions can be used in several ways, most commonly in “if statements” and loops.
An “if statement” is written by using the if keyword.
ExampleIf statement:
a = 22
b = 300
if b > a:
print(“b is greater than a”)
The else keyword catches anything which isn’t caught by the preceding conditions.
x = 100
y = 23
if x > y:
print(“x is greater than y”)
else:
print(“y is greater than x”)
Elif
The elif keyword is pythons way of saying “if the previous conditions were not true, then try this
condition”.
Example
a = 44
b = 44
if b > a:
print(“b is greater than a”)
elif a == b:
print(“a and b are equal”)
And
The and keyword is a logical operator, and is used to combine conditional statements:
Example
a = 100
b = 23
c = 500
if a > b and c > a:
print(“Both conditions are True”)
Or
a = 100
b = 23
c = 400
if a > b or a > c:
print("At least one of the conditions is True")
Nested If
You can have if statements inside if statements, this is called nested if statements.
Example
x = 51
if x > 10:
print(“Above ten,”)
if x > 20:
print(“and also above 20!”)
else:
print(“but not above 20.”)
A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or
a string).
This is less like the for keyword in other programming languages, and works more like an itera-
tor method as found in other object-orientated programming languages.
With the for loop we can execute a set of statements, once for each item in a list, tuple, set etc.
Example
Example
for x in "banana":
print(x)
Output
b
a
n
a
n
a
1. What will be the datatype of the sample in the below code snippet?
sample=25
print(type(sample))
sample=”Welcome”
print(type(sample))
(a) Error
(b) 80
(c) 100
(d) 117
(a) if a>=2 :
(b) if (a >= 2)
(c) if (a => 22)
(d) if a >= 22
(a) else if
(b) elseif
(c) elif
(d) None of the above
(a) aBCD
(b) abcd
(c) ABCD
(d) error
(a) 7
(b) 1
(c) 0
(d) 5
(a) 27
(b) 9
(c) 3
(d) 1
Self-Assessment Questions
1 d
2 b
3 a
4 b
5 a
6 c
7 b
8 b
9 b
10 d
e-References
● https://www.amazon.in/Basic-Core-Python-Programming-Applica-
tions-ebook/dp/B0933F73LK
● https://slideplayer.com/slide/13549208/#.Y6AeQ1THs4I.gmail
● https://docs.python.org/3/whatsnew/3.11.html
Introduction to Numpy
IThis module will help you get acquainted with the widely used array-processing library in Python,
NumPy. What is NumPy? NumPy is a general-purpose array-processing package. It provides a
high-performance multidimensional array object, and tools for working with these arrays. It is the
fundamental package for scientific computing with Python. It is open-source software. It contains
various features including these important ones:
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined using NumPy which allows NumPy
to seamlessly and speedily integrate with a wide variety of databases.
Installation:
● Mac and Linux users can install NumPy via pip command:
● Windows does not have any package manager analogous to that in linux or mac. Please
download the pre-built windows installer for NumPy from here (according to your system
configuration and Python version). And then install the packages manually.
Introduction to Numpy
Unit 1
Numpy Array
Aim ________________________________________________________69
Instructional Objectives_________________________________________69
Learning Outcomes____________________________________________69
Bibliography__________________________________________________83
e-References_________________________________________________83
Instructional Objectives
Learning Outcomes
NumPy is a Python package. It stands for 'Numerical Python'. It is a library consisting of multidi-
mensional array objects and a collection of routines for processing of array.
Numeric, the ancestor of NumPy, was developed by Jim Hugunin. Another package Numarray
was also developed, having some additional functionalities. In 2005, Travis Oliphant created
NumPy package by incorporating the features of Numarray into Numeric package. There are
many contributors to this open source project.
NumPy is often used along with packages like SciPy (Scientific Python) and Mat−plotlib (plotting
library). This combination is widely used as a replacement for MatLab, a popular platform for
technical computing. However, Python alternative to MatLab is now seen as a more modern and
complete programming language.
The best way to enable NumPy is to use an installable binary package specific to your oper-
ating system. These binaries contain full SciPy stack (inclusive of NumPy, SciPy, matplotlib,
IPython, SymPy and nose packages along with core Python).
Windows
Anaconda (from https://www.continuum.io) is a free Python distribution for SciPy stack. It is also
available for Linux and Mac.
Linux
Package managers of respective Linux distributions are used to install one or more packages in
SciPy stack.
Ubuntu
Fedora
Core Python (2.6.x, 2.7.x and 3.2.x onwards) must be installed with distutils and zlib module
should be enabled.
GNU gcc (4.2 and above) C compiler must be available.
To install NumPy, run the following command.
To test whether NumPy module is properly installed, try to import it from Python prompt.
import numpy
The most important object defined in NumPy is an N-dimensional array type called ndarray. It
describes the collection of items of the same type. Items in the collection can be accessed using
a zero-based index.
Every item in an ndarray takes the same size of block in the memory. Each element in ndarray is
an object of data-type object (called dtype).
Any item extracted from ndarray object (by slicing) is represented by a Python object of one of
array scalar types. The following diagram shows a relationship between ndarray, data type object
(dtype) and array scalar type −
Head
Header
Ndarray
An instance of ndarray class can be constructed by different array creation routines described
later in the tutorial. The basic ndarray is created using an array function in NumPy as follows −
numpy.array
It creates an ndarray from any object exposing array interface, or from any method that returns
an array.
numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)
The above constructor takes the following parameters −
object
1 Any object exposing the array interface method returns an array, or any (nested)
sequence.
Dtype
2 Desired data type of array, optional
Copy
3 Optional. By default (true), the object is copied
Order
4 C (row major) or F (column major) or A (any) (default)
Subok
By default, returned array forced to be a base class array. If true, sub-classes passed
5
through
Ndmin
6 Specifies minimum dimensions of resultant array
Example 1
import numpy as np
a = np.array([1,2,3])
print a
The output is as follows −
[1, 2, 3]
Example 2
# minimum dimensions
import numpy as np
a = np.array([1, 2, 3,4,5], ndmin = 2)
print a
Example 4
# dtype parameter
import numpy as np
a = np.array([1, 2, 3], dtype = complex)
print a
The ndarray object consists of contiguous one-dimensional segment of computer memory, com-
bined with an indexing scheme that maps each item to a location in the memory block.
NumPy supports a much greater variety of numerical types than Python does. The following
table shows different scalar data types defined in NumPy.
complex64
18 Complex number, represented by two 32-bit floats (real and imaginary components)
complex128
19 Complex number, represented by two 64-bit floats (real and imaginary components)
NumPy numerical types are instances of dtype (data-type) objects, each having unique charac-
teristics. The dtypes are available as np.bool_, np.float32, etc.
A data type object describes interpretation of fixed block of memory corresponding to an array,
depending on the following aspects −
The byte order is decided by prefixing '<' or '>' to data type. '<' means that encoding is little-en-
dian (least significant is stored in smallest address). '>' means that encoding is big-endian
(most significant byte is stored in smallest address).
A dtype object is constructed using the following syntax −
numpy.dtype(object, align, copy)
Example
output
int32
#int8, int16, int32, int64 can be replaced by equivalent string 'i1', 'i2','i4', etc.
import numpy as np
dt = np.dtype('i4')
print dt
Output
int32
Each built-in data type has a character code that uniquely identifies it.
● 'b' − boolean
● 'i' − (signed) integer
● 'u' − unsigned integer
● 'f' − floating-point
● 'c' − complex-floating point
● 'm' − timedelta
● 'M' − datetime
● 'O' − (Python) objects
● 'S', 'a' − (byte-)string
● 'U' − Unicode
● 'V' − raw data (void)
Ndarray.shape
This array attribute returns a tuple consisting of array dimensions. It can also be used to resize
the array.
Example
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print a.shape
Output
(2, 3)
a = np.array([[1,2,3],[4,5,6]])
a.shape = (3,2)
print a
Output
[[1, 2]
[3, 4]
[5, 6]]
Example
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
b = a.reshape(3,2)
print b
Output
[[1, 2]
[3, 4]
[5, 6]]
Ndarray.ndim
Example
Example
# now reshape it
b = a.reshape(2,4,3)
print b
Output
[[[ 0, 1, 2]
[ 3, 4, 5]
[ 6, 7, 8]
[ 9, 10, 11]]
[[12, 13, 14]
[15, 16, 17]
[18, 19, 20]
[21, 22, 23]]]
Numpy.empty
It creates an uninitialized array of specified shape and dtype. It uses the following constructor −
numpy.empty(shape, dtype = float, order = 'C')
import numpy as np
x = np.empty([3,2], dtype = int)
print x
Output
[[22649312 1701344351]
[1818321759 1885959276]
[16779776 156368896]]
Note − The elements in an array show random values as they are not initialized.
numpy.zeros
Dtype
2 Desired output data type. Optional
Order
3 ‘C’ for C-style row-major array, ‘F’ for FORTRAN style column-major array
Example
Output
[ 0. 0. 0. 0. 0.]
import numpy as np
x = np.zeros((5,), dtype = np.int)
print x
Output
[0 0 0 0 0]
Example
# custom type
import numpy as np
x = np.zeros((2,2), dtype = [(‘x’, ‘i4’), (‘y’, ‘i4’)])
print x
Output
[[(0,0)(0,0)]
[(0,0)(0,0)]]
Contents of ndarray object can be accessed and modified by indexing or slicing, just like Python’s
in-built container objects.
As mentioned earlier, items in ndarray object follows zero-based index. Three types of indexing
methods are available − field access, basic slicing and advanced indexing.
Basic slicing is an extension of Python’s basic concept of slicing to n dimensions. A Python slice
object is constructed by giving start, stop, and step parameters to the built-in slice function. This
slice object is passed to the array to extract a part of array.
Example
import numpy as np
a = np.arange(10)
s = slice(2,7,2)
print a[s]
In the above example, an ndarray object is prepared by arange() function. Then a slice object is
defined with start, stop, and step values 2, 7, and 2 respectively. When this slice object is passed
to the ndarray, a part of it starting with index 2 up to 7 with a step of 2 is sliced.
The same result can also be obtained by giving the slicing parameters separated by a colon :
(start:stop:step) directly to the ndarray object.
Example
import numpy as np
a = np.arange(10)
b = a[2:7:2]
print b
Output
[2 4 6]
e-References
● https://www.amazon.in/Basic-Core-Python-Programming-Applica-
tions-ebook/dp/B0933F73LK
● https://slideplayer.com/slide/13549208/#.Y6AeQ1THs4I.gmail
● https://docs.python.org/3/whatsnew/3.11.html
Introduction to Numpy
Unit 2
Pandas
Aim ________________________________________________________86
Instructional Objectives_________________________________________86
Learning Outcomes____________________________________________86
Bibliography _________________________________________________99
e-References ________________________________________________99
Instructional Objectives
Learning Outcomes
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty
or NULL values. This is called cleaning the data.
The source code for Pandas is located at this github repository https://github.com/pandas-dev/
pandas
Installation of Pandas
If you have Python and PIP already installed on a system, then installation of Pandas is very
easy.
If this command fails, then use a python distribution that already has Pandas installed like, Ana-
conda, Spyder etc.
Once Pandas is installed, import it in your applications by adding the import keyword:
import pandas
Now Pandas is imported and ready to use.
Example
import pandas
mydataset = {
‘cars’: [“BMW”, “Volvo”, “Ford”],
‘passings’: [3, 7, 2]
}
myvar = pandas.DataFrame(mydataset)
print(myvar)
What is a Series?
Example
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
Output
0 1
1 7
2 2
dtype: int64
If nothing else is specified, the values are labeled with their index number. First value has index
0, second value has index 1 etc.
This label can be used to access a specified value.
Example
Output
1
Create Labels
With the index argument, you can name your own labels.
Example
a = [1, 7, 2]
print(myvar)
Output
x 1
y 7
z 2
dtype: int64
When you have created labels, you can access an item by referring to the label.
Output
7
What is a DataFrame?
Example
data = {
“calories”: [420, 380, 390],
“duration”: [50, 40, 45]
}
print(df)
Output
calories duration
0 420 50
1 380 40
2 390 45
As you can see from the result above, the DataFrame is like a table with rows and columns.
Pandas use the loc attribute to return one or more specified row(s)
Example
Return row 0:
#refer to the row index:
print(df.loc[0])
Output
calories 420
duration 50
Name: 0, dtype: int64
Pandas
Python provides a library called pandas that is popular with data scientists and analysts. Pan-
das enable users to manipulate and analyze data using sophisticated data analysis tools.
Pandas provide two data structures that shape data into a readable form:
● Series
● DataFrame
3.2.5 Series
A pandas series is a one-dimensional data structure that comprises of key-value pair, where
keys/labels are the indices and values are the values stored on that index. It is similar to a python
dictionary, except it provides more freedom to manipulate and edit the data.
Series 1 Series 2
Initializing a series
In the code example above, there are three different series initialized by providing a list to the
pandas.Series() method. Every element in the series has a label/index. By default, the indices
are similar to an array index e.g., start with 00 and end at N - 1N−1, where NN is the number of
elements in that list.
However, we can provide our indices by using the index parameter of the pandas.Series()
method.
Moreover, you can name your series by passing a string to the name argument in the
pandas.Series() method:
Series 1 Series 2
Series 1
Syntax
Initializing a DataFrame
Both of the lists comprising of fruits as values are used to make a Python dictionary which is then
passed to the pandas.DataFrame() method to make a DataFrame.
For the second DataFrame, we passed a list of indexes using the index argument in the pandas.
DataFrame() method to use our custom indices.
(a) zeros()
(b) ones()
(c) arange()
(d) eye()
(a) 0
(b) 1
(c) 2
(d) 3
4. Which of the following is used to find the maximum element in a NumPy array?
(a) max()
(b) maximum()
(c) amax()
(d) All of the above
5. Which of the following is used to find the sum of the elements in a NumPy array?
(a) cumsum()
(b) sum()
(c) All of the above
(d) None of the above
(a) two-dimensional
(b) 1 dimensional
(c) Multi dimensional
(d) None of the above
(a) True
(a) False
Self-Assessment Questions
1 A
2 D
3 C
4 D
5 B
6 C
7 A
8 A
9 A
10 A
e-References
● https://www.amazon.in/Basic-Core-Python-Programming-Applica-
tions-ebook/dp/B0933F73LK
● https://slideplayer.com/slide/13549208/#.Y6AeQ1THs4I.gmail
● https://docs.python.org/3/whatsnew/3.11.html
Data visualization is the discipline of trying to understand data by placing it in a visual context
so that patterns, trends, and correlations that might not otherwise be detected can be exposed.
Python offers multiple great graphing libraries packed with lots of different features. Whether you
want to create interactive or highly customized plots, Python has an excellent library for you.
In this module, we will learn how to create basic plots using Matplotlib, Pandas visualization, and
Seaborn as well as how to use some specific features of each library. This module will focus on
the syntax and not on interpreting the graphs.
Aim ________________________________________________________104
Instructional Objectives_________________________________________104
Learning Outcomes____________________________________________104
Bibliography _________________________________________________107
e-References _______________________ _________________________107
Instructional Objectives
Learning Outcomes
It may sometimes seem easier to go through a set of data points and build insights from it but
usually this process may not yield good results. There could be a lot of things left undiscovered
as a result of this process. Additionally, most of the data sets used in real life are too big to do any
analysis manually. This is essentially where data visualization steps in.
Data visualization is an easier way of presenting the data, however complex it is, to analyze
trends and relationships amongst variables with the help of pictorial representation.
While building visualization, it is always a good practice to keep some below mentioned
points in mind
● Ensure appropriate usage of shapes, colors, and size while building visualization
● Plots/graphs using a co-ordinate system are more pronounced
● Knowledge of suitable plot with respect to the data types brings more clarity to the infor-
mation
● Usage of labels, titles, legends and pointers passes seamless information the wider au-
dience
There are a lot of python libraries which could be used to build visualization like matplotlib, vispy,
bokeh, seaborn, pygal, folium, plotly, cufflinks, and networkx. Of the many, matplotlib and sea-
born seems to be very widely used for basic to intermediate level of visualizations.
4.1.2 Matplotlib
e-References
● https://www.amazon.in/Basic-Core-Python-Programming-Applica-
tions-ebook/dp/B0933F73LK
● https://slideplayer.com/slide/13549208/#.Y6AeQ1THs4I.gmail
● https://docs.python.org/3/whatsnew/3.11.html
Instructional Objectives
Learning Outcomes
Syntax :
Output
Parameters: This function accepts parameters that enables us to set axes scales and format
the graphs. These parameters are mentioned below :-
● plot(x, y): plot x and y using default line style and color.
● plot.axis([xmin, xmax, ymin, ymax]): scales the x-axis and y-axis from minimum to maxi-
mum values
● plot(x, y, label = ‘Sample line ‘) plotted Sample Line will be displayed as a legend
● For sake of example we will use Electricity Power Consumption datasets of India and
Bangladesh. Here, we are using Google Public Data as a data source.
Conceptualized and built originally at the Stanford University, this library sits on top of matplot-
lib. In a sense, it has some flavors of matplotlib while from the visualization point, it is much
better than matplotlib and has added features as well. Below are its advantages
Nature of Visualization
Depending on the number of variables used for plotting the visualization and the type of vari-
ables, there could be different types of charts which we could use to understand the relation-
ship. Based on the count of variables, we could have
A Univariate plot could be for a continuous variable to understand the spread and distribution of
the variable while for a discrete variable it could tell us the count Similarly, a Bivariate plot for con-
tinuous variable could display essential statistic like correlation, for a continuous versus discrete
variable could lead us to very important conclusions like understanding data distribution across
different levels of a categorical variable. A bivariate plot between two discrete variables could also
be developed.
A boxplot, also known as a box and whisker plot, the box and the whisker are clearly displayed
in the below image. It is a very good visual representation when it comes to measuring the data
distribution. Clearly plots the median values, outliers and the quartiles. Understanding data dis-
tribution is another important factor which leads to better model building. If data has outliers, box
plot is a recommended way to identify them and take necessary actions.
Returns: It returns the Axes object with the plot drawn onto it.
The box and whiskers chart shows how data is spread out. Five pieces of information are gen-
erally included in the chart
1. The minimum is shown at the far left of the chart, at the end of the left ‘whisker’
2. First quartile, Q1, is the far left of the box (left whisker)
3. The median is shown as a line in the center of the box
4. Third quartile, Q3, shown at the far right of the box (right whisker)
5. The maximum is at the far right of the box
As could be seen in the below representations and charts, a box plot could be plotted for one or
more than one variable providing very good insights to our data.
Representation of box plot.
Scatter Plot
Scatter plots or scatter graphs is a bivariate plot having greater resemblance to line graphs in the
way they are built. A line graph uses a line on an X-Y axis to plot a continuous function, while a
scatter plot relies on dots to represent individual pieces of data. These plots are very useful to see
if two variables are correlated. Scatter plot could be 2 dimensional or 3 dimensional.
Parameters:
Python3
# import module
import matplotlib.pyplot as plt
Python3
# assign labels
ax.set_xlabel('X Label'), ax.set_ylabel('Y Label'), ax.set_zlabel('Z Label')
# display illustration
plt.show()
4.2.5 Histogram
Histograms display counts of data and are hence similar to a bar chart. A histogram plot can also
tell us how close a data distribution is to a normal curve. While working out statistical method, it is
very important that we have a data which is normally or close to a normal distribution. However,
histograms are univariate in nature and bar charts bivariate.
A bar graph charts actual counts against categories e.g. height of the bar indicates the number
of items in that category whereas a histogram displays the same categorical variables in bins.
Data visualization is the discipline of trying to understand data by placing it in a visual context
so that patterns, trends, and correlations that might not otherwise be detected can be exposed.
Python offers multiple great graphing libraries packed with lots of different features. Whether you
want to create interactive or highly customized plots, Python has an excellent library for you.
In this article, we will learn how to create basic plots using Matplotlib, Pandas visualization, and
Seaborn as well as how to use some specific features of each library. This article will focus on the
syntax and not on interpreting the graphs, which I will cover in another blog post.
In further articles, I will go over interactive plotting tools like Plotly, which is built on D3 and can
also be used with JavaScript.
Importing Datasets
In this article, we will use two freely available datasets. The Iris and Wine Reviews dataset, which
we can both load into memory using pandas read_csv method.
import pandas as pd
iris = pd.read_csv(‘iris.csv’, names=[‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’,
‘class’])
print(iris.head())
Python
Copy
121
Aromas in- Nicosia
clude trop- Sicly & 2013
valuka bal- Kerin @keri- White
0 Italy ical fruit, 87 Nan saridin- Etha Nan vulka Nicosia
ance okeefe nokeefe blend
broom, ia bin-
brimston co(etha)
Quinta
This is ripe Quinta
dos avida- Portu-
Port- and fruity, Reger dos
1 Avodagos 87 15.0 Douro Nan Nan @vossroger gos 2011 guess
hgal wine that is voss advida-
avidagos red
smooth gos
red(duro)
Rainstrom
Tart and
2013
snappy, the Willa- Willa-
Paul greg- point
2 us flavors of Nan 87 14.0 Oregon mette mette @paulgwine Point gris Rainstrom
ult gris(will-
time flesh valley valley
mate
and
valley)
Pinapple St. julian
rid, lemon Lake 2013 re-
Reserve Mich- Alexander
3 us path and 87 13.0 michgan Nan Nan serve late riesling St. julian
lathe havest gan perthee
orange shore harvest
blosam riesling
sweet
Much like
cheeks
the regular Vinther’s re- Will- Will-
Paul gre- 2012 Sweet
4 us boothing serve whild 87 65.0 Oregon mette mette @paulgwine Point noir
gutt vinter’s cheeks
from 2012, child block valley valley
reserve
this
To create a scatter plot in Matplotlib, we can use the scatter method. We will also create a figure
and an axis using plt.subplots to give our plot a title and labels.
Python
Copy
Python
Copy
In Matplotlib, we can create a line chart by calling the plot method. We can also plot multiple
columns in one graph by looping through the columns we want and plotting each column on the
same axis.
Python
Copy
4.2.8 Histogram
In Matplotlib, we can create a Histogram using the hist method. If we pass categorical data like
the points column from the wine-review dataset, it will automatically calculate how often each
class occurs.
A bar chart can be created using the bar method. The bar chart isn’t automatically calculating the
frequency of a category, so we will use pandas value_counts method to do this. The bar chart is
useful for categorical data that doesn’t have a lot of different categories (less than 30) because
else it can get quite messy.
Python
Copy
To create a line chart in Pandas we can call <dataframe>.plot.line(). While in Matplotlib, we need-
ed to loop through each column we wanted to plot, in Pandas we don’t need to do this because it
automatically plots all available numeric columns (at least if we don’t specify a specific column/s).
Histogram
In Pandas, we can create a Histogram with the plot.hist method. There aren’t any required argu-
ments, but we can optionally pass some like the bin size.
wine_reviews['points'].plot.hist()
Python
Copy
Figure : Histogram
The subplots argument specifies that we want a separate plot for each feature, and the layout
specifies the number of plots per row and column.
Bar Chart
To plot a bar chart, we can use the plot.bar() method, but before calling this, we need to get our
data. We will first count the occurrences using the value_count() method and then sort the occur-
rences from smallest to largest using the sort_index() method.
wine_reviews[‘points’].value_counts().sort_index().plot.bar()
Python
Copy
It’s also really simple to make a horizontal bar chart using the plot.barh() method.
wine_reviews['points'].value_counts().sort_index().plot.barh()
Python
Copy
In the example above, we grouped the data by country, took the mean of the wine prices, ordered
it, and plotted the five countries with the highest average wine price.
Seaborn
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level inter-
face for creating attractive graphs.
Seaborn has a lot to offer. For example, you can create graphs in one line that would take multiple
tens of lines in Matplotlib. Its standard designs are awesome, and it also has a nice interface for
working with Pandas dataframes.
We can also highlight the points by class using the hue argument, which is a lot easier than in
Matplotlib.
Line chart
To create a line chart, the sns.lineplot method can be used. The only required argument is the
data, which in our case are the four numeric columns from the Iris dataset. We could also use the
sns.kdeplot method, which smoothes the edges of the curves and therefore is cleaner if you have
a lot of outliers in your dataset.
sns.lineplot(data=iris.drop([‘class’], axis=1))
Python
Copy
Histogram
To create a histogram in Seaborn, we use the sns.distplot method. We need to pass it the column
we want to plot, and it will calculate the occurrences itself. We can also pass it the number of bins
and if we want to plot a gaussian kernel density estimate inside the graph.
Figure : Histogram
sns.distplot(wine_reviews['points'], bins=10, kde=True)
In Seaborn, a bar chart can be created using the sns.countplot method and passing it the data.
sns.countplot(wine_reviews['points'])
Python
Copy
Figure : Bar-Chart
Other graphs
Now that you have a basic understanding of the Matplotlib, Pandas Visualization, and Seaborn
syntax, I want to show you a few other graph types that are useful for extracting insides.
For most of them, Seaborn is the go-to library because of its high-level interface that allows for
the creation of beautiful graphs in just a few lines of code.
Box plots
A Box Plot is a graphical method of displaying the five-number summary. We can create box plots
using seaborn's sns.boxplot method and passing it the data as well as the x and y column names.
Figure: Boxplot
4.2.12 Heatmap
A Heatmap is a graphical representation of data where the individual values contained in a ma-
trix are represented as colors. Heatmaps are perfect for exploring the correlation of features in a
dataset. To get the correlation of the features inside a dataset, we can call <dataset>.corr(), which
is a Pandas dataframe method. This will give us the correlation matrix.
We can now use either Matplotlib or Seaborn to create the heatmap.
Matplotlib:
# set labels
ax.set_xticks(np.arange(len(corr.columns)))
ax.set_yticks(np.arange(len(corr.columns)))
ax.set_xticklabels(corr.columns)
ax.set_yticklabels(corr.columns)
# set labels
ax.set_xticks(np.arange(len(corr.columns)))
ax.set_yticks(np.arange(len(corr.columns)))
ax.set_xticklabels(corr.columns)
ax.set_yticklabels(corr.columns)
sns.heatmap(iris.corr(), annot=True)
Python
Copy
4.2.13 Faceting
Faceting is the act of breaking data variables up across multiple subplots and combining those
subplots into a single figure.
To use one kind of faceting in Seaborn, we can use the FacetGrid. First of all, we need to define
the FacetGrid and pass it our data as well as a row or column, which will be used to split the data.
Then we need to call the map function on our FacetGrid object and define the plot type we want
to use and the column we want to graph.
g = sns.FacetGrid(iris, col='class')
g = g.map(sns.kdeplot, 'sepal_length')
Python
Copy
4.2.14 Pairplot
Lastly, I will show you Seaborns pairplot and Pandas scatter_matrix, which enable you to plot a
grid of pairwise relationships in a dataset.
sns.pairplot(iris)
Python
Copy
Figure: Pairplot
fig, ax = plt.subplots(figsize=(12,12))
scatter_matrix(iris, alpha=1, ax=ax)
Python
Copy
As you can see in the images above, these techniques are always plotting two features with each
other. The diagonal of the graph is filled with histograms, and the other plots are scatter plots.
(a) Charts
(b) Maps
(c) shapes
(d) Graphs
(a) Histogram
(b) Boxplot
(c) Pie
(d) All the above
(a) Marker
(a) Linehight
(a) Linestyle
(a) Color
(a) Line
(b) Bar
(c) Pie
(d) Scatter
6. Which of the following cart element is used to identify data series by its color pat-
terns?
8.Which function can be used to export generated graph in matplotlib to png Bar
(a) savefigure
(b) savefig
(c) save
(d) export
9. Which method can be used to get the shortest path in networkx library
(a) shortest_path
(b) short_path
(c) shortestPath
(d) sortPath
(a) bar
(b) histogram
(c) scatterplots
(d) basemap
Self-Assessment Questions
1 C
2 D
3 D
4 B
5 B
6 B
7 A
8 B
9 A
10 C
e-References
● https://www.amazon.in/Basic-Core-Python-Programming-Applica-
tions-ebook/dp/B0933F73LK
● https://slideplayer.com/slide/13549208/#.Y6AeQ1THs4I.gmail
● https://docs.python.org/3/whatsnew/3.11.html