Course Pack - Programming For Data Science
Course Pack - Programming For Data Science
Study Material
Bachelor in
Data Science
Subject
Programming for
Data Science
Faculty
Nitish Patil
Page 1
School of Data Science Programming for Data Science
A. COURSE DESCRIPTION
B. LEARNING OBJECTIVES
● Understand the basic syntax and structure of the Python programming language.
● Learn fundamental programming concepts such as variables, data types, control flow, and
functions.
thinking.
Page 2
School of Data Science Programming for Data Science
● Acquire a solid foundation in Python programming that can be built upon for more
advanced topics.
C. LEARNING OUTCOMES
● Recall and explain Python syntax, built-in functions, and standard library modules.
● Practice Python programming techniques to solve real-world problems and automate tasks.
● Interpret and debug Python code, identify errors, and propose appropriate solutions.
● Evaluating: Evaluate the efficiency and effectiveness of Python programs, identify areas for
improvement, and suggest optimizations.
● Design and develop Python programs and applications that meet specific requirements,
demonstrating creativity and problem-solving skills.
Projects
● Writing a Python program to calculate and display the Fibonacci sequence.
● Creating a command-line tool that performs file manipulation tasks, such as renaming and
organizing files.
Page 3
School of Data Science Programming for Data Science
Online References:
● "Real Python" - Online tutorials, articles, and resources for Python programming.
Available at: realpython.com
Suggested Readings:
Page 4
School of Data Science Programming for Data Science
Python is a popular programming language. It was created by Guido van Rossum, and released in
1991.
It is used for:
Why Python?
Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
Python has a simple syntax similar to the English language.
Python has syntax that allows developers to write programs with fewer lines than some
other programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
Python can be treated in a procedural way, an object-oriented way or a functional way.
Variables
Creating Variables
Page 5
School of Data Science Programming for Data Science
Example
x=5
y = "John"
print(x)
print(y)
Variables do not need to be declared with any particular type, and can even change type after they
have been set.
Example
x=4 # x is of type int
x = "Sally" # x is now of type str
print(x)
Casting
If you want to specify the data type of a variable, this can be done with casting.
Example
x = str(3) # x will be '3'
y = int(3) # y will be 3
z = float(3) # z will be 3.0
You can get the data type of a variable with the type() function.
Example
x=5
y = "John"
print(type(x))
print(type(y))
Page 6
School of Data Science Programming for Data Science
Example
x = "John"
# is the same as
x = 'John'
Case-Sensitive
Example
a=4
A = "Sally"
Python Operators
Operators are used to perform operations on variables and values.
In the example below, we use the + operator to add together two values:
Example
print(10 + 5)
Arithmetic operators
Assignment operators
Comparison operators
Logical operators
Identity operators
Membership operators
Bitwise operators
Page 7
School of Data Science Programming for Data Science
Arithmetic operators are used with numeric values to perform common mathematical operations:
Operator Name
+ Addition
- Subtraction
* Multiplication
/ Division
% Modulus
** Exponentiation
// Floor division
Operator Example
Page 8
School of Data Science Programming for Data Science
= x=5
+= x += 3
-= x -= 3
*= x *= 3
/= x /= 3
%= x %= 3
//= x //= 3
**= x **= 3
&= x &= 3
|= x |= 3
Page 9
School of Data Science Programming for Data Science
^= x ^= 3
>>= x >>= 3
<<= x <<= 3
Operator Name
== Equal
!= Not equal
Page 10
School of Data Science Programming for Data Science
Operator Description
Identity operators are used to compare the objects, not if they are equal, but if they are actually the
same object, with the same memory location:
Operator Description
Page 11
School of Data Science Programming for Data Science
is not Returns True if both variables are not the same object
Operator Description
not in Returns True if a sequence with the specified value is not present
object
Page 12
School of Data Science Programming for Data Science
<< Zero fill left shift Shift left by pushing zeros in from the right and let the leftmost bits
>> Signed right shift Shift right by pushing copies of the leftmost bit in from the left, and
rightmost bits fall off
Operator Precedence
Example
Parentheses has the highest precedence, meaning that expressions inside parentheses must be
evaluated first:
print((6 + 3) - (6 + 3))
Example
Multiplication * has higher precedence than addition +, and therefor multiplications are evaluated
before additions:
print(100 + 5 * 3)
The precedence order is described in the table below, starting with the highest precedence at the
top:
Page 13
School of Data Science Programming for Data Science
Operator Description
() Parentheses
** Exponentiation
^ Bitwise XOR
| Bitwise OR
== != > >= < <= is is not in not in Comparisons, identity, and membership operators
Page 14
School of Data Science Programming for Data Science
and AND
or OR
Variables can store data of different types, and different types can do different things.
Python has the following data types built-in by default, in these categories:
Python Numbers
Page 15
School of Data Science Programming for Data Science
int
float
complex
Variables of numeric types are created when you assign a value to them:
Example
x = 1 # int
y = 2.8 # float
z = 1j # complex
Python Lists
List
Lists are one of 4 built-in data types in Python used to store collections of data, the other 3
are Tuple, Set, and Dictionary, all with different qualities and usage.
Example
Create a List:
List Items
List items are indexed, the first item has index [0], the second item has index [1] etc.
Ordered
Page 16
School of Data Science Programming for Data Science
When we say that lists are ordered, it means that the items have a defined order, and that order will
not change.
If you add new items to a list, the new items will be placed at the end of the list.
Changeable
The list is changeable, meaning that we can change, add, and remove items in a list after it has been
created.
Allow Duplicates
Since lists are indexed, lists can have items with the same value:
Example
List Length
To determine how many items a list has, use the len() function:
Example
Page 17
School of Data Science Programming for Data Science
Example
Example
type()
From Python's perspective, lists are defined as objects with the data type 'list':
<class 'list'>
Example
It is also possible to use the list() constructor when creating a new list.
Example
There are four collection data types in the Python programming language:
Page 18
School of Data Science Programming for Data Science
When choosing a collection type, it is useful to understand the properties of that type. Choosing the
right type for a particular data set could mean retention of meaning, and, it could mean an increase
in efficiency or security.
Python Tuples
Tuple
Tuple is one of 4 built-in data types in Python used to store collections of data, the other 3
are List, Set, and Dictionary, all with different qualities and usage.
Example
Create a Tuple:
Tuple Items
Tuple items are indexed, the first item has index [0], the second item has index [1] etc.
Ordered
When we say that tuples are ordered, it means that the items have a defined order, and that order
will not change.
Page 19
School of Data Science Programming for Data Science
Unchangeable
Tuples are unchangeable, meaning that we cannot change, add or remove items after the tuple has
been created.
Allow Duplicates
Since tuples are indexed, they can have items with the same value:
Example
Tuple Length
To determine how many items a tuple has, use the len() function:
Example
To create a tuple with only one item, you have to add a comma after the item, otherwise Python will
not recognize it as a tuple.
Example
thistuple = ("apple",)
print(type(thistuple))
thistuple = ("apple")
print(type(thistuple))
Page 20
School of Data Science Programming for Data Science
Example
Example
type()
From Python's perspective, tuples are defined as objects with the data type 'tuple':
<class 'tuple'>
Example
Example
Page 21
School of Data Science Programming for Data Science
Python Dictionaries
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
Dictionary
Dictionaries are written with curly brackets, and have keys and values:
Example
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
print(thisdict)
Dictionary Items
Dictionary items are ordered, changeable, and does not allow duplicates.
Dictionary items are presented in key:value pairs, and can be referred to by using the key name.
Example
Page 22
School of Data Science Programming for Data Science
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
print(thisdict["brand"])
Ordered or Unordered?
As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries
are unordered.
When we say that dictionaries are ordered, it means that the items have a defined order, and that
order will not change.
Unordered means that the items does not have a defined order, you cannot refer to an item by using
an index.
Changeable
Dictionaries are changeable, meaning that we can change, add or remove items after the dictionary
has been created.
Example
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964,
"year": 2020
}
print(thisdict)
Page 23
School of Data Science Programming for Data Science
Dictionary Length
To determine how many items a dictionary has, use the len() function:
Example
print(len(thisdict))
Example
thisdict = {
"brand": "Ford",
"electric": False,
"year": 1964,
"colors": ["red", "white", "blue"]
}
type()
From Python's perspective, dictionaries are defined as objects with the data type 'dict':
<class 'dict'>
Example
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
Page 24
School of Data Science Programming for Data Science
}
print(type(thisdict))
Example
List comprehension offers a shorter syntax when you want to create a new list based on the values
of an existing list.
Example:
Based on a list of fruits, you want a new list, containing only the fruits with the letter "a" in the
name.
Page 25
School of Data Science Programming for Data Science
Without list comprehension you will have to write a for statement with a conditional test inside:
Example
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = []
for x in fruits:
if "a" in x:
newlist.append(x)
print(newlist)
With list comprehension you can do all that with only one line of code:
Example
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
print(newlist)
The Syntax
newlist = [expression for item in iterable if condition == True]
The return value is a new list, leaving the old list unchanged.
Condition
The condition is like a filter that only accepts the items that valuate to True.
Example
Page 26
School of Data Science Programming for Data Science
The condition if x != "apple" will return True for all elements other than "apple", making the new
list contain all fruits except "apple".
Example
With no if statement:
Iterable
The iterable can be any iterable object, like a list, tuple, set etc.
Example
Example
Expression
The expression is the current item in the iteration, but it is also the outcome, which you can
manipulate before it ends up like a list item in the new list:
Example
Page 27
School of Data Science Programming for Data Science
Example
The expression can also contain conditions, not like a filter, but as a way to manipulate the
outcome:
Example
A nested list is a list within a list. Python provides features to handle nested list gracefully and
apply common functions to manipulate the nested lists. In this article we will see how to use list
comprehension to create and use nested lists in python.
Creating a Matrix
Page 28
School of Data Science Programming for Data Science
Creating a matrix involves creating series of rows and columns. We can use for loop for creating
the matrix rows and columns by putting one python list with for loop inside another python list
with for loop.
Example
matrix = [[m for m in range(4)] for n in range(3)]
print(matrix)
Running the above code gives us the following result:
[[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]]
Page 29
School of Data Science Programming for Data Science
Python Strings
Strings
Strings in python are surrounded by either single quotation marks, or double quotation marks.
Assigning a string to a variable is done with the variable name followed by an equal sign and the
string:
Example
a = "Hello"
print(a)
Multiline Strings
Example
Page 30
School of Data Science Programming for Data Science
Example
a = '''Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.'''
print(a)
Like many other popular programming languages, strings in Python are arrays of bytes representing
unicode characters.
However, Python does not have a character data type, a single character is simply a string with a
length of 1.
Example
Get the character at position 1 (remember that the first character has the position 0):
a = "Hello, World!"
print(a[1])
Since strings are arrays, we can loop through the characters in a string, with a for loop.
Example
for x in "banana":
print(x)
String Length
Example
Page 31
School of Data Science Programming for Data Science
a = "Hello, World!"
print(len(a))
Check String
To check if a certain phrase or character is present in a string, we can use the keyword in.
Example
Use it in an if statement:
Example
Check if NOT
To check if a certain phrase or character is NOT present in a string, we can use the keyword not in.
Example
Use it in an if statement:
Example
Page 32
School of Data Science Programming for Data Science
Slicing
Specify the start index and the end index, separated by a colon, to return a part of the string.
Example
b = "Hello, World!"
print(b[2:5])
String Concatenation
a = "Hello"
b = "World"
c=a+b
print(c)
Page 33
School of Data Science Programming for Data Science
Creating a Function
Example
def my_function():
print("Hello from a function")
Calling a Function
Example
def my_function():
print("Hello from a function")
my_function()
Arguments
Arguments are specified after the function name, inside the parentheses. You can add as many
arguments as you want, just separate them with a comma.
The following example has a function with one argument (fname). When the function is called, we
pass along a first name, which is used inside the function to print the full name:
Example
def my_function(fname):
print(fname + " Refsnes")
Page 34
School of Data Science Programming for Data Science
my_function("Emil")
my_function("Tobias")
my_function("Linus")
Parameters or Arguments?
The terms parameter and argument can be used for the same thing: information that are passed into
a function.
A parameter is the variable listed inside the parentheses in the function definition.
Number of Arguments
By default, a function must be called with the correct number of arguments. Meaning that if your
function expects 2 arguments, you have to call the function with 2 arguments, not more, and not
less.
Example
my_function("Emil", "Refsnes")
If you try to call the function with 1 or 3 arguments, you will get an error:
Example
Page 35
School of Data Science Programming for Data Science
my_function("Emil")
If you do not know how many arguments that will be passed into your function, add a * before the
parameter name in the function definition.
This way the function will receive a tuple of arguments, and can access the items accordingly:
Example
def my_function(*kids):
print("The youngest child is " + kids[2])
Keyword Arguments
You can also send arguments with the key = value syntax.
Example
def my_function(child3, child2, child1):
print("The youngest child is " + child3)
If you do not know how many keyword arguments that will be passed into your function, add two
asterisk: ** before the parameter name in the function definition.
This way the function will receive a dictionary of arguments, and can access the items accordingly:
Page 36
School of Data Science Programming for Data Science
Example
If the number of keyword arguments is unknown, add a double ** before the parameter name:
def my_function(**kid):
print("His last name is " + kid["lname"])
Example
def my_function(country = "Norway"):
print("I am from " + country)
my_function("Sweden")
my_function("India")
my_function()
my_function("Brazil")
You can send any data types of argument to a function (string, number, list, dictionary etc.), and it
will be treated as the same data type inside the function.
E.g. if you send a List as an argument, it will still be a List when it reaches the function:
Example
def my_function(food):
for x in food:
print(x)
Page 37
School of Data Science Programming for Data Science
my_function(fruits)
Return Values
Example
def my_function(x):
return 5 * x
print(my_function(3))
print(my_function(5))
print(my_function(9))
function definitions cannot be empty, but if you for some reason have a function definition with no
content, put in the pass statement to avoid getting an error.
Example
def myfunction():
pass
Recursion
Python also accepts function recursion, which means a defined function can call itself.
Recursion is a common mathematical and programming concept. It means that a function calls
itself. This has the benefit of meaning that you can loop through data to reach a result.
The developer should be very careful with recursion as it can be quite easy to slip into writing a
function which never terminates, or one that uses excess amounts of memory or processor power.
However, when written correctly recursion can be a very efficient and mathematically-elegant
approach to programming.
Page 38
School of Data Science Programming for Data Science
In this example, tri_recursion() is a function that we have defined to call itself ("recurse"). We use
the k variable as the data, which decrements (-1) every time we recurse. The recursion ends when
the condition is not greater than 0 (i.e. when it is 0).
To a new developer it can take some time to work out how exactly this works, best way to find out
is by testing and modifying it.
Example
Recursion Example
def tri_recursion(k):
if(k > 0):
result = k + tri_recursion(k - 1)
print(result)
else:
result = 0
return result
Python Lambda
A lambda function can take any number of arguments, but can only have one expression.
Syntax
lambda arguments : expression
Example
Add 10 to argument a, and return the result:
Page 39
School of Data Science Programming for Data Science
x = lambda a : a + 10
print(x(5))
Example
Multiply argument a with argument b and return the result:
x = lambda a, b : a * b
print(x(5, 6))
Example
Summarize argument a, b, and c and return the result:
x = lambda a, b, c : a + b + c
print(x(5, 6, 2))
Say you have a function definition that takes one argument, and that argument will be multiplied with
an unknown number:
def myfunc(n):
return lambda a : a * n
Use that function definition to make a function that always doubles the number you send in:
Example
def myfunc(n):
return lambda a : a * n
mydoubler = myfunc(2)
print(mydoubler(11))
Page 40
School of Data Science Programming for Data Science
Page 41
School of Data Science Programming for Data Science
Equals: a == b
Not Equals: a != b
Less than: a < b
Less than or equal to: a <= b
Greater than: a > b
Greater than or equal to: a >= b
These conditions can be used in several ways, most commonly in "if statements" and loops.
Example
If statement:
a = 33
b = 200
if b > a:
print("b is greater than a")
In this example we use two variables, a and b, which are used as part of the if statement to test
whether b is greater than a. As a is 33, and b is 200, we know that 200 is greater than 33, and so we
print to screen that "b is greater than a".
Indentation
Python relies on indentation (whitespace at the beginning of a line) to define scope in the code.
Other programming languages often use curly-brackets for this purpose.
Example
a = 33
b = 200
if b > a:
print("b is greater than a")
Elif
Page 42
School of Data Science Programming for Data Science
The elif keyword is Python's way of saying "if the previous conditions were not true, then try this
condition".
Example
a = 33
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
In this example a is equal to b, so the first condition is not true, but the elif condition is true, so we
print to screen that "a and b are equal".
Else
The else keyword catches anything which isn't caught by the preceding conditions.
Example
a = 200
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
else:
print("a is greater than b")
In this example a is greater than b, so the first condition is not true, also the elif condition is not
true, so we go to the else condition and print to screen that "a is greater than b".
Example
a = 200
b = 33
Page 43
School of Data Science Programming for Data Science
if b > a:
print("b is greater than a")
else:
print("b is not greater than a")
Short Hand If
If you have only one statement to execute, you can put it on the same line as the if statement.
Example
If you have only one statement to execute, one for if, and one for else, you can put it all on the same
line:
Example
a=2
b = 330
print("A") if a > b else print("B")
You can also have multiple else statements on the same line:
Example
a = 330
b = 330
print("A") if a > b else print("=") if a == b else print("B")
Page 44
School of Data Science Programming for Data Science
And
The and keyword is a logical operator, and is used to combine conditional statements:
Example
a = 200
b = 33
c = 500
if a > b and c > a:
print("Both conditions are True")
Or
Example
a = 200
b = 33
c = 500
if a > b or a > c:
print("At least one of the conditions is True")
Not
The not keyword is a logical operator, and is used to reverse the result of the conditional statement:
Example
Page 45
School of Data Science Programming for Data Science
a = 33
b = 200
if not a > b:
print("a is NOT greater than b")
Nested If
You can have if statements inside if statements, this is called nested if statements.
Example
x = 41
if x > 10:
print("Above ten,")
if x > 20:
print("and also above 20!")
else:
print("but not above 20.")
if statements cannot be empty, but if you for some reason have an if statement with no content, put
in the pass statement to avoid getting an error.
Example
a = 33
b = 200
if b > a:
pass
Page 46
School of Data Science Programming for Data Science
while loops
for loops
With the while loop we can execute a set of statements as long as a condition is true.
Example
i=1
while i < 6:
print(i)
i += 1
The while loop requires relevant variables to be ready, in this example we need to define an
indexing variable, i, which we set to 1.
With the break statement we can stop the loop even if the while condition is true:
Example
i=1
while i < 6:
print(i)
if i == 3:
break
i += 1
Page 47
School of Data Science Programming for Data Science
With the continue statement we can stop the current iteration, and continue with the next:
Example
i=0
while i < 6:
i += 1
if i == 3:
continue
print(i)
With the else statement we can run a block of code once when the condition no longer is true:
Example
i=1
while i < 6:
print(i)
i += 1
else:
print("i is no longer less than 6")
Page 48
School of Data Science Programming for Data Science
This is less like the for keyword in other programming languages, and works more like an iterator
method as found in other object-orientated programming languages.
With the for loop we can execute a set of statements, once for each item in a list, tuple, set etc.
Example
Print each fruit in a fruit list:
The for loop does not require an indexing variable to set beforehand.
Example
Loop through the letters in the word "banana":
for x in "banana":
print(x)
Example
Exit the loop when x is "banana":
Page 49
School of Data Science Programming for Data Science
if x == "banana":
break
Example
Exit the loop when x is "banana", but this time the break comes before the print:
Example
Do not print banana:
The range() function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by
default), and ends at a specified number.
Page 50
School of Data Science Programming for Data Science
Example
Using the range() function:
for x in range(6):
print(x)
The range() function defaults to 0 as a starting value, however it is possible to specify the starting value
by adding a parameter: range(2, 6), which means values from 2 to 6 (but not including 6):
Example
Using the start parameter:
The range() function defaults to increment the sequence by 1, however it is possible to specify the
increment value by adding a third parameter: range(2, 30, 3):
Example
Increment the sequence with 3 (default is 1):
Example
Print all numbers from 0 to 5, and print a message when the loop has ended:
for x in range(6):
print(x)
Page 51
School of Data Science Programming for Data Science
else:
print("Finally finished!")
Example
Break the loop when x is 3, and see what happens with the else block:
for x in range(6):
if x == 3: break
print(x)
else:
print("Finally finished!")
Nested Loops
A nested loop is a loop inside a loop.
The "inner loop" will be executed one time for each iteration of the "outer loop":
Example
Print each adjective for every fruit:
for x in adj:
for y in fruits:
print(x, y)
Page 52
School of Data Science Programming for Data Science
Example
for x in [0, 1, 2]:
pass
Page 53
School of Data Science Programming for Data Science
NumPy, which stands for Numerical Python, is a library consisting of multidimensional array
objects and a collection of routines for processing those arrays. Using NumPy, mathematical and
logical operations on arrays can be performed. It explains the basics of NumPy such as its
architecture and environment. It also discusses the various array functions, types of indexing, etc.
NumPy is a Python package. It stands for 'Numerical Python'. It is a library consisting of
multidimensional array objects and a collection of routines for processing of array.
Numeric, the ancestor of NumPy, was developed by Jim Hugunin. Another package Numarray was
also developed, having some additional functionalities. In 2005, Travis Oliphant created NumPy
package by incorporating the features of Numarray into Numeric package. There are many
contributors to this open source project.
Operations using NumPy
Using NumPy, a developer can perform the following operations −
Mathematical and logical operations on arrays.
Fourier transforms and routines for shape manipulation.
Operations related to linear algebra. NumPy has in-built functions for linear algebra and
random number generation
Contents of ndarray object can be accessed and modified by indexing or slicing, just like Python's
in-built container objects.
As mentioned earlier, items in ndarray object follows zero-based index. Three types of indexing
methods are available − field access, basic slicing and advanced indexing.
Basic slicing is an extension of Python's basic concept of slicing to n dimensions. A Python slice
object is constructed by giving start, stop, and step parameters to the built-in slice function. This
slice object is passed to the array to extract a part of array.
Example 1
import numpy as np
a = np.arange(10)
s = slice(2,7,2)
print a[s]
Its output is as follows −
[2 4 6]
In the above example, an ndarray object is prepared by arange() function. Then a slice object is
defined with start, stop, and step values 2, 7, and 2 respectively. When this slice object is passed to
the ndarray, a part of it starting with index 2 up to 7 with a step of 2 is sliced.
The same result can also be obtained by giving the slicing parameters separated by a colon :
(start:stop:step) directly to the ndarray object.
Example 2
Page 54
School of Data Science Programming for Data Science
import numpy as np
a = np.arange(10)
b = a[2:7:2]
print b
Here, we will get the same output −
[2 4 6]
If only one parameter is put, a single item corresponding to the index will be returned. If a : is
inserted in front of it, all items from that index onwards will be extracted. If two parameters (with :
between them) is used, items between the two indexes (not including the stop index) with default
step one are sliced.
Example 3
# slice single item
import numpy as np
a = np.arange(10)
b = a[5]
print b
Its output is as follows −
5
Pandas
Page 55
School of Data Science Programming for Data Science
The name of Pandas is gotten from the word Board Information, and that implies an Econometrics
from Multi-faceted information. It was created in 2008 by Wes McKinney and is used for data
analysis in Python.
Processing, such as restructuring, cleaning, merging, etc., is necessary for data analysis. Numpy,
Scipy, Cython, and Panda are just a few of the fast data processing tools available. Yet, we incline
toward Pandas since working with Pandas is quick, basic and more expressive than different
apparatuses.
Since Pandas is built on top of the Numpy bundle, it is expected that Numpy will work with
Pandas.
Before Pandas, Python was able for information planning, however it just offered restricted help for
information investigation. As a result, Pandas entered the picture and enhanced data analysis
capabilities. Regardless of the source of the data, it can carry out the five crucial steps that are
necessary for processing and analyzing it: load, manipulate, prepare, model, and analyze.
o It has a DataFrame object that is quick and effective, with both standard and custom
indexing.
o Utilized for reshaping and turning of the informational indexes.
o For aggregations and transformations, group by data.
o It is used to align the data and integrate the data that is missing.
o Provide Time Series functionality.
o Process a variety of data sets in various formats, such as matrix data, heterogeneous tabular
data, and time series.
o Manage the data sets' multiple operations, including subsetting, slicing, filtering, groupBy,
reordering, and reshaping.
o It incorporates with different libraries like SciPy, and scikit-learn.
o Performs quickly, and the Cython can be used to accelerate it even further.
Benefits of Pandas
Representation of Data: Through its DataFrame and Series, it presents the data in a manner that is
appropriate for data analysis.
Page 56
School of Data Science Programming for Data Science
Clear code: Pandas' clear API lets you concentrate on the most important part of the code. In this
way, it gives clear and brief code to the client.
DataFrame and Series are the two data structures that Pandas provides for processing data. These
data structures are discussed below:
1) Series
A one-dimensional array capable of storing a variety of data types is how it is defined. The term
"index" refers to the row labels of a series. We can without much of a stretch believer the rundown,
tuple, and word reference into series utilizing "series' technique. Multiple columns cannot be
included in a Series. Only one parameter exists:
Before creating a Series, Firstly, we have to import the numpy module and then use array() function
in the program.
1. import pandas as pd
2. import numpy as np
3. info = np.array(['P','a','n','d','a','s'])
4. a = pd.Series(info)
5. print(a)
Output
0 P
1 a
2 n
3 d
4 a
5 s
dtype: object
Explanation: In this code, firstly, we have imported the pandas and numpy library with
the pd and np alias. Then, we have taken a variable named "info" that consist of an array of some
values. We have called the info variable through a Series method and defined it in an "a" variable.
The Series has printed by calling the print(a) method.
Page 57
School of Data Science Programming for Data Science
It is a generally utilized information design of pandas and works with a two-layered exhibit with
named tomahawks (lines and segments). As a standard method for storing data, DataFrame has two
distinct indexes-row index and column index. It has the following characteristics:
It can be thought of as a series structure dictionary with indexed rows and columns. It is referred to
as "columns" for rows and "index" for columns.
1. import pandas as pd
2. # a list of strings
3. x = ['Python', 'Pandas']
4.
5. # Calling DataFrame constructor on list
6. df = pd.DataFrame(x)
7. print(df)
Output
0
0 Python
1 Pandas
Page 58
School of Data Science Programming for Data Science
Matplotlib is a plotting library for Python. It is used along with NumPy to provide an
environment that is an effective open source alternative for MatLab. It can also be used
with graphics toolkits like PyQt and wxPython.
Matplotlib module was first written by John D. Hunter. Since 2012, Michael Droettboom is
the principal developer. Currently, Matplotlib ver. 1.5.1 is the stable version available. The
package is available in binary distribution as well as in the source code form
on www.matplotlib.org.
Conventionally, the package is imported into the Python script by adding the following
statement −
from matplotlib import pyplot as plt
Here pyplot() is the most important function in matplotlib library, which is used to plot 2D
data. The following script plots the equation y = 2x + 5
Example
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11)
y=2*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()
An ndarray object x is created from np.arange() function as the values on the x axis. The
corresponding values on the y axis are stored in another ndarray object y. These values
are plotted using plot() function of pyplot submodule of matplotlib package.
The graphical representation is displayed by show() function.
The above code should produce the following output −
Page 59
School of Data Science Programming for Data Science
Instead of the linear graph, the values can be displayed discretely by adding a format
string to the plot() function. Following formatting characters can be used.
The Plotly Python library is an interactive open-source library. This can be a very
helpful tool for data visualization and understanding the data simply and easily.
plotly graph objects are a high-level interface to plotly which are easy to use. It
can plot various types of graphs and charts like scatter plots, line charts, bar
charts, box plots, histograms, pie charts, etc.
So you all must be wondering why plotly over other visualization tools or libraries?
Here’s the answer –
Plotly has hover tool capabilities that allow us to detect any outliers or
anomalies in a large number of data points.
It is visually attractive that can be accepted by a wide range of audiences.
It allows us for the endless customization of our graphs that makes our plot
more meaningful and understandable for others.
Page 60
School of Data Science Programming for Data Science
For a brief introduction to the ideas behind the library, you can read the introductory notes or
the paper. Visit the installation page to see how you can download the package and get
started with it. You can browse the example gallery to see some of the things that you can do
with seaborn, and then check out the tutorials or API reference to find out how.
For a brief introduction to the ideas behind the library, you can read the introductory notes or
the paper. Visit the installation page to see how you can download the package and get
started with it. You can browse the example gallery to see some of the things that you can do
with seaborn, and then check out the tutorials or API reference to find out how.
To see the code or report a bug, please visit the GitHub repository. General support questions
are most at home on stackoverflow, which has a dedicated channel for seaborn.
Page 61
School of Data Science Programming for Data Science
Python has several functions for creating, reading, updating, and deleting files.
File Handling
The key function for working with files in Python is the open() function.
"r" - Read - Default value. Opens a file for reading, error if the file does not exist
"a" - Append - Opens a file for appending, creates the file if it does not exist
"w" - Write - Opens a file for writing, creates the file if it does not exist
"x" - Create - Creates the specified file, returns an error if the file exists
In addition you can specify if the file should be handled as binary or text mode
Syntax
To open a file for reading it is enough to specify the name of the file:
f = open("demofile.txt")
f = open("demofile.txt", "rt")
Because "r" for read, and "t" for text are the default values, you do not need to specify them.
Page 62
School of Data Science Programming for Data Science
Assume we have the following file, located in the same folder as Python:
demofile.txt
The open() function returns a file object, which has a read() method for reading the content of the
file:
Example
f = open("demofile.txt", "r")
print(f.read())
If the file is located in a different location, you will have to specify the file path, like this:
Example
f = open("D:\\myfiles\welcome.txt", "r")
print(f.read())
By default the read() method returns the whole text, but you can also specify how many characters
you want to return:
Example
f = open("demofile.txt", "r")
print(f.read(5))
Page 63
School of Data Science Programming for Data Science
Read Lines
Example
f = open("demofile.txt", "r")
print(f.readline())
By calling readline() two times, you can read the two first lines:
Example
f = open("demofile.txt", "r")
print(f.readline())
print(f.readline())
By looping through the lines of the file, you can read the whole file, line by line:
Example
f = open("demofile.txt", "r")
for x in f:
print(x)
Close Files
It is a good practice to always close the file when you are done with it.
Example
Page 64
School of Data Science Programming for Data Science
f = open("demofile.txt", "r")
print(f.readline())
f.close()
To write to an existing file, you must add a parameter to the open() function:
f = open("demofile2.txt", "a")
f.write("Now the file has more content!")
f.close()
Example
f = open("demofile3.txt", "w")
f.write("Woops! I have deleted the content!")
f.close()
Page 65
School of Data Science Programming for Data Science
To create a new file in Python, use the open() method, with one of the following parameters:
"x" - Create - will create a file, returns an error if the file exist
"a" - Append - will create a file if the specified file does not exist
"w" - Write - will create a file if the specified file does not exist
Example
f = open("myfile.txt", "x")
Example
f = open("myfile.txt", "w")
Delete a File
To delete a file, you must import the OS module, and run its os.remove() function:
import os
os.remove("demofile.txt")
To avoid getting an error, you might want to check if the file exists before you try to delete it:
Page 66
School of Data Science Programming for Data Science
Example
import os
if os.path.exists("demofile.txt"):
os.remove("demofile.txt")
else:
print("The file does not exist")
Delete Folder
Example
import os
os.rmdir("myfolder")
Page 67
School of Data Science Programming for Data Science
Exception Handling
When an error occurs, or exception as we call it, Python will normally stop and generate an error
message.
Example
try:
print(x)
except:
print("An exception occurred")
Since the try block raises an error, the except block will be executed.
Without the try block, the program will crash and raise an error:
Example
print(x)
Many Exceptions
You can define as many exception blocks as you want, e.g. if you want to execute a special block of
code for a special kind of error:
Example
Print one message if the try block raises a NameError and another for other errors:
try:
print(x)
except NameError:
print("Variable x is not defined")
Page 68
School of Data Science Programming for Data Science
except:
print("Something else went wrong")
Else
You can use the else keyword to define a block of code to be executed if no errors were raised:
Example
In this example, the try block does not generate any error:
try:
print("Hello")
except:
print("Something went wrong")
else:
print("Nothing went wrong")
Finally
The finally block, if specified, will be executed regardless if the try block raises an error or not.
Example
try:
print(x)
except:
print("Something went wrong")
finally:
print("The 'try except' is finished")
Example
Page 69
School of Data Science Programming for Data Science
try:
f = open("demofile.txt")
try:
f.write("Lorum Ipsum")
except:
print("Something went wrong when writing to the file")
finally:
f.close()
except:
print("Something went wrong when opening the file")
The program can continue, without leaving the file object open.
Raise an exception
Example
x = -1
if x < 0:
raise Exception("Sorry, no numbers below zero")
You can define what kind of error to raise, and the text to print to the user.
Example
x = "hello"
Page 70
School of Data Science Programming for Data Science
Page 71
School of Data Science Programming for Data Science
Page 72