Python Notes
Python Notes
Python Notes
INTRODUCTION
An interpreter processes the program statements one by one, first translating and then
executing. This process is continued until an error is encountered or the whole program is
executed successfully. In both the cases, program execution will stop.
A compiler translates the entire source code, as a whole, into the object code. After scanning
the whole program, it generates error messages, if any.
Python was created by Guido van Rossum and was released in February 1991.
Features of Python
Python Keywords
Keywords are reserved words. Each keyword has a specific meaning to the Python
interpreter, and we can use a keyword in our program only for the purpose for which it has
been defined.
as elif if or yield
Identifiers
Identifiers are names used to identify a variable, function, or other entities in a program. The
rules for naming an identifier in Python are:
• The name should begin with an uppercase or a lowercase alphabet or an underscore sign
(_). Thus, an identifier cannot start with a digit.
DATA TYPES
We can determine the data type of a variable using built-in function type()
1) Number
int- integers
float- real or floating point numbers
complex- complex numbers; in Python, iota is denoted by j
Boolean data type (bool) is a subtype of integer. It is a unique data type, consisting of two
constants, True and False. Boolean True value is non-zero, non-null and non-empty.
Boolean False is the value zero.
2) Sequence
3) Set
4) None
None is a special data type with a single value. It is used to signify the absence of value in a
situation. None supports no special operations, and it is neither False nor 0 (zero), nor empty
string.
5) Mapping
Mapping is an unordered data type in Python. Currently, there is only one standard mapping
data type in Python called dictionary.
Variables whose values can be changed after they are created and assigned are called
mutable. Variables whose values cannot be changed after they are created and assigned
are called immutable. When an attempt is made to update the value of an immutable
variable, the old variable is destroyed and a new variable is created by the same name in
memory.
Immutable data types: Integers, Float, Boolean, Complex, Strings, Tuples, Sets
This statement will create an object with value 300 and the object is referenced by the
identifier num1
num2 = num1 will make num2 refer to the value 300, also being referred by num1, and
stored at memory location number, say a. So, num1 shares the referenced location with
num2.
In this manner Python makes the assignment effective by copying only the reference, and
not the data.
num1 = num2 + 100 links the variable num1 to a new object stored at memory location
number say b having a value 400. As num1 is an integer, which is an immutable type, it is
rebuilt.
OPERATORS
Arithmetic Operators
Floor or Integer Division (//): returns the quotient by removing the decimal part
Relational Operators- used for comparison and determining the relationship between
operators
Equals to (==)
Assignment Operators- assigns or changes the value of the variable on its left
(+=) adds the value of right-side operand to the left-side operand and assigns the result to
the left-side operand
Logical Operators
-not, and, or
-all values are True except None, False, 0 (zero), empty collections '', (), [], {} etc.
Identity Operators
-can also be used to determine whether two variables are referring to the same object or not
is: Evaluates True if the variables on either side of the operator point towards the same
memory location and False otherwise
is not: opposite of is
Membership Operators- used to check if a value is a member of the given sequence or not
in- Returns True if the variable/value is found in the specified sequence and False otherwise
Precedence of Operators
Binary operators are operators with two operands. The unary operators need only one
operand, and they have a higher precedence than the binary operators. The minus (-) as
well as + (plus) operators can act as both unary and binary operators, but not is a unary
logical operator.
1 ** Exponent
-Parenthesis can be used to override the precedence of operators. The expression within ()
is evaluated first.
-For operators with equal precedence, the expression is evaluated from left to right.
The else clause of a while or for loop is executed only if the loop terminates normally and not
through break.
If the break statement is inside a nested loop, it will terminate the innermost loop.
STRINGS
String is a sequence made up of one or more UNICODE characters. Here the character can be a letter,
digit, whitespace or any other symbol. A string can be created by enclosing one or more characters in
single, double or triple quote.
Each individual character in a string can be accessed using a technique called indexing. The index
specifies the character to be accessed in the string and is written in square brackets ([ ]). The index of
the first character (from left) in the string is 0 and the last character is n-1 where n is the length of
the string. If we give index value out of this range then we get an IndexError. The index must be an
integer (positive, zero or negative).
The index can also be an expression including variables and operators but the expression must
evaluate to an integer.
Negative indices are used when we want to access the characters of the string from right to left.
Starting from right hand side, the first character has the index as -1 and the last character has the
index –n where n is the length of the string.
An inbuilt function len() in Python returns the length of the string.
STRING OPERATIONS
Concatenation
To concatenate means to join. Python allows us to join two strings using concatenation operator plus
which is denoted by symbol +.
Repetition- to repeat the given string using repetition operator, denoted by symbol *.
Note: string still remains the same after the use of repetition operator.
SLICING
- to access some part of a string by specifying an index range. Given a string str1, the slice operation
str1[n:m] returns the part of the string str1 starting from index n (inclusive) and ending at index m-1.
The numbers of characters in the substring will always be equal to difference of two indices m and n,
i.e., (m-n).
Index that is too big is truncated down to the end of the string
If the first index is not mentioned, the slice starts from index 0
If the second index is not mentioned, the slicing is done till the length of the string.
The slice operation can also take a third index that specifies the ‘step size’. For example, str1[n:m:k],
means every kth character has to be extracted from the string str1 starting from n and ending at m-1.
By default, the step size is one.
to print the string with first and last characters removed- string[1:n-1] where n is length of string
If we ignore both the indexes and give step size as -1, str1[::-1], we obtain the string in reverse order
Traversing a String
for ch in str1:
print(ch, end='')
index = 0
print(str1[index],end = '')
index += 1
Returns the string with first letter of every word in the string in uppercase and
title()
rest in lowercase
capitalize() Returns the string with first letter of the string in uppercase and the rest in
lowercase
lower() Returns the string with all uppercase letters converted to lowercase
upper() Returns the string with all lowercase letters converted to uppercase
count(str, start, end) Returns number of times substring str occurs in the given string.
If we do not give start index and end index then searching starts from
index 0 and ends at length of the string
find(str, start, end) Returns the first occurrence of index of substring str occurring in the
given string.
If we do not give start and end then searching starts from index 0 and
ends at length of the string.
If the substring is not present in the given string, then the function
returns -1
index(str, start, end) Same as find() but raises an exception if the substring is not present in the
given string
endswith(substr) Returns True if the given string ends with the supplied substring otherwise
returns False
startswith(substr) Returns True if the given string starts with the supplied substring otherwise
returns False
isalnum() Returns True if characters of the given string are either alphabets or
numeric.
If whitespace or special symbols are part of the given string or the
string is empty it returns False
islower() Returns True if the string is non-empty and has all lowercase alphabets, or has
at least one character as lowercase alphabet and rest are non-alphabet
characters
isupper() Returns True if the string is non-empty and has all uppercase alphabets, or has
at least one character as uppercase character and rest are non-alphabet
characters
isalpha() Returns True if all the characters in the string are alphabets, otherwise False
isdigit() Returns True if all the characters in the string are digits, otherwise False
isspace() Returns True if the string is non-empty and all characters are white spaces
(blank, tab \t, newline \n, carriage return \r)
istitle() Returns True if the string is non-empty and title case, i.e., the first letter of
every word in the string in uppercase and rest in lowercase
lstrip() Returns the string after removing the spaces only on the left of the string
rstrip() Returns the string after removing the spaces only on the right of the string
strip() Returns the string after removing the spaces both on the left and the right of
the string
replace(oldstr, newstr) Replaces all occurrences of old string with the new string
join() Returns a string in which the characters in the string have been joined
by a separator
syntax: sep.join(string)
partition(sep) Partitions the given string at the first occurrence of the substring
(separator) and returns the string partitioned into three parts:
Substring before, Separator and Substring after
If the separator is not found in the string, it returns the whole string
itself and two empty strings
always returns a tuple of 3 strings
it is necessary to pass one argument in partition()
Python compares strings lexicographically, using ASCII value of the characters. If the first
character of both the strings are same, the second character is compared, and so on.
ASCII value:
0-9: 48-57
A-Z: 65-90
a-z: 97-122
LISTS
List is an ordered sequence made up of one or more elements. Unlike a string which consists of only
characters, a list can have elements of different data types.
A list is a mutable data type, which means it can be modified. However, if an element of a list
is immutable (e.g. string), it cannot be changed.
Elements of a list are enclosed in square brackets and are separated by comma.
LIST OPERATIONS
Concatenation
-to join two or more lists using concatenation operator depicted by the symbol +.
-The concatenation operator '+’ requires that the operands should be of list type only. If we try to
concatenate a list with elements of some other data type, TypeError occurs
LIST METHODS
append() Appends a single element passed as an argument at the end of the list
extend() Appends each element of the list passed as argument to the end of the given
list
count() Returns the number of times a given element appears in the list
index() Returns index of the first occurrence of the element in the list.
If the element is not present, ValueError is generated
pop([index]) Returns the element whose index is passed as parameter to this function and
also removes it from the list. If no parameter is given, then it returns and
removes the last element of the list.
sorted(list) It takes a list as parameter and creates a new list consisting of the same
elements arranged in sorted order, e.g. list1=sorted(list2)
Nested Lists
To access the element of the nested list of list1, we have to specify two indices list1[i][j]. The first
index i will take us to the desired nested list and second index j will take us to the desired element in
that nested list.
Copying Lists
The statement list2 = list1 does not create a new list. Rather, it just makes list1 and list2 refer to the
same list object. Here list2 actually becomes an alias of list1. Therefore, any changes made to either
of them will be reflected in the other list.
We can also create a copy or clone of the list as a distinct object by three methods:
Method 1
We can slice our original list and store it into a new variable: newList = oldList[:]
Method 2
Method 3
(A) Elements of the original list may be changed, i.e. changes made to the list in the function are
reflected back in the calling function.
(B) If the list is assigned a new value inside the function then a new list object is created and it
becomes the local copy of the function. Any changes made inside the local copy of the
function are not reflected back to the calling function.
TUPLES
A tuple is an ordered sequence and can contain elements of different data types. Elements of a tuple
are enclosed in parenthesis (round brackets) and are separated by commas.
If there is only a single element in a tuple then the element should be followed by a comma. A
sequence (comma separated values) without parenthesis is treated as tuple by default.
We generally use list to store elements of the same data types whereas we use tuples to store
elements of different data types.
Tuple is an immutable data type. However an element of a tuple may be of mutable type.
List is mutable but tuple is immutable. So iterating through a tuple is faster as compared to a list.
If we have data that does not change then storing this data in a tuple will make sure that it is not
changed accidentally.
Concatenation operator can also be used for extending an existing tuple. When we extend a tuple
using concatenation a new tuple is created.
sorted() Takes elements in the tuple and returns a new sorted list.
sorted() does not make any change to the original tuple
Tuple Assignment
allows a tuple of variables on the left side of the assignment operator to be assigned respective
values from a tuple on the right side. The number of variables on the left should be same as the
number of elements in the tuple.
If there is an expression on the right side then first that expression is evaluated and finally the result
is assigned to the tuple.
DICTIONARIES
The key-value pair is called an item. A key is separated from its value by a colon(:) and consecutive
items are separated by commas. Items in dictionaries are unordered, so we may not get back the
data in the same order in which we had entered the data initially in the dictionary.
A dictionary is enclosed in curly braces. The keys in the dictionary must be unique and should be of
any immutable data type. The values can be repeated and can be of any data type.
dict1 = {}
dict1=dict()
The items of a dictionary are accessed via the keys. Each key serves as the index and maps to a value.
The order of items does not matter. If the key is not present in the dictionary we get KeyError
The existing dictionary can be modified by just overwriting the key-value pair.
Method 1
print(key,':',dict1[key])
Method 2
print(key,':',value)
Dictionary Methods
get() Returns the value corresponding to the key passed as the argument
If the key is not present in the dictionary it will return None
you can choose what message to display if the key is not
present
update(new_dict) appends the key-value pair of the dictionary passed as the argument to the
key-value pair of the given dictionary
max(), min() work with keys so the keys must be of the same data type
SETS
A set is a mutable collection of distinct immutable values that are unordered. It is written in
curly braces {}
You can only add a value that is immutable (like a string or a tuple) to a set. For example,
you would get a TypeError if you try to add a list to a set.
1) set_name.remove(element)
If you try to remove an element which is not present in the set, you will get a
KeyError.
2) set_name.discard(element)
No error if the element is not present in the set.
3) Using pop()- returns and removes a random element
Raises a KeyError if the set is empty
4) clear() removes all elements from the set and we obtain an empty set
update()- requires a single argument, which can be a set, list, tuple, or dictionary
automatically converts other data types to a set and adds them to the set.
set1.update(set2)
The sorted function can be used to get the values of set in an ordered fashion
Set Operations
• One of the main advantages of using sets in Python is that they are highly optimized
for membership tests.
• For example, sets do membership tests a lot more efficiently than lists.
• average case time complexity of membership tests in sets are O(1) vs O(n) for
lists.
Subsets
While nested lists and tuples are possible, nested sets are not possible since a set cannot
contain a mutable element.
Frozen Set
Importance of Functions
Functions provide a systematic way of problem solving by dividing the given problem into
several subproblems, finding their individual solutions and integrating the solutions of
individual problems to solve the original problem. This approach of problem solving is called
Stepwise Refinement/Modular approach/Divide & conquer approach.
Advantages of Functions:
1. Functions increase readability- the program is better organized and easier to understand
Types of Functions:
1) Built-in functions: pre-defined functions that are already built in/available in the Python
library and are frequently used in programs
2) Functions in a module
-To use these modules in a program, the programmer needs to import the module.
-From statement: used to import required functions from the module instead of the entire
module, requires less memory
from module_name import * (to import all the functions of a particular module,
equivalent to importing the entire module)
Here, dot notation is not required, and we can directly use function_name()
-The items enclosed in "[ ]" are called parameters and they are optional.
-Function names should be unique. Rules for naming identifiers also apply for function
naming.
-The statements outside the function indentation are not considered as part of the function.
Parameter- value provided in the parenthesis when we write function header, it is required
by the function to work, also known as formal parameter or formal argument
Argument- value passed to the function when it is called, it is provided in function call/invoke
statement, also known as actual argument or actual parameter
An argument is a value passed to the function during the function call which is received in
corresponding parameter defined in function header.
1) Default parameters
-A default value is a value that is pre-decided and assigned to the parameter when the
function call does not have its corresponding argument.
-If an argument is passed for a default parameter, then the value of the parameter gets over-
written to that of the argument passed in the function call.
-The default parameters must be the trailing parameters in the function header, i.e., if any
parameter has a default value, then all the other parameters to its right must also have
default values. Default parameters cannot be followed by positional parameters.
2) Positional parameters
When positional parameters are defined in the function header, the no. of (required)
arguments must be equal to the no. of parameters, or else it would lead to an error. The
arguments get assigned in the same order as the defined parameters.
A function may or may not return a value when called. The return statement returns the
values from the function. Functions which do not return any value are called void functions.
We can use a return statement to send value(s) from the function to its calling function.
Flow of Execution
-The order in which the statements in a program are executed is called flow of execution.
The Python interpreter starts executing the instructions in a program from the first statement.
The statements are executed one by one, in the order of appearance from top to bottom.
-When the interpreter encounters a function definition, the statements inside the function are
not executed until the function is called.
-When the interpreter encounters a function call, the control jumps to the called function and
executes the statement of that function.
-The execution of statements inside the function stops at the last statement, or at the return
statement if it comes first. Anything written after the return statement inside the function will
NOT be executed.
-After that, the control comes back to the point of function call so that the remaining
statements in the program can be executed.
Scope of a Variable
A variable defined inside a function cannot be accessed outside it. Every variable has a well-
defined accessibility. The part of the program where a variable is accessible is defined as
the scope of that variable.
- a name declared in the top-level segment (_main_) of a program has global scope and can
be used in the entire program.
-Any change made to the global variable is permanent and affects all the functions in the
program where that variable can be accessed.
- If a variable with the same name as the global variable is defined inside a function, then it
is considered local to that function and hides the global variable.
- If the modified value of a global variable is to be used outside the function, then the
keyword global should be prefixed to the variable name in the function.
Lifetime of a variable
-for global variables, the lifetime is the entire program run, i.e., as long as the program is
executing
-for local variables, the lifetime is their function’s run, i.e., as long as the function is
executing.
For every name reference, Python (interpreter) follows the steps below:
1) It checks within the LOCAL environment/namespace, whether there is a variable with the
same name. If yes, Python uses its value, otherwise, it moves to step 2.
2) It checks the ENCLOSING environment for a variable of the same name; if found, Python
uses its value. If the variable is not found in the current environment, Python repeats this
step in higher-level enclosing environments, if any. Otherwise, it moves to Step 3.
3) Next, it checks the GLOBAL environment for a variable of the same name; if found,
Python uses its value otherwise, it moves to step 4.
4) Next, it checks the BUILT-IN environment for a variable of the same name; if found,
Python uses its value. Otherwise, Python reports the error:
-Any change in the value of a mutable data type passed in the function will change the
memory address it is referring to (pass by reference- change made to original value).
-Any change in the value of an immutable data type passed in the function will not change its
memory address (pass by value- no change made to original value).
-Changes in an immutable data type done within the function are never reflected in the
function call.
-Changes in a mutable data type done within the function are reflected in the function call,
unless it (the parameter) is assigned a different value or a different data type; or another
variable with a different value is assigned to it.
bin()- converts and returns the binary equivalent string of a given integer
bool([n])- converts a value to Boolean; It's not mandatory to pass a value to bool(). If you do not pass
a value, bool() returns False. In general use, bool() takes a single parameter value.
chr()- returns a character (a string) from an integer (represents Unicode code point of the character),
e.g. chr(65) returns A
ord()- returns an integer representing the Unicode character, e.g. ord("A") returns 65
complex()- returns a complex number when real and imaginary parts are provided, or it converts a
string to a complex number; e.g. complex(a,b) returns a+bj, complex("a+bj") returns a+bj, complex(a)
returns a+0j
divmod()- takes two numbers and returns a pair of numbers (a tuple) consisting of their quotient and
remainder
eval()- parses the expression passed to this method and runs python expression (code) within the
program
exec()- executes the dynamically created program, which is either a string or a code object
The & operator performs a bitwise AND operation. When you perform a bitwise AND
operation with 1, it effectively checks whether the number is odd or even. If a number is odd,
the least significant bit is always 1 in binary representation, so the result of x & 1 will be 1. If
the number is even, the least significant bit is 0, so the result will be 0.
The bitwise AND operation with 1 checks the least significant bit (rightmost bit) of each
number in the list.
Odd numbers (1, 3, 5) have a 1 in the least significant bit, so the & operation with 1
results in 1.
Even numbers (2, 4) have a 0 in the least significant bit, so the & operation with 1
results in 0.
Lambda function
#small anonymous function, does not have any name
#used for one-line functions
#use lambda instead of def
#syntax:- lambda argument(s):expression
x=lambda a:a*2
print(x(4))
b=6
w=lambda a,c : a+b+c
print(w(5,2))
def fn(n):
a=lambda :n
print(a())
fn(2)
Recursion
A function is said to be a recursive if it calls itself.
LIST COMPREHENSION
A list comprehension consists of an expression followed by the for statement inside square
brackets.
numbers = [number*number for number in range(1, 6)]
Every list comprehension can be rewritten in for loop, but every for loop can’t be rewritten in
the form of list comprehension.
Flattening a list
• In Python, a list of lists (or cascaded lists) resembles a two-dimensional array
• Hence, flattening such a list of lists means getting elements of sublists into a one-
dimensional array-like list.
• flatlist=[element for sublist in nestedlist for element in sublist]
• E.g., [[1,2,3],[4,5,6],[7,8,9]] is flattened to [1,2,3,4,5,6,7,8,9]
list1=[[1,2,3],[4,5,6],[7,8,9]]
list2=[j for i in list1 for j in i]
print(list2)
• The function takes in iterables (an object capable of returning its members one at a
time) as arguments and returns an iterator.
• This iterator generates a series of tuples containing elements from each iterable (list,
set, tuples, file etc.)
• The zip() function returns a zip object, which is an iterator of tuples where the first
item in each passed iterator is paired together, and then the second item in each
passed iterator are paired together etc.
Syntax: zip(iterator1, iterator2, iterator3 ...)
Note: If the passed iterators have different lengths, the iterator with the least items
decides the length of the new iterator.
Mapping
• Map in Python is a function that works as an iterator to return a result after applying a
function to every item of an iterable (tuple, lists, etc.).
• It is used when you want to apply a single transformation function to all the iterable
elements.
• The iterable and function are passed as arguments to the map in Python.
• The syntax of the Python map() function is: map(function, iterable)
• It loops over each item of an iterable and applies the transformation function to it.
• Then, it returns a map object that stores the value of the transformed item.
• The input function can be any callable function, including built-in functions, lambda
functions, user-defined functions, classes, and methods.
We can also pass more than one iterable, e.g. the function pow requires 2 inputs. The final
iterable is only as long as the shortest iterable.
listdir()
used to get the list of all the files and/or directories in the specified directory; if no argument
passed, returns the list of files and/or directories in the current working directory
Suggested Read (Important): Python pandas tutorial: The ultimate guide for beginners |
DataCamp
-df.set_index() method is used to assign a list, series, or another data frame as the index of a given
data frame
•Pandas’ functionality includes data transformations, like sorting rows and taking subsets, to
calculating summary statistics such as the mean, reshaping DataFrames, and joining
DataFrames together etc.
[PIP is a package management system that installs and manages software packages written
in Python. It stands for "Preferred Installer Program" (or "Pip Installs Packages") ]
Use of Pandas
•Import datasets from databases, spreadsheets, comma-separated values (CSV) files, and
more.
•Tidy datasets by reshaping their structure into a suitable format for analysis.
•Aggregate data by calculating summary statistics such as the mean of columns, correlation
between them, and more.
•pandas also contains functionality for time series analysis and analyzing text data.
• Series
• DataFrame
• Panel
A Series is essentially a column, and a DataFrame is a multi-dimensional table made up of
a collection of Series.
Series
•Pandas Series is a one-dimensional labeled array capable of holding data of any type
(integer, string, float, python objects, etc.).
•Pandas Series will be created by loading the datasets from existing storage, storage can be
SQL Database, CSV file, and Excel file.
•Pandas Series can be created from lists, dictionary, and from a scalar value etc.
Creating a series
import pandas
import numpy
ser = pandas.Series()
print(ser)
# simple array
ser = pandas.Series(data)
print(ser)
-import pandas
DataFrame
• Pandas DataFrame consists of three principal components, the data, rows, and columns.
Creating DataFrames
purchases = pandas.DataFrame(data)
purchases
Note: Each (key, value) item in the data corresponds to a column in the resulting Data Frame.
•Use read_csv() with the path to the CSV file to read a comma-separated values file
•Example: df=pandas.read_csv(“diabetes.csv”)
Note: CSVs don't have indexes like our DataFrames, so all we need to do is just designate the
index_col when reading:
df=pd.read_csv(“diabetes.csv”,index_col=0)
df=pd.read_csv(r’C:\Users\vivek\Documents\SRCC\Python/’+’diabetes.csv’)
[To be done if the file is not present in the current working directory]
More information on read_csv() function (Important): pandas read_csv() Tutorial: Importing Data
| DataCamp
• The separator argument refers to the symbol used to separate rows in a DataFrame.
• Comma (sep = ","), whitespace(sep = "\s"), tab (sep = "\t"), and colon(sep = ":") are the commonly
used separators.
• df = pd.read_csv("diabetes.txt", sep="\s")
•df = pd.read_excel('diabetes.xlsx’)
•df = pd.read_excel('diabetes_multi.xlsx', sheet_name=1) #to import data from the first sheet in the
workbook
Outputting a dataframe
•df.to_csv("diabetes_out.csv", index=False)
•The arguments include the filename with path and index – where index = True implies writing the
DataFrame’s index.
•Similarly, json files can be imported and exported as well as excel files.
The first few and last few rows of a dataframe can be read using the head() and tail() function
•df.tail(n=**)
describe() method
•prints the summary statistics of all numeric columns, such as count, mean, standard
deviation, range, and quartiles
•df.describe()
•For example, if we want the 30th, 50th, and 70th percentiles of the numeric columns in DataFrame:
df.describe(percentiles=[0.3, 0.5, 0.7])
info() method
•The .info() method is a quick way to look at the data types, missing values, and data size of a
DataFrame.
•Here, we’re setting the show_counts argument to True, which gives a few over the total non-
missing values in each column.
•We’re also setting memory_usage to True, which shows the total memory usage of the DataFrame
elements.
•When verbose is set to True, it prints the full summary from .info().
•df.columns
•list(df.columns)
df2=df.copy()
This is done so as not to affect the original dataframe and perform operations on the copy of the
dataframe created.
df[['Pregnancies', 'Outcome’]]
• df[df.index==1]
•A single row can be fetched by passing in a boolean series with one True value.
•In the example above, the second row with index = 1 is returned. Here, .index returns the row
labels of the DataFrame, and the comparison turns that into a Boolean one-dimensional array.
•ROWS OR COLUMN RANGE can be also be ‘:’ and if given in rows or column Range parameter then
all entries will be included for corresponding row or column.
•Note: .loc[] uses a label to point to a row, column or cell, whereas .iloc[] uses the numeric position.
df2.loc[1]
df2.iloc[1]
The 1 represents the row index (label) in loc, whereas the 1 in .iloc[] is the row position (first row).
the 'loc' function is mainly used when we want to select rows and columns based on their labels-
includes last element of the range
iloc is index-based (must pass an integer), does not include the last element of the range
Fetching multiple rows: df2.loc[100:110] will return rows labelled 100 to 110
Conditional Slicing
df[df.BloodPressure == 122]
df[df.Outcome == 1]
This code fetches Pregnancies, Glucose, and BloodPressure for all records with BloodPressure
greater than 100.
df2.loc[df['Age']==81, ['Age']] = 80
This statement updates values of “Age” column in df2 (copy of df) to 80 at the location of all the
rows in df where “Age” is 81
1. df4.dropna(inplace=True, axis=1)
2. df4.dropna(inplace=True, how=“all”)
#can also drop both rows and columns with missing values by setting the how argument to 'all'
df=df1.copy()
df1['Glucose_Insulin_Ratio'] = df['Glucose']/df['Insulin’]
df1.head()
• Here, for example, we are counting the number of observations where Outcome is diabetic (1) and
the number of observations where the Outcome is non-diabetic (0).
df['Outcome'].value_counts()
• Pandas lets you aggregate values by grouping them by specific column values.
•You can do that by combining the .groupby() method with a summary method of your choice
Syntax: df.groupby(<column_name>)
•The below code displays the mean of each of the numeric columns grouped by Outcome.
df.groupby('Outcome').mean()
• It can be used to check if a string contains the specified search pattern or not.
• Python provides a built in module re which can be used to work with regular expression.
• match=re.method_name(pattern,string)
• If the search is successful, search() returns a match object or None object otherwise.
To implement regular expressions, the Python's re package can be used. Import the Python's re
package with the following command: import re
Raw strings
A normal string, when prefixed with 'r' or 'R' becomes a raw string.
The difference between a normal string and a raw string is that the normal string in print() function
translates escape characters (such as \n, \t etc.) if any, while those in a raw string are not.
Meta Characters
Some characters carry a special meaning when they appear as a part pattern matching string.
Python's re module uses the following characters as meta characters: . ^ $ * + ? [ ] \ | ( )
When a set of alpha-numeric characters are placed inside square brackets [], the target string is
matched with these characters. A range of characters or individual characters can be listed in the
square bracket.
'\'is an escaping metacharacter followed by various characters to signal various special sequences. If
you need to match a [ or \, you can precede them with a backslash to remove their special
meaning: \[ or \\.
You can also specify a range of characters using - inside square brackets.
• [0-9] is the same as [0123---9] You can complement (invert) the character set by using caret ^
symbol at the start of a square-bracket.
There are some of the Special sequences that make commonly used patterns easier to write. Below
is a list of such special sequences:
re.match()- This function in re module tries to find if the specified pattern is present at the beginning
of the given string.
Syntax: re.match(pattern,string)
This function returns None if no match can be found. If they’re successful, a match object instance is
returned, containing information about the match: where it starts and ends, the substring it
matched, etc.
>>> import re
>>> obj=re.match("Simple",string)
>>> obj
>>> obj.start()
>>> obj.end()
The match object's start() method returns the starting position of pattern in the string, and end()
returns the endpoint. If the pattern is not found, the match object is None.
re.search():
This function searches for first occurrence of RE pattern within string from any position of the string
but it only returns the first occurrence of the search pattern.
>>> import re
>>> obj.start()
>>> obj.end()
re.findall():
It helps to get a list of all matching patterns. The return object is the list of all matches.
>>> import re
>>> obj
['ple', 'ple']
>>> obj
['S', 'i', 'm', 'p', 'l', 'e', 'i', 's', 'b', 'e', 't', 't', 'e', 'r', 't', 'h', 'a', 'n', 'c', 'o', 'm', 'p', 'l', 'e', 'x']
>>> obj
['Simple', '', 'is', '', 'better', '', 'than', '', 'complex', '', '']
re.split():
This function helps to split string by the occurrences of given pattern. The returned object is the list
of slices of strings.
>>> import re
>>> string="Simple is better than complex."
The string is split at each occurrence of a white space ' ' returning list of slices, each corresponding to
a word. Note that output is similar to split() function of built-in str object.
re.sub():
This function returns a string by replacing a certain pattern by its substitute string.
In the example below, the word 'is' gets substituted by 'was' everywhere in the target string.
>>> obj
'Simple was better than complex. Complex was better than complicated.'
A word character is a character from a-z, A-Z, 0-9, including the _ (underscore) character.
• ? The question mark indicates zero or one occurrences of the preceding element. For example,
colou?r matches both "color" and "colour".
• * The asterisk indicates zero or more occurrences of the preceding element. For example, ab*c
matches "ac", "abc", "abbc", "abbbc", and so on.
MATPLOTLIB
• Matplotlib is a powerful plotting library in Python used to create static, animated, and
interactive visualizations.
• Matplotlib is popular due to its ease of use, extensive documentation, and wide range
of plotting capabilities.
• Many other packages use Matplotlib for data visualization, including pandas, NumPy,
and SciPy.
In Matplotlib, a figure is the top-level container that holds all the elements of a plot.
• Marker
• Lines to Figures
• Matplotlib Title
• Axis labels
• Legend
• Gridlines
• The package is imported into the Python script by adding the following statement:
Pyplot in Matplotlib
• Each pyplot function makes some changes to a figure: e.g., creates a figure, creates
a plotting area in a figure, plots some lines in a plotting area, decorates the plot with
labels, etc.
• The various plots we can utilize using Pyplot are Line Plot, Histogram, Scatter, 3D
Plot, Image, Contour, and Polar
• Use plot() to plot the graph. This function is used to draw the graph. It takes x value,
y value, format string(line style and color) as an argument.
• Use show() to show the graph window. This function is used to display the graph. It
does not take any argument.
• Use title() to give title to graph. It takes string to be displayed as title as argument.
• Use xlabel() to give label to x-axis. It takes string to be displayed as label of x-axis as
argument.
• Use ylabel() to give label to y-axis. It takes string to be displayed as label of y-axis as
argument.
• The subplot() function allows you to plot different things in the same figure. Its first
argument specify height, second specify the width and third argument specify the
active subplot.
• Use bar() function to generate if we want to draw bar graph in place of line graph.
E.g. plt.bar(x, y, color = 'g', align = 'center')
• Use hist() function for graphical representation of the frequency distribution of data.
Rectangles of equal horizontal size corresponding to class interval called bin and
variable height corresponding to frequency. It takes the input array and bins as two
parameters. The successive elements in bin array act as the boundary of each bin.
The seaborn library in Python
• Seaborn is a library mostly used for statistical plotting in Python.
• It is built on top of Matplotlib and provides beautiful default styles and color palettes
to make statistical plots more attractive.
Heatmap
• Heatmap is defined as a graphical representation of data using colours to visualize
the value of the matrix.
• In this, to represent more common values or higher activities brighter colors basically
reddish colors are used and to represent less common or activity values, darker
colors are preferred.
seaborn.heatmap()
Syntax: seaborn.heatmap(data, *, vmin=None, vmax=None, cmap=None, center=None, an
not_kws=None, linewidths=0, linecolor=’white’, cbar=True, **kwargs)
Important Parameters:
• data: 2D dataset that can be coerced into an ndarray.
• vmin, vmax: Values to anchor the colormap, otherwise they are inferred from the
data and other keyword arguments.
• cmap: The mapping from data values to color space.
• center: The value at which to center the colormap when plotting divergent data.
• annot: If True, write the data value in each cell.
• fmt: String formatting code to use when adding annotations.
• linewidths: Width of the lines that will divide each cell.
• linecolor: Color of the lines that will divide each cell.
• cbar: Whether to draw a colorbar.
All the parameters except data are optional.
Suggested Reads
• Neural Data Science in Python — Neural Data Science in Python
• Python Plotting With Matplotlib (Guide) – Real Python
• Getting Started with Python Matplotlib – An Overview – GeeksforGeeks
• Python Seaborn Tutorial – GeeksforGeeks
• Subplots in Python (Matplotlib Subplots - How to create multiple plots in same figure
in Python? - Machine Learning Plus)
NUMPY
Introduction
• NumPy stands for Numerical Python which is a Python package developed by Travis Oliphant in
2005.
In Python, we use the list for purpose of the array but it’s slow to process. There are the following
advantages of using NumPy for data analysis.
• It is capable of performing Fourier Transform and reshaping the data stored in multidimensional
arrays.
• NumPy provides the in-built functions for linear algebra and random number generation.
Arrays in NumPy
Array in Numpy is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers.
In NumPy, the number of dimensions of the array is called rank of the array.
A tuple of integers giving the size of the array along each dimension is known as shape of the array.
An array class in NumPy is called as ndarray.
Elements in NumPy arrays are accessed by using square brackets and can be initialized by using
nested Python Lists.
Data type objects (dtype): It is an instance of numpy.dtype class. It describes how the bytes in the
fixed-size block of memory corresponding to an array item should be interpreted.
1. numpy.array(): The Numpy array object in Numpy is called ndarray. We can create ndarray
using numpy.array() function. Syntax: numpy.array(parameter)
2. numpy.fromiter(): The fromiter() function create a new one-dimensional array from an
iterable object. Syntax: numpy.fromiter(iterable, dtype, count=-1)
3. numpy.arange(): This is an inbuilt NumPy function that returns evenly spaced values within a
given range. Syntax: numpy.arange([start, ]stop, [step, ]dtype=None)
4. numpy.linspace(): This function returns evenly spaced numbers over a specified between
two limits. Syntax: numpy.linspace(start, stop, num=10) -> start : [optional] start of interval
range. By default start = 0 -> stop : end of interval range -> num : [int, optional] No. of
samples to generate
5. numpy.empty(): This function create a new array of given shape and type, without initializing
value. Syntax: numpy.empty(shape, dtype=float, order=’C’)
input shape in the form [x,y] where x is the no. of elements and y is the size of each element.
order: {'C', 'F'}(optional)
This parameter defines the order in which the multi-dimensional array is going to be stored
either in row-major or column-major. By default, the order parameter is set to 'C'.
6. numpy.ones(): This function is used to get a new array of given shape and type, filled with
ones(1). Syntax: numpy.ones(shape, dtype=None, order=’C’)
7. numpy.zeros(): This function is used to get a new array of given shape and type, filled with
zeros(0). Syntax: numpy.zeros(shape, dtype=None)
• logspace()
• asarray()
• In a numpy array, indexing or accessing the array index can be done in multiple ways.
• To print a range of an array, slicing is done. Slicing of an array is defining a range in a new array
which is used to print a range of elements from the original array.
• Since, sliced array holds a range of elements of the original array, modifying content with the help
of sliced array modifies the original array content.
In numpy, arrays allow a wide range of operations which can be performed on a particular array or a
combination of Arrays. These operation include some basic Mathematical operation as well as Unary
and Binary operations.
Math Operations on DataType array
In Numpy arrays, basic mathematical operations are performed element-wise on the array. These
operations are applied both as operator overloads and as functions. Many useful functions are
provided in Numpy for performing computations on Arrays such as sum: for addition of Array
elements, T: for Transpose of elements, etc.
Trigonometric Functions- NumPy has standard trigonometric functions which return trigonometric
ratios for a given angle in radians.
arcsin, arcos, and arctan functions return the trigonometric inverse of sin, cos, and tan of the given
angle. The result of these functions can be verified by numpy.degrees() function by converting
radians to degrees.
Functions for Rounding
1. numpy.around()
This is a function that returns the value rounded to the desired precision. The function takes
the following parameters. numpy.around(a,decimals)
2. numpy.floor()
This function returns the largest integer not greater than the input parameter. The floor of
the scalar x is the largest integer i, such that i <= x. Note that in Python, flooring always is
rounded away from 0.
3. numpy.ceil()
The ceil() function returns the ceiling of an input value, i.e. the ceil of the scalar x is the
smallest integer i, such that i >= x.
Numpy-Array- Attributes
• ndarray.shape: This array attribute returns a tuple consisting of array dimensions. It can also be
used to resize the array.
• numpy.itemsize: This array attribute returns the length of each element of array in bytes.
• Contents of ndarray object can be accessed and modified by indexing or slicing, just like Python's
in-built container objects.
• A Python slice object is constructed by giving start, stop, and step parameters to the built-in slice
function.
• This slice object is passed to the array to extract a part of array.
• The same result can also be obtained by giving the slicing parameters separated by a colon :
(start:stop:step) directly to the ndarray object, i.e.
s = a[2:5:1]
print(s)
• If only one parameter is put, a single item corresponding to the index will be returned.
• Slicing can also include ellipsis (…) to make a selection tuple of the same length as the dimension of
an array. If ellipsis is used at the row position, it will return an ndarray comprising of items in rows.
Numpy Array Reshape
Reshaping means changing the shape of an array. The shape of an array is the number of elements in
each dimension. By reshaping we can add or remove dimensions or change number of elements in
each dimension.
Iterating means going through elements one by one. As we deal with multi-dimensional arrays in
numpy, we can do this using basic for loop of python. If we iterate on a 1-D array it will go through
each element one by one.
Numpy Array Join
Joining means putting contents of two or more arrays in a single array. In NumPy we join arrays by
axes. We pass a sequence of arrays that we want to join to the concatenate() function, along with the
axis. If axis is not explicitly passed, it is taken as 0 (row-wise).
Splitting is reverse operation of Joining. Joining merges multiple arrays into one and Splitting breaks
one array into multiple. We use array_split() for splitting arrays, we pass it the array we want to split
and the number of splits.
You can search an array for a certain value, and return the indexes that get a match. To search an
array, use the where() method.
Sorting means putting elements in an ordered sequence. Ordered sequence is any sequence that has
an order corresponding to elements, like numeric or alphabetical, ascending or descending. The
NumPy ndarray object has a function called sort(), that will sort a specified array.
Numpy-Random Number
• Numpy has sub module called random that is equipped with the rand() function. Using this we can
generate the random numbers between 0 and 1.0. random.rand()
• We can create a 1D array of random numbers by passing the size of array to the rand() function as:
a=random.rand(n)
• We can create a 2D array of random numbers by passing the size of array to the rand() function as:
a=random.rand(m,n)
Numpy-I/O
• The numpy.save() file stores the input array in a disk file with npyextension.
a=arange(8).reshape(4,2)
print(a)
save('outfile',a)
b=load('outfile.npy')
print(b)
• The storage and retrieval of array data in simple text file format is done with savetxt() and loadtxt()
functions.
a=arange(8).reshape(4,2)
savetxt('out.txt',a)
b=loadtxt('out.txt')
print(b)
Numpy-Linear Algebra
• NumPy package contains numpy.linalg module that provides all the functionality required for linear
algebra. Some of the important function in this module are as follows:
Introduction
Machine Learning is the field of study that gives computers the capability to learn without being
explicitly programmed. It is a subfield of artificial intelligence, which is broadly defined as the
capability of a machine to imitate intelligent human behaviour.
The model or algorithm is presented with example inputs and their desired outputs and then finds
patterns and connections between the input and the output. The goal is to learn a general rule that
maps inputs to outputs. The training process continues until the model achieves the desired level of
accuracy on the training data. Some real-life examples are:
Image Classification: You train with images/labels. Then in the future, you give a new
image expecting that the computer will recognize the new object.
Market Prediction/Regression: You train the computer with historical market data
and ask the computer to predict the new price in the future.
In unsupervised machine learning, a program looks for patterns in unlabelled data. Unsupervised
machine learning can find patterns or trends that people aren’t explicitly looking for. For example, an
unsupervised machine learning program could look through online sales data and identify different
types of clients making purchases.
No labels are given to the learning algorithm, leaving it on its own to find structure in its input. It is
used for clustering populations in different groups. Unsupervised learning can be a goal in itself
(discovering hidden patterns in data).
Clustering: You ask the computer to separate similar data into clusters, this is essential
in research and science.
High-Dimension Visualization: Use the computer to help us visualize high-dimension
data.
Generative Models: After a model captures the probability distribution of your input
data, it will be able to generate more data. This can be very useful to make your
classifier more robust.
REGRESSION
Linear regression
Logistic regression
Logistic regression is used for binary classification where we use sigmoid function, that takes input
as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an
input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to Class 0 .
Logistic regression predicts the output of a categorical dependent variable. Therefore, the
outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as
0 and 1, it gives the probabilistic values which lie between 0 and 1.
In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
Over-fitting
Overfitting occurs when the model fits the training data too closely, capturing noise or random
fluctuations that do not represent the true underlying relationship between variables. This can lead
to poor generalization performance on new, unseen data.
A statistical model is said to be overfitted when the model does not make accurate predictions on
testing data. When a model gets trained with so much data, it starts learning from the noise and
inaccurate data entries in our data set. And when testing with test data results in High variance.
Then the model does not categorize the data correctly, because of too many details and noise. The
causes of overfitting are the non-parametric and non-linear methods because these types of machine
learning algorithms have more freedom in building the model based on the dataset and therefore
they can really build unrealistic models. A solution to avoid overfitting is using a linear algorithm if
we have linear data or using the parameters like the maximal depth if we are using decision trees.
In a nutshell, Overfitting is a problem where the evaluation of machine learning algorithms on
training data is different from unseen data.