Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Python Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 59

[Webites for reference: realpython, Datacamp W3 schools]

INTRODUCTION

Program- An ordered set of instructions to be executed by a computer to carry out a specific


task. The language used to specify this set of instructions to the computer is called a
programming language.

An interpreter processes the program statements one by one, first translating and then
executing. This process is continued until an error is encountered or the whole program is
executed successfully. In both the cases, program execution will stop.

A compiler translates the entire source code, as a whole, into the object code. After scanning
the whole program, it generates error messages, if any.

Python was created by Guido van Rossum and was released in February 1991.

Features of Python

 high level language, free and open source


 an interpreted language, i.e., Python programs are executed by an interpreter
 easy to understand, clearly defined syntax, relatively simple structure
 case-sensitive
 portable and platform independent; can run on various operating systems and
hardware platforms
 has a rich library of predefined functions
 Can be used for web development, building web services and applications
 uses indentation for blocks and nested blocks
 can be used for both procedural and object-oriented programming
 Dynamic Typing: a variable pointing to a value of certain type can be made to point to
value/object of different type

-The interpreter is also called Python shell

Python Keywords

Keywords are reserved words. Each keyword has a specific meaning to the Python
interpreter, and we can use a keyword in our program only for the purpose for which it has
been defined.

-Total 35 in number, main 33, last 2 added recently

False class finally is return


None continue for lambda try

True def from nonlocal while

and del global not with

as elif if or yield

assert else import pass raise

break except in async await

Identifiers

Identifiers are names used to identify a variable, function, or other entities in a program. The
rules for naming an identifier in Python are:

• The name should begin with an uppercase or a lowercase alphabet or an underscore sign
(_). Thus, an identifier cannot start with a digit.

• It can be of any length. (However, it is preferred to keep it short and meaningful.)

• It should not be a keyword or reserved word.

• We cannot use special symbols like !, @, #, $, %, etc. or space in identifiers.

DATA TYPES

We can determine the data type of a variable using built-in function type()

1) Number

-stores numerical values only.

-further classified into:

 int- integers
 float- real or floating point numbers
 complex- complex numbers; in Python, iota is denoted by j

Boolean data type (bool) is a subtype of integer. It is a unique data type, consisting of two
constants, True and False. Boolean True value is non-zero, non-null and non-empty.
Boolean False is the value zero.

2) Sequence

-A sequence is an ordered collection of items, where each item is indexed by an integer.


-The three types of sequence data types available in Python are: Strings, Lists and Tuples.

3) Set

4) None

None is a special data type with a single value. It is used to signify the absence of value in a
situation. None supports no special operations, and it is neither False nor 0 (zero), nor empty
string.

5) Mapping

Mapping is an unordered data type in Python. Currently, there is only one standard mapping
data type in Python called dictionary.

Mutable and Immutable Data Types

Variables whose values can be changed after they are created and assigned are called
mutable. Variables whose values cannot be changed after they are created and assigned
are called immutable. When an attempt is made to update the value of an immutable
variable, the old variable is destroyed and a new variable is created by the same name in
memory.

Immutable data types: Integers, Float, Boolean, Complex, Strings, Tuples, Sets

Mutable Data types: List, Dictionary

>>> num1 = 300

This statement will create an object with value 300 and the object is referenced by the
identifier num1

num2 = num1 will make num2 refer to the value 300, also being referred by num1, and
stored at memory location number, say a. So, num1 shares the referenced location with
num2.

In this manner Python makes the assignment effective by copying only the reference, and
not the data.

num1 = num2 + 100 links the variable num1 to a new object stored at memory location
number say b having a value 400. As num1 is an integer, which is an immutable type, it is
rebuilt.
OPERATORS

Arithmetic Operators

Addition (+), Subtraction (-), Multiplication (), Exponent (**)

Division (/): always returns floating point value

Floor or Integer Division (//): returns the quotient by removing the decimal part

Modulus (%): returns remainder

a%b where a<b will return remainder a

Relational Operators- used for comparison and determining the relationship between
operators

Equals to (==)

Not equal to (!=)

>, <, >=, <=

Assignment Operators- assigns or changes the value of the variable on its left

(=) Assigns value from right-side operand to left-side operand

(+=) adds the value of right-side operand to the left-side operand and assigns the result to
the left-side operand

x+=y is the same as x=x+y

Logical Operators

-not, and, or

-Evaulates to either True or False

-all values are True except None, False, 0 (zero), empty collections '', (), [], {} etc.

Identity Operators

-used to determine whether the value of a variable is of a certain type or not

-can also be used to determine whether two variables are referring to the same object or not
is: Evaluates True if the variables on either side of the operator point towards the same
memory location and False otherwise

is not: opposite of is

Membership Operators- used to check if a value is a member of the given sequence or not

in- Returns True if the variable/value is found in the specified sequence and False otherwise

not in- opposite of in

Precedence of Operators

Binary operators are operators with two operands. The unary operators need only one
operand, and they have a higher precedence than the binary operators. The minus (-) as
well as + (plus) operators can act as both unary and binary operators, but not is a unary
logical operator.

Order of Operators Description


Precedence

1 ** Exponent

2 ~, +, - Bitwise complement, unary plus, unary


minus

3 *, /, %, // Multiply, divide, modulo, floor division

4 +, - Addition and subtraction

5 <=, <, >, >= Relational Operators

6 ==, != Equality operators

7 =, +=, -=, /=, //=, %=, *=, **= Assignment operators

8 is, is not Identity operators

9 in, not in Membership Operators

10 not>and>or Logical operators

-Parenthesis can be used to override the precedence of operators. The expression within ()
is evaluated first.

-For operators with equal precedence, the expression is evaluated from left to right.

Exception: Exponent, i.e. a**b**c is evaluated as a**(b**c)


LOOPS

The else clause of a while or for loop is executed only if the loop terminates normally and not
through break.

If the break statement is inside a nested loop, it will terminate the innermost loop.

STRINGS

String is a sequence made up of one or more UNICODE characters. Here the character can be a letter,
digit, whitespace or any other symbol. A string can be created by enclosing one or more characters in
single, double or triple quote.

Values can be extended to multiple lines using triple quotes.

Each individual character in a string can be accessed using a technique called indexing. The index
specifies the character to be accessed in the string and is written in square brackets ([ ]). The index of
the first character (from left) in the string is 0 and the last character is n-1 where n is the length of
the string. If we give index value out of this range then we get an IndexError. The index must be an
integer (positive, zero or negative).

The index can also be an expression including variables and operators but the expression must
evaluate to an integer.

TypeError: string indices must be integers

Negative indices are used when we want to access the characters of the string from right to left.
Starting from right hand side, the first character has the index as -1 and the last character has the
index –n where n is the length of the string.
An inbuilt function len() in Python returns the length of the string.

A string is an immutable data type.

STRING OPERATIONS

Concatenation

To concatenate means to join. Python allows us to join two strings using concatenation operator plus
which is denoted by symbol +.

Repetition- to repeat the given string using repetition operator, denoted by symbol *.

Note: string still remains the same after the use of repetition operator.

SLICING

- to access some part of a string by specifying an index range. Given a string str1, the slice operation
str1[n:m] returns the part of the string str1 starting from index n (inclusive) and ending at index m-1.

The numbers of characters in the substring will always be equal to difference of two indices m and n,
i.e., (m-n).

Index that is too big is truncated down to the end of the string

first index > second index results in an empty ''

If the first index is not mentioned, the slice starts from index 0

If the second index is not mentioned, the slicing is done till the length of the string.

The slice operation can also take a third index that specifies the ‘step size’. For example, str1[n:m:k],
means every kth character has to be extracted from the string str1 starting from n and ending at m-1.
By default, the step size is one.

to print the first n characters- string[:n]

to print the last n characters- string[-n:]

to print the string with first and last characters removed- string[1:n-1] where n is length of string

If we ignore both the indexes and give step size as -1, str1[::-1], we obtain the string in reverse order
Traversing a String

Using for Loop

for ch in str1:

print(ch, end='')

Using while Loop

index = 0

while index < len(str1):

print(str1[index],end = '')

index += 1

STRING METHODS/BUILT-IN FUNCTIONS

Usual syntax: str.<method>()

Returns the string with first letter of every word in the string in uppercase and
title()
rest in lowercase

capitalize() Returns the string with first letter of the string in uppercase and the rest in
lowercase

lower() Returns the string with all uppercase letters converted to lowercase

upper() Returns the string with all lowercase letters converted to uppercase

count(str, start, end)  Returns number of times substring str occurs in the given string.
 If we do not give start index and end index then searching starts from
index 0 and ends at length of the string

find(str, start, end)  Returns the first occurrence of index of substring str occurring in the
given string.
 If we do not give start and end then searching starts from index 0 and
ends at length of the string.
 If the substring is not present in the given string, then the function
returns -1

index(str, start, end) Same as find() but raises an exception if the substring is not present in the
given string

endswith(substr) Returns True if the given string ends with the supplied substring otherwise
returns False

startswith(substr) Returns True if the given string starts with the supplied substring otherwise
returns False

isalnum()  Returns True if characters of the given string are either alphabets or
numeric.
 If whitespace or special symbols are part of the given string or the
string is empty it returns False

islower() Returns True if the string is non-empty and has all lowercase alphabets, or has
at least one character as lowercase alphabet and rest are non-alphabet
characters

isupper() Returns True if the string is non-empty and has all uppercase alphabets, or has
at least one character as uppercase character and rest are non-alphabet
characters

isalpha() Returns True if all the characters in the string are alphabets, otherwise False

isdigit() Returns True if all the characters in the string are digits, otherwise False

isspace() Returns True if the string is non-empty and all characters are white spaces
(blank, tab \t, newline \n, carriage return \r)

istitle() Returns True if the string is non-empty and title case, i.e., the first letter of
every word in the string in uppercase and rest in lowercase

lstrip() Returns the string after removing the spaces only on the left of the string

rstrip() Returns the string after removing the spaces only on the right of the string

strip() Returns the string after removing the spaces both on the left and the right of
the string

replace(oldstr, newstr) Replaces all occurrences of old string with the new string

join()  Returns a string in which the characters in the string have been joined
by a separator
 syntax: sep.join(string)

partition(sep)  Partitions the given string at the first occurrence of the substring
(separator) and returns the string partitioned into three parts:
Substring before, Separator and Substring after
 If the separator is not found in the string, it returns the whole string
itself and two empty strings
 always returns a tuple of 3 strings
 it is necessary to pass one argument in partition()

split() Returns a list of words delimited by the specified substring. If no delimiter is


given then words are separated by space.
Comparison of Strings

Python compares strings lexicographically, using ASCII value of the characters. If the first
character of both the strings are same, the second character is compared, and so on.

ASCII value:

0-9: 48-57

A-Z: 65-90

a-z: 97-122

(digits < uppercase letters < lowercase letters)

LISTS

List is an ordered sequence made up of one or more elements. Unlike a string which consists of only
characters, a list can have elements of different data types.

A list is a mutable data type, which means it can be modified. However, if an element of a list
is immutable (e.g. string), it cannot be changed.

Elements of a list are enclosed in square brackets and are separated by comma.

LIST OPERATIONS

Concatenation

-to join two or more lists using concatenation operator depicted by the symbol +.

-there is no change in ongoing lists

-The concatenation operator '+’ requires that the operands should be of list type only. If we try to
concatenate a list with elements of some other data type, TypeError occurs

LIST METHODS

del statement can also be used with lists

list()  Creates an empty list if no argument is passed


 Creates a list if a sequence is passed as an argument
 list(range(1,6)) will create the list [1,2,3,4,5]

append() Appends a single element passed as an argument at the end of the list

extend() Appends each element of the list passed as argument to the end of the given
list

insert(index, Inserts an element at a particular index in the list


element)

count() Returns the number of times a given element appears in the list

index()  Returns index of the first occurrence of the element in the list.
 If the element is not present, ValueError is generated

remove()  Removes the given element from the list.


 If the element is present multiple times, only the first occurrence is
removed.
 If the element is not present, ValueError is generated

pop([index]) Returns the element whose index is passed as parameter to this function and
also removes it from the list. If no parameter is given, then it returns and
removes the last element of the list.

reverse() Reverses the order of elements in the given list

sort()  Sorts the elements of the given list


 by default, in ascending order
 list1.sort(reverse=True) for descending order

sorted(list) It takes a list as parameter and creates a new list consisting of the same
elements arranged in sorted order, e.g. list1=sorted(list2)

min()/max()/sum() returns smallest element/largest element/sum of elements

Nested Lists

When a list appears as an element of another list

To access the element of the nested list of list1, we have to specify two indices list1[i][j]. The first
index i will take us to the desired nested list and second index j will take us to the desired element in
that nested list.

Copying Lists

The statement list2 = list1 does not create a new list. Rather, it just makes list1 and list2 refer to the
same list object. Here list2 actually becomes an alias of list1. Therefore, any changes made to either
of them will be reflected in the other list.

We can also create a copy or clone of the list as a distinct object by three methods:

Method 1

We can slice our original list and store it into a new variable: newList = oldList[:]
Method 2

We can use the built-in function list(): newList = list(oldList)

Method 3

We can use the copy () function.

import copy #import the library copy

newList = copy.copy(oldList) #use copy()function of library copy

Whenever a list is passed as an argument to a function, we have to consider two scenarios:

(A) Elements of the original list may be changed, i.e. changes made to the list in the function are
reflected back in the calling function.
(B) If the list is assigned a new value inside the function then a new list object is created and it
becomes the local copy of the function. Any changes made inside the local copy of the
function are not reflected back to the calling function.

TUPLES

A tuple is an ordered sequence and can contain elements of different data types. Elements of a tuple
are enclosed in parenthesis (round brackets) and are separated by commas.

If there is only a single element in a tuple then the element should be followed by a comma. A
sequence (comma separated values) without parenthesis is treated as tuple by default.

We generally use list to store elements of the same data types whereas we use tuples to store
elements of different data types.

Tuple is an immutable data type. However an element of a tuple may be of mutable type.

List is mutable but tuple is immutable. So iterating through a tuple is faster as compared to a list.

If we have data that does not change then storing this data in a tuple will make sure that it is not
changed accidentally.

Concatenation operator can also be used for extending an existing tuple. When we extend a tuple
using concatenation a new tuple is created.

#single element is appended to tuple6

>>> tuple6 = tuple6 + (6,)


TUPLE METHODS

tuple()  Creates an empty tuple if no argument is passed


 Creates a tuple if a sequence is passed as argument

sorted()  Takes elements in the tuple and returns a new sorted list.
 sorted() does not make any change to the original tuple

Tuple Assignment

allows a tuple of variables on the left side of the assignment operator to be assigned respective
values from a tuple on the right side. The number of variables on the left should be same as the
number of elements in the tuple.

If there is an expression on the right side then first that expression is evaluated and finally the result
is assigned to the tuple.

DICTIONARIES

Dictionaries permit faster access to data

Dictionary is a mapping (non-scalar) data type. It is mutable.

A dictionary is a mapping between a set of keys and a set of values.

The key-value pair is called an item. A key is separated from its value by a colon(:) and consecutive
items are separated by commas. Items in dictionaries are unordered, so we may not get back the
data in the same order in which we had entered the data initially in the dictionary.

A dictionary is enclosed in curly braces. The keys in the dictionary must be unique and should be of
any immutable data type. The values can be repeated and can be of any data type.

To create an empty dictionary:

 dict1 = {}
 dict1=dict()

The items of a dictionary are accessed via the keys. Each key serves as the index and maps to a value.
The order of items does not matter. If the key is not present in the dictionary we get KeyError

The existing dictionary can be modified by just overwriting the key-value pair.

The membership operator in checks if the key is present in the dictionary.

To search for values, use 'a' in dict.values()


Traversing a dictionary

Method 1

for key in dict1:

print(key,':',dict1[key])

Method 2

for key,value in dict1.items():

print(key,':',value)

Dictionary Methods

Dictionaries do not have count() function or attribute

print(list(dict)) will print a list of the keys of the dictionary

dict()  Creates a dictionary from a list of tuples of key-value pairs


 d3=dict(zip((1,2,3), ('one', 'two', 'three'))) [2 tuples - keys and
values]

keys() returns a list of keys

values() returns a list of values

items() returns a list of tuples (key-value pairs)

get()  Returns the value corresponding to the key passed as the argument
 If the key is not present in the dictionary it will return None
 you can choose what message to display if the key is not
present

update(new_dict) appends the key-value pair of the dictionary passed as the argument to the
key-value pair of the given dictionary

del  Deletes the item with the given key


 To delete the dictionary from the memory we write: del Dict_name
 After using del dict_name, the dictionary no longer exists
 should not be used within the print function, since it results in
an error

clear()  Deletes or clear all the items of the dictionary


 makes the dictionary an empty one

max(), min() work with keys so the keys must be of the same data type
SETS

A set itself is mutable but cannot contain mutable elements!

Once created, elements of a set cannot be changed.

A set is a mutable collection of distinct immutable values that are unordered. It is written in
curly braces {}

To create an empty set: emptySet = set()

dataScientist = set(['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS']) is equivalent to


{‘Python’,’R’,’SQL’,’Git’,’Tableau’,’SAS’}

You can only add a value that is immutable (like a string or a tuple) to a set. For example,
you would get a TypeError if you try to add a list to a set.

To remove an element from a set:

1) set_name.remove(element)
If you try to remove an element which is not present in the set, you will get a
KeyError.
2) set_name.discard(element)
No error if the element is not present in the set.
3) Using pop()- returns and removes a random element
Raises a KeyError if the set is empty
4) clear() removes all elements from the set and we obtain an empty set

To add elements to a set:

update()- requires a single argument, which can be a set, list, tuple, or dictionary

automatically converts other data types to a set and adds them to the set.

set1 = set([7, 10, 11, 13])

set2 = set([11, 8, 9, 12, 14, 15])

set1.update(set2)

Now, set1= {7,8,9,10,11,12,13,14,15}

Transforming a set into ordered values

The sorted function can be used to get the values of set in an ordered fashion

dataScientist = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS’}

print(sorted(dataScientist)) #alphabetical order

print(sorted(dataScientist, reverse=True)) #reverse alphabetical order


print(type(sorted(dataScientist)) #list

Set Operations

1) Union: set1.union(set2) or set1|set2

2) Intersection: set1.intersection(set2) or set1&set2

set1.isdisjoint(set2) returns True if the sets are disjoint

3) Difference: set1.difference(set2) or set1-set2 [order matters!]

4) Symmetric difference: set1.symmetric_difference(set2) or set1^set2

Symmetric difference is equivalent to union-intersection

• One of the main advantages of using sets in Python is that they are highly optimized
for membership tests.

• For example, sets do membership tests a lot more efficiently than lists.
• average case time complexity of membership tests in sets are O(1) vs O(n) for
lists.

Subsets

set1.issubset(set2) returns True if set1 is a subset of set2

While nested lists and tuples are possible, nested sets are not possible since a set cannot
contain a mutable element.

Frozen Set

-frozenset() function returns an immutable set

-this can then be added as an element to another set

FUNCTIONS: a named group of instructions or a subprogram that accomplishes a specific


task when it is invoked is called a function.

Importance of Functions

Functions provide a systematic way of problem solving by dividing the given problem into
several subproblems, finding their individual solutions and integrating the solutions of
individual problems to solve the original problem. This approach of problem solving is called
Stepwise Refinement/Modular approach/Divide & conquer approach.

Advantages of Functions:

1. Functions increase readability- the program is better organized and easier to understand

2. Reduces code length and makes debugging faster

3. Increases reusability- a function can be called multiple times in a program

4. Work can be divided among team members resulting in faster completion

Types of Functions:

1) Built-in functions: pre-defined functions that are already built in/available in the Python
library and are frequently used in programs

2) Functions in a module

-A module is a file containing Python functions and statements

-Standard library of Python is extended as module(s) to a programmer


-Definitions from the module can be used within the code of a program

-To use these modules in a program, the programmer needs to import the module.

-Import statement: import module_name1 [, module_name2…]

To use the function, write: module_name.function_name() [dot notation]

-From statement: used to import required functions from the module instead of the entire
module, requires less memory

from module_name import function_name1, function_name2

from module_name import * (to import all the functions of a particular module,
equivalent to importing the entire module)

Here, dot notation is not required, and we can directly use function_name()

3) User-defined functions: functions defined to achieve a task as per the programmer's


requirement

Creating user-defined functions

-A function definition begins with def (short for define)

-The items enclosed in "[ ]" are called parameters and they are optional.

-Function header always ends with a colon (:).

-Function names should be unique. Rules for naming identifiers also apply for function
naming.

-The statements outside the function indentation are not considered as part of the function.

Arguments and Parameters

Parameter- value provided in the parenthesis when we write function header, it is required
by the function to work, also known as formal parameter or formal argument

Argument- value passed to the function when it is called, it is provided in function call/invoke
statement, also known as actual argument or actual parameter

An argument is a value passed to the function during the function call which is received in
corresponding parameter defined in function header.

1) Default parameters

-Python allows assigning a default value to the parameter.

-A default value is a value that is pre-decided and assigned to the parameter when the
function call does not have its corresponding argument.
-If an argument is passed for a default parameter, then the value of the parameter gets over-
written to that of the argument passed in the function call.

-The default parameters must be the trailing parameters in the function header, i.e., if any
parameter has a default value, then all the other parameters to its right must also have
default values. Default parameters cannot be followed by positional parameters.

2) Positional parameters

When positional parameters are defined in the function header, the no. of (required)
arguments must be equal to the no. of parameters, or else it would lead to an error. The
arguments get assigned in the same order as the defined parameters.

Function returning value(s)

A function may or may not return a value when called. The return statement returns the
values from the function. Functions which do not return any value are called void functions.
We can use a return statement to send value(s) from the function to its calling function.

The return statement does the following:

• returns the control to the calling function.

• return value(s) or None.

Flow of Execution

-The order in which the statements in a program are executed is called flow of execution.
The Python interpreter starts executing the instructions in a program from the first statement.
The statements are executed one by one, in the order of appearance from top to bottom.

-When the interpreter encounters a function definition, the statements inside the function are
not executed until the function is called.

-When the interpreter encounters a function call, the control jumps to the called function and
executes the statement of that function.

-The execution of statements inside the function stops at the last statement, or at the return
statement if it comes first. Anything written after the return statement inside the function will
NOT be executed.

-After that, the control comes back to the point of function call so that the remaining
statements in the program can be executed.

-A function must be defined before its call within a program.

Scope of a Variable
A variable defined inside a function cannot be accessed outside it. Every variable has a well-
defined accessibility. The part of the program where a variable is accessible is defined as
the scope of that variable.

1) Global variable (has global scope)

- a name declared in the top-level segment (_main_) of a program has global scope and can
be used in the entire program.

-a variable that is defined outside any function or any block

-It can be accessed in any functions defined onwards.

-Any change made to the global variable is permanent and affects all the functions in the
program where that variable can be accessed.

2) Local variable (has local scope)

-A variable that is defined inside any function or a block

-It can be accessed only in the function or a block where it is defined.

-It exists only till the function executes.

-Formal parameters/arguments are local variables

- If a variable with the same name as the global variable is defined inside a function, then it
is considered local to that function and hides the global variable.

- If the modified value of a global variable is to be used outside the function, then the
keyword global should be prefixed to the variable name in the function.

Lifetime of a variable

- the time for which the variable/name remains in the memory

-for global variables, the lifetime is the entire program run, i.e., as long as the program is
executing

-for local variables, the lifetime is their function’s run, i.e., as long as the function is
executing.

Name Resolution (LEGB Rule)

For every name reference, Python (interpreter) follows the steps below:

1) It checks within the LOCAL environment/namespace, whether there is a variable with the
same name. If yes, Python uses its value, otherwise, it moves to step 2.

2) It checks the ENCLOSING environment for a variable of the same name; if found, Python
uses its value. If the variable is not found in the current environment, Python repeats this
step in higher-level enclosing environments, if any. Otherwise, it moves to Step 3.
3) Next, it checks the GLOBAL environment for a variable of the same name; if found,
Python uses its value otherwise, it moves to step 4.

4) Next, it checks the BUILT-IN environment for a variable of the same name; if found,
Python uses its value. Otherwise, Python reports the error:

name <variable> not defined

Mutable/Immutable properties of passed data objects

-Any change in the value of a mutable data type passed in the function will change the
memory address it is referring to (pass by reference- change made to original value).

-Any change in the value of an immutable data type passed in the function will not change its
memory address (pass by value- no change made to original value).

Mutable/Immutable properties of arguments/parameters

When passing values through arguments and parameters in a function:

-Changes in an immutable data type done within the function are never reflected in the
function call.

-Changes in a mutable data type done within the function are reflected in the function call,
unless it (the parameter) is assigned a different value or a different data type; or another
variable with a different value is assigned to it.

Built-in Functions in Python

abs()- returns the absolute value

bin()- converts and returns the binary equivalent string of a given integer

bool([n])- converts a value to Boolean; It's not mandatory to pass a value to bool(). If you do not pass
a value, bool() returns False. In general use, bool() takes a single parameter value.

chr()- returns a character (a string) from an integer (represents Unicode code point of the character),
e.g. chr(65) returns A

ord()- returns an integer representing the Unicode character, e.g. ord("A") returns 65

complex()- returns a complex number when real and imaginary parts are provided, or it converts a
string to a complex number; e.g. complex(a,b) returns a+bj, complex("a+bj") returns a+bj, complex(a)
returns a+0j

divmod()- takes two numbers and returns a pair of numbers (a tuple) consisting of their quotient and
remainder
eval()- parses the expression passed to this method and runs python expression (code) within the
program

exec()- executes the dynamically created program, which is either a string or a code object

float()- returns a floating point number from a number or a string


format()- returns a formatted representation of the given value controlled by the format specifier
help()- calls the built-in Python help system. If string is passed as an argument, name of a module,
function, class, method, keyword, or documentation topic, and a help page is printed
hex()- converts an integer number to the corresponding hexadecimal string
input()- reads a line from input, converts into a string and returns it
oct()- takes an integer number and returns its octal representation
round()- returns a floating-point number rounded to the specified number of decimals
ADVANCED CONCEPTS

The & operator performs a bitwise AND operation. When you perform a bitwise AND
operation with 1, it effectively checks whether the number is odd or even. If a number is odd,
the least significant bit is always 1 in binary representation, so the result of x & 1 will be 1. If
the number is even, the least significant bit is 0, so the result will be 0.

The bitwise AND operation with 1 checks the least significant bit (rightmost bit) of each
number in the list.

 Odd numbers (1, 3, 5) have a 1 in the least significant bit, so the & operation with 1
results in 1.
 Even numbers (2, 4) have a 0 in the least significant bit, so the & operation with 1
results in 0.

Thus, x&1 returns 1 if x is odd and 0 if x is 1.

Lambda function
#small anonymous function, does not have any name
#used for one-line functions
#use lambda instead of def
#syntax:- lambda argument(s):expression
x=lambda a:a*2
print(x(4))

y=lambda : "without arguments"


print(y())

z=lambda a,b: (a+b)*3


print(z(5,6))

b=6
w=lambda a,c : a+b+c
print(w(5,2))

a=lambda i: i.split(" ")[-1]


L1=["simran kaur", "mann kumar", "raman singh", "taran shah"]
L2=[]
for i in L1:
print(a(i))
L2.append(a(i))
print(L2)

def fn(n):
a=lambda :n
print(a())
fn(2)

Recursion
A function is said to be a recursive if it calls itself.
LIST COMPREHENSION
A list comprehension consists of an expression followed by the for statement inside square
brackets.
numbers = [number*number for number in range(1, 6)]
Every list comprehension can be rewritten in for loop, but every for loop can’t be rewritten in
the form of list comprehension.

 Conditionals in List Comprehension


number_list = [ x for x in range(21) if x % 2 == 0]

• Nested if with list comprehension


num_list = [y for y in range(100) if (y % 2 == 0 and y % 5 == 0)]

• if else with List Comprehension


obj = ["Even" if i%2==0 else "Odd" for i in range(10)]

Flattening a list
• In Python, a list of lists (or cascaded lists) resembles a two-dimensional array
• Hence, flattening such a list of lists means getting elements of sublists into a one-
dimensional array-like list.
• flatlist=[element for sublist in nestedlist for element in sublist]
• E.g., [[1,2,3],[4,5,6],[7,8,9]] is flattened to [1,2,3,4,5,6,7,8,9]
list1=[[1,2,3],[4,5,6],[7,8,9]]
list2=[j for i in list1 for j in i]
print(list2)

Splitting list into chunks


To split up a list into parts of the same size, zip() function can be used with iter() function

• The function takes in iterables (an object capable of returning its members one at a
time) as arguments and returns an iterator.

• This iterator generates a series of tuples containing elements from each iterable (list,
set, tuples, file etc.)
• The zip() function returns a zip object, which is an iterator of tuples where the first
item in each passed iterator is paired together, and then the second item in each
passed iterator are paired together etc.
Syntax: zip(iterator1, iterator2, iterator3 ...)
Note: If the passed iterators have different lengths, the iterator with the least items
decides the length of the new iterator.

Mapping
• Map in Python is a function that works as an iterator to return a result after applying a
function to every item of an iterable (tuple, lists, etc.).
• It is used when you want to apply a single transformation function to all the iterable
elements.
• The iterable and function are passed as arguments to the map in Python.
• The syntax of the Python map() function is: map(function, iterable)
• It loops over each item of an iterable and applies the transformation function to it.
• Then, it returns a map object that stores the value of the transformed item.
• The input function can be any callable function, including built-in functions, lambda
functions, user-defined functions, classes, and methods.

We can also pass more than one iterable, e.g. the function pow requires 2 inputs. The final
iterable is only as long as the shortest iterable.

filter() is a built-in function that takes two positional arguments


• filter() yields the items of the input iterable for which function returns True.
• If you pass None to function, then filter() uses the identity function.
• This means that filter() will check the truth value of each item in iterable and filter out
all of the items that are false.
• Python’s reduce() has to be imported from the module functools
• reduce() is another core functional tool in Python that is useful when you need to
apply a function to all the items in an iterable and compute a single cumulative value.
• This kind of operation is commonly known as reduction or folding. reduce() takes two
required arguments: function and iterable

OS (OPERATING SYSTEM) MODULE


-provides functions for interacting with the operating system
-we must import the os module
-The rename() method takes two arguments, the current filename and the new filename.
os.rename("current_file_name", "new_file_name")
-to remove a file: os.remove("file_name")
-to change the current working directory: os.chdir("new_dir")
-to display the current working directory: os.getcwd()
-to create a new folder/directory: os.mkdir("dir_name")
-to remove the directory: os.rmdir("dir_name") - remove all the files in the directory first!

listdir()
used to get the list of all the files and/or directories in the specified directory; if no argument
passed, returns the list of files and/or directories in the current working directory

File Information (Metadata)


By giving the path of the file, we can get more information about the file.
• os.path.getsize("filename") returns the size of the file
• os.path.getmtime() returns the file last modified date
• os.path.getctime() returns the file creation date
• os.stat() returns all the information you need in a concise way, used to get status of the
specified path

PANDAS- stands for Panel Data Analysis

Suggested Read (Important): Python pandas tutorial: The ultimate guide for beginners |
DataCamp
-df.set_index() method is used to assign a list, series, or another data frame as the index of a given
data frame

-pandas.concat([df1,df2]) is used to concatenate two data frames.

•Data manipulation package in Python for tabular data

•Pandas’ functionality includes data transformations, like sorting rows and taking subsets, to
calculating summary statistics such as the mean, reshaping DataFrames, and joining
DataFrames together etc.

•Open source Python library

•To install in Python: pip install pandas

[PIP is a package management system that installs and manages software packages written
in Python. It stands for "Preferred Installer Program" (or "Pip Installs Packages") ]

Use of Pandas

•Import datasets from databases, spreadsheets, comma-separated values (CSV) files, and
more.

•Clean datasets, for example, by dealing with missing values.

•Tidy datasets by reshaping their structure into a suitable format for analysis.

•Aggregate data by calculating summary statistics such as the mean of columns, correlation
between them, and more.

•Visualize datasets and uncover insights.

•pandas also contains functionality for time series analysis and analyzing text data.

Pandas deals with the following three data structures −

• Series

• DataFrame

• Panel
A Series is essentially a column, and a DataFrame is a multi-dimensional table made up of
a collection of Series.

Series

•Pandas Series is a one-dimensional labeled array capable of holding data of any type
(integer, string, float, python objects, etc.).

•Axis labels are collectively called index

•Pandas Series will be created by loading the datasets from existing storage, storage can be
SQL Database, CSV file, and Excel file.

•Pandas Series can be created from lists, dictionary, and from a scalar value etc.

Creating a series

import pandas

import numpy

ser = pandas.Series()

print(ser)

# simple array

data = numpy.array(['g', 'e', 'e', 'k', 's'])

ser = pandas.Series(data)
print(ser)

-import pandas

-pandas.Series([data]) used to create a series

DataFrame

• A two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled


axes (rows and columns)

• Pandas DataFrame consists of three principal components, the data, rows, and columns.

Creating DataFrames

#creating dataframe from a dictionary

data = {'apples': [3, 2, 0, 1], 'oranges': [0, 3, 7, 2]}

purchases = pandas.DataFrame(data)

purchases

Note: Each (key, value) item in the data corresponds to a column in the resulting Data Frame.

Providing customised index

•Index, if not specified, by default starts from 0

•In case we want to provide index then:

purchases = pandas.DataFrame(data, index=['June', 'Robert', 'Lily', 'David'])

Importing data in Pandas

Importing csv files

•Use read_csv() with the path to the CSV file to read a comma-separated values file

•Example: df=pandas.read_csv(“diabetes.csv”)

Note: CSVs don't have indexes like our DataFrames, so all we need to do is just designate the
index_col when reading:

df=pd.read_csv(“diabetes.csv”,index_col=0)

df=pd.read_csv(r’C:\Users\vivek\Documents\SRCC\Python/’+’diabetes.csv’)

[To be done if the file is not present in the current working directory]
More information on read_csv() function (Important): pandas read_csv() Tutorial: Importing Data
| DataCamp

Importing text files

• The separator argument refers to the symbol used to separate rows in a DataFrame.

• Comma (sep = ","), whitespace(sep = "\s"), tab (sep = "\t"), and colon(sep = ":") are the commonly
used separators.

• df = pd.read_csv("diabetes.txt", sep="\s")

Importing excel files

•df = pd.read_excel('diabetes.xlsx’)

•df = pd.read_excel('diabetes_multi.xlsx', sheet_name=1) #to import data from the first sheet in the
workbook

Outputting a dataframe

•df.to_csv("diabetes_out.csv", index=False)

•The arguments include the filename with path and index – where index = True implies writing the
DataFrame’s index.

•Similarly, json files can be imported and exported as well as excel files.

Viewing and analysing dataframes

The first few and last few rows of a dataframe can be read using the head() and tail() function

•df.head() #returns the first 5 rows

•df.tail() #returns the last 5 rows

•df.head(n=**) or df.head(n) # returns the first n rows

•df.tail(n=**)

[**Specify value of n according to choice]

describe() method

•prints the summary statistics of all numeric columns, such as count, mean, standard
deviation, range, and quartiles

•df.describe()

-Modifying what we want to read in the describe() function:


•can also modify the quartiles using the percentiles argument.

•For example, if we want the 30th, 50th, and 70th percentiles of the numeric columns in DataFrame:
df.describe(percentiles=[0.3, 0.5, 0.7])

•df.describe(include=[int])- Summarizing columns with integer data types only

•df.describe(exclude=[int])- Excluding columns having int types, summary stats of non-integer


columns only

Transpose of the DataFrame: df.T

Can also find transpose of df.describe()

info() method

•The .info() method is a quick way to look at the data types, missing values, and data size of a
DataFrame.

•Here, we’re setting the show_counts argument to True, which gives a few over the total non-
missing values in each column.

•We’re also setting memory_usage to True, which shows the total memory usage of the DataFrame
elements.

•When verbose is set to True, it prints the full summary from .info().

df.info(show_counts=True, memory_usage=True, verbose=True)

Getting the structure of the DataFrame

•df.shape # Get the number of rows and columns

Output is a tuple- (row, column)

•df.shape[0] # Get the number of rows only

•df.shape[1] # Get the number of columns only

Fetching the columns and column names

•df.columns

•list(df.columns)

*To get columns with their data types: df.dtypes


Creating a copy of a dataframe

df2=df.copy()

This is done so as not to affect the original dataframe and perform operations on the copy of the
dataframe created.

Slicing and Extracting data in Pandas

•isolating a single column using a square bracket [ ] with a column name in it

df[['Pregnancies', 'Outcome’]]

#Isolating more than 1 column from a dataframe

Extracting rows from dataframe

• df[df.index==1]

•A single row can be fetched by passing in a boolean series with one True value.

•In the example above, the second row with index = 1 is returned. Here, .index returns the row
labels of the DataFrame, and the comparison turns that into a Boolean one-dimensional array.

Extracting more than one row: df[df.index.isin(range(2,10))]

Using .loc and .iloc to fetch rows and columns in dataframe

dataFrame.loc[<ROWS RANGE> , <COLUMNS RANGE>]

•ROWS OR COLUMN RANGE can be also be ‘:’ and if given in rows or column Range parameter then
all entries will be included for corresponding row or column.

•Note: .loc[] uses a label to point to a row, column or cell, whereas .iloc[] uses the numeric position.

df2.loc[1]

df2.iloc[1]

The 1 represents the row index (label) in loc, whereas the 1 in .iloc[] is the row position (first row).

the 'loc' function is mainly used when we want to select rows and columns based on their labels-
includes last element of the range

iloc is index-based (must pass an integer), does not include the last element of the range

Fetching multiple rows: df2.loc[100:110] will return rows labelled 100 to 110

Whereas iloc[100:110] will return rows with index 100 to 109


Getting a subset of rows: df2.loc[[100, 200, 300]]

Conditional Slicing

df[df.BloodPressure == 122]

df[df.Outcome == 1]

df.loc[df['BloodPressure'] > 100, ['Pregnancies', 'Glucose', 'BloodPressure’]]

This code fetches Pregnancies, Glucose, and BloodPressure for all records with BloodPressure
greater than 100.

Updating value of a column

df2.loc[df['Age']==81, ['Age']] = 80

This statement updates values of “Age” column in df2 (copy of df) to 80 at the location of all the
rows in df where “Age” is 81

Isolating rows based on a condition: df.loc[df['BloodPressure'] > 100, ['Pregnancies', 'Glucose',


'BloodPressure’]]

Cleaning data using Pandas

Checking for missing/null values:

df.isnull().sum() #getting the number of null values in each column

df.isnull().sum().sum() #getting the total number of null values

Dropping missing values: df4=df4.dropna()

Another way of dropping missing values

1. df4.dropna(inplace=True, axis=1)

#axis =0 for rows, axis=1 for columns

2. df4.dropna(inplace=True, how=“all”)

#can also drop both rows and columns with missing values by setting the how argument to 'all'

To detect duplicate rows: df.duplicated()

Dropping duplicates: drop_duplicates()


Data Analysis in Pandas

df.mean(), df.median(), df.mode()

Creating new columns based on existing columns

Create a copy of the dataframe df

df=df1.copy()

df1['Glucose_Insulin_Ratio'] = df['Glucose']/df['Insulin’]

df1.head()

Working with categorical values

• Category values can be counted using the .value_counts() methods.

• Here, for example, we are counting the number of observations where Outcome is diabetic (1) and
the number of observations where the Outcome is non-diabetic (0).

df['Outcome'].value_counts()

-Applying .value_counts() on a subset of columns: df.value_counts(subset=['Pregnancies',


'Outcome'])

Aggregating data with .groupby() in pandas

• Pandas lets you aggregate values by grouping them by specific column values.

•You can do that by combining the .groupby() method with a summary method of your choice

Syntax: df.groupby(<column_name>)

•The below code displays the mean of each of the numeric columns grouped by Outcome.

df.groupby('Outcome').mean()

•Another example: df.groupby(['Pregnancies', 'Outcome']).mean()

Sorting values in a dataframe

sort_values(by=<column_name>) method sorts the DataFrame by the


specified label.

-You can select a column ‘A’ in a df using df[‘A’] or df.A


-df.astype() to change the data type of a column in a DataFrame
-To rename a column of a dataframe: df.rename(columns={‘old_name’:’new_name’})
REGULAR EXPRESSION (regex)

Regular expressions is a sequence of characters that forms a search pattern.

• It can be used to check if a string contains the specified search pattern or not.

• Python provides a built in module re which can be used to work with regular expression.

• match=re.method_name(pattern,string)

• If the search is successful, search() returns a match object or None object otherwise.

To implement regular expressions, the Python's re package can be used. Import the Python's re
package with the following command: import re

Raw strings

A normal string, when prefixed with 'r' or 'R' becomes a raw string.

The difference between a normal string and a raw string is that the normal string in print() function
translates escape characters (such as \n, \t etc.) if any, while those in a raw string are not.

Meta Characters

Some characters carry a special meaning when they appear as a part pattern matching string.
Python's re module uses the following characters as meta characters: . ^ $ * + ? [ ] \ | ( )

When a set of alpha-numeric characters are placed inside square brackets [], the target string is
matched with these characters. A range of characters or individual characters can be listed in the
square bracket.
'\'is an escaping metacharacter followed by various characters to signal various special sequences. If
you need to match a [ or \, you can precede them with a backslash to remove their special
meaning: \[ or \\.

You can also specify a range of characters using - inside square brackets.

• [a-e] is the same as [abcde].

• [1-4] is the same as [1234].

• [0-9] is the same as [0123---9] You can complement (invert) the character set by using caret ^
symbol at the start of a square-bracket.

• [^abc] means any character except a or b or c.

• [^0-9] means any non-digit character.


Other Special Sequences

There are some of the Special sequences that make commonly used patterns easier to write. Below
is a list of such special sequences:

re.match()- This function in re module tries to find if the specified pattern is present at the beginning
of the given string.

Syntax: re.match(pattern,string)

This function returns None if no match can be found. If they’re successful, a match object instance is
returned, containing information about the match: where it starts and ends, the substring it
matched, etc.

>>> import re

>>> string="Simple is better than complex."

>>> obj=re.match("Simple",string)

>>> obj

>>> obj.start()

>>> obj.end()

The match object's start() method returns the starting position of pattern in the string, and end()
returns the endpoint. If the pattern is not found, the match object is None.

re.search():
This function searches for first occurrence of RE pattern within string from any position of the string
but it only returns the first occurrence of the search pattern.

>>> import re

>>> string="Simple is better than complex."

>>> obj=re.search("is", string)

>>> obj.start()

>>> obj.end()

re.findall():

It helps to get a list of all matching patterns. The return object is the list of all matches.

>>> import re

>>> string="Simple is better than complex."

>>> obj=re.findall("ple", string)

>>> obj

['ple', 'ple']

To obtain list of all alphabetic characters from the string:

>>> obj=re.findall("\w", string)

>>> obj

['S', 'i', 'm', 'p', 'l', 'e', 'i', 's', 'b', 'e', 't', 't', 'e', 'r', 't', 'h', 'a', 'n', 'c', 'o', 'm', 'p', 'l', 'e', 'x']

To obtain list of words:

>>> obj=re.findall("\w*", string)

>>> obj

['Simple', '', 'is', '', 'better', '', 'than', '', 'complex', '', '']

re.split():

This function helps to split string by the occurrences of given pattern. The returned object is the list
of slices of strings.

>>> import re
>>> string="Simple is better than complex."

>>> obj=re.split(' ',string)

>>> obj ['Simple', 'is', 'better', 'than', 'complex.']

The string is split at each occurrence of a white space ' ' returning list of slices, each corresponding to
a word. Note that output is similar to split() function of built-in str object.

>>> string.split(' ')

['Simple', 'is', 'better', 'than', 'complex.']

re.sub():

This function returns a string by replacing a certain pattern by its substitute string.

Syntax: re.sub(pattern, replacement, string)

In the example below, the word 'is' gets substituted by 'was' everywhere in the target string.

>>> string="Simple is better than complex. Complex is better than complicated."

>>> obj=re.sub('is', 'was', string)

>>> obj

'Simple was better than complex. Complex was better than complicated.'

^ (Caret): Matches pattern only at the start of the string.

$ (Dollar): Matches pattern at the end of the string

+ (Plus): Match 1 or more repetitions of the regex.

. (Dot): Matches any character except a newline

{} Exactly the specified number of occurrences

The \w metacharacter is used to find a word character.

A word character is a character from a-z, A-Z, 0-9, including the _ (underscore) character.

• ? The question mark indicates zero or one occurrences of the preceding element. For example,
colou?r matches both "color" and "colour".

• * The asterisk indicates zero or more occurrences of the preceding element. For example, ab*c
matches "ac", "abc", "abbc", "abbbc", and so on.
MATPLOTLIB

Introduction to Data Visualisation in Python

• Matplotlib is a powerful plotting library in Python used to create static, animated, and
interactive visualizations.

• It was originally designed to emulate plotting abilities of Matlab but in Python

• Matplotlib is popular due to its ease of use, extensive documentation, and wide range
of plotting capabilities.

• Many other packages use Matplotlib for data visualization, including pandas, NumPy,
and SciPy.

• Other libraries include seaborn, Altair, ggpy, Bokeh, plotly

• While some are built on top of Matplotlib, others are independent

In Matplotlib, a figure is the top-level container that holds all the elements of a plot.

It represents the entire window or page where the plot is drawn.

The parts of a Matplotlib figure include:

• Figures (the canvas)

• Axes (The co-ordinate system)

• Axis (X-Y Axis)

• Marker

• Lines to Figures

• Matplotlib Title

• Axis labels

• Ticks and tick labels

• Legend

• Gridlines

• Spines (Borders of the plot area)

• The package is imported into the Python script by adding the following statement:

from matplotlib import pyplot [as plt]


• Here pyplot() is the most important function in matplotlib library, which is used to plot
2D data.

Pyplot in Matplotlib

• Pyplot is a Matplotlib module that provides a MATLAB-like interface.

• Each pyplot function makes some changes to a figure: e.g., creates a figure, creates
a plotting area in a figure, plots some lines in a plotting area, decorates the plot with
labels, etc.

• The various plots we can utilize using Pyplot are Line Plot, Histogram, Scatter, 3D
Plot, Image, Contour, and Polar

Basic Functions of matplotlib.pyplot for Chart Creation

• Use plot() to plot the graph. This function is used to draw the graph. It takes x value,
y value, format string(line style and color) as an argument.

• Use show() to show the graph window. This function is used to display the graph. It
does not take any argument.

• Use title() to give title to graph. It takes string to be displayed as title as argument.

• Use xlabel() to give label to x-axis. It takes string to be displayed as label of x-axis as
argument.

• Use ylabel() to give label to y-axis. It takes string to be displayed as label of y-axis as
argument.

• Use savefig() to save the result in a file.

• Use annotate() function to highlight some specific locations in the chart.

• Use legend() to apply legend in the chart.

• The subplot() function allows you to plot different things in the same figure. Its first
argument specify height, second specify the width and third argument specify the
active subplot.

• Use bar() function to generate if we want to draw bar graph in place of line graph.
E.g. plt.bar(x, y, color = 'g', align = 'center')

• For horizontal bar graph, use barh(y, x)

• Use scatter(x,y) for a scatter plot

• Use hist() function for graphical representation of the frequency distribution of data.
Rectangles of equal horizontal size corresponding to class interval called bin and
variable height corresponding to frequency. It takes the input array and bins as two
parameters. The successive elements in bin array act as the boundary of each bin.
The seaborn library in Python
• Seaborn is a library mostly used for statistical plotting in Python.
• It is built on top of Matplotlib and provides beautiful default styles and color palettes
to make statistical plots more attractive.
Heatmap
• Heatmap is defined as a graphical representation of data using colours to visualize
the value of the matrix.
• In this, to represent more common values or higher activities brighter colors basically
reddish colors are used and to represent less common or activity values, darker
colors are preferred.

seaborn.heatmap()
Syntax: seaborn.heatmap(data, *, vmin=None, vmax=None, cmap=None, center=None, an
not_kws=None, linewidths=0, linecolor=’white’, cbar=True, **kwargs)
Important Parameters:
• data: 2D dataset that can be coerced into an ndarray.
• vmin, vmax: Values to anchor the colormap, otherwise they are inferred from the
data and other keyword arguments.
• cmap: The mapping from data values to color space.
• center: The value at which to center the colormap when plotting divergent data.
• annot: If True, write the data value in each cell.
• fmt: String formatting code to use when adding annotations.
• linewidths: Width of the lines that will divide each cell.
• linecolor: Color of the lines that will divide each cell.
• cbar: Whether to draw a colorbar.
All the parameters except data are optional.
Suggested Reads
• Neural Data Science in Python — Neural Data Science in Python
• Python Plotting With Matplotlib (Guide) – Real Python
• Getting Started with Python Matplotlib – An Overview – GeeksforGeeks
• Python Seaborn Tutorial – GeeksforGeeks
• Subplots in Python (Matplotlib Subplots - How to create multiple plots in same figure
in Python? - Machine Learning Plus)
NUMPY

Introduction

• NumPy stands for Numerical Python which is a Python package developed by Travis Oliphant in
2005.

• It is a library consisting of multidimensional array objects and a collection of routines for


performing mathematical and logical operations on those arrays

In Python, we use the list for purpose of the array but it’s slow to process. There are the following
advantages of using NumPy for data analysis.

• NumPy performs array-oriented computing.

• It efficiently implements the multidimensional arrays.

• It performs scientific computations.

• It is capable of performing Fourier Transform and reshaping the data stored in multidimensional
arrays.

• NumPy provides the in-built functions for linear algebra and random number generation.

We must import numpy before using any of its objects or routines.

Arrays in NumPy

Array in Numpy is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers.

In NumPy, the number of dimensions of the array is called rank of the array.

A tuple of integers giving the size of the array along each dimension is known as shape of the array.
An array class in NumPy is called as ndarray.

Elements in NumPy arrays are accessed by using square brackets and can be initialized by using
nested Python Lists.

One-dimensional array: sample_array = numpy.array(list1)

Multi-dimensional array: sample_array = numpy.array([list1,list2,list3])


THE SIZE OF ALL LISTS MUST BE THE SAME!

Note: use [ ] operators inside numpy.array() for multi-dimensional


sample_array.shape
-gives the number of elements along with each dimension
-output: (a,b) where a is the number of elements in the array and b is the dimension (size) of
each element

Data type objects (dtype): It is an instance of numpy.dtype class. It describes how the bytes in the
fixed-size block of memory corresponding to an array item should be interpreted.

Different ways of creating an array using NumPy

1. numpy.array(): The Numpy array object in Numpy is called ndarray. We can create ndarray
using numpy.array() function. Syntax: numpy.array(parameter)
2. numpy.fromiter(): The fromiter() function create a new one-dimensional array from an
iterable object. Syntax: numpy.fromiter(iterable, dtype, count=-1)
3. numpy.arange(): This is an inbuilt NumPy function that returns evenly spaced values within a
given range. Syntax: numpy.arange([start, ]stop, [step, ]dtype=None)
4. numpy.linspace(): This function returns evenly spaced numbers over a specified between
two limits. Syntax: numpy.linspace(start, stop, num=10) -> start : [optional] start of interval
range. By default start = 0 -> stop : end of interval range -> num : [int, optional] No. of
samples to generate

5. numpy.empty(): This function create a new array of given shape and type, without initializing
value. Syntax: numpy.empty(shape, dtype=float, order=’C’)

input shape in the form [x,y] where x is the no. of elements and y is the size of each element.
order: {'C', 'F'}(optional)
This parameter defines the order in which the multi-dimensional array is going to be stored
either in row-major or column-major. By default, the order parameter is set to 'C'.

6. numpy.ones(): This function is used to get a new array of given shape and type, filled with
ones(1). Syntax: numpy.ones(shape, dtype=None, order=’C’)
7. numpy.zeros(): This function is used to get a new array of given shape and type, filled with
zeros(0). Syntax: numpy.zeros(shape, dtype=None)

• logspace()

• asarray()

Accessing the array Index

• In a numpy array, indexing or accessing the array index can be done in multiple ways.

• To print a range of an array, slicing is done. Slicing of an array is defining a range in a new array
which is used to print a range of elements from the original array.

• Since, sliced array holds a range of elements of the original array, modifying content with the help
of sliced array modifies the original array content.

Basic Array Operations

In numpy, arrays allow a wide range of operations which can be performed on a particular array or a
combination of Arrays. These operation include some basic Mathematical operation as well as Unary
and Binary operations.
Math Operations on DataType array

In Numpy arrays, basic mathematical operations are performed element-wise on the array. These
operations are applied both as operator overloads and as functions. Many useful functions are
provided in Numpy for performing computations on Arrays such as sum: for addition of Array
elements, T: for Transpose of elements, etc.

Trigonometric Functions- NumPy has standard trigonometric functions which return trigonometric
ratios for a given angle in radians.

arcsin, arcos, and arctan functions return the trigonometric inverse of sin, cos, and tan of the given
angle. The result of these functions can be verified by numpy.degrees() function by converting
radians to degrees.
Functions for Rounding

1. numpy.around()
This is a function that returns the value rounded to the desired precision. The function takes
the following parameters. numpy.around(a,decimals)

2. numpy.floor()
This function returns the largest integer not greater than the input parameter. The floor of
the scalar x is the largest integer i, such that i <= x. Note that in Python, flooring always is
rounded away from 0.

3. numpy.ceil()
The ceil() function returns the ceiling of an input value, i.e. the ceil of the scalar x is the
smallest integer i, such that i >= x.

Numpy-Array- Attributes

• ndarray.shape: This array attribute returns a tuple consisting of array dimensions. It can also be
used to resize the array.

• ndarray.ndim: This array attribute returns the number of array dimensions.

• numpy.itemsize: This array attribute returns the length of each element of array in bytes.

Numpy-Array- Indexing & Slicing

• Contents of ndarray object can be accessed and modified by indexing or slicing, just like Python's
in-built container objects.

• Items in ndarray object follows zero-based index.

• A Python slice object is constructed by giving start, stop, and step parameters to the built-in slice
function.
• This slice object is passed to the array to extract a part of array.

• The same result can also be obtained by giving the slicing parameters separated by a colon :
(start:stop:step) directly to the ndarray object, i.e.

s = a[2:5:1]

print(s)

• If only one parameter is put, a single item corresponding to the index will be returned.

• Slicing can also include ellipsis (…) to make a selection tuple of the same length as the dimension of
an array. If ellipsis is used at the row position, it will return an ndarray comprising of items in rows.
Numpy Array Reshape

Reshaping means changing the shape of an array. The shape of an array is the number of elements in
each dimension. By reshaping we can add or remove dimensions or change number of elements in
each dimension.

NumPy Array Iterating

Iterating means going through elements one by one. As we deal with multi-dimensional arrays in
numpy, we can do this using basic for loop of python. If we iterate on a 1-D array it will go through
each element one by one.
Numpy Array Join

Joining means putting contents of two or more arrays in a single array. In NumPy we join arrays by
axes. We pass a sequence of arrays that we want to join to the concatenate() function, along with the
axis. If axis is not explicitly passed, it is taken as 0 (row-wise).

Numpy array split

Splitting is reverse operation of Joining. Joining merges multiple arrays into one and Splitting breaks
one array into multiple. We use array_split() for splitting arrays, we pass it the array we want to split
and the number of splits.

Numpy Array search

You can search an array for a certain value, and return the indexes that get a match. To search an
array, use the where() method.

Numpy Array Sort

Sorting means putting elements in an ordered sequence. Ordered sequence is any sequence that has
an order corresponding to elements, like numeric or alphabetical, ascending or descending. The
NumPy ndarray object has a function called sort(), that will sort a specified array.

Numpy Array Filter


Getting some elements out of an existing array and creating a new array out of them is called
filtering. In NumPy, you filter an array using a boolean index list. A boolean index list is a list of
booleans corresponding to indexes in the array. If the value at an index is True that element is
contained in the filtered array, if the value at that index is False that element is excluded from the
filtered array.

Numpy-Random Number

• Numpy has sub module called random that is equipped with the rand() function. Using this we can
generate the random numbers between 0 and 1.0. random.rand()

• We can create a 1D array of random numbers by passing the size of array to the rand() function as:
a=random.rand(n)

• We can create a 2D array of random numbers by passing the size of array to the rand() function as:
a=random.rand(m,n)

Numpy-I/O

• The numpy.save() file stores the input array in a disk file with npyextension.

from numpy import *

a=arange(8).reshape(4,2)

print(a)

save('outfile',a)

• To reconstruct array from outfile.npy, use load() function.

from numpy import *

b=load('outfile.npy')

print(b)
• The storage and retrieval of array data in simple text file format is done with savetxt() and loadtxt()
functions.

from numpy import *

a=arange(8).reshape(4,2)

savetxt('out.txt',a)

b=loadtxt('out.txt')

print(b)

Numpy-Linear Algebra

• NumPy package contains numpy.linalg module that provides all the functionality required for linear
algebra. Some of the important function in this module are as follows:

– dot(): To find the dot product of two arrays.

– vdot():To find the dot product of two vectors.

– inner(): To find the inner product of two arrays.

– matmul():To find the matrix product of two arrays.

– determinant(): To find the determinant of the array.

– inv(): find the multiplicative inverse of a matrix

– solve(): Solve the linear matrix equation


MACHINE LEARNING

Introduction

Machine Learning is the field of study that gives computers the capability to learn without being
explicitly programmed. It is a subfield of artificial intelligence, which is broadly defined as the
capability of a machine to imitate intelligent human behaviour.

 Machine learning is data driven technology.


 Machine can learn itself from past data and automatically improve.
 From the given dataset, it detects various patterns on data.
 It is similar to data mining because it is also deals with the huge amount of data.

Supervised Machine Learning


Supervised machine learning models are trained with labeled data sets, which allow the models to
learn and grow more accurate over time. For example, an algorithm would be trained with pictures of
dogs and other things, all labeled by humans, and the machine would learn ways to identify pictures
of dogs on its own. Supervised machine learning is the most common type used today.

The model or algorithm is presented with example inputs and their desired outputs and then finds
patterns and connections between the input and the output. The goal is to learn a general rule that
maps inputs to outputs. The training process continues until the model achieves the desired level of
accuracy on the training data. Some real-life examples are:
 Image Classification: You train with images/labels. Then in the future, you give a new
image expecting that the computer will recognize the new object.
 Market Prediction/Regression: You train the computer with historical market data
and ask the computer to predict the new price in the future.

Unsupervised Machine Learning

In unsupervised machine learning, a program looks for patterns in unlabelled data. Unsupervised
machine learning can find patterns or trends that people aren’t explicitly looking for. For example, an
unsupervised machine learning program could look through online sales data and identify different
types of clients making purchases.

No labels are given to the learning algorithm, leaving it on its own to find structure in its input. It is
used for clustering populations in different groups. Unsupervised learning can be a goal in itself
(discovering hidden patterns in data).
 Clustering: You ask the computer to separate similar data into clusters, this is essential
in research and science.
 High-Dimension Visualization: Use the computer to help us visualize high-dimension
data.
 Generative Models: After a model captures the probability distribution of your input
data, it will be able to generate more data. This can be very useful to make your
classifier more robust.

Python libraries for machine learning: Sci-kit learn


Scikit-learn is one of the most popular ML libraries for classical ML algorithms. It is built on top of
two basic Python libraries, viz., NumPy and SciPy. Scikit-learn supports most of the supervised and
unsupervised learning algorithms. Scikit-learn can also be used for data-mining and data-analysis,
which makes it a great tool who is starting out with ML.

REGRESSION

Linear regression

Logistic regression

Logistic regression is a supervised machine learning algorithm used for classification


tasks where the goal is to predict the probability that an instance belongs to a given class or not.
Logistic regression is a statistical algorithm which analyze the relationship between two data
factors.

Logistic regression is used for binary classification where we use sigmoid function, that takes input
as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an
input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to Class 0 .

 Logistic regression predicts the output of a categorical dependent variable. Therefore, the
outcome must be a categorical or discrete value.
 It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as
0 and 1, it gives the probabilistic values which lie between 0 and 1.
 In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).

Types of Logistic Regression


On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types
of dependent variables, such as “low”, “Medium”, or “High”.

Over-fitting
Overfitting occurs when the model fits the training data too closely, capturing noise or random
fluctuations that do not represent the true underlying relationship between variables. This can lead
to poor generalization performance on new, unseen data.

A statistical model is said to be overfitted when the model does not make accurate predictions on
testing data. When a model gets trained with so much data, it starts learning from the noise and
inaccurate data entries in our data set. And when testing with test data results in High variance.
Then the model does not categorize the data correctly, because of too many details and noise. The
causes of overfitting are the non-parametric and non-linear methods because these types of machine
learning algorithms have more freedom in building the model based on the dataset and therefore
they can really build unrealistic models. A solution to avoid overfitting is using a linear algorithm if
we have linear data or using the parameters like the maximal depth if we are using decision trees.
In a nutshell, Overfitting is a problem where the evaluation of machine learning algorithms on
training data is different from unseen data.

Reasons for Overfitting:


1. High variance and low bias.
2. The model is too complex.
3. The size of the training data.

Techniques to Reduce Overfitting


1. Improving the quality of training data reduces overfitting by focusing on meaningful
patterns, mitigate the risk of fitting the noise or irrelevant features.
2. Increase the training data can improve the model’s ability to generalize to unseen data
and reduce the likelihood of overfitting.
3. Reduce model complexity.
4. Early stopping during the training phase (have an eye over the loss over the training
period as soon as loss begins to increase stop training).
5. Ridge Regularization and Lasso Regularization .
6. Use dropout for neural networks to tackle overfitting.

Regularization Techniques for Linear Models


Lasso Regression (L1 Regularization)
Lasso Regression is a technique used for regularizing a linear regression model, it adds a penalty
term to the linear regression objective function to prevent overfitting.
The objective function after applying lasso regression is:
𝐽(𝜃)=12𝑚∑𝑖=1𝑚(𝑦𝑖^–𝑦𝑖)+𝜆∑𝑗=1𝑛∣𝜃𝑗∣J(θ)=2m1∑i=1m(yi–yi)+λ∑j=1n∣θj∣
 the first term is the least squares loss, representing the squared difference between
predicted and actual values.
 the second term is the L1 regularization term, it penalizes the sum of absolute values of
the regression coefficient θj.

Ridge Regression (L2 Regularization)


Ridge regression is a linear regression technique that adds a regularization term to the standard
linear objective. Again, the goal is to prevent overfitting by penalizing large coefficient in linear
regression equation. It useful when the dataset has multicollinearity where predictor variables are
highly correlated.
The objective function after applying ridge regression is:
𝐽(𝜃)=12𝑚∑𝑖=1𝑚(𝑦𝑖^–𝑦𝑖)+𝜆∑𝑗=1𝑛𝜃𝑗2J(θ)=2m1∑i=1m(yi–yi)+λ∑j=1nθj2
 the first term is the least squares loss, representing the squared difference between
predicted and actual values.
 the second term is the L1 regularization term, it penalizes the sum of square of values of
the regression coefficient θj.
Elastic Net Regression
Elastic Net Regression is a hybrid regularization technique that combines the power of both L1 and
L2 regularization in linear regression objective.

You might also like