Python For Machine Learning
Python For Machine Learning
Variable
In [5]: name ="VaibhaV"
In [6]: print(name)
VaibhaV
Data type
int
float
Boolean
String
In [6]: num1= 1
print(type(num1))
num2= 2.303
print(type(num2))
num3= True
print(type(num3))
num4= "VaibhaV"
print(type(num4))
<class 'int'>
<class 'float'>
<class 'bool'>
<class 'str'>
Operators
Arithmatic Operator +, -, *, /
In [7]: num1, num2 = 36, 7
file:///Users/vaibhavarde/Downloads/python_notebook.html 1/95
11/08/2024, 17:41 python_notebook
43
Out[8]:
29
Out[9]:
252
Out[10]:
5.142857142857143
Out[11]:
5
Out[12]:
True
Out[13]:
False
Out[14]:
False
Out[15]:
True
Out[16]:
False
Out[18]:
True
Out[19]:
True
Out[20]:
file:///Users/vaibhavarde/Downloads/python_notebook.html 2/95
11/08/2024, 17:41 python_notebook
Python Tokens
Smallest meaningfull componet in program
Keywords : Special reserved words like For If Yield
Identifiers: Names used for variables, functions or objects
Literals : Costants in python
num = "Test" : here "num" is Identifier and "Test" is literal
Operators
String:
In [21]: str1 = "I love Pizza"
str1.find("Piz")
7
Out[21]:
List
In Python, a list is a built-in data structure that allows you to store an ordered collection
of items. These items can be of any data type, including integers, strings, floats, or even
other lists. Lists are mutable, meaning you can change their content without changing
their identity.
Here is a basic definition and example of a list in Python:
Definition:A list is defined by placing a comma-separated sequence of items within
square brackets [].
Key Characteristics:
Ordered: The items have a defined order, and that order will not change unless you
explicitly modify the list.
Mutable: You can change, add, or remove items after the list has been created.
Heterogeneous: A list can contain items of different data types.
Common Operations:
Accessing Elements: Use indexing to access individual elements.
Slicing: Use slicing to access a range of elements.
Appending: Use append() to add an element to the end of the list.
Inserting: Use insert() to add an element at a specific position.
Removing: Use remove() to remove a specific element.
Popping: Use pop() to remove an element at a specific position and return it.
file:///Users/vaibhavarde/Downloads/python_notebook.html 3/95
11/08/2024, 17:41 python_notebook
In [25]: l1[0]=100
l1
In [26]: l1.append("VaibhaV")
l1
In [27]: l1.pop()
'VaibhaV'
Out[27]:
In [28]: l1
In [29]: l1.reverse()
l1
In [31]: l1.sort()
l1
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[31], line 1
----> 1 l1.sort()
2 l1
file:///Users/vaibhavarde/Downloads/python_notebook.html 4/95
11/08/2024, 17:41 python_notebook
[1, 'b', True, 1, 'b', True, 1, 'b', True]
Out[34]:
Tuple
Tuple are ordered collection of elements enclosed within ()
Tuples are immutable, once created we can not update or change values of tuple
In [35]: tup1 = (1, "b", True)
tup1
In [36]: type(tup1)
tuple
Out[36]:
In [38]: tup2[1:6]
In [39]: tup2[::2]
In [41]: tup2[2]="Test"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[41], line 1
----> 1 tup2[2]="Test"
Dictionary
Dictionary is an unordered collection of key-value pairs enclosed with {}
Dictionary is mutable
In [43]: fruit = {"Mango": 10, 'Banana': 20, 'Apple':30}
In [44]: fruit.keys()
file:///Users/vaibhavarde/Downloads/python_notebook.html 5/95
11/08/2024, 17:41 python_notebook
In [45]: fruit.values()
In [46]: fruit["Guava"]=40
fruit
In [47]: fruit
In [49]: f1.update(f2)
f1
In [50]: f1.pop("Guava")
40
Out[50]:
In [51]: f1
Set
Set ia a unordered and unindexed collection of elements enclosed with {}
Duplicates are not allowed in Set
In [52]: # Try duplicates with set
s1 = {"c", "a", "b", 1, 2, 3, "a", "b", 1, 2}
s1
In [53]: s1.add("Test")
s1
In [54]: s1.remove("b")
s1
file:///Users/vaibhavarde/Downloads/python_notebook.html 6/95
11/08/2024, 17:41 python_notebook
In [56]: s1
In [57]: s2
In [58]: s1.intersection(s2)
{1, 2, 'a'}
Out[58]:
In [60]: if a>b:
print("A is greater than B")
else:
print("B is greater than A")
B is greater than A
In [61]: result= "A is greater than B" if a>b else "B is greater than A"
result
z is not in tup
Looping Statement
To repeat task multiple times
In [64]: fruit = ['Mango', 'Banana', 'Apple']
for frt in fruit:
print(frt)
Mango
Banana
Apple
In [65]: i= 1
while i<=10:
print(i)
i+=1
file:///Users/vaibhavarde/Downloads/python_notebook.html 7/95
11/08/2024, 17:41 python_notebook
1
2
3
4
5
6
7
8
9
10
In [67]: p1 = Phone()
In [68]: p1.make_call()
In [69]: p1.play_game()
Playing game
In [70]: p1.set_color("Red")
In [71]: p1.set_cost(1000)
In [72]: p1.show_color()
'Red'
Out[72]:
In [73]: p1.show_cost()
1000
Out[73]:
Constructor
In [74]: class Phone:
def __init__(self, color, cost):
file:///Users/vaibhavarde/Downloads/python_notebook.html 8/95
11/08/2024, 17:41 python_notebook
self.color = color
self.cost = cost
def make_call(self):
print("Making phone call")
def play_game(self):
print("Playing game")
In [76]: p1.color
'Red'
Out[76]:
Inheritance
With inheritance one class can derive the properties of another class
In [77]: class Vehicle:
def __init__(self, mileage, cost):
self.mileage = mileage
self.cost = cost
def vehicle_details(self):
print(f"Vehicle has {self.mileage} mileage and it cost ${self.cost}"
In [81]: c1.show_car()
I am Car
In [82]: c1.vehicle_details()
In [85]: c1.show_car()
In [86]: c1.vehicle_details()
file:///Users/vaibhavarde/Downloads/python_notebook.html 9/95
11/08/2024, 17:41 python_notebook
Vehicle has 100 mileage and it cost $15000
Multiple Inheretance
Child inherite more than one parent class
In [93]: class Parent1:
def assign_string_1(self, name):
self.name1 = name
def show_string_1(self):
return self.name1
class Parent2:
def assign_string_2(self, name):
self.name2 = name
def show_string_2(self):
return self.name2
In [94]: c1 = Child()
c1.assign_string_1("One")
c1.assign_string_2("Two")
c1.assign_string_3("Three")
In [95]: print(c1.show_string_1())
print(c1.show_string_2())
print(c1.show_string_3())
One
Two
Three
class Child(Parent1):
def assign_age(self, age):
self.age = age
def show_age(self):
return self.age
class GrandChild(Child):
def assign_gender(self, gender):
self.gender = gender
def show_gender(self):
return self.gender
file:///Users/vaibhavarde/Downloads/python_notebook.html 10/95
11/08/2024, 17:41 python_notebook
In [97]: gc = GrandChild()
In [98]: gc.assign_name("Test")
In [99]: gc.assign_age(21)
In [100… gc.assign_gender("Male")
File Handling
Open mode : Open text file for reading, writing and doing some other stuff
Read mode : To read the text which is already stored in your text file
Write mode : To write your text in .txt file
In [102… file = open('processed_data/test.txt', 'w')
try:
file.write('Chai aur code line 1')
finally:
file.close()
try:
file.write('\nChai aur code line 2')
finally:
file.close()
41
Readline functions
In [105… f = open('processed_data/test.txt', 'w')
f.write("I am learning file handling")
f.write("\n Topic is file handling, read and write")
40
Out[105]:
In [107… print(f.readline())
In [108… print(f.readline())
file:///Users/vaibhavarde/Downloads/python_notebook.html 11/95
11/08/2024, 17:41 python_notebook
In [109… print(f.readline())
Try Except
In [110… a =input("Enter the number 1 : ")
b =input("Enter the number 2 : ")
try:
c= int(a) + b
print(c)
except Exception as e:
print(e)
In [111… try:
c= int(a) + b
print(c)
except:
print("Error in try block")
try:
c= int(a) + int(b)
print(c)
except Exception as e:
print(e)
else:
print("All inputs are good, Else clause got executed!!")
3
All inputs are good, Else clause got executed!!
try:
c= int(a) + b
print(c)
except:
print("Error in try block")
file:///Users/vaibhavarde/Downloads/python_notebook.html 12/95
11/08/2024, 17:41 python_notebook
finally:
print("Finally runs post try except")
try:
c= int(a) + int(b)
print(c)
except Exception as e:
print(e)
else:
print("All inputs are good, Else clause got executed!!")
finally:
print("Finally runs post try except")
3
All inputs are good, Else clause got executed!!
Finally runs post try except
Arrays:
Linear Data structure
Continous Memory Locations
Access elements randomly
Homogeneous elements i.e. similar elements ### Applications
Storing Information - linear fashion
Suitable for applications that require frequent searching ### 1-Dimentional Array
1D can be related to a row
Elements are stored one after another
Only one subscript or index is used ### Declaration and Initialization
Array declaration:
Datatype varname [size]
Can also do declaration and initialization at once:
Datatype varname [] = {ele1, ele2, ele3, ele4}; ### 2-Dimentional Array
2D can be related to a table or matrix
Elements are stored one after another i.e. one 1D array inside another.
file:///Users/vaibhavarde/Downloads/python_notebook.html 13/95
11/08/2024, 17:41 python_notebook
Two subscripts or indices are used, one row and one column.
Dimensions depends upon the number of subscripts used.
In [116… # 1D array
print("How many elements to store inside the array", end="")
num = input()
arr = []
print("\nEnter", num, "Element:", end="")
num = int(num)
for i in range(num):
element = input()
arr.append(element)
print("\nThe array elements are")
for i in range(num):
print(arr[i], end=" ")
In [117… # 2D Array
r_num = int(input("Input number of rows: "))
c_num = int(input("Input number of columns: "))
twoD_arr = [[0 for col in range(c_num)] for row in range(r_num)]
# print("Enter the elements of the matrix: ")
# for i in range(r_num):
# for j in range(c_num):
# twoD_arr[i][j] = int(input())
print("The matrix is: ")
for i in range(r_num):
for j in range(c_num):
print(twoD_arr[i][j], i*j)
twoD_arr[i][j]= i*j
print(twoD_arr)
Advantages of an array
Random access elements
Easy sorting and iteration
Replacement of multiple variables ### Disadvantage of an array
Size is fixed
Difficult to insert and delete
If capacity is more and occupancy less most of the array gets wasted
Needs continous memory
file:///Users/vaibhavarde/Downloads/python_notebook.html 15/95
11/08/2024, 17:41 python_notebook
Stack
Linear data structure
It follows Last In First Out(LIFO) order
Insertion and removal of the element has done at one end
Push is used for inserting an element in a stack
Pop is used to removal an element in a stack ### Functions
push(x) - it is used to insert the element 'x' at the end of a stack.
pop() - it is used to remove the topmost/last element of a stack.
size() - gives the size/length of a stack.
top() - give reference of last element present in stack
empty() - return true for an empty stack ### Implementation of Stack Several ways
to implement stack in python
list
collections.deque
queue.LifoQueue ### Implementation of stack using list List in python can be used
as stack
append() - it is used to insert the element
pop() - it is used to remove the last element
Logic-
stack = []
stack.append("abc")
print(stack.pop())
In [122… # Implementation of stack using list
stack= []
stack.append("Welcome")
stack.append("to")
stack.append("great learning")
print(stack)
print(stack.pop())
print(stack)
stack.append("abc")
print(stack.pop())
In [123… ### Implementation of stack using deque
from collections import deque
stack = deque()
stack.append("Welcome")
stack.append("to")
stack.append("great learning")
print(stack)
print(stack.pop())
print(stack)
file:///Users/vaibhavarde/Downloads/python_notebook.html 17/95
11/08/2024, 17:41 python_notebook
What is Queue
Linear data structure
Follows FIFO: First in first out
Insertion can take place from the rear end
Deletion can take place from front end
Queue at ticket counter, bus station
4 major operations
enqueue(ele) - used to insert element at top
dequeue() - removes the top element from queue
peekfirst() - to get first element of queue
peeklast() - to get last element of queue
All operation works in constant time i.e. 0(1)
Applications of Queue
Scheduling
Maintaining playlist
Interrupt handling ## Queue Implementation
Enqueue
Dequeue
Display
In [125… # Implementation of stack using queue
class Queue:
def __init__(self):
self.queue = []
def enqueue(self, data):
self.queue.append(data)
def dequeue(self):
if len(self.queue)<1:
return None
return self.queue.pop(0)
def display(self):
print(self.queue)
def size(self):
return len(self.queue)
In [126… q = Queue()
q.enqueue(1)
q.enqueue(2)
q.enqueue(3)
q.enqueue(4)
q.display()
q.dequeue()
q.display()
[1, 2, 3, 4]
[2, 3, 4]
Queue Implementation
A program implementation for circular queue:
file:///Users/vaibhavarde/Downloads/python_notebook.html 18/95
11/08/2024, 17:41 python_notebook
Enqueue
Dequeue
In [127… # circular queue
class MyCircularQueue():
def __init__(self, k):
self.k = k
self.queue = [None] * k
self.head = self.tail = -1
def dequeue(self):
if(self.head == -1):
print("Queue is empty")
elif(self.head == self.tail):
temp = self.queue[self.head]
self.head = -1
self.tail = -1
return temp
else:
temp = self.queue[self.head]
self.head = (self.head + 1) % self.k
return temp
def printCQueue(self):
if(self.head == -1):
print("No element in circular queue is found")
elif(self.tail >= self.head):
for i in range(self.head, self.tail + 1):
print(self.queue[i], end = " ")
print()
else:
for i in range(self.head, self.k):
print(self.queue[i], end = " ")
for i in range(0, self.tail + 1):
print(self.queue[i], end = " ")
print()
obj.dequeue()
print("After removing an element from the queue")
obj.printCQueue()
file:///Users/vaibhavarde/Downloads/python_notebook.html 19/95
11/08/2024, 17:41 python_notebook
Initial queue values
12 22 31 44 57
After removing an element from the queue
22 31 44 57
Advantages of queue
Maintains data in FIFO manner
Insersion from beginning and deletion from end takes O(1) time ### Disadvantages
of queue
Manipulation is restricted front and rear
Not much flexible
Linked List
It is collection or group of nodes
Each node contains data and reference (pointer) which contains the address of next
node.
It is linear data structure
Elements are stored randomly in memory
Why Linked List
Linked list is having more efficiency for performing the operations as compared to
list
Elements are stored randomly whereas in list at continuous memory
Accessing the elements in linked list will be slower as compared to list
Utilization of memory is higher than the list
Singly Linked List
It is traversed only in one direction
Operations of Singly Linked List
Insertion
Deletion
Traversal
Pseudo Code
Creating a node in Singly Linked List
Class Node:
def __init__(self, data):
self.data=data;
file:///Users/vaibhavarde/Downloads/python_notebook.html 20/95
11/08/2024, 17:41 python_notebook
self.reference=None;
node1= Node(7)
print(node1.data)
print(node1.reference)
Creating a clas of singly Linked list
class LinkedList
def __init__(self):
self.head=None;
n1 = Node(7)
print(n1.data)
print(n1.next)
7
None
sll = SinglyLinkedList()
print(sll.head)
None
Searching Algorithms
Linear Search Algorithm
What is Linear Search
It helps you to search for an element in a linear data structure.
It checks each and every element for the element to be searched.
Since this is done in linear fashion, it is termed as linear search.
In [134… # linearSearch(arr, item)
# for each element in array
# if element == item
# return index
# return -1
file:///Users/vaibhavarde/Downloads/python_notebook.html 21/95
11/08/2024, 17:41 python_notebook
for i in range(0, n):
if (array[i] == x):
return i
return -1
array =[2, 4, 0, 1, 9]
n = len(array)
x = 1
result = linearSearch(array, n, x)
if(result == -1):
print("Element not found")
else:
print("Element found at index: ", result)
file:///Users/vaibhavarde/Downloads/python_notebook.html 22/95
11/08/2024, 17:41 python_notebook
# return binarysearch(arr, mid + 1, end, item)
# else
# return binarysearch(arr, beg, mid - 1, item)
# else
# return -1
file:///Users/vaibhavarde/Downloads/python_notebook.html 23/95
11/08/2024, 17:41 python_notebook
It works just like playing cards i.e. picking one card and sorting it with the cards that
we have in our hand already which in turn are sorted
With every iteration, one item from unsorted is moved to the sorted part
First element is picked and considered as sorted
Then we start picking from 2nd elements onwards and start comparing it with
elements in sorted part.
We shift the elements from sorted by one element until an appropriate location is
not found for the picked element
This continues till all the elements get exhausted.
In [138… # Insertion sort using Python
def insersion_sort(array):
for step in range(1, len(array)):
key = array[step]
j = step - 1
while j >= 0 and key < array[j]:
array[j + 1] = array[j]
j = j - 1
array[j + 1] = key
return array
file:///Users/vaibhavarde/Downloads/python_notebook.html 24/95
11/08/2024, 17:41 python_notebook
Quick sort
It is one of the most widely used sorting algorithm
It follows divide and conquer algorithm
Recursion is used in quicksort implementation
In each recursive call, a pivot is chosen then the array is partitioned in such a way
that all the elements less than pivot lie to the left and all the elements greater than
pivot lie to the right
After every call the chosen pivot occupies its correct position in the array which is
supposed to be in sorted order
So with each step, our problem gets reduced by 2 which leads to quick sorting
Pivot can be last element of current array, first element of current array or any
random element
In [139… ## Quick sort using python
def partition(array, low, high):
pivot = array[high]
i = low - 1
for j in range(low, high):
if array[j] <= pivot:
i = i + 1
(array[i], array[j]) = (array[j], array[i])
(array[i + 1], array[high]) = (array[high], array[i + 1])
return i + 1
In [140… d = [9,8,7,2,10,20,1]
print("Unsorted data: ", d)
size = len(d)
d = quick_sort(d, 0, size - 1)
print("Sorted data: ", d)
Merge - Algorithm
Create 2 subarrays left and right
Create 3 iterators i, j, k
Insert elements in lft and right (i& j)
k - Replace the values in the original array
Pick the larger elements from left and right and place them in the correct position
If there are no elements in either left or right, pick up the remaining elements either
from left or right and insert in original array
In [142… def merge_sort(arr):
if len(arr) > 1:
mid = len(arr) // 2
left_arr = arr[:mid]
right_arr = arr[mid:]
def printList(arr):
for i in range(len(arr)):
print(arr[i], end=" ")
print()
Sorted array is: [2, 9, 11, 18, 22, 33, 34, 88]
In [152… type(arr)
numpy.ndarray
Out[152]:
In [145… # Multi-dimensional
n1 = np.array([[1, 2, 3], [4, 5, 6]])
n1
array([[1, 2, 3],
Out[145]:
[4, 5, 6]])
file:///Users/vaibhavarde/Downloads/python_notebook.html 27/95
11/08/2024, 17:41 python_notebook
In [147… n3 = np.zeros((1,2))
n3
array([[0., 0.]])
Out[147]:
array([[1., 1.],
Out[148]:
[1., 1.],
[1., 1.]])
array([[10, 10],
Out[149]:
[10, 10]])
array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
Out[150]:
In [151… n6 = np.arange(10,50,5)
n6
In [152… n7 = np.arange(50,10,-5)
n7
file:///Users/vaibhavarde/Downloads/python_notebook.html 28/95
11/08/2024, 17:41 python_notebook
(2, 3)
Out[155]:
array([[1, 2],
Out[156]:
[3, 4],
[5, 6]])
In [157… # vstack()
n1 = np.array([1, 2, 3])
n2 = np.array([4, 5, 6])
np.vstack((n1, n2))
array([[1, 2, 3],
Out[157]:
[4, 5, 6]])
In [158… # vstack()
n1 = np.array([1, 2, 3])
n2 = np.array([4, 5, 6])
n3 = np.array([1, 2, 3])
np.vstack((n1, n2, n3))
array([[1, 2, 3],
Out[158]:
[4, 5, 6],
[1, 2, 3]])
In [159… # hstack()
n1 = np.array([1, 2, 3])
n2 = np.array([4, 5, 6])
np.hstack((n1, n2))
array([1, 2, 3, 4, 5, 6])
Out[159]:
In [160… # hstack()
n1 = np.array([1, 2, 3])
n2 = np.array([4, 5, 6])
n3 = np.array([4])
np.hstack((n1, n2, n3))
array([1, 2, 3, 4, 5, 6, 4])
Out[160]:
In [161… # column_stack()
n1 = np.array([1, 2, 3])
n2 = np.array([4, 5, 6])
np.column_stack((n1, n2))
array([[1, 4],
Out[161]:
[2, 5],
[3, 6]])
array([50, 60])
Out[162]:
file:///Users/vaibhavarde/Downloads/python_notebook.html 29/95
11/08/2024, 17:41 python_notebook
array([10, 20, 30, 40])
Out[163]:
21
Out[165]:
array([5, 7, 9])
Out[166]:
array([ 6, 15])
Out[167]:
array([2, 3, 4])
Out[168]:
In [169… # Subtraction
n1 = np.array([1, 2, 3])
n1-1
array([0, 1, 2])
Out[169]:
In [170… # Multiplication
n1 = np.array([1, 2, 3])
print(n1 * 2)
print(n1)
[2 4 6]
[1 2 3]
In [171… # Division
n1 = np.array([1, 2, 3])
n1/2
array([0.5, 1. , 1.5])
Out[171]:
file:///Users/vaibhavarde/Downloads/python_notebook.html 30/95
11/08/2024, 17:41 python_notebook
print(f"Mean is : {np.mean(n1)}")
print(f"median is : {np.median(n1)}")
print(f"Standard deviation is : {np.std(n1)}")
Mean is : 48.875
median is : 50.0
Standard deviation is : 24.17352632530058
array([[1, 2, 3],
Out[173]:
[4, 5, 6],
[7, 8, 9]])
In [174… # rows
print(n1[0])
print(n1[1])
print(n1[2])
[1 2 3]
[4 5 6]
[7 8 9]
In [175… # columns
print(n1[:, 0])
print(n1[:, 1])
print(n1[:, 2])
[1 4 7]
[2 5 8]
[3 6 9]
file:///Users/vaibhavarde/Downloads/python_notebook.html 31/95
11/08/2024, 17:41 python_notebook
np.dot(n1, n2) : [[ 30 23 19]
[ 84 65 58]
[138 107 97]]
np.dot(n2, n1) : [[ 30 23 19]
[ 84 65 58]
[138 107 97]]
n1.dot(n2) : [[ 30 23 19]
[ 84 65 58]
[138 107 97]]
n2.dot(n1) : [[ 93 117 141]
[ 54 69 84]
[ 18 24 30]]
array([[1, 2, 3],
Out[178]:
[4, 5, 6],
[7, 8, 9]])
Pandas
Pandas stand for panel data and is the core library for data manipulation and data
analysis
It consist of single and multi dimentional data structures for data manipulation
Series Object: One-dimensional labeled array
Data-frame: Multi-dimensional labeled array
In [3]: import pandas as pd
0 1
Out[180]:
1 2
2 3
3 4
4 5
dtype: int64
In [181… type(s1)
pandas.core.series.Series
Out[181]:
a 1
Out[182]:
b 2
c 3
d 4
e 5
dtype: int64
file:///Users/vaibhavarde/Downloads/python_notebook.html 32/95
11/08/2024, 17:41 python_notebook
a 1
Out[183]:
b 2
c 3
d 4
e 5
dtype: int64
b 2
Out[184]:
e 5
c 3
a 1
d 4
dtype: int64
4
Out[185]:
0 1
Out[186]:
1 2
2 3
3 4
dtype: int64
3 4
Out[187]:
4 5
5 6
6 7
dtype: int64
4 5
Out[189]:
5 6
6 7
dtype: int64
file:///Users/vaibhavarde/Downloads/python_notebook.html 33/95
11/08/2024, 17:41 python_notebook
0 6
Out[190]:
1 7
2 8
3 9
4 10
5 11
6 12
dtype: int64
0 1
Out[191]:
1 3
2 5
3 7
4 9
5 11
6 13
dtype: int64
Pandas Dataframe
Dataframe is a 2-dimentional labelled data-structure
A dataframe is a collection of series (A data-frame comprises of rows and columns)
In [192… df = pd.DataFrame({"Name": ["Meera", "Mirukali", "Ganu"], "Marks": [7, 5, 6]
df
In [193… type(df)
pandas.core.frame.DataFrame
Out[193]:
In [195… iris.tail()
file:///Users/vaibhavarde/Downloads/python_notebook.html 34/95
11/08/2024, 17:41 python_notebook
In [196… iris.count()
sepal_length 150
Out[196]:
sepal_width 150
petal_length 150
petal_width 150
species 150
dtype: int64
In [197… iris.shape
(150, 5)
Out[197]:
In [198… iris.describe()
sepal_length 5.1
Out[199]:
sepal_width 3.5
petal_length 1.4
petal_width 0.2
species setosa
Name: 0, dtype: object
file:///Users/vaibhavarde/Downloads/python_notebook.html 35/95
11/08/2024, 17:41 python_notebook
In [204… iris.head()
file:///Users/vaibhavarde/Downloads/python_notebook.html 36/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 37/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 38/95
11/08/2024, 17:41 python_notebook
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[208], line 2
1 # print(f"Mean of Iris data is : {iris.mean()}")
----> 2 iris.mean()
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/frame.py:11693, in DataFrame.mean(self, axis, skipna, numeric_only, **kwa
rgs)
11685 @doc(make_doc("mean", ndim=2))
11686 def mean(
11687 self,
(...)
11691 **kwargs,
11692 ):
> 11693 result = super().mean(axis, skipna, numeric_only, **kwargs)
11694 if isinstance(result, Series):
11695 result = result.__finalize__(self, method="mean")
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/generic.py:12420, in NDFrame.mean(self, axis, skipna, numeric_only, **kwa
rgs)
12413 def mean(
12414 self,
12415 axis: Axis | None = 0,
(...)
12418 **kwargs,
12419 ) -> Series | float:
> 12420 return self._stat_function(
12421 "mean", nanops.nanmean, axis, skipna, numeric_only, **kwarg
s
12422 )
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/generic.py:12377, in NDFrame._stat_function(self, name, func, axis, skipn
a, numeric_only, **kwargs)
12373 nv.validate_func(name, (), kwargs)
12375 validate_bool_kwarg(skipna, "skipna", none_allowed=False)
> 12377 return self._reduce(
12378 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric
_only
12379 )
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/frame.py:11562, in DataFrame._reduce(self, op, name, axis, skipna, numeri
c_only, filter_type, **kwds)
11558 df = df.T
11560 # After possibly _get_data and transposing, we are now in the
11561 # simple case where we can use BlockManager.reduce
> 11562 res = df._mgr.reduce(blk_func)
11563 out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]
11564 if out_dtype is not None and out.dtype != "boolean":
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/internals/managers.py:1500, in BlockManager.reduce(self, func)
1498 res_blocks: list[Block] = []
1499 for blk in self.blocks:
-> 1500 nbs = blk.reduce(func)
1501 res_blocks.extend(nbs)
1503 index = Index([None]) # placeholder
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/internals/blocks.py:404, in Block.reduce(self, func)
398 @final
file:///Users/vaibhavarde/Downloads/python_notebook.html 39/95
11/08/2024, 17:41 python_notebook
399 def reduce(self, func) -> list[Block]:
400 # We will apply the function and reshape the result into a sing
le-row
401 # Block with the same mgr_locs; squeezing will be done at a hi
gher level
402 assert self.ndim == 2
--> 404 result = func(self.values)
406 if self.values.ndim == 1:
407 res_values = result
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/frame.py:11481, in DataFrame._reduce.<locals>.blk_func(values, axis)
11479 return np.array([result])
11480 else:
> 11481 return op(values, axis=axis, skipna=skipna, **kwds)
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/nanops.py:147, in bottleneck_switch.__call__.<locals>.f(values, axis, ski
pna, **kwds)
145 result = alt(values, axis=axis, skipna=skipna, **kwds)
146 else:
--> 147 result = alt(values, axis=axis, skipna=skipna, **kwds)
149 return result
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/nanops.py:404, in _datetimelike_compat.<locals>.new_func(values, axis, sk
ipna, mask, **kwargs)
401 if datetimelike and mask is None:
402 mask = isna(values)
--> 404 result = func(values, axis=axis, skipna=skipna, mask=mask, **kwarg
s)
406 if datetimelike:
407 result = _wrap_results(result, orig_values.dtype, fill_value=iN
aT)
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/nanops.py:720, in nanmean(values, axis, skipna, mask)
718 count = _get_counts(values.shape, mask, axis, dtype=dtype_count)
719 the_sum = values.sum(axis, dtype=dtype_sum)
--> 720 the_sum = _ensure_numeric(the_sum)
722 if axis is not None and getattr(the_sum, "ndim", False):
723 count = cast(np.ndarray, count)
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/nanops.py:1686, in _ensure_numeric(x)
1683 inferred = lib.infer_dtype(x)
1684 if inferred in ["string", "mixed"]:
1685 # GH#44008, GH#36703 avoid casting e.g. strings to numeric
-> 1686 raise TypeError(f"Could not convert {x} to numeric")
1687 try:
1688 x = x.astype(np.complex128)
file:///Users/vaibhavarde/Downloads/python_notebook.html 40/95
11/08/2024, 17:41 python_notebook
icavirginicavirginicavirginicavirginicavirginicavirginicavirginicavirginica
virginicavirginicavirginicavirginicavirginicavirginicavirginicavirginicavir
ginicavirginicavirginicavirginicavirginicavirginicavirginicavirginicavirgin
icavirginicavirginicavirginicavirginicavirginicavirginicavirginicavirginica
virginicavirginicavirginicavirginicavirginicavirginicavirginicavirginicavir
ginica'] to numeric
In [209… iris.median()
file:///Users/vaibhavarde/Downloads/python_notebook.html 41/95
11/08/2024, 17:41 python_notebook
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[209], line 1
----> 1 iris.median()
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/frame.py:11706, in DataFrame.median(self, axis, skipna, numeric_only, **k
wargs)
11698 @doc(make_doc("median", ndim=2))
11699 def median(
11700 self,
(...)
11704 **kwargs,
11705 ):
> 11706 result = super().median(axis, skipna, numeric_only, **kwargs)
11707 if isinstance(result, Series):
11708 result = result.__finalize__(self, method="median")
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/generic.py:12431, in NDFrame.median(self, axis, skipna, numeric_only, **k
wargs)
12424 def median(
12425 self,
12426 axis: Axis | None = 0,
(...)
12429 **kwargs,
12430 ) -> Series | float:
> 12431 return self._stat_function(
12432 "median", nanops.nanmedian, axis, skipna, numeric_only, **k
wargs
12433 )
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/generic.py:12377, in NDFrame._stat_function(self, name, func, axis, skipn
a, numeric_only, **kwargs)
12373 nv.validate_func(name, (), kwargs)
12375 validate_bool_kwarg(skipna, "skipna", none_allowed=False)
> 12377 return self._reduce(
12378 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric
_only
12379 )
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/frame.py:11562, in DataFrame._reduce(self, op, name, axis, skipna, numeri
c_only, filter_type, **kwds)
11558 df = df.T
11560 # After possibly _get_data and transposing, we are now in the
11561 # simple case where we can use BlockManager.reduce
> 11562 res = df._mgr.reduce(blk_func)
11563 out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]
11564 if out_dtype is not None and out.dtype != "boolean":
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/internals/managers.py:1500, in BlockManager.reduce(self, func)
1498 res_blocks: list[Block] = []
1499 for blk in self.blocks:
-> 1500 nbs = blk.reduce(func)
1501 res_blocks.extend(nbs)
1503 index = Index([None]) # placeholder
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/internals/blocks.py:404, in Block.reduce(self, func)
398 @final
399 def reduce(self, func) -> list[Block]:
file:///Users/vaibhavarde/Downloads/python_notebook.html 42/95
11/08/2024, 17:41 python_notebook
400 # We will apply the function and reshape the result into a sing
le-row
401 # Block with the same mgr_locs; squeezing will be done at a hi
gher level
402 assert self.ndim == 2
--> 404 result = func(self.values)
406 if self.values.ndim == 1:
407 res_values = result
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/frame.py:11481, in DataFrame._reduce.<locals>.blk_func(values, axis)
11479 return np.array([result])
11480 else:
> 11481 return op(values, axis=axis, skipna=skipna, **kwds)
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/nanops.py:147, in bottleneck_switch.__call__.<locals>.f(values, axis, ski
pna, **kwds)
145 result = alt(values, axis=axis, skipna=skipna, **kwds)
146 else:
--> 147 result = alt(values, axis=axis, skipna=skipna, **kwds)
149 return result
File /opt/anaconda3/envs/genAI_env/lib/python3.11/site-packages/pandas/cor
e/nanops.py:787, in nanmedian(values, axis, skipna, mask)
785 inferred = lib.infer_dtype(values)
786 if inferred in ["string", "mixed"]:
--> 787 raise TypeError(f"Cannot convert {values} to numeric")
788 try:
789 values = values.astype("f8")
In [210… iris.min()
file:///Users/vaibhavarde/Downloads/python_notebook.html 43/95
11/08/2024, 17:41 python_notebook
sepal_length 4.3
Out[210]:
sepal_width 2.0
petal_length 1.0
petal_width 0.1
species setosa
dtype: object
In [211… iris.max()
sepal_length 7.9
Out[211]:
sepal_width 4.4
petal_length 6.9
petal_width 2.5
species virginica
dtype: object
Matplotlib
Matplotlib is a python library used for data visualisation
In [6]: import numpy as np
from matplotlib import pyplot as plt
In [213… x = np.arange(1,11)
x
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Out[213]:
In [214… y= 2 * x
y
Line Graph:
A line graph is a visual representation of data that shows how values change over time or
in relation to another variable. It consists of points connected by straight lines.
Key components of a line graph:
X-axis: Represents the independent variable (often time).
Y-axis: Represents the dependent variable (the value being measured).
Data points: Represent specific values at different points on the x-axis.
Line segments: Connect the data points to show trends and patterns. ### When to
use a line graph:
To show trends over time: For example, temperature changes, stock prices, or
population growth.
To compare multiple data sets: For example, sales of different products over time.
To identify patterns and correlations: For example, the relationship between hours
studied and exam scores.
In [215… plt.plot(x,y)
plt.title("Line Graph")
plt.xlabel("x-label")
file:///Users/vaibhavarde/Downloads/python_notebook.html 44/95
11/08/2024, 17:41 python_notebook
plt.ylabel("y-label")
plt.show()
In [216… x = np.arange(1,11)
y = 2 * x
y2 = 3 * x
plt.plot(x,y, color='r', linestyle=':', linewidth=3)
plt.plot(x,y2, color='b', linestyle=':', linewidth=3)
plt.title("Line Graph")
plt.xlabel("x-label")
plt.ylabel("y-label")
plt.grid(True)
plt.show()
file:///Users/vaibhavarde/Downloads/python_notebook.html 45/95
11/08/2024, 17:41 python_notebook
In [217… x = np.arange(1,11)
y = 2 * x
y2 = 3 * x
# plt.title("Line Graph")
# plt.xlabel("x-label")
# plt.ylabel("y-label")
# plt.grid(True)
plt.subplot(1,2,1)
plt.plot(x,y, color='r', linestyle=':', linewidth=3)
plt.subplot(1,2,2)
plt.plot(x,y2, color='b', linestyle=':', linewidth=3)
plt.show()
file:///Users/vaibhavarde/Downloads/python_notebook.html 46/95
11/08/2024, 17:41 python_notebook
Bar Plot
A bar plot is a type of chart or graph that represents categorical data with rectangular
bars. The length of each bar is proportional to the value it represents.
Key components of a bar plot:
X-axis: Represents the categories or groups of data.
Y-axis: Represents the values or frequency of each category.
Bars: Rectangular shapes whose height or length corresponds to the data value. ##
Types of bar plots:
Vertical bar plot: Bars are oriented vertically.
Horizontal bar plot: Bars are oriented horizontally.
Stacked bar plot: Multiple bars are stacked on top of each other to show the
composition of a category.
Grouped bar plot: Bars are grouped together to compare multiple categories. ###
When to use a bar plot:
To compare values across different categories.
To show the distribution of categorical data.
To visualize changes in values over time (using grouped or stacked bar plots).
In [218… x = np.arange(1,11)
y = 2 * x
plt.bar(x,y)
plt.title("Bar Plot")
plt.xlabel("x-label")
plt.ylabel("y-label")
plt.show()
file:///Users/vaibhavarde/Downloads/python_notebook.html 47/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 48/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 49/95
11/08/2024, 17:41 python_notebook
Scatter Plot
A scatter plot is a type of graph that displays the relationship between two numerical
variables. Each data point is represented by a dot on the graph, with the position of the
dot determined by its values for the two variables.
Key components of a scatter plot:
X-axis: Represents one numerical variable.
Y-axis: Represents the other numerical variable.
Data points: Represent individual observations, with their position determined by
their values on the x and y axes. ### When to use a scatter plot:
To explore the relationship between two numerical variables.
To identify patterns or trends in the data.
To detect outliers or unusual data points. ### Examples of scatter plots:
Relationship between height and weight.
Correlation between study hours and exam scores.
Distribution of house prices based on square footage. ### Advantages of scatter
plots:
Easy to visualize the relationship between two variables.
Can reveal patterns, trends, and outliers.
Useful for exploratory data analysis.
In [222… # Scatter plot
x = np.random.randint(1,20,10)
print(f"x: {x}")
file:///Users/vaibhavarde/Downloads/python_notebook.html 50/95
11/08/2024, 17:41 python_notebook
y = np.random.randint(1,20,10)
print(f"y: {y}")
plt.scatter(x,y)
plt.show()
x: [13 5 14 9 6 3 1 19 15 4]
y: [ 2 5 9 13 12 19 6 17 11 8]
In [223… x = np.random.randint(1,20,10)
print(f"x: {x}")
y = np.random.randint(1,20,10)
print(f"y: {y}")
plt.scatter(x,y, color='r', marker='*', s=100)
plt.show()
x: [16 9 2 18 12 5 14 1 9 10]
y: [12 16 7 17 15 6 3 11 19 11]
file:///Users/vaibhavarde/Downloads/python_notebook.html 51/95
11/08/2024, 17:41 python_notebook
In [7]: x = np.random.randint(1,20,10)
print(f"x: {x}")
y = np.random.randint(1,20,10)
print(f"y: {y}")
y2 = np.random.randint(1,20,10)
print(f"y2: {y2}")
plt.scatter(x,y, color='r', marker='*', s=100)
plt.scatter(x,y2, color='g', marker='.', s=100)
plt.show()
x: [12 9 17 7 12 17 8 17 3 14]
y: [13 6 2 17 17 7 8 6 3 16]
y2: [ 5 19 14 9 13 10 8 19 11 2]
file:///Users/vaibhavarde/Downloads/python_notebook.html 52/95
11/08/2024, 17:41 python_notebook
In [8]: plt.subplot(1,2,1)
plt.scatter(x,y, color='r', marker='*', s=100)
plt.subplot(1,2,2)
plt.scatter(x,y2, color='g', marker='.', s=100)
plt.show()
In [9]: plt.subplot(2,1,1)
plt.scatter(x,y, color='r', marker='*', s=100)
plt.subplot(2,1,2)
plt.scatter(x,y2, color='g', marker='.', s=100)
plt.show()
file:///Users/vaibhavarde/Downloads/python_notebook.html 53/95
11/08/2024, 17:41 python_notebook
Histogram
A histogram is a graphical representation of the distribution of numerical data. It's
similar to a bar chart, but there are key differences:
Bars are adjacent: Unlike bar charts, the bars in a histogram touch each other to
indicate continuous data.
Horizontal axis represents intervals: The x-axis shows ranges of values, called bins
or class intervals.
Vertical axis represents frequency: The y-axis shows the number of data points that
fall within each bin. ### Key uses of a histogram:
Understanding data distribution: It helps visualize the shape, center, and spread of
data.
Identifying outliers: Unusual data points can be easily spotted.
Comparing distributions: Histograms can be used to compare different datasets.
In [10]: # Histogram
x = np.random.randint(1,11,20)
print(f"x: {x}")
plt.hist(x)
plt.show()
x: [ 2 4 9 8 9 2 6 2 5 6 7 5 6 1 5 2 10 3 5 8]
file:///Users/vaibhavarde/Downloads/python_notebook.html 54/95
11/08/2024, 17:41 python_notebook
In [11]: x = np.random.randint(1,100,20)
print(f"x: {x}")
plt.hist(x, color='r', bins=10)
plt.show()
x: [65 53 93 6 82 14 63 1 53 87 66 50 76 15 18 33 99 80 69 27]
file:///Users/vaibhavarde/Downloads/python_notebook.html 55/95
11/08/2024, 17:41 python_notebook
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
Box Plot
A box plot (also known as a box-and-whisker plot) is a graphical representation of the
distribution of a dataset. It provides a visual summary of five key statistical measures:
Minimum: The smallest value in the dataset (excluding outliers).
First quartile (Q1): The value below which 25% of the data lies.
Median (Q2): The middle value of the dataset.
Third quartile (Q3): The value below which 75% of the data lies.
Maximum: The largest value in the dataset (excluding outliers). ### How to interpret
a box plot:
The box: Represents the interquartile range (IQR), which contains the middle 50% of
the data.
The line within the box: Indicates the median.
The whiskers: Extend from the box to the minimum and maximum values (excluding
outliers).
Outliers: Data points that fall outside the whiskers are often represented as
individual points. ### Why use a box plot?
Quick overview of data: Provides a summary of the distribution at a glance.
Comparison of groups: Multiple box plots can be used to compare different
datasets.
Identification of outliers: Unusual data points can be easily spotted.
file:///Users/vaibhavarde/Downloads/python_notebook.html 56/95
11/08/2024, 17:41 python_notebook
x= np.random.randint(20,50,20)
y= np.random.randint(20,50,20)
z= np.random.randint(10,70,30)
print(f"x: {x}")
print(f"y: {y}")
print(f"z: {z}")
data= list([x,y,z])
plt.boxplot(data)
plt.show()
x: [43 42 43 23 43 48 26 31 47 21 47 24 34 34 34 39 38 24 43 35]
y: [25 38 25 20 42 36 37 41 46 32 36 45 39 27 43 31 37 28 23 30]
z: [14 24 43 32 23 55 20 48 43 40 26 52 30 62 43 59 43 35 66 49 28 10 24 47
30 28 43 46 36 35]
Violin Plot
A violin plot is a statistical graphic for comparing probability distributions. It combines
elements of a box plot and a kernel density plot.
Key components:
Density plot: The shape of the violin represents the probability density of the data.
Box plot: Typically overlaid on the violin, showing the median, quartiles, and
sometimes outliers. ### Advantages of violin plots:
Shows distribution: Provides a detailed view of the data distribution, including
peaks, valleys, and skewness.
Comparison: Effective for comparing distributions across different groups.
Outlier detection: Can help identify potential outliers. ## When to use a violin plot:
When you want to visualize the distribution of numerical data.
file:///Users/vaibhavarde/Downloads/python_notebook.html 57/95
11/08/2024, 17:41 python_notebook
Pie Chart
A pie chart is a circular statistical graphic which is divided into slices to illustrate
numerical proportions. In a pie chart, the arc length of each slice is proportional to the
quantity it represents. All sectors will sum up to 100%.
Best used for:
Showing the composition of a whole.
Comparing the relative sizes of different categories.
Donut Chart
A donut chart is similar to a pie chart, but with a hole in the center. This allows for
additional information to be displayed in the center, such as totals or averages.
Best used for:
The same purposes as a pie chart, with the added benefit of displaying additional
information in the center.
Important note: While pie and donut charts are visually appealing, they can be difficult to
interpret when there are too many categories. In such cases, other chart types like bar
file:///Users/vaibhavarde/Downloads/python_notebook.html 58/95
11/08/2024, 17:41 python_notebook
In [17]: # Pie-Chart
#Dictionary of fruits and their quantity
fruits = {'apple': 10, 'banana': 15, 'orange': 20, 'grape': 25, 'pineapple':
# Dictinary of colors for above furits
colors = {'apple': 'red', 'banana': 'yellow', 'orange': 'orange', 'grape':
file:///Users/vaibhavarde/Downloads/python_notebook.html 59/95
11/08/2024, 17:41 python_notebook
In [18]: # Donughnut-Chart
#Dictionary of fruits and their quantity
fruits = {'apple': 10, 'banana': 15, 'orange': 20, 'grape': 25, 'pineapple':
# Dictinary of colors for above furits
colors = {'apple': 'red', 'banana': 'yellow', 'orange': 'orange', 'grape':
Seaborn
file:///Users/vaibhavarde/Downloads/python_notebook.html 60/95
11/08/2024, 17:41 python_notebook
Seaborn is built on top of matplotlib hence , we need to import both seaborn and
matplotlib to use seaborn
Seaborn Lineplot
Seaborn's lineplot function is used to visualize relationships between variables where
one of the variables is continuous. It's particularly useful for visualizing trends over time
or other continuous variables.
Key Features and Parameters:
x and y: Specify the variables for the x and y axes.
data: The dataset to be used.
hue: Separates observations into multiple lines based on a categorical variable.
style: Uses different line styles to distinguish between categories.
size: Adjusts the thickness of lines based on a numerical variable.
markers: Adds markers to the line plot.
file:///Users/vaibhavarde/Downloads/python_notebook.html 61/95
11/08/2024, 17:41 python_notebook
ci: Controls the confidence interval around the line. ### Additional Customization:
Color palettes: Use palette parameter to set custom colors.
Line styles: Control line styles using style parameter.
Marker styles: Customize marker appearance with markers parameter.
Axes labels and title: Add labels and titles using plt.xlabel, plt.ylabel, and plt.title.
By effectively utilizing Seaborn's lineplot function and its parameters, you can
create informative and visually appealing line plots to explore your data.
In [20]: # Line Plot
fmri = sns.load_dataset("fmri")
print(fmri.head())
sns.lineplot(x="timepoint", y="signal", data=fmri)
plt.show()
file:///Users/vaibhavarde/Downloads/python_notebook.html 62/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 63/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 64/95
11/08/2024, 17:41 python_notebook
Key Parameters:
x: The name of the categorical variable to be used on the x-axis.
y: The name of the numerical variable to be aggregated and visualized.
data: The dataset to be used.
hue: Optional categorical variable for creating multiple bars within each category.
estimator: Function to aggregate the y-value (default is np.mean).
ci: Confidence interval to display (default is 95%).
palette: Color palette for the bars.
errorbar: Type of error bar to display (default is 'ci').
Additional Customization:
orient: To create horizontal bar plots.
saturation: Adjust the color saturation.
ax: To plot on a specific matplotlib axes.
Other aesthetic parameters from Matplotlib can be used for further customization.
file:///Users/vaibhavarde/Downloads/python_notebook.html 65/95
11/08/2024, 17:41 python_notebook
Seaborn's bar plot is a versatile tool for visualizing categorical data and understanding
relationships between categorical and numerical variables. By exploring the different
parameters and customization options, you can create informative and visually
appealing bar plots.
In [24]: sns.set_theme(style="whitegrid")
pokemon = pd.read_csv('data/pokemon.csv')
print(pokemon.head())
sns.barplot(x="legendary", y="speed", data=pokemon, palette="vlag")
plt.show()
file:///Users/vaibhavarde/Downloads/python_notebook.html 66/95
11/08/2024, 17:41 python_notebook
Seaborn Scatterplot
Seaborn's scatterplot function is used to visualize the relationship between two
numerical variables. Each data point is represented by a marker, and the position of the
marker on the x and y axes corresponds to the values of the two variables.
Key Parameters:
x: The name of the numerical variable to be used on the x-axis.
y: The name of the numerical variable to be used on the y-axis.
data: The dataset to be used.
hue: Optional categorical variable for grouping data points by color.
style: Optional categorical variable for grouping data points by marker style.
size: Optional numerical variable for controlling the size of the markers.
palette: Color palette for the scatter plot.
alpha: Transparency of the markers.
Additional Customization:
markers: Customize the marker style.
edgecolor: Set the color of the marker edges.
linewidth: Adjust the width of the marker edges.
s: Manually set the marker size.
Seaborn's scatterplot is a versatile tool for exploring relationships between numerical
variables. By using the various parameters, you can create informative and visually
appealing scatter plots to understand your data.
In [28]: sns.scatterplot(x="sepal_length", y="petal_length", style="species", data=i
plt.show()
file:///Users/vaibhavarde/Downloads/python_notebook.html 68/95
11/08/2024, 17:41 python_notebook
Seaborn Histogram
Seaborn's histplot function is used to visualize the distribution of a numerical variable. It
counts the number of observations that fall within discrete bins and represents this
count with rectangular bars.
Key Parameters:
data: The dataset to be used.
x: The name of the numerical variable to be plotted.
y: Optional variable for bivariate histograms.
hue: Optional categorical variable for creating multiple histograms.
bins: Number of bins for the histogram.
kde: Whether to plot a kernel density estimate (KDE) curve.
stat: The statistic to compute in each bin (e.g., 'count', 'density', 'probability').
common_norm: Whether to normalize histograms when using hue.
element: The visual representation of the data (e.g., 'bars', 'step', 'poly')
Additional Customization:
Color palettes: Use the palette parameter to set custom colors.
Binning: Control the number of bins with the bins parameter.
Density estimation: Adjust the KDE bandwidth with kde_kws.
Axes labels and title: Add labels and titles using plt.xlabel, plt.ylabel, and plt.title.
file:///Users/vaibhavarde/Downloads/python_notebook.html 69/95
11/08/2024, 17:41 python_notebook
Seaborn's histplot is a versatile tool for understanding the distribution of your data. By
combining it with other Seaborn functions and Matplotlib customizations, you can create
informative and visually appealing histograms.
In [29]: # SNS histogram or Distribution plot
diamond = sns.load_dataset("diamonds")
diamond.head()
In [30]: sns.distplot(diamond['price'])
plt.show()
/var/folders/8h/zprf7hjs319_78816p34b90c0000gn/T/ipykernel_41370/425555065
2.py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function wit
h
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(diamond['price'])
file:///Users/vaibhavarde/Downloads/python_notebook.html 70/95
11/08/2024, 17:41 python_notebook
/var/folders/8h/zprf7hjs319_78816p34b90c0000gn/T/ipykernel_41370/418129790.
py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function wit
h
similar flexibility) or `kdeplot` (an axes-level function for kernel densit
y plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(diamond['price'], hist=False)
file:///Users/vaibhavarde/Downloads/python_notebook.html 71/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 72/95
11/08/2024, 17:41 python_notebook
/var/folders/8h/zprf7hjs319_78816p34b90c0000gn/T/ipykernel_41370/975780176.
py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function wit
h
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
file:///Users/vaibhavarde/Downloads/python_notebook.html 73/95
11/08/2024, 17:41 python_notebook
In [36]: ##JointPlot
file:///Users/vaibhavarde/Downloads/python_notebook.html 74/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 75/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 76/95
11/08/2024, 17:41 python_notebook
Seaborn Boxplot
Seaborn's boxplot function is used to visualize the distribution of numerical data across
different categories. It provides a compact representation of the data, showing the
median, quartiles, and potential outliers.
Key Parameters:
x: The name of the categorical variable to be used on the x-axis.
y: The name of the numerical variable to be visualized.
data: The dataset to be used.
hue: Optional categorical variable for creating multiple boxplots within each
category.
width: Width of the boxes.
palette: Color palette for the boxes.
whis: Determines the length of the whiskers.
showfliers: Whether to show outliers.
Additional Customization:
file:///Users/vaibhavarde/Downloads/python_notebook.html 77/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 78/95
11/08/2024, 17:41 python_notebook
file:///Users/vaibhavarde/Downloads/python_notebook.html 79/95
11/08/2024, 17:41 python_notebook
/var/folders/8h/zprf7hjs319_78816p34b90c0000gn/T/ipykernel_41370/209491317
7.py:1: FutureWarning:
/var/folders/8h/zprf7hjs319_78816p34b90c0000gn/T/ipykernel_41370/384559801.
py:1: FutureWarning:
file:///Users/vaibhavarde/Downloads/python_notebook.html 80/95
11/08/2024, 17:41 python_notebook
/var/folders/8h/zprf7hjs319_78816p34b90c0000gn/T/ipykernel_41370/184676387.
py:1: FutureWarning:
file:///Users/vaibhavarde/Downloads/python_notebook.html 81/95
11/08/2024, 17:41 python_notebook
GenAI
Generative AI, or Gen AI, functions by employing a neural network to analyse data
patterns and generate new content based on those patterns.
Discriminative Classifies data in a way similar to judge
Generative AI Create, transforms or generates its own content like an artist
load_dotenv()
# Define the route for the chat endpoint, which accepts POST requests
@app.route('/chat', methods=["POST"])
def chat():
# Get the user's message from the JSON payload of the request
user_input = request.json.get('message')
print(f"Received message: {user_input}") # Debugging statement to print
file:///Users/vaibhavarde/Downloads/python_notebook.html 85/95
11/08/2024, 17:41 python_notebook
if response.status_code == 200:
try:
# Return the content of the first choice in the response
return response.json()['choices'][0]['message']['content']
except (KeyError, IndexError) as e:
print(f"Error parsing response JSON: {e}") # Print the erro
print(f"Response JSON: {response.json()}") # Print the full
return "An error occurred while processing the response from
elif response.status_code == 429:
# Handle rate limiting errors
print(f"Request to OpenAI failed with status code: {response.sta
print(f"Response: {response.text}")
if attempt < retry_count - 1:
print("Retrying...")
time.sleep(2 ** attempt) # Exponential backoff before retry
else:
return "You have exceeded your current quota. Please check y
else:
# Handle other errors
print(f"Request to OpenAI failed with status code: {response.sta
print(f"Response: {response.text}")
return "An error occurred while communicating with OpenAI."
Components:
Flask for the web framework.
OpenAI
HTML/CSS for the front-end interface.
Prerequisites:
Python installed on your system.
Required libraries: flask, openai.
API Key from OpenAI
Let's explore more with a demo !!
Step 6:
(app.py)Create the main Flask application file
file:///Users/vaibhavarde/Downloads/python_notebook.html 87/95
11/08/2024, 17:41 python_notebook
@app.route("/")
def home():
# Render the 'index.html' template when the home page is accessed
return render_template('text_to_image.html')
# Define the route for the generate_image endpoint, which accepts POST reque
@app.route('/generate_image', methods=['POST'])
def generate_image():
# Get the 'prompt' from the JSON payload of the POST request
prompt = request.json.get('prompt')
Imports:
Flask, request, jsonify, render_template from the Flask framework for handling web
requests and rendering HTML templates
Openai for inretacting with the Open AI API.
App Initialization:
app = Flask(name) # initializes the Flask application.
openai.api_key =
'your_openai_api_key' # sets the OpenAI API key.
Routes
@app.route('/') # The home route renders the index.html template when accessed. This
is the main page of the application. def home(): return render_template('index.html')
file:///Users/vaibhavarde/Downloads/python_notebook.html 88/95
11/08/2024, 17:41 python_notebook
return image_url
file:///Users/vaibhavarde/Downloads/python_notebook.html 89/95
11/08/2024, 17:41 python_notebook
'Content-Type': 'application/json'
},
body: JSON.stringify({ prompt: prompt })
});
const data = await response.json();
const img = document.createElement('img');
img.src = data.image_url;
responseDiv.innerHTML = '';
responseDiv.appendChild(img);
}
</script>
</body>
</html>
LangChain Apps
Overview:
LangChain streamlines the development process for applications that utilize LLMs
by offering a modular and extensible architecture.
It supports a wide range of use cases, from chatbots and personal assistants to
complex NLP tasks and data analysis.
The framework is built to be highly customizable, allowing developers to tailor it to
their specific needs and integrate it with various external data sources and APIs.
Let's understand more with the help of LangChain Case Study.
Personalized Story Generator:
This project will take inputs like character names, settings, and themes from the user
and generate a unique story using a text generation model like GPT-3.5.
Steps to Create the Project:
file:///Users/vaibhavarde/Downloads/python_notebook.html 90/95
11/08/2024, 17:41 python_notebook
1. Set Up Environment
2. Collect User Inputs
3. Generate Story Using AI Model
4. Display the Generated Story
file:///Users/vaibhavarde/Downloads/python_notebook.html 91/95
11/08/2024, 17:41 python_notebook
model = TextModel(model_name="gpt-3.5-turbo",
api_key="your_openai_api_key")
LangChain Apps
Overview:
LangChain streamlines the development process for applications that utilize LLMs
by offering a modular and extensible architecture.
It supports a wide range of use cases, from chatbots and personal assistants to
complex NLP tasks and data analysis.
The framework is built to be highly customizable, allowing developers to tailor it to
their specific needs and integrate it with various external data sources and APIs.
(Manual Testing)
Manual testing means the (web) application is tested manually by QA testers. Tests need
to be performed manually in every environment, using a different data set and the
success/failure rate of every transaction should be recorded.
Manual testing is mandatory for every newly developed software before automated
testing. This testing requires great efforts and time, but it gives the surety of bug-free
software.
Automation Testing:
file:///Users/vaibhavarde/Downloads/python_notebook.html 93/95
11/08/2024, 17:41 python_notebook
As the name suggests, automation testing takes software testing activities and executes
them via an automation toolset or framework. In simple words, it is a type of testing in
which a tool executes a set of tasks in a defined pattern automatically.
This automation testing method uses scripted sequences that are executed by testing
tools. Automated testing tools execute examinations of the software, report outcomes,
and compare results with earlier test runs.
What is Selenium?
Selenium was introduced by Jason Huggins in 2004. Jason Huggins, an Engineer at
Thoughtworks, was doing his work on some web application and he suddenly required
testing.
Testing done using Selenium is often referred to as Selenium Testing.
Selenium is an open-source tool and portable framework that is used for automating the
tests administered on web browsers. It is only used for testing web applications such as
Shopping Carts, Email Programs like Gmail, Yahoo.
file:///Users/vaibhavarde/Downloads/python_notebook.html 95/95