0% found this document useful (0 votes)

34 views

Numpy and Pandas

Uploaded by

Suja Mary

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Numpy and Pandas

Uploaded by

Suja Mary

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Pandas

Pandas is a very popular library for working with data (its goal is to be the
most powerful and flexible open-source tool, and in our opinion, it has
reached that goal). DataFrames are at the center of pandas. A DataFrame is
structured like a table or spreadsheet. The rows and the columns both have
indexes, and you can perform operations on rows or columns separately.

A pandas DataFrame can be easily changed and manipulated. Pandas has

helpful functions for handling missing data, performing operations on
columns and rows, and transforming data. If that wasn’t enough, a lot of SQL
functions have counterparts in pandas, such as join, merge, filter by, and
group by. With all of these powerful tools, it should come as no surprise that
pandas is very popular among data scientists.

NumPy
NumPy is an open-source Python library that facilitates efficient numerical
operations on large quantities of data. There are a few functions that exist in
NumPy that we use on pandas DataFrames. For us, the most important part
about NumPy is that pandas is built on top of it. So, NumPy is a dependency
of Pandas.

pip install numpy

pip install pandas

import numpy as np
import pandas as pd

list1 = [1,2,3,4]

array1 = np.array(list1)
print(array1)

list2 = [[1,2,3],[4,5,6]]
array2 = np.array(list2)
print(array2)
toyPrices = [5,8,3,6]
# print(toyPrices - 2) -- Not possible. Causes an error
for i in range(len(toyPrices)):
toyPrices[i] -= 2
print(toyPrices)

# Create a Series using a NumPy array of ages with the default numerical indices
ages = np.array([13,25,19])
series1 = pd.Series(ages)
print(series1)

# Create a Series using a NumPy array of ages but customize the indices to be the
names that correspond to each age
ages = np.array([13,25,19])
series1 = pd.Series(ages,index=['Emma', 'Swetha', 'Serajh'])
print(series1)

dataf = pd.DataFrame([
['John Smith','123 Main St',34],
['Jane Doe', '456 Maple Ave',28],
['Joe Schmo', '789 Broadway',51]
],
columns=['name','address','age'])

Standard Deviation

import numpy

speed = [86,87,88,86,87,85,86]

x = numpy.std(speed)

print(x)
Try it Yourself »
# creating an empty list

lst = []

# number of elements as input

n = int(input("Enter number of elements : "))

# iterating till the range

for i in range(0, n):

ele = int(input())

# adding the element

lst.append(ele)

print(lst)

Program to calculate Percentile of Students

Given an array containing marks of students, the task is to calculate the percentile of the students. The
percentile is calculated according to the following rule:

The percentile of a student is the % of the number of students having marks less than him/her.
Examples:

Input: arr[] = { 12, 60, 80, 71, 30 }

Output: { 0, 50, 100, 75, 25 }
Explanation:
Percentile of Student 1 = 0/4*100 = 0 (out of other 4 students no one has marks less than this student)
Percentile of Student 2 = 2/4*100 = 50 (out of other 4 students, 2 have marks less than this student)
Percentile of Student 3 = 4/4*100 = 100 (out of other 4 students, all 4 have marks less than this student)
Percentile of Student 4 = 3/4*100 = 75 (out of other 4 students, 3 have marks less than this student)
Percentile of Student 5 = 1/4*100 = 25 (out of other 4 students only 1 has marks less than this student)

import numpy as np

# Function to calculate the percentile

def percentile(arr, n):

i, j = 0, 0

count, percent = 0, 0

# Start of the loop that calculates percentile

while i < n:

count = 0

j=0

while j < n:

# Comparing the marks of student i

# with all other students

if (arr[i] > arr[j]):

count += 1

j += 1

percent = (count * 100) // (n - 1)

print("Percentile of Student ", i + 1," = ", percent)

i += 1

# Driver Code
#StudentMarks = [12, 60, 80, 71, 30]

StudentMarks=list(map(int, input("Enter Students Marks:-").strip().split()))

n = len(StudentMarks)

percentile(StudentMarks, n)

p=int(input("Enter Percentile of Marks:"))

x= np.percentile(StudentMarks, p)

print(p ,"the percentile of marks=",x)

Histogram

A histogram is a graphical representation of a set of data points arranged in a user-defined range.

Similar to a bar chart, a bar chart compresses a series of data into easy-to-interpret visual objects
by grouping multiple data points into logical areas or containers.
To draw this we will use:
 random.normal() method for finding the normal distribution of the data. It has three
parameters:
 loc – (average) where the top of the bell is located.
 Scale – (standard deviation) how uniform you want the graph to be distributed.
 size – Shape of the returning Array
 The function hist() in the Pyplot module of the Matplotlib library is used to draw histograms.
It has parameters like:
 data: This parameter is a data sequence.
 bin: This parameter is optional and contains integers, sequences or strings.
 Density: This parameter is optional and contains a Boolean value.
 Alpha: Value is an integer between 0 and 1, which represents the transparency of
each histogram. The smaller the value of n, the more transparent the histogram.

import numpy as np
import matplotlib.pyplot as plt
# Generating some random data
# for an example
data = np.random.normal(170, 10, 250)
# Plotting the histogram.
plt.hist(data, bins=25, density=True,
alpha=0.6, color='b')
plt.show()
 The function hist() in the Pyplot module of the Matplotlib library is used to draw histograms.
It has parameters like:
 data: This parameter is a data sequence.
 bin: This parameter is optional and contains integers, sequences or strings.
 Density: This parameter is optional and contains a Boolean value.
 Alpha: Value is an integer between 0 and 1, which represents the transparency of
each histogram. The smaller the value of n, the more transparent the histogram.

import numpy as np

import matplotlib.pyplot as plt

from scipy import stats

N = 10000

x = stats.norm.rvs(size=N)

num_bins = 20

ax=plt.axes()

ax.set_title('Histogram of Normal Distribution')

plt.hist(x, bins=num_bins, facecolor='blue', alpha=0.5)

y = np.linspace(-4, 4, 1000)

bin_width = (x.max() - x.min()) / num_bins

plt.plot(y, stats.norm.pdf(y) * N * bin_width)

plt.show()

pdf() for displaying the probability density function. This pdf() method present inside
the scipy.stats.norm.

Importance of Standard Normal Distribution

We may use a Standard Score or z-score to compute the probability that a given value comes from a
specific distribution or to compare values from multiple distributions, with a standard normal distribution.
A normal distribution can be easily converted to a Standard normal distribution with the help of the
following formula
How to Draw a Scatter Plot
When scatter plots were discovered, drawing them was a complex task. It often required the use of statisticians and scientists. In the
more recent past most of the drawing has been automated. However there is still a large amount of human involvement as well as
human judgement which is required. The steps are mapped as follows:

Step 1: Decide the Two Variables

The most important step of the analysis is performed even before the analysis begins. In text book problems we assume that we
know the variables between which we have to find correlation. However in real life, there are many variables and therefore many
cases of correlation possible. Selecting the variables in between there exists a material relationship that if understood will benefit the
process is important.

Step 2: Collect Data

Once the variables have been selected, relevant data needs to be collected to draw meaningful conclusions about the same. This
can be done by applying the relevant design of experiment and coming up with measurements that will be used as inputs into the
system. This process like every other follows the principle of GIGO i.e. Garbage In Garbage Out and hence due care must be taken
regarding the input data.

Step 3: Map the Data

Once the data has been collected, it must be mapped on the X and Y axes of the Cartesian Co-ordinate system. This will give the
viewer an idea about where the majority of the points are centred, where the outliers are and why this is the case. Nowadays, this
does not have to be done manually. There are software available that will automatically fetch the incoming data real time and map it
on to a scatter plot.

Step 4: The Line of Best Fit

The next step is to statistically compute the line of best fir for the scattered data points. This means that mathematically a line will be
worked out that fits through most of the lines and is closest to the rest of them. This line has an equation that can be used to predict
the nature of relationship between the variable. This step too, early required complex calculations, prone to human error. Now
software can do it seamlessly and in no time.

Step 5: Come Up With an Exact Number

The next step is to come up with a co-relation co-efficient. This number as stated earlier is the best metric to understand correlation
and lies between -1 and +1. The software will work out and give you a correlation co-efficient. Expensive software are not required.
Something as simple as an excel sheet can be used.

Step 6: Interpret the Number

The last step is to interpret the number. Anything above + or – 0.5 suggests a strong correlation. 0 represents no correlation while -1
or +1 represents perfect co-relation. Perfect correlation may be an indicator for causation. However, it does not imply causation, all
by itself.
Scatter Plot Uses and Examples
Scatter plots instantly report a large volume of data. It is beneficial in the following situations –

 For a large set of data points given

 Each set comprises a pair of values
 The given data is in numeric form

import pandas as pd

import seaborn as sns

# Path of the file to read

insurance_filepath = "D:/smoke.csv"

# Read the file into a variable insurance_data

insurance_data = pd.read_csv(insurance_filepath)

insurance_data.head()

print(insurance_data)

sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])

Polynomial Regression

import numpy as nm

import matplotlib.pyplot as mtp

import pandas as pd

#importing datasets

data_set= pd.read_csv('D:/reg.csv')

#Extracting Independent and dependent Variable

x= data_set.iloc[:, 1:2].values

y= data_set.iloc[:, 2].values

#Fitting the Linear Regression to the dataset

from sklearn.linear_model import LinearRegression

#Visulaizing the result forPolynomial Regression

from sklearn.preprocessing import PolynomialFeatures

poly_regs= PolynomialFeatures(degree=2)

x_poly= poly_regs.fit_transform(x)

lin_regs =LinearRegression()

lin_regs.fit(x_poly, y)

mtp.scatter(x,y,color="blue")

mtp.plot(x, lin_regs.predict(poly_regs.fit_transform(x)), color="red")

mtp.title("Polynomial Regression")

mtp.xlabel("Position Levels")

mtp.ylabel("Salary")

mtp.show()

Draw a Decision Tree

from matplotlib import pyplot as plt

from sklearn import datasets

from sklearn.tree import DecisionTreeClassifier

from sklearn import tree

# Prepare the data data

iris = datasets.load_iris()

X = iris.data

y = iris.target

clf = DecisionTreeClassifier(random_state=1234)

model = clf.fit(X, y)

text_representation = tree.export_text(clf)

print(text_representation)

fig = plt.figure(figsize=(25,20))

tee= tree.plot_tree(clf,

feature_names=iris.feature_names,

class_names=iris.target_names,

filled=True)

Create and insert values using MySQL

import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="stud"
)

mycursor = mydb.cursor()

#mycursor.execute("CREATE TABLE customers (name VARCHAR(255),

address VARCHAR(255))")

#mycursor.execute("SHOW TABLES")
#mycursor = mydb.cursor()
#for x in mycursor:
# print(x)
sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = ("John", "Highway 21")
mycursor.execute(sql, val)

mydb.commit()

print(mycursor.rowcount, "record inserted.")

import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="stud"
)
mycursor = mydb.cursor()
s_name = input('Student Name:')
s_add = input('Address:')
mycursor.execute("INSERT INTO customers(name,address)VALUES (%s,%s)",
(s_name,s_add))
mydb.commit()
print ( 'Data entered successfully.' )
mydb.close()

FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
Unit - V
No ratings yet
Unit - V
75 pages
Unit 5
No ratings yet
Unit 5
75 pages
Unit - V
No ratings yet
Unit - V
90 pages
DSI237_GROUP_2
No ratings yet
DSI237_GROUP_2
27 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
Unit - V
100% (1)
Unit - V
75 pages
PP&DS UNIT III
No ratings yet
PP&DS UNIT III
26 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
Rahul ML file'[1] 2
No ratings yet
Rahul ML file'[1] 2
30 pages
MODULE 1
No ratings yet
MODULE 1
42 pages
Answers 1
No ratings yet
Answers 1
17 pages
Untitled Document
No ratings yet
Untitled Document
27 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
Machinelearning Prac
No ratings yet
Machinelearning Prac
17 pages
Unit 5
No ratings yet
Unit 5
19 pages
EDA Document
No ratings yet
EDA Document
13 pages
ML With Python Lab (MCA)
No ratings yet
ML With Python Lab (MCA)
36 pages
BDA File
No ratings yet
BDA File
26 pages
Data Preprocessing
No ratings yet
Data Preprocessing
84 pages
Data Visualization using Matplotlib in Python
No ratings yet
Data Visualization using Matplotlib in Python
15 pages
DS Day 5
No ratings yet
DS Day 5
11 pages
Algorithm, Arrays: Ques 1:define An Algorithm. What Are The Properties of An Algorithm? What Are The Types of Algorithms?
No ratings yet
Algorithm, Arrays: Ques 1:define An Algorithm. What Are The Properties of An Algorithm? What Are The Types of Algorithms?
51 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
ML-LAB-MANUAL (1)
No ratings yet
ML-LAB-MANUAL (1)
21 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
PP_unit-5_notes
No ratings yet
PP_unit-5_notes
15 pages
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
100% (1)
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
12 pages
Data Science 1-5
No ratings yet
Data Science 1-5
15 pages
Lab Manual (DAV)
No ratings yet
Lab Manual (DAV)
33 pages
python 2.1.1 (2)
No ratings yet
python 2.1.1 (2)
7 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
No ratings yet
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
68 pages
Data Wrangling and Preprocessing
100% (1)
Data Wrangling and Preprocessing
41 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Big Data Analysis
No ratings yet
Big Data Analysis
38 pages
Manual
No ratings yet
Manual
21 pages
Data Structure & Data Mining
No ratings yet
Data Structure & Data Mining
26 pages
Roadmap
No ratings yet
Roadmap
27 pages
Data Analysis2
No ratings yet
Data Analysis2
16 pages
Machine Learning
No ratings yet
Machine Learning
80 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
Introduction To Matplotlib Using Python For Beginners
No ratings yet
Introduction To Matplotlib Using Python For Beginners
14 pages
Python Exploratory Data Analysis
No ratings yet
Python Exploratory Data Analysis
24 pages
Deep Learning
No ratings yet
Deep Learning
25 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Histogram and Qunatiles
No ratings yet
Histogram and Qunatiles
12 pages
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
100% (1)
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
12 pages
Python For DScience & D Visualisation Updated
No ratings yet
Python For DScience & D Visualisation Updated
11 pages
Ass-2 Ds
No ratings yet
Ass-2 Ds
29 pages
DSA Unit 1 & 2
No ratings yet
DSA Unit 1 & 2
26 pages
LAB 2 DWM
No ratings yet
LAB 2 DWM
13 pages
Data Handling Using NumPy
No ratings yet
Data Handling Using NumPy
43 pages
Programming Python Statistics
No ratings yet
Programming Python Statistics
7 pages
Unit 5
No ratings yet
Unit 5
27 pages
CS3361-DATA SCIENCE LAB MANUAL
No ratings yet
CS3361-DATA SCIENCE LAB MANUAL
44 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
300+ Python Algorithms: Mastering the Art of Problem-Solving
From Everand
300+ Python Algorithms: Mastering the Art of Problem-Solving
Hernando Abella
5/5 (1)
Unit-1 Control statement
No ratings yet
Unit-1 Control statement
15 pages
Sequential Storage
No ratings yet
Sequential Storage
9 pages
Unit II - Diagnotis and Multiple Linear
No ratings yet
Unit II - Diagnotis and Multiple Linear
8 pages
Unit-III Advanced Machine Learning
No ratings yet
Unit-III Advanced Machine Learning
8 pages
Unit IV Recommender System
No ratings yet
Unit IV Recommender System
5 pages
Programs
No ratings yet
Programs
10 pages
Changelog
No ratings yet
Changelog
6 pages
Mis
No ratings yet
Mis
258 pages
Recognition As Intersubjective Vulnerability in The Psychoanalytic Dialogue PDF
No ratings yet
Recognition As Intersubjective Vulnerability in The Psychoanalytic Dialogue PDF
18 pages
Neumann Rhodes 2023 Morality in Social Media A Scoping Review
No ratings yet
Neumann Rhodes 2023 Morality in Social Media A Scoping Review
31 pages
Energies: A New Power Sharing Scheme of Multiple Microgrids and An Iterative Pairing-Based Scheduling Method
No ratings yet
Energies: A New Power Sharing Scheme of Multiple Microgrids and An Iterative Pairing-Based Scheduling Method
20 pages
Task 1
100% (1)
Task 1
6 pages
Chapter 6 Notes - Student1
No ratings yet
Chapter 6 Notes - Student1
79 pages
InternshipReport 3
No ratings yet
InternshipReport 3
34 pages
Fanuc Laser
No ratings yet
Fanuc Laser
4 pages
CC1011 Midterm
No ratings yet
CC1011 Midterm
3 pages
Iridium 9505a Satellite Phone
No ratings yet
Iridium 9505a Satellite Phone
4 pages
FURUNO ECHOSOUNDER FCV581L OME-F
No ratings yet
FURUNO ECHOSOUNDER FCV581L OME-F
34 pages
10ka Miniature Circuit Breakers
No ratings yet
10ka Miniature Circuit Breakers
7 pages
KD.3.1 How To Introduce Yourself in English
No ratings yet
KD.3.1 How To Introduce Yourself in English
13 pages
Composite Fabrication by Filament Winding
No ratings yet
Composite Fabrication by Filament Winding
26 pages
New Askir 36 Li Ion en
No ratings yet
New Askir 36 Li Ion en
1 page
Cue Sheet 2
No ratings yet
Cue Sheet 2
2 pages
Dropbox Explained
No ratings yet
Dropbox Explained
9 pages
Sennheiser Rs 175 Manuale Utente (English Version)
No ratings yet
Sennheiser Rs 175 Manuale Utente (English Version)
36 pages
Ali CV ..
No ratings yet
Ali CV ..
2 pages
FA1600C v2 Installation Manual
No ratings yet
FA1600C v2 Installation Manual
230 pages
Scale Parameter
No ratings yet
Scale Parameter
11 pages
Specification FOR Saw, Plaster, Hand: Re Sffirrmd 191
100% (1)
Specification FOR Saw, Plaster, Hand: Re Sffirrmd 191
2 pages
Bee Lok
No ratings yet
Bee Lok
1 page
Cryptography in Blockchain
No ratings yet
Cryptography in Blockchain
24 pages
Foundations of Library and Information Science 4th Edition Richard E. Rubin download pdf
100% (1)
Foundations of Library and Information Science 4th Edition Richard E. Rubin download pdf
81 pages
Electrical Vendor Directory Jan 2020 To Jun 2020
0% (1)
Electrical Vendor Directory Jan 2020 To Jun 2020
231 pages
مركز المهندس-1
No ratings yet
مركز المهندس-1
11 pages
Gradient-Based Feature Extraction From Raw Bayer Pattern Images
No ratings yet
Gradient-Based Feature Extraction From Raw Bayer Pattern Images
12 pages
Antenna
100% (2)
Antenna
68 pages