0% found this document useful (0 votes)

538 views

Advanced Python Programming Data Science: The University of Sheffield

Here are the steps to clean the IQ score data: 1. Read the csv file into a Pandas DataFrame 2. Drop duplicate rows keeping the first occurrence 3. Drop irrelevant columns like UID and LOCATION_ID 4. Replace errors marked by -1 with NaN 5. Investigate the histogram of IQ scores 6. Identify any outliers that fall outside the main distribution 7. Remove outlier rows using threshold or replace values 8. Re-plot the histogram to check cleaning removed outliers df = pd.read_csv('iq_scores.csv') df = df.drop_duplicates(subset='UID', keep='first') df = df.drop(['UID','LOCATION

Uploaded by

Be Kind

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

538 views

Advanced Python Programming Data Science: The University of Sheffield

Uploaded by

Be Kind

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/339411719

ADVANCED PYTHON PROGRAMMING Data Science

Presentation · February 2020

DOI: 10.13140/RG.2.2.17929.39529

CITATIONS READS

0 8,040

1 author:

Edgar Iyasele
The University of Sheffield
17 PUBLICATIONS 2 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

A COMPREHENSIVE REVIEW OF PEROVSKITE-BASED OPTO-ELECTRONIC DEVICES View project

HIGH EFFICIENCY, LOW - COST PEROVSKITE SOLAR CELL MODULES FOR ELECTRICITY GENERATION IN NIGERIA View project

All content following this page was uploaded by Edgar Iyasele on 21 February 2020.

The user has requested enhancement of the downloaded file.

ADVANCED PYTHON
PROGRAMMING
Data Science
http://rcg.group.shef.ac.uk/courses/python
Course Outline:

- Capture Data
- Manage and Clean Data
- Data Analysis
- Report data

Requirements:

- CI6010a
- CI6010b
- CI6011a
Anaconda Python should be installed on your
desktop, please start Spyder.

If it is not installed, please go to the Software

Centre and install it.
WHAT IS DATA SCIENCE?
CAPTURE DATA
Data Sources:

Scraping it from a website from figures, etc.

Data 
Pulling the data from a database.
for study.
Accessing an API, etc.
Data types:
▪ Observational: Captured in real-time,
cannot be reproduced.

▪ Experimental: Data from lab

equipment and under controlled
conditions.
Observational data
▪ Simulation: Data generated from test
models studying actual or theoretical
systems

▪ Compiled: The results of data analysis,

or aggregated from multiple sources.

▪ Canonical: Fixed or organic collection

datasets, usually peer-reviewed, and
often published and curated. Simulation data
Reading Data in Python

Unstructured:
• Data without inherent structure.
Quasi-Structured:
• Textual data with erratic format that can be formatted with effort.
Semi-Structured:
• Textual data with apparent pattern (including errors)
Structured:
• Defined data model (errors less likely).
Complexity
Flexibility

Reading line by line. Pandas DataFrame.

PANDAS DATAFRAME

The Pandas DataFrame is a multi-

dimensional size-mutable, potentially
heterogeneous tabular data structure
with labeled axes (rows and columns).

Advantages:
It can present data in a way that is suitable for data analysis.
The package contains multiple methods for convenient data filtering.
Pandas has a variety of utilities to perform Input/Output operations
in a seamless manner.
Constructing a DataFrame

import pandas as pd

df1 = pd.read_excel(‘sample.xlsx’) # Excel file 

df2 = pd.read_csv(‘sample.csv’) # Comma Separated file
df3 = pd.read_table(‘sample.txt’, sep= ‘ ’) # Text file
Constructing a DataFrame Manually

df = pd.DataFrame(data=d, index=i, columns=c)

• Parameter data: ndarray, iterable, 0 1 2 3

dictionary or DataFrame.
1 data data data
• Parameter index: array. RangeIndex by
default (0, 1, 2, 3, …, n).
2 data data data
• Parameter columns: array.
RangeIndex by default (0, 1, 2, 3, …, n) or 3 data data data
the keys of a dictionary if the data input is
a dictionary. 4 data data data
import pandas as pd

d_1 = [1,2,3]
d_2 = {‘header_1': [1, 2], ‘header_2': [3, 4]}

df_1 = pd.DataFrame(data=d_1) # Constructing DataFrame from a list

df_2 = pd.DataFrame(data=d_2) # Constructing DataFrame from a dict

print(df_1)
print(df_2)

0 # Header
0 1 # First row
1 2 # Second row
2 3 # …

header_1 header_2 # Header

0 1 3 # First row
1 2 4 # Second row
Create a Pandas DataFrame based on the file
‘global_temp.txt’. Print out the database.
import pandas as pd

df = pd.read_table(‘global_temp.txt’, sep= ‘ ’)

print(df)
MANAGE DATA

Unwanted Observations

Remove Outliers

Fix Structural Errors

Handle Missing Data The least enjoyable

part of data science.
Spending the most
Filtering and Sorting Data time doing it.
Unwanted observations
• Duplicates: Frequently arise during collection, such as combining
different datasets.
• Irrelevant data: They don’t actually fit the specific problem.

Irrelevant data
Duplicates
• Removing identical rows

df = df.drop_duplicates(subset=‘Last Name', keep='first')

• Parameter subset: It takes a column or list of column label. After passing

columns, it will consider them only for duplicates.

• Parameter keep: It could be ‘first’, ‘last’ or ‘False’ (it consider all of the same
values as duplicates).
• Dropping irrelevant columns
df = df.drop([“Sales”], axis = 1)
• Dropping irrelevant rows
df = df.drop([“Johnson”, “Smith”])

• Dropping rows containing NaN

df = df.dropna()
Handle Missing Data:
• Dropping observations:
• Replace the entry with value “NaN”.

• By using the method .replace():

df = df.replace(1, 31) # Replace 1 with 31
df = df.replace(1, np.nan) # Replace 1 with “np.nan”

• Remove the whole row where information is missing.

• Warning! Missing data may be informative itself.
• Input missing values:
• The gap will be filled with artificial data (mean, median, std), having
similar properties then real observation. The added value will not be
scientifically valid, no matter how sophisticated your filling method is.
Unwanted Outliers:
• An observation that lies outside the overall pattern of a distribution.

• Common causes: human, measurement, experimental errors.

• Outliers are innocent until proven guilty.

Outlier

Outlier
Finding outliers with method .describe()

• The core statistics about a particular column can be studied by the

describe() method. The method returns the following:

A. For numeric columns: the value count, mean, standard deviation,

minimum, maximum, and 25th, 50th, and 75th quantiles for the data
in a column.

B. For string columns: the number of unique entries, the most

frequently occurring value (‘top’), and the number of times the top
value occurs (‘freq’)
import pandas as pd

d = {“Name”: [”Alisa”,”Bobby”,”Cat”,”Madonna”,”Rocky”],
“Age”: [1,27,25,24,31],
“IQ”: [100, 120, 95, 1300, 101]}

df = pd.DataFrame(d)
print(df.describe())

• Investigate the output and look for potential outliers.

Age IQ
count 5.000000 5.000000
mean 21.600000 343.200000 # Suspicious
std 11.823705 534.952054 # Suspicious
min 1.000000 95.000000 # Outlier: Too young
25% 24.000000 100.000000
50% 25.000000 101.000000
75% 27.000000 120.000000
max 31.000000 1300.000000 # Outlier: Too smart
Finding Outliers with Histograms

df.hist([‘Age’,’IQ])
plt.show() #It may be necessary after importing matplotlib

Unexpected behaviour,
i.e., far from general
population, nonsense
value, wrong
distribution shape, etc
…
Removing Outliers from the data

Remove the outlier by dropping the row, replacing its value one by
one or introducing a threshold.
• Dropping column or row can be done by the method .drop() as
discussed before.

• Replace the outlier with another value

df = df.replace(1, 31) # Replace 1 with 31
df = df.replace(1, np.nan) # Replace 1 with “np.nan”

• Introducing a threshold and remove the outlier:

df = df.mask(df <= 1, 10) # Replace every element<=1 with 10
• Read the database, named “iq_scores.csv”.

• Drop the insignificant rows: UID and LOCATION_ID

• Drop the duplicated lines.

• Errors are marked by the number -1. Remove them.

• Investigate the histogram of the variable IQ. Search for

unexpected behaviour and remove the outliers is there
are any.

• Plot the histogram IQ without any outliers or errors.

import pandas as pd

# Read the database

df = pd.read_csv("iq_scores.csv")

#Drop duplicates
df = df.drop_duplicates(subset=‘UID', keep=‘first’)

# Drop irrelevant columns

df = df.drop(['UID','LOCATION_ID'], axis=1)

# Investigate the data

df.hist(‘IQ’)

Outlier
import numpy as np

# Remove known errors/missing data

df = df.replace(-1, np.nan).dropna()

# Remove the outlier

df.mask(df['IQ'] > 900)

# Investigate the data

df.hist(‘IQ’)
Filtering Data
• Data segmentation: Limits of computation, e.g.
insufficient memory or CPU performance.
• Filtering by data attributes, e.g. separate the data
by time.
• Use the method .iloc().

Sorting Data
• Sorting by some dimension alphabetically
or numerically, e.g. sorting by time or date.
• Ascending or Descending.
• Use the method .sort_values().
Filtering Data by Using iloc()

• Select one element of the DataFrame A B C

df.iloc[row, col]) 0 1 4 7
df.iloc[1,2] #Out: 8 1 2 5 8
2 3 6 9
• Slicing through dimensions:
df.iloc[row1: row2, col1: col2])
df.iloc[0: 2, 2: 3])

Output:
C
0 7
1 8
• Select a column of the DataFrame

print(df.iloc[:,1]) # Output: 4, 5, 6

• Select a row of the DataFrame

print(df.iloc[2,:]) # Output: 3, 6, 9

• First 2 rows:

print(df.iloc[0:2,:])

• Remove the last row

print(df.iloc[:2,:])

• And so on…
CLEAN DATA
• Normalisation typically means rescales the values into a range of [0,1].
• In most cases, when you normalise data you eliminate the units of
measurement for data, enabling you to more easily compare data from
different places.

x = [1,43,65,23,4,57,87,45,45,23]

x − xmin xmin = 1
xnew =
xmax − xmin xmax = 87

xnew = [0,0.48,0.74,0.25,0.03,0.65,1,0.51,0.51,0.25]
Normalising a Numpy array or Normalising a column of Pandas
DataFrame (normalise column named “score” in Dataframe “df ”):

import numpy as np
import pandas as pd

raw_data= [1,43,65,23,4,57,87,45,45,23])

x = np.array(raw_data)
x_new = (x - x.min()) / (x.max() - x.min())

df = pd.DataFrame({‘score’: raw_data})
df[‘score’] = (df[‘score’] - df[‘score’].min()) /
(df[‘score’].max() - df[‘score’].min())
Data normalisation example
• Data Standardisation:
Standardisation typically means rescales x−μ
xnew =
data to have a mean of 0 and a standard σ
deviation of 1 (unit variance).

x = [1,43,65,23,4,57,87,45,45,23] μ = 39.3 xmax = 87

xnew = [−1.49,0.14,1.00, − 0.63, − 1.37,0.69,1.86,0.22,0.22, − 0.63]

Standardising a Numpy array or a column of Pandas
DataFrame (normalise column named “sc” in Dataframe “df ”):

import numpy as np
import pandas as pd

raw_data= [1,43,65,23,4,57,87,45,45,23])

x = np.array(raw_data)
x_new = (x - x.mean()) / x.std()

df = pd.DataFrame({‘sc’: raw_data})
df[‘sc’]=(df[‘sc’]-df[‘sc’].mean())/
df[‘sc’].std()
EXPLORATORY DATA ANALYSIS
Aim:

An approach to understanding the entire dataset.

Objectives:

1) Detection of mistakes. 2) Checking assumptions. 3) Detecting relationships

between variables. 4) Start to play with the data!

Tools:

EDA typically relies heavily on visualising the data to assess patterns and
identify data characteristics that the analyst would not otherwise know to look
for.
Example database: Airline safety

Aim: Should Travelers Avoid Flying Airlines

That Have Had Crashes in the Past?

Objectives: We are going to explore the

airline safety database between 1985-2014.

Tools: Univariate and multivariate data

visualisation and simple statistical tools.
Example database: Airline safety

The data is stored in csv format and it appears to be

structured (no missing data, no structural error).

The data contains the following information:

• airline: The new of the airline company.

• avail_seat: Passenger capacity. Available seat per km per week.
• incidents_85_99: Incidents between 1985 and 1999.
• fatal_accidents_85_99: Fatal accidents between 1985 and 1999.
• fatalities_85_99: Fatalities be between 1985 and 1999.
• incidents_00_14: Incidents between 2000 and 2014.
• fatal_accidents_00_14: Fatal accidents between 2000 and 2014.
• fatalities_00_14: Fatalities be between 2000 and 2014.
df = pd.read_csv(‘airline-safety_csv.csv')

airline avail_seat_km incidents_85_99 fatal_accidents_85_99 …

aeroflot* 1197672318 76 14 …
aerolineas arg. 385803648 6 0 …
aeromexico* 596871813 3 1 …
… … … … …
• Standardisation: Inconveniently big numbers
• What is the meaning of these numbers?
airline avail_seat_km incidents_85_99 fatal_accidents_85_99 …
aeroflot* -0.12 76 14 …
aerolineas arg. -0.68 6 0 …
aeromexico* -0.53 3 1 …
… … … … …
Univariate visualisation

• For each field in the

raw dataset.
> 70 incidents by  
• Is it the expected one airline? Outlier?
distribution?
• Are there any
outliers?

df.hist('incidents_85_99')
Univariate visualisation

Somebody is flying a lot… Somebody is crashing a lot…

Less fatalities recently?

df.hist()
The investigation starts

Insights
Somebody is flying a lot…

Connection?

Somebody is crashing a lot…

Is my data reliable?
> 70 incidents by  

Questions
one airline? Outlier?
Is my data reliable?
Is it safer to fly today than before?

So on…
Download and load the airline safety database.
Standardise the column “avail_seat” and find the airline
who had more than 70 incidents between 1985 and
1999.
import pandas as pd

# Read the database

df = pd.read_csv(‘airline-safety_csv.csv’)

# Filter out the airlines if incidents < 70

dfnan = df.mask(df[“incidents_85_99"] < 70)

# Drop the irrelevant rows

df_filtered = dfnan.dropna()

# Print the results

print(df_filtered)
airline avail_seat_km_per_week incidents_85_99 fatal_accidents_85_99
1 aeroflot* -0.127583 76.0 14.0

Flying less than the average. High number of incidents.

Multivariate visualisations

• Is there any relationship between the investigated data subsets?

• Is the relationship significant statistically or interesting scientifically?

Relation between capacity and incidents?

df.plot.scatter('avail_seat_km_per_week', 'incidents_85_99')
• Use corr() function to find the correlation among the columns in the dataframe
using ‘Pearson’ method.
• Correlations are never lower than -1. A correlation of -1 indicates that
the data points in a scatter plot lie exactly on a straight descending line.
• A correlation of 0 means that two variables don't have any linear relation
whatsoever. However, some non linear relation may exist between the two variables.
• Correlation coefficients are never higher than 1. A correlation
coefficient of 1 means that two variables are perfectly positively linearly related.

avail_seat_km_per_week incidents_85_99
avail_seat_km_per_week 1.000000 0.279538
incidents_85_99 0.279538 1.000000
fatal_accidents_85_99 0.468300 0.856991
incidents_00_14 0.725917 0.403009

High correlation coefficient and interesting.

High correlation coefficient but not scientifically interesting.

Investigate the relationship between the variables
“incidents_85_99” and “incidents_85_99”. Use scatter
plot the visualise the results.
import pandas as pd

# Read the database

df = pd.read_csv(‘airline-safety_csv.csv’)

df.plot.scatter('incidents_85_99',
'incidents_00_14')

There seems to be a
relationship but is it significant?

Significant improvement.

incidents_85_99
DATA ANALYSIS
Turn insight and ideas into scientifically valid results.

Use the most promising finding.

Perform in-depth analysis.

Check your results.

Prove your results.

Continue to investigate the details!

Different behaviours seem to be mixed in this statistics. If possible try to

separate the data. Separate manually or by using an algorithm

Random behaviour? Linear trend? Unique behaviour?

incidents_85_99
# Filter incidents < 10
df_l = df.mask(df["incidents_85_99"] > 10).dropna()

# Output: air canada, air india, air new zealand …

print(df_l[‘airline’])

# Plot the results

df_l.plot.scatter(‘incidents_85_99’, ‘incidents_00_14')

#Check the correlation: Output: 0.36

df_l['incidents_85_99'].corr(df_l['incidents_00_14'])
df_m = df.mask((df["incidents_85_99"] < 10) |
(df["incidents_85_99"] > 70)).dropna()

# Output: air france, china airlines, delta …

print(df_m[‘airline’])

# Plot the results

df_m.plot.scatter(‘incidents_85_99’, ‘incidents_00_14')

#Check the correlation: Output: 0.68

df_m[‘incidents_85_99'].corr(df_m['incidents_00_14'])

View publication stats

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (82)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Design of Experiments, Principles and Applications
100% (1)
Design of Experiments, Principles and Applications
350 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
hduud
No ratings yet
hduud
55 pages
exp3 python (1)
No ratings yet
exp3 python (1)
15 pages
Pandas
No ratings yet
Pandas
94 pages
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
No ratings yet
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
35 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Phython Example
No ratings yet
Phython Example
12 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Practical File IP
No ratings yet
Practical File IP
27 pages
CSE445 NSU Week_3
No ratings yet
CSE445 NSU Week_3
48 pages
Class 12 Practical File Informatics Practices
No ratings yet
Class 12 Practical File Informatics Practices
28 pages
dataframing_in_csv
No ratings yet
dataframing_in_csv
14 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Pandas PDF(2)
No ratings yet
Pandas PDF(2)
25 pages
PythonForMachineLearning
No ratings yet
PythonForMachineLearning
66 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas
No ratings yet
Pandas
12 pages
(The Ultimate PDF) Practical File For I.P. Practical 2023-24
No ratings yet
(The Ultimate PDF) Practical File For I.P. Practical 2023-24
45 pages
Pandas
No ratings yet
Pandas
5 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
ICT2103 Full Book-Part-3
No ratings yet
ICT2103 Full Book-Part-3
14 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
L6
No ratings yet
L6
67 pages
Data Cleaning
No ratings yet
Data Cleaning
20 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
python interviews
No ratings yet
python interviews
154 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
lab 1 ML lab
No ratings yet
lab 1 ML lab
15 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Reading 5 - Data Preparation
No ratings yet
Reading 5 - Data Preparation
23 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
justenoughpython_pandas_220915_175329
No ratings yet
justenoughpython_pandas_220915_175329
64 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
lecture-week5
No ratings yet
lecture-week5
72 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
List of Practical Ip065 Xii Session 2025 Ckc Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 Ckc Academy
19 pages
Unit6 - Working With Data
No ratings yet
Unit6 - Working With Data
29 pages
pandas_merged
No ratings yet
pandas_merged
2 pages
Unit 4 Pandas
No ratings yet
Unit 4 Pandas
8 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Day08-Pandas-Tutorial: Pandas - by Punith V T
No ratings yet
Day08-Pandas-Tutorial: Pandas - by Punith V T
8 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
Pandas Notes
No ratings yet
Pandas Notes
12 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Mastering Pandas in Python: Course Book
From Everand
Mastering Pandas in Python: Course Book
Pedro Martins
No ratings yet
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Discrete Probability Distributions Worksheet Answers
No ratings yet
Discrete Probability Distributions Worksheet Answers
8 pages
CS5560: Probabilistic Models For ML
No ratings yet
CS5560: Probabilistic Models For ML
3 pages
HR PLANNING Vits
No ratings yet
HR PLANNING Vits
61 pages
Explanatory Sequential and Exploratory Sequential
No ratings yet
Explanatory Sequential and Exploratory Sequential
35 pages
2024 PLUS Template Jurnal Bisnis Dan Manajemen
No ratings yet
2024 PLUS Template Jurnal Bisnis Dan Manajemen
5 pages
Parametric Stat Excel MS2007 Prez
100% (1)
Parametric Stat Excel MS2007 Prez
146 pages
Variance and Standard Deviation of The Sampling Distribution of Means With Replacement
No ratings yet
Variance and Standard Deviation of The Sampling Distribution of Means With Replacement
33 pages
GEA1000 Chapter 3 Review: David - Chew@nus - Edu.sg
No ratings yet
GEA1000 Chapter 3 Review: David - Chew@nus - Edu.sg
19 pages
Chapter 17 Basic Audit Sampling Concepts
100% (1)
Chapter 17 Basic Audit Sampling Concepts
32 pages
Energies: Buildings Energy Efficiency Analysis and Classification Using Various Machine Learning Technique Classifiers
No ratings yet
Energies: Buildings Energy Efficiency Analysis and Classification Using Various Machine Learning Technique Classifiers
24 pages
Legume Legacy TCD
No ratings yet
Legume Legacy TCD
5 pages
Sant Gadge Baba Amravati University: B.Sc. Part-I, Semester-I Examination of Summer-2017
No ratings yet
Sant Gadge Baba Amravati University: B.Sc. Part-I, Semester-I Examination of Summer-2017
31 pages
Analyzing Inequalities An Introduction To Race Class Gender and Sexuality Using The General Social Survey 1st Edition Catherine E. Harnois
100% (4)
Analyzing Inequalities An Introduction To Race Class Gender and Sexuality Using The General Social Survey 1st Edition Catherine E. Harnois
84 pages
Mathematics Anxiety Among STEM and Social Sciences Students: The Roles of Mathematics Self-Efficacy, and Deep and Surface Approach To Learning
No ratings yet
Mathematics Anxiety Among STEM and Social Sciences Students: The Roles of Mathematics Self-Efficacy, and Deep and Surface Approach To Learning
11 pages
FORM_TWO__SCHEME_term_2
No ratings yet
FORM_TWO__SCHEME_term_2
10 pages
Psychology Test: Tool: Rating Scale Variable: Study Habits
No ratings yet
Psychology Test: Tool: Rating Scale Variable: Study Habits
16 pages
Random Response Method Exam Questions
No ratings yet
Random Response Method Exam Questions
6 pages
Checklist For Evaluating A Research Report
No ratings yet
Checklist For Evaluating A Research Report
2 pages
17 Synopses
No ratings yet
17 Synopses
105 pages
Probability of Winning at Tennis
No ratings yet
Probability of Winning at Tennis
29 pages
Week 12
No ratings yet
Week 12
34 pages
Multiple Regression
No ratings yet
Multiple Regression
67 pages
Program Effectiveness Survey
No ratings yet
Program Effectiveness Survey
32 pages
GE ELEC 3 MODULE 5 Hypothesis Testing in Political Science
No ratings yet
GE ELEC 3 MODULE 5 Hypothesis Testing in Political Science
30 pages
Hans Lind - Theories and Models in Economics - An Empirical Approach To Methodology-Edward Elgar Publishing (2024)
No ratings yet
Hans Lind - Theories and Models in Economics - An Empirical Approach To Methodology-Edward Elgar Publishing (2024)
135 pages
Chapter 3: Organization, Utilization, and Communication of Test Results
No ratings yet
Chapter 3: Organization, Utilization, and Communication of Test Results
25 pages
Extruline X en
No ratings yet
Extruline X en
4 pages
Project
No ratings yet
Project
95 pages
Time Study
No ratings yet
Time Study
30 pages