0% found this document useful (0 votes)

9 views

Python Datasci Slides

Uploaded by

mira.jeni1love

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Python Datasci Slides

Uploaded by

mira.jeni1love

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Teaching Data Science using Python II

The Python Data Science Ecosystem

Teaching Data Science using Python

Sandbox option: Berkeley's Data 8 course

• Uses the datascience package

Real world option uses a set of Python

packages:
• Standard Python libraries
• NumPy
• Pandas
• Matplotlib
• Also: seaborn, statsmodels, scikitlearn
Python Data Science packages

Going to give a basic overview of some of the main Python Data

Science packages

Will redo the avocado analyses using some of these packages

NumPy is a library that adds support for large, multi-dimensional arrays and
matrices, along with a large collection of high-level mathematical functions
to operate on these arrays.
• i.e., it is similar to MATLAB

The core data structure of NumPy is its "ndarray".

Ndarrays are similar to Python lists, except that all elements in an ndarray
must of the same type
• E.g., all elements are numbers, or all elements are strings, etc.
import numpy as np SciPy contains modules for
optimization, linear algebra,
x = np.array([1, 2, 3]) integration, interpolation, FFT, signal
2 * x and image processing, etc.
• Uses ndarrays as main data structure
# the numbers 0 to 9
x = np.arange(10)

# 3 x 3 matrix
M = np.array([[1, 2, 3], [3, 4, 6.7], [5, 9.0, 5]])
pandas is a library for data manipulation and analysis that has two main
data structures:

1. Series: One-dimensional ndarray with an index for each value

• Similar to a named vector in R

2. DataFrame: Two-dimensional, size-mutable, potentially

heterogeneous tabular data.
• Similar to an R data frame
• (or multiple Series of the same length with the same index)
import pandas as pd
avocado = pd.read_csv("avocado.csv")
avocado.head(3) # show the first 3 rows

avocado["AveragePrice"] # returns a series

# Get the average value for all numerical

columns separately for each type of avocado
avocado.groupby("type").mean().reset_index()
Matplotlib is a plotting library. Each plot has a figure and a number of different
subplots (axes).
• somewhat similar to base R graphics

It has two interfaces for plotting:

1. A "pylab" procedural interface based on a state machine that closely resembles

MATLAB
• Updates are made to the most recent axis plotted on

2. An object-oriented API
• Updates are made to the axis that is selected

The objected oriented interface is preferred (not a big difference)

import matplotlib.pyplot as plt

# pylab interface (like matlab)

plt.plot([1,3,10]);

# object oriented interface

fig, ax = plt.subplots()
ax.plot([1,3,10]);
seaborn is a visualization library built off Matplotlib, but it provides a
higher level interface that uses Pandas DataFrames
• somewhat similar to ggplot
Figure level plots
There are "axes-level" functions that plot
on a single axis and "figure-level"
functions that plot across multiple axes

Figure level plots are grouped based on

the types of variables being plotted
• E.g., a single quantitative variable, two
quantitative variables, etc.
import seaborn as sns
penguins = sns.load_dataset("penguins")

# figure-level plot
sns.displot(data=penguins,
x="flipper_length_mm",
hue="species",
multiple="stack",
kind="kde");
Translation between Tables and DataFrames

Translation between datascience Tables and pandas DataFrames

Translation between datascience Tables and babypandas DataFrames

Let’s try it ourselves!

Device Properties Manual For Programming Cards - DPD00705V004
100% (1)
Device Properties Manual For Programming Cards - DPD00705V004
34 pages
Learning Pandas Library
100% (1)
Learning Pandas Library
271 pages
Arnauld's Objections To Descartes
100% (2)
Arnauld's Objections To Descartes
7 pages
Japanese Construction Firms en
No ratings yet
Japanese Construction Firms en
4 pages
GROHE To Know How EN PDF
No ratings yet
GROHE To Know How EN PDF
572 pages
Risk Management Plan
100% (7)
Risk Management Plan
32 pages
PP_unit-5_notes
No ratings yet
PP_unit-5_notes
15 pages
Dav Lab
No ratings yet
Dav Lab
8 pages
AIES Assignment1
No ratings yet
AIES Assignment1
15 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
Python Libraries
No ratings yet
Python Libraries
17 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Session3 - Analytics For Programming II - Siryani - 090524
No ratings yet
Session3 - Analytics For Programming II - Siryani - 090524
28 pages
BDA File
No ratings yet
BDA File
26 pages
Python Week+1 New
No ratings yet
Python Week+1 New
44 pages
Day2Part2. DataVisualization
No ratings yet
Day2Part2. DataVisualization
29 pages
BSM 461 Introduction To Big Data: Kevser Ovaz Akpınar, PHD
No ratings yet
BSM 461 Introduction To Big Data: Kevser Ovaz Akpınar, PHD
26 pages
Python Data Visualization Cookbook - Second Edition - Sample Chapter
100% (1)
Python Data Visualization Cookbook - Second Edition - Sample Chapter
22 pages
Unit 5
No ratings yet
Unit 5
11 pages
Unit 5
No ratings yet
Unit 5
27 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
23 pages
Data Science ppt
No ratings yet
Data Science ppt
17 pages
DSLab2020 - Week 1 Exercises
No ratings yet
DSLab2020 - Week 1 Exercises
30 pages
unit 5
No ratings yet
unit 5
28 pages
DM File
No ratings yet
DM File
22 pages
LAB 2 DWM
No ratings yet
LAB 2 DWM
13 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
16. PYTHON PACKAGES TO LEARN DATA SCIENCE E-BOOK
No ratings yet
16. PYTHON PACKAGES TO LEARN DATA SCIENCE E-BOOK
76 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
Introduction To Matplotlib
No ratings yet
Introduction To Matplotlib
20 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
PP&DS UNIT III
No ratings yet
PP&DS UNIT III
26 pages
Chapter 4 Data Visualizations
No ratings yet
Chapter 4 Data Visualizations
24 pages
Unit 2 Mca275 PPT Part 2
No ratings yet
Unit 2 Mca275 PPT Part 2
33 pages
Python-Numpy & Pandas
No ratings yet
Python-Numpy & Pandas
78 pages
Unit 7: Problem Solving Real World Programming Problems
No ratings yet
Unit 7: Problem Solving Real World Programming Problems
36 pages
Unit 3 (Python)
No ratings yet
Unit 3 (Python)
29 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Exp1-ref-doc-installation
No ratings yet
Exp1-ref-doc-installation
6 pages
Experiment No 2
No ratings yet
Experiment No 2
2 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Mastering Python Data Visualization - Sample Chapter
100% (9)
Mastering Python Data Visualization - Sample Chapter
63 pages
FDS Notes Unit-5
No ratings yet
FDS Notes Unit-5
24 pages
Data Visualization
No ratings yet
Data Visualization
35 pages
ProgrammingForDS12_viz
No ratings yet
ProgrammingForDS12_viz
25 pages
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
No ratings yet
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
76 pages
Data Visualization
No ratings yet
Data Visualization
25 pages
Exp-1
No ratings yet
Exp-1
22 pages
Fds Record
No ratings yet
Fds Record
69 pages
Visualization - Python Data Analysis
No ratings yet
Visualization - Python Data Analysis
13 pages
data science
No ratings yet
data science
42 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
74 pages
Visualization Library Documentation
No ratings yet
Visualization Library Documentation
16 pages
Ai - ML - Sarthak1.4
No ratings yet
Ai - ML - Sarthak1.4
4 pages
Scipy,Matplotlib,Pandas
No ratings yet
Scipy,Matplotlib,Pandas
16 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
Unit 5-Python Packages 240127 185930
No ratings yet
Unit 5-Python Packages 240127 185930
34 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
Ip 102
No ratings yet
Ip 102
36 pages
3-numpy_pandas
No ratings yet
3-numpy_pandas
37 pages
Matplotlib Tutorial
50% (4)
Matplotlib Tutorial
81 pages
lab2report
No ratings yet
lab2report
6 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Influence of Debt Equity-897
No ratings yet
Influence of Debt Equity-897
11 pages
10 Grade - Chemistry - Second Term2023 (F.V)
No ratings yet
10 Grade - Chemistry - Second Term2023 (F.V)
1 page
Multimeter Parts and Functions
No ratings yet
Multimeter Parts and Functions
3 pages
OR
100% (1)
OR
716 pages
Smepmp
No ratings yet
Smepmp
9 pages
Brand EMI Activation by PSA App
No ratings yet
Brand EMI Activation by PSA App
10 pages
Aguerra Mary Anjaeline G. EXERCISES 2 PROPERTIES OF LIQUIDS DUE TO INTERMOLECULAR FORCES EXHIBIT
No ratings yet
Aguerra Mary Anjaeline G. EXERCISES 2 PROPERTIES OF LIQUIDS DUE TO INTERMOLECULAR FORCES EXHIBIT
4 pages
Mayur Resume
No ratings yet
Mayur Resume
1 page
Environmental Scanning and Industry Analysis
100% (2)
Environmental Scanning and Industry Analysis
64 pages
2 - Case Study On Human Resource Management Software
No ratings yet
2 - Case Study On Human Resource Management Software
3 pages
Lab Manual
50% (2)
Lab Manual
17 pages
Sound Healing Through Quranic Recitation and Its Psychological Effects
No ratings yet
Sound Healing Through Quranic Recitation and Its Psychological Effects
8 pages
Dehx3950bt Instruction Manual - 251976
No ratings yet
Dehx3950bt Instruction Manual - 251976
96 pages
SCED 404.04 Fall23
No ratings yet
SCED 404.04 Fall23
2 pages
Estimation of Reservoir Properties by Monte Carlo Simulation
No ratings yet
Estimation of Reservoir Properties by Monte Carlo Simulation
8 pages
Interim Visit Summary Format
No ratings yet
Interim Visit Summary Format
2 pages
Capacitor
No ratings yet
Capacitor
10 pages
5 Articles Worth Reading This Week - by Inahid - Medium: Invntory
No ratings yet
5 Articles Worth Reading This Week - by Inahid - Medium: Invntory
4 pages
[Ebooks PDF] download The Politics of New Immigrant Destinations Transatlantic Perspectives 1st Edition Stefanie Chambers (Editor) full chapters
No ratings yet
[Ebooks PDF] download The Politics of New Immigrant Destinations Transatlantic Perspectives 1st Edition Stefanie Chambers (Editor) full chapters
90 pages
RA - Republic Act No. 10863 PDF
100% (1)
RA - Republic Act No. 10863 PDF
130 pages
Exhaust System 4ja1 and 4jhi
No ratings yet
Exhaust System 4ja1 and 4jhi
20 pages
The Complicated Psychology of The Most OFFENSIVE Character On TV
No ratings yet
The Complicated Psychology of The Most OFFENSIVE Character On TV
22 pages
Igcse Physics 3ed TR Coursebook Answers Cambridge Physics 3
No ratings yet
Igcse Physics 3ed TR Coursebook Answers Cambridge Physics 3
39 pages
Faq - LPG
No ratings yet
Faq - LPG
13 pages
History of Propeller Cornelio Celeste
No ratings yet
History of Propeller Cornelio Celeste
6 pages

Python Datasci Slides

Uploaded by

Python Datasci Slides

Uploaded by

Teaching Data Science using Python II

The Python Data Science Ecosystem

Sandbox option: Berkeley's Data 8 course

Real world option uses a set of Python

Going to give a basic overview of some of the main Python Data

Will redo the avocado analyses using some of these packages

The core data structure of NumPy is its "ndarray".

1. Series: One-dimensional ndarray with an index for each value

2. DataFrame: Two-dimensional, size-mutable, potentially

avocado["AveragePrice"] # returns a series

# Get the average value for all numerical

It has two interfaces for plotting:

1. A "pylab" procedural interface based on a state machine that closely resembles

The objected oriented interface is preferred (not a big difference)

# pylab interface (like matlab)

# object oriented interface

Figure level plots are grouped based on

Translation between datascience Tables and pandas DataFrames

Translation between datascience Tables and babypandas DataFrames

You might also like