pandas_workshop - Jupyter Notebook
pandas_workshop - Jupyter Notebook
Data frame is a main object in pandas. It is used to represent data with rows and columns
Data frame is a datastructure represent the data in tabular or excel spread sheet like data)
creating dataframe:
In [1]:
1 import pandas as pd
2 df = pd.read_csv("weather_data.csv") #read weather.csv data
3 df
Out[1]:
0 1/1/2017 32 6 Rain
1 1/2/2017 35 7 Sunny
2 1/3/2017 28 2 Snow
3 1/4/2017 24 7 Snow
4 1/5/2017 32 4 Rain
5 1/6/2017 31 2 Sunny
In [2]:
1 #list of tuples
2
3 weather_data = [('1/1/2017', 32, 6, 'Rain'),
4 ('1/2/2017', 35, 7, 'Sunny'),
5 ('1/3/2017', 28, 2, 'Snow'),
6 ('1/4/2017', 24, 7, 'Snow'),
7 ('1/5/2017', 32, 4, 'Rain'),
8 ('1/6/2017', 31, 2, 'Sunny')
9 ]
10 df = pd.DataFrame(weather_data, columns=['day', 'temperature', 'windspeed', 'event'])
11 df
Out[2]:
0 1/1/2017 32 6 Rain
1 1/2/2017 35 7 Sunny
2 1/3/2017 28 2 Snow
3 1/4/2017 24 7 Snow
4 1/5/2017 32 4 Rain
5 1/6/2017 31 2 Sunny
In [3]:
Out[3]:
(6, 4)
In [4]:
1 #if you want to see initial some rows then use head command (default 5 rows)
2 df.head()
Out[4]:
0 1/1/2017 32 6 Rain
1 1/2/2017 35 7 Sunny
2 1/3/2017 28 2 Snow
3 1/4/2017 24 7 Snow
4 1/5/2017 32 4 Rain
In [5]:
1 #if you want to see last few rows then use tail command (default last 5 rows will print
2 df.tail()
Out[5]:
1 1/2/2017 35 7 Sunny
2 1/3/2017 28 2 Snow
3 1/4/2017 24 7 Snow
4 1/5/2017 32 4 Rain
5 1/6/2017 31 2 Sunny
In [6]:
1 #slicing
2 df[2:5]
Out[6]:
2 1/3/2017 28 2 Snow
3 1/4/2017 24 7 Snow
4 1/5/2017 32 4 Rain
In [0]:
Out[21]:
In [7]:
Out[7]:
0 1/1/2017
1 1/2/2017
2 1/3/2017
3 1/4/2017
4 1/5/2017
5 1/6/2017
Name: day, dtype: object
In [0]:
Out[24]:
0 1/1/2017
1 1/2/2017
2 1/3/2017
3 1/4/2017
4 1/5/2017
5 1/6/2017
Name: day, dtype: object
In [0]:
Out[26]:
day event
0 1/1/2017 Rain
1 1/2/2017 Sunny
2 1/3/2017 Snow
3 1/4/2017 Snow
4 1/5/2017 Rain
5 1/6/2017 Sunny
In [0]:
Out[28]:
0 32
1 35
2 28
3 24
4 32
5 31
Name: temperature, dtype: int64
In [0]:
Out[29]:
35
In [0]:
Out[30]:
24
In [0]:
Out[31]:
count 6.000000
mean 30.333333
std 3.829708
min 24.000000
25% 28.750000
50% 31.500000
75% 32.000000
max 35.000000
Name: temperature, dtype: float64
In [10]:
Out[10]:
1 1/2/2017 35 7 Sunny
In [11]:
Out[11]:
1 1/2/2017 35 7 Sunny
In [0]:
Out[33]:
1 1/2/2017
Name: day, dtype: object
In [0]: