Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

5 - Pandas and Groupby

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

In [1]: import pandas as pd

df=pd.read_csv('/Users/jagadeeshreddy/Downloads/Ele_Store.csv')

In [2]: type(df)

pandas.core.frame.DataFrame
Out[2]:

In [3]: df.head(5)

Out[3]: OrderID OrderDate UnitCost Price OrderQty CostofSales Sales Profit Channel PromotionName ProductName Manufacturer ProductSubCategory ProductCategory Region City Country

13-09- European Spring Contoso SLR Camera M143 Cameras and


0 7077 76.094968 304.00 9 684.854710 2714.7200 2029.865290 Store Contoso, Ltd Digital SLR Cameras Europe Moscow Russia
2017 Promotion Grey camcorders

14-09- European Spring Contoso 512MB MP3 Player


1 117 7.491753 12.99 4 29.967011 50.1414 20.174389 Store Contoso, Ltd MP4&MP3 Audio Europe Moscow Russia
2017 Promotion E51 Blue

15-09- European Spring Contoso DVD 9-Inch Player Music, Movies and Audio
2 7018 10.122338 159.99 9 91.101039 1395.1128 1304.011761 Store Contoso, Ltd Movie DVD Europe Moscow Russia
2017 Promotion Portable M300 Silver Books

16-09- North America Spring NT Bluetooth Stereo Northwind Bluetooth North United
3 140 0.576153 25.69 18 10.370759 462.4200 452.049241 Store Audio Bellevue
2017 Promotion Headphones E52 Pink Traders Headphones America States

17-09- Contoso SLR Camera M143 Cameras and


4 491 108.508777 304.00 9 976.578991 2614.4000 1637.821009 Online Asian Spring Promotion Contoso, Ltd Digital SLR Cameras Asia Beijing China
2017 Grey camcorders

In [4]: df.isna().sum()

OrderID 0
Out[4]:
OrderDate 0
UnitCost 0
Price 0
OrderQty 0
CostofSales 0
Sales 0
Profit 0
Channel 0
PromotionName 0
ProductName 0
Manufacturer 0
ProductSubCategory 0
ProductCategory 0
Region 0
City 0
Country 0
dtype: int64

In [7]: df.shape

(15000, 17)
Out[7]:

In [8]: list(df.columns)

['OrderID',
Out[8]:
'OrderDate',
'UnitCost',
'Price',
'OrderQty',
'CostofSales',
'Sales',
'Profit',
'Channel',
'PromotionName',
'ProductName',
'Manufacturer',
'ProductSubCategory',
'ProductCategory',
'Region',
'City',
'Country']

In [10]: df1=df[['Region','City','Country','Sales','Profit','Channel','OrderQty']]
df1.shape

(15000, 7)
Out[10]:

In [11]: df1.head(4)

Out[11]: Region City Country Sales Profit Channel OrderQty

0 Europe Moscow Russia 2714.7200 2029.865290 Store 9

1 Europe Moscow Russia 50.1414 20.174389 Store 4

2 Europe Moscow Russia 1395.1128 1304.011761 Store 9

3 North America Bellevue United States 462.4200 452.049241 Store 18

In [12]: df1.groupby(['Region','Country','City'])[['Profit']].sum(5)

Out[12]: Profit

Region Country City

Asia Armenia Yerevan 1.119702e+05

Australia Canberra 9.384047e+04

Sydney 2.503940e+05

Bhutan Thimphu 8.817516e+04

China Beijing 3.810007e+06

... ... ... ...

North America United States Waukesha 9.556164e+04

Wheat Ridge 5.794578e+04

Winchester 5.261520e+04

Worcester 1.087016e+05

Yakima 7.079848e+04

263 rows × 1 columns

In [13]: df1.groupby(['Region','Country','City'])[['Profit','Sales']].sum(5)

Out[13]: Profit Sales

Region Country City

Asia Armenia Yerevan 1.119702e+05 1.729347e+05

Australia Canberra 9.384047e+04 2.112396e+05

Sydney 2.503940e+05 4.214029e+05

Bhutan Thimphu 8.817516e+04 1.612916e+05

China Beijing 3.810007e+06 6.596953e+06

... ... ... ... ...

North America United States Waukesha 9.556164e+04 1.953965e+05

Wheat Ridge 5.794578e+04 1.295698e+05

Winchester 5.261520e+04 9.426428e+04

Worcester 1.087016e+05 1.839504e+05

Yakima 7.079848e+04 1.008435e+05

263 rows × 2 columns

In [18]: df1.groupby(['Country'])[['Profit','Sales','OrderQty']].sum(5)
df1.groupby(['Country']).agg({'Profit':'mean','Sales':'mean','OrderQty':'mean'}).head(5)

Out[18]: Profit Sales OrderQty

Country

Armenia 2544.776604 3930.333023 14.318182

Australia 2310.298224 4245.922889 14.946309

Bhutan 1836.982502 3360.241792 11.854167

Canada 1863.152960 3198.653065 14.181058

China 2625.444989 4517.617332 21.163778

You might also like