5 - Pandas and Groupby
5 - Pandas and Groupby
5 - Pandas and Groupby
df=pd.read_csv('/Users/jagadeeshreddy/Downloads/Ele_Store.csv')
In [2]: type(df)
pandas.core.frame.DataFrame
Out[2]:
In [3]: df.head(5)
Out[3]: OrderID OrderDate UnitCost Price OrderQty CostofSales Sales Profit Channel PromotionName ProductName Manufacturer ProductSubCategory ProductCategory Region City Country
15-09- European Spring Contoso DVD 9-Inch Player Music, Movies and Audio
2 7018 10.122338 159.99 9 91.101039 1395.1128 1304.011761 Store Contoso, Ltd Movie DVD Europe Moscow Russia
2017 Promotion Portable M300 Silver Books
16-09- North America Spring NT Bluetooth Stereo Northwind Bluetooth North United
3 140 0.576153 25.69 18 10.370759 462.4200 452.049241 Store Audio Bellevue
2017 Promotion Headphones E52 Pink Traders Headphones America States
In [4]: df.isna().sum()
OrderID 0
Out[4]:
OrderDate 0
UnitCost 0
Price 0
OrderQty 0
CostofSales 0
Sales 0
Profit 0
Channel 0
PromotionName 0
ProductName 0
Manufacturer 0
ProductSubCategory 0
ProductCategory 0
Region 0
City 0
Country 0
dtype: int64
In [7]: df.shape
(15000, 17)
Out[7]:
In [8]: list(df.columns)
['OrderID',
Out[8]:
'OrderDate',
'UnitCost',
'Price',
'OrderQty',
'CostofSales',
'Sales',
'Profit',
'Channel',
'PromotionName',
'ProductName',
'Manufacturer',
'ProductSubCategory',
'ProductCategory',
'Region',
'City',
'Country']
In [10]: df1=df[['Region','City','Country','Sales','Profit','Channel','OrderQty']]
df1.shape
(15000, 7)
Out[10]:
In [11]: df1.head(4)
In [12]: df1.groupby(['Region','Country','City'])[['Profit']].sum(5)
Out[12]: Profit
Sydney 2.503940e+05
Winchester 5.261520e+04
Worcester 1.087016e+05
Yakima 7.079848e+04
In [13]: df1.groupby(['Region','Country','City'])[['Profit','Sales']].sum(5)
In [18]: df1.groupby(['Country'])[['Profit','Sales','OrderQty']].sum(5)
df1.groupby(['Country']).agg({'Profit':'mean','Sales':'mean','OrderQty':'mean'}).head(5)
Country