Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
318 views

Python Data Science Assignment

The document provides instructions for analyzing salary and ecommerce purchase datasets using Pandas in Python. For the salary dataset, it asks to find the average and highest salaries, identify an individual's job title and pay, and perform aggregate analysis by job title, year, and other fields. For the ecommerce data, it asks similar questions about purchase prices, customer demographics like language and job, and relationships between fields. It provides the code to import Pandas and load the datasets, and asks the reader to write Python code to answer over 20 questions by analyzing and filtering the data.

Uploaded by

Mo Shah
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
318 views

Python Data Science Assignment

The document provides instructions for analyzing salary and ecommerce purchase datasets using Pandas in Python. For the salary dataset, it asks to find the average and highest salaries, identify an individual's job title and pay, and perform aggregate analysis by job title, year, and other fields. For the ecommerce data, it asks similar questions about purchase prices, customer demographics like language and job, and relationships between fields. It provides the code to import Pandas and load the datasets, and asks the reader to write Python code to answer over 20 questions by analyzing and filtering the data.

Uploaded by

Mo Shah
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment -01

Salary Exercise
import pandas as pd
df=pd.read_csv('salary.csv')
show the first five rows
print(df.head())
use the info() method the show the information about the entry.
print(df.info())

 What is the average BasePay?


 ans :- df['BasePay'].mean()
 What is the highest amount of OvertimePay in the dataset??
 ans:- df['OvertimePay'].max()
 What is the job title of JOSEPH DRISCOLL ? Note: Use all caps, otherwise
you may get an answer that doesn't match up (there is also a lowercase
Joseph Driscoll)
 ans:- df[df['EmployeeName']==' JOSEPH DRISCOLL ']['jobtitle']
 How much does JOSEPH DRISCOLL make (including benefits)?
 ans:- df[df['EmployeeName']=='JOSEPH DRISCOLL']['TotalPayBenefit']
 What is the name of highest paid person (including benefits)?
 ans:- df[df['TotalPayBenefits']== df['TotalPayBenefits'].max()]
 What is the name of lowest paid person (including benefits)? Do you
notice something strange about how much he or she is paid?
 ans:-df[df['TotalPayBenefits']== df['TotalPayBenefits'].min()]
 What was the average (mean) BasePay of all employees per year? (2011-
2014) ?
 ans:-df.groupby('Year').mean()['BasePay']
 How many unique job titles are there?
 ans :- df['JobTitle'].nunique()
 What are the top 5 most common jobs?
 ans:- df['JobTitle'].value_counts().head(5)
 How many Job Titles were represented by only one person in 2013? (e.g.
Job Titles with only one occurence in 2013?)
 ans:-df[df['Year']==2013]['JobTitle'].value_counts() == 1)]
 How many people have the word Chief in their job title? (This is pretty
tricky)
 ans:- def chief_string(title):
 if 'chief' in title.lower():
 return True
 else:
 return False
 df(df['JobTitle'].apply(lambda x: chief_string(x)))
 Bonus: Is there a correlation between length of the Job Title string and
Salary?
df['title_len'] = df['JobTitle'].apply(len)
df[['title_len','TotalPayBenefits']].corr()

Assignment -02
Ecommerce Purchases Exercise
import pandas as pd
ecom=pd.read_csv('Ecommerce Purchase')
In this Exercise you will be given some Fake Data about some purchases done through Amazon!
Just go ahead and follow the directions and try your best to answer the questions and complete
the tasks.
Import pandas and read in the Ecommerce Purchases csv file and set it to a DataFrame called
ecom.

 Check the head of the DataFrame.


 ans :- print(ecom.head())
 How many rows and columns are there?
 ans :- print(ecom.info())
 What is the average Purchase Price?
 ans:- print(ecom['PurchasePrice'].mean())
 What were the highest and lowest purchase prices?
 ans:- print(ecom['Purchase Price'].max())
 print(ecom['PurchasePrice'].min())
 How many people have English 'en' as their Language of choice on the website?
 ans:- print(ecom[ecom['Language']=='en'].count())
 How many people have the job title of "Lawyer" ?
 ans:- print(ecom[ecom['job']=='lawyer'].info())
 How many people made the purchase during the AM and how many people made the
purchase during PM ?
 ans :- print(ecom['AM or PM'].value_counts())
 What are the 5 most common Job Titles?
 ans:- print(ecom['job'].value_counts().head(5))
 Someone made a purchase that came from Lot: "90 WT" , what was the Purchase Price
for this transaction?
 ans:-print(ecom[ecom['lot']=='90 WT']['Purchase Price'])
 What is the email of the person with the following Credit Card Number:
4926535242672853?
 ans:-print(ecom[ecom['Credit Card']=='4926535242672853']['Email'])
 How many people have American Express as their Credit Card Provider *and* made a
purchase above $95 ?
 ans:-print(ecom[(ecom['CC Provider']=='American Express') & (ecom['Purchase
Price']>95)].count())
 How many people have a credit card that expires in 2025?
 ans:- print(sum(ecom['CC Exp Date'].apply(lambda x: x[3:]) == '25'))
 What are the top 5 most popular email providers/hosts (e.g. gmail.com, yahoo.com,
etc...)
ans:- ecom['Email'].apply(lambda x: x.split('@')[1]).value_counts().head(5)

You might also like