Exercise 7 - Pandas
Exercise 7 - Pandas
Exercise 7 - Pandas
2. Consider the following Python dictionary data and Python list labels:
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat',
'dog', 'dog'],
'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
a) Create a DataFrame df from this dictionary data which has the index labels.
b) Display a summary of the basic information about this DataFrame and its data
c) Return the first 3 rows of the DataFrame df.
d) Select just the 'animal' and 'age' columns from the DataFrame df.
e) Select the data in rows [3, 4, 8] and in columns ['animal', 'age'].
f) Select only the rows where the number of visits is greater than 3.
g) Select the rows where the age is missing, i.e. is NaN.
h) Select the rows the age is between 2 and 4 (inclusive).
i) Calculate the sum of all visits (the total number of visits).
j) Append a new row 'k' to df with your choice of values for each column. Then
delete that row to return the original DataFrame.
k) The 'priority' column contains the values 'yes' and 'no'. Replace this column
with a column of boolean values: 'yes' should be Trueand 'no' should be False.
4. a) Create a DatetimeIndex that contains each business day of 2015 and use it to
index a Series of random numbers. Let's call this Series s.
b) Find the sum of the values in s for every Wednesday.
c) For each calendar month in s, find the mean of values.
d) Create a DateTimeIndex consisting of the third Thursday in each month for the
years 2015 and 2016.
6. Given the lists letters = ['A', 'B', 'C'] and numbers = list(range(10)),
construct a Multi Index object from the product of the two lists. Use it to index a
Series of random numbers. Call this Series s.
a) Select the labels 1, 3 and 6 from the second level of the Multi Indexed
Series.
b) Slice the Series s; slice up to label 'B' for the first level and from label
5 onwards for the second level.
c) Sum the values in s for each label in the first level (you should have Series
giving you a total for labels A, B and C).
10. How to get the items not common to both series A and series B?
14. How to convert the first character of each element in a series to uppercase?
17. Create the 3 Data Frames based on the following raw data
raw_data_1 = {
'subject_id': ['1', '2', '3', '4', '5'],
'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
raw_data_2 = {
'subject_id': ['4', '5', '6', '7', '8'],
'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
raw_data_3 = {
'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}
a) Join the two data frames along rows and assign all_data
b) Join the two data frames along columns and assing to all_data_col
c) Merge all data and data3 along the subject_id value
d) Merge only the data that has the same 'subject_id' on both data1 and data2
e) Merge all values in data1 and data2, with matching records from both sides
where available