HW 1
HW 1
HW 1
HOMEWORK 1
Tauseef Shah
HW1
Due Date: Nov. 9, 2022
Submission requirements:
1. Suppose that a data warehouse consists of four dimensions, date, spectator, location,
and game, and two measures, count and charge, where charge is the fare that a
spectator pays when watching a game on a given date. Spectators may be students,
adults, or seniors, with each category having its own charge rate.
(a) Draw a star schema diagram for the data warehouse.
Star Schema
Spectator
Date Spectator_id
Fact Table
Date_ID Date_ID Spectator_nam
e
Day spectator_ID patient_id
Day of the Week Location_ID Phone
Game_ID Address
Month Charge Status
Quarter Count Charge_rate
Year
Game Location
Game_ID Location_ID
Game_Name Phone #
Description City
producer Street
Province
Country
(b) Starting with the base cuboid [date, spectator, location, game] , what specific
OLAP operations should one perform in order to list the total charge paid by
PAGE 1 11/18/22
student spectators in Los Angeles?
(c) Bitmap indexing is a very useful optimization technique. Please present the pros
and cons of using bitmap indexing in this given data warehouse.
2. Suppose a hospital tested the age and body fat data for 18 random selected adults with
the following result:
age 23 23 27 27 39 41 47 49 50 52 54 54 56 57 58 58 60 61
%f 9. 26. 7. 17. 31. 25. 27. 27. 31. 34. 42. 28. 33. 30. 34. 32. 41. 35.
at 5 5 8 8 4 9 4 2 2 6 5 8 4 2 1 9 2 7
I used MATLAB to solve this question. MATLAB Code is given in the end of this assignment.
(a) Calculate the mean, median, and standard deviation of age and %fat.
PAGE 2 11/18/22
(b) Draw the boxplots for age and %fat.
PAGE 3 11/18/22
(c) Draw a scatter plot based on these two variables.
-1.77359187028033
-1.77359187028033
-1.47098851800501
-1.47098851800501
-0.563178461179061
-0.411876785041403
0.0420282433715718
0.193329919509230
0.268980757578059
0.420282433715717
0.571584109853375
0.571584109853375
0.722885785991034
0.798536624059863
0.874187462128692
0.874187462128692
1.02548913826635
1.10113997633518
age Values
∑ = 836
PAGE 4 11/18/22
Mean = 46.444
∑(age – Mage)2 = SSage = 2970.444
%Fats Values
∑ = 518.1
Mean = 28.783
∑(Fat – Mfat)2 = SSfat = 1455.945
correlation Calculation
correlation = ∑((age – Mfat)(fat – Mage)) / √((SSage)(SSfat))
(f) Smooth the fat data by bin means, using a bin depth of 6.
Bin1: [19.1167 19.1167 19.1167 19.1167 19.1167 19.1167]
Bin2: [30.3167 30.3167 30.3167 30.3167 30.3167 30.3167]
Bin3:[ 36.9167 36.9167 36.9167 36.9167 36.9167 36.9167]
(g) Smooth the fat data by bin boundaries, using a bin depth of 6.
Bin1:[ 7.8000 7.8000 27.2000 27.2000 27.2000 27.2000]
Bin2:[ 27.4000 27.4000 32.9000 32.9000 32.9000 32.9000]
Bin3:[ 33.4000 34.4000 33.4000 33.4000 42.5000 42.5000]
PAGE 5 11/18/22
2. Assume that a great number of emails are stored in a database. Now you would like
to build a data warehouse for these emails to facilitate data analysis or data mining
later on. Please give a design for the email data warehouse by using a schema
diagram. In addition, please provide your design for the fact table(s) and dimension
tables.
For our design we are using 3-tier architecture for the data warehouse. We are
using the star schema for the fact table and dimension table. The conceptual
design, schema used and tables used are given below
PAGE 6 11/18/22
Star Schema
Conversation Table
Email Table
Email_ID
Email_Created
Email_edited
Email_deleted
PAGE 7 11/18/22
MATLAB Code for Question 2, part a,b,c,d and f
age=[23 23 27 27 39 41 47 49 50 52 54 54 56 57 58 58 60
61];
fat=[9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2 34.6 42.5
28.8 33.4 30.2 34.1 32.9 41.2 35.7];
n_age=normalize(age);
mean_age=mean(age,2);
median_age=median(age,2);
std_age=std(age);
mean_fat=mean(fat,2);
median_fat=median(fat,2);
std_fat=std(fat);
hold off
figure(1)
boxchart(age)
ylabel('Age (years)')
title('Age');
hold off
figure (2)
boxchart(fat)
ylabel('Fat (age)')
title('Fats');
hold off
figure (3)
scatter(age,fat)
PAGE 8 11/18/22
xlabel('age')
ylabel('fats')
title('Scatter Plot');
PAGE 9 11/18/22