Assignment 1 Solved
Assignment 1 Solved
)
Introduction to Data Analytics
Prof. Nandan Sudarsanam & Prof. B. Ravindran
1
(d) (a) and (b)
(e) (b) and (c)
Sol. (a)
4. To test the linear relationship between y (dependent) and x (independent) continuous variables,
the best plot is:
(a) bar chart
(b) scatter plot
(c) histogram
(d) pie chart
(e) none of the above
Sol. (b)
To test the linear relationship between continuous variables Scatter plot is a good option. We
can find out how one variable is changing w.r.t. another variable. A scatter plot displays the
relationship between two quantitative variables.
5. The algebraic sum of deviations from the mean is
(a) mean
(b) zero
(c) maximum
(d) minimum
(e) undefined
Sol. (b)
While calculating the algebraic sum, the sum deviations with the positive and negative sign
are equal in magnitude. So they cancel out each other.
6. In an agriculture research center, the scientists collected the past 20 years data of rainfall along
with the crop yield. If they want to perform regression analysis on this data, which variable
should they consider to be the independent variable and which one should they consider being
the dependent variable?
(a) Independent variable: yield, Dependent variable: rainfall
(b) Independent variable: rainfall, Dependent variable: yield
Sol. (b) We expect the amount of rainfall to have an impact on the crop yield and not the
other way around.
7. In a glass production house, John recorded the temperature values in degree Celsius. After
working out he came to know that mean of the data is 28.6 C and variance is 4.0( C)2 . If the
data values were converted to Fahrenheit (F), what would be the values of mean and variance?
We use the following formula to convert a temperature value from degrees Celsius (C) to
Fahrenheit (F)
9
F = C + 32
5
2
(a) mean = 28.6 F and variance = 4.0( F )2
(b) mean = 57.2 F and variance = 8.0( F )2
(c) mean = 87.22 F and variance = 16.38( F )2
(d) mean = 83.48 F and variance = 12.96( F )2
Sol. (d)
Mean = 28.6 + 32 = 83.48 F
9
5
2
Variance = 95 4.0 = 12.96( F )2
8. If a data set has even number of observations. Then median of the data set:
3
(a) mean, because it covers information from all 75 years
(b) IQR, because it is unaffected by the outliers
(c) median, because the distribution is skewed to the right
(d) standard deviation, because it is unaffected by outliers and the distribution is skewed
Sol. (c)
IQR and standard deviation are not measures of the central tendency. Since the distribution is
right skewed, mean will be biased towards values with higher frequency. Median is a resistant
measure which should be used here.