Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Stats For Data Science Assignment-2: NAME: Rakesh Choudhary ROLL NO.-167 BATCH-Big Data B3

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

STATS FOR DATA SCIENCE ASSIGNMENT-2

NAME: Rakesh Choudhary


ROLL NO.-167
BATCH-Big Data B3

1. According to the 2003 Annual Consumer Spending Survey, the average


monthly Bank of America Visa credit card charge was $1838 (U.S.
Airways Attaché Magazine, December 2003). A sample of monthly credit
card charges provides the following data.

236 1710 1351 825


7450
316 4135 1333
1584 387
991 3396 170
1428 1688

a.Compute the mean and median.


Ans. Mean:

Here the number of terms=n=15


Hence,
Mean=(236+1710+1351+825+7450+316+4135+1333+1584+387+991+339
6+170+1428+1688)/15
Mean=27000/15
Mean=1800

Ans: Mean=1800
Median:
Since the number of credit score is odd hence the median is the middle element
of the occurences:
Median=1351
b.Compute the first and third quartiles.
Ans. The first Quartile is the median of the data values below the median(or
at25 % of the data):
Q1=387

The third Quartile is the median of the data values above the median(or at75%
of the data):
Q1=1710

c. Compute the range and interquartile range. Ans.


Range=Highest data value- lowest data value
Therefore, Range=7450-170=7280
Range=7280
Interquartile Range: Third Quartile-First Quartile
InterQuartile Range=Q3-Q1=1710-387=1323
InterQuartileRange=1323

d.Compute the variance and standard deviation.


And. Variance=squared deviations^2/(n-1)
s^2= (236-1800)^2+(1710-1800)^2+(1351-1800)^2+(825-1800)^2+(7450-
1800)^2+(316-1800)^2+(4135-1800)^2+(1333-1800)^2+(1584-
1800)^2+(387-1800)^2+(991-1800)^2+(3396-1800)^2+(170-1800)^2+(1428-
1800)^2+(1688-1800)^2/(15-1)
variance=3675303
Standard Deviation=variance^(1/2)
Standard Deviation=1917.108
e. The skewness measure for these data is 2.12. Comment on the shape of
this distribution. Is it the shape you would expect? Why or why not?
Ans.The skewness measure is positive, which indicates that the data is
positively skewed. This is to be expected, because most charges will be
small, but some will be much larger.
f. Do the data contain outliers?
Ans. Outliers are the observations that are more than 15 times the
InterQuartile Range above Q3 or below Q1.
Q3+1.5(IQR)=1710+1.5(1323)=3694.5
Q1-1.5IQR=387-1.5(1323)=-1597.5
Therefore We note that 7450 and 4135 both lie between the range hence, both
are the outliers.
2. A bowler’s scores for 3 games were 182, 190, and 168. Compute the range, variance,
standard
deviation, and the coefficient of variation.
Ans Range=168-182=4
Mean=(182+190+168)/3=540/3=180
Variance=(182-180)^2+(190-180)^2+(168-180)^2/(3-1)
Variance=248/2=124

Standard Deviation=124^(1/2)=11.136
Coefficient of Variation=(Standard Deviation/mean)*100=(11.136/180)*100=6.2144

3. Dividend yield is the annual dividend per share a company pays


divided by the current market price per share expressed as a
percentage. A sample of 10 large companies provided the following
dividend yield data (The Wall Street Journal, January 16, 2004).

Company Yield Company Yield


% %
Altria Group 5.0 General Motors 3.7
American 0.8 JPMorgan 3.5
Express Chase
Caterpillar 1.8 McDonald’s 1.6
Eastman 1.9 United 1.5
Kodak Technology
ExxonMobil 2.5 Wal-Mart 0.7
Stores

a.What are the mean and median dividend yields?


Ans Mean= (5+0.8+1.8+1.9+2.5+3.7+3.5+1.6+1.5+0.7)/10=2.3
Mean=2.3
Median=(1.8++1.9)/2 =1.85
Median=1.85
b.What are the variance and standard deviation?
Ans Variance=((5-2.3)^2+(0.8-2.3)^2+(1.8-2.3)^2+(1.9-2.3)^2+(2.5-
2.3)^2+(3.7-2.3)^2+(3.5-2.3)^2+(1.6-2.3)^2+(1.5-2.3)^2+(0.7-2.3)^2)/(10-1)
Variance=1.9
Standard Deviation=(variance)^(1/2)
=(1.9)^(1/2)
=1.3776
Standard Deviation=1.3776
c. Which company provides the highest dividend yield?
Ans The company Altria Group provides the highest dividend Yield of 5.0%
d.What is the z-score for McDonald’s? Interpret this z-
score. Ans The standardized score :
z-score= (x-mean)/standard deviation
z-score=(1.6-2.3)-1.3776= -0.51
McDonald’s dividend Yield is 0.51 standard deviations below the mean
dividend yield
e. What is the z-score for General Motors? Interpret this z-score.

Ans The standardized score :


z-score= (x-mean)/standard deviation
z-score=(3.7-2.3)-1.3776= 1.02
General Motors dividend Yield is 1.02 standard deviations above the
mean dividend yield.

f. Based on z-scores, do the data contain any


outliers? Ans. The standardized score :
z-score= (x-mean)/standard deviation
z-score=(5.0-2.3)-1.3776= 1.96
z=(0.7-2.3)/1.3776= -1.16
The values are considered outliers if the z scores is less than -2 and greater
than 2 .Since maximum and minimum both have a z-score between -2 and
2 ,thus the data contains no outliers.

4. Let A and B be two events such that P(A or B) = .60, P(A and B) = .10, P(A|B) = .
25, P(B|A) =
.333, and P(A) = .70. What is the probability of event A not occurring?

Ans Probability of event A not occurring = 1-P(A)=0.30

5. The probability distribution for the random variable x follows.


x f(x)
20 .20
25 .15
30 .25
35 .40
a.Is this probability distribution valid? Explain.
b.What is the probability that x # 30?
c. What is the probability that x is less than or equal to 25?
d.What is the probability that x is greater than 30?

Ans a. All probabilities are between 0 and 1 including Then we will need to
check if the sum of all probabilities is 1
020 +0.15+0.25 +0.40=1
Since the sum of probabilities und will probabilities are between 0 and 1
including,the probability distribution is valid.
b. The probability in given in the table:
P(x= 30) = 0.25
c. Add the corresponding probability
P(x<= 25 ) = P(x=20)+P(x= 25)= 0.20 +0.15=0.35
e. Add the corresponding probability:
P(x>30)=P(x=35)=0.40

6. Military radar and missile detection systems are designed to warn a


country of an enemy attack. A reliability question is whether a
detection system will be able to identify an attack and issue a warning.
Assume that a particular detection system has a .90 probability of
detecting a missile attack. Use the binomial probability distribution to
answer the following questions.

a.What is the probability that a single detection system will detect an


attack?
b.If two detection systems are installed in the same area and operate
independently, what is the probability that at least one of the systems
will detect the attack?
c. If three systems are installed, what is the probability that at least one
of the systems will detect the attack?
d.Would you recommend that multiple detection systems be used?
Explain.
Ans.
Ans d. It is recommendable that multiple detection systems are used,
because when three systems are used, then it is nearly impossible to
not detect the attack.

7. A simple random sample of 40 items resulted in a sample mean of 25.


The population standard deviation is σ =5.

a.What is the standard error of the mean?


b.At 95% confidence, what is the margin of error?
8. A simple random sample of 60 items resulted in a sample mean of 80.
The population
standard deviation is σ # 15.

a.Compute the 95% confidence interval for the population mean.


b.Assume that the same sample mean was obtained from a sample of
120 items. Provide a 95% confidence interval for the population
mean.
Ans a

Ans b
N=120
For confidence level 1-alpha=0.95,determine z(a/2)=z0.025 using the
table
Z(a/2)=1.96
The boundaries of confidence level=
80-1.96*(15/(120)^(1/2))=77.3162
And
80+1.96*(15/(120)^(1/2))=82.6838
c. What is the effect of a larger sample size on the interval estimate?
A larger sample size will decrease the length of the interval, because the point
estimate is more reliable.

You might also like