LP Stats
LP Stats
FMTH0301/Rev.5.3
Course Plan
i. Identify and use statistical methods and models appropriate for specific types of study designs, data
ii. Apply the fundamental concepts of probability and probability distributions to solve problems related
engineering field.
iii. Derive sampling distribution, obtain the estimates of the parameters and test claims about the
iv. Apply regression and correlation techniques to build empirical models to data and asses model
adequacy.
Eg: 1.2.3: Represents program outcome ‘1’, competency ‘2’ and performance indicator ‘3’.
Course Content
Content Hrs
Unit – 1
R-tutorial: Linear Regression with ANOVA approach, Multiple Regression with ANOVA 4 hours
approach
Text Books
1. J. Susan Milton, Jesse C. Arnold, Introduction to Probability and Statistics: Principles and
Applications for Engineering and the Computing Sciences, 4 th Ed, TATA McGraw-Hill Edition
2007.
2. Kishor S Trivedi, probability and statistics with reliability queuing and computer science
applications, 1ed, PHI, 2000.
Reference Books:
1. Gupta S C and Kapoor V K, Fundamentals of Mathematical Statistics, 1ed, Sultan Chand &
Sons, New Delhi, 2000.
2. Jiawei Han, Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann
Publishers, 2005.
3. Sheldon M.Ross , Introduction to Probability and Statistics for Engineers and Scientists
Evaluation Scheme
ISA Scheme
Assessment Weightage in Marks
ISA - 1 20
ISA - 2 20
Total 50
Unit-1
2. Probability 6 1.50 -- 2
Unit-2
Unit-3
6. Statistical Inference II 5 -- -- -- 1
Note
1. Each Question carries 20 marks and may consists of sub-questions.
2. Mixing of sub-questions from different chapters within a unit (only for Unit I and Unit II) is allowed in
ISA-I, ISA-II and ESA.
3. Answer 5 full questions of 20 marks each (two full questions from Unit I, II and one full question from
Unit III) out of 8 questions in ESA.
Chapter-wise Plan
Learning Outcomes:
At the end of the topic the student should be able to:
TLO's CO's BL CA
Code
8. Compare two or more data sets by analyzing their characteristics CO1 L2 1.1
Lesson Schedule
Class No. - Portion covered per hour
Review Questions
Sr.No Questions PI
TLO BL
Code
1 List and describe different types of variables used in collecting data. TLO1 L2 1.1.3
72 84 61 76 104 76 86 92 80 88
98 76 97 82 84 67 70 81 82 89
74 73 86 81 85 78 82 80 91 83 TLO5 L3 1.1.3
(a) Construct a relative frequency histogram for the data, using eight
classes.
(b) If you put $ 9000 in the ATM each day, what percent of the days in
a month should you expect to run out of cash? Explain your reasoning.
(c) If you are willing to run out of cash for 10% of the days, how much
cash, in hundreds of dollars, should you put in the ATM each day?
Explain your reasoning.
14
The following scores represent the final examination grades for an
elementary statistics course:
23 60 79 32 57 74 52 70 82 36 80 77 81 95 41
65 92 85 55 76 52 10 64 75 78 25 80 98 81 67
41 71 83 54 64 72 88 62 74 43 60 78 89 76 84
48 84 90 15 79 34 67 17 82 69 74 63 80 85 61
TLO6 L3 1.1.3
a) Find the median, mode, lower and upper quartile of this data.
b) What fraction of the class received scores which were less than 65?
c) Make a frequency table, starting the first class interval at a lower
class boundary of 9.5. Use Sturges’ Rule.
d) Draw a frequency histogram, a relative frequency histogram and a
cumulative frequency diagram.
e) Show a box plot of these data.
15 Nicotine content in milligrams, from 40 cigarettes of certain brand were
recorded as follows:
1.09, 1.92, 2.31, 1.79, 2.28, 1.74, 1.47, 1.97, 0.85, 1.24, 1.58, 2.03,
1.70, 2.17, 2.55, 2.11, 1.86, 1.90, 1.68, 1.51, 1.64, 0.72, 1.69, 1.85, TLO6 L3 1.1.3
1.82, 1.79, 2.46, 1.88, 2.08, 1.67, 1.37, 1.93, 1.40, 1.64, 2.09, 1.75,
1.63, 2.37, 1.75, 1.69
Construct Box plot and analyze it.
16 Two catalysts are being analyzed to determine how they affect the
average performance of a chemical processes. Here are the data.
Catalyst 1: 91.5 92.18 95.39 91.79 89.07 94.72 89.21 TLO8 L3 1.1.3
Catalyst 2: 89.18 90.95 93.21 97.19 97.04 91.07 92.75
Analyze the data by drawing box plot.
17 Consider the following data. They represent measures concentration
of arsenic in drinking water to 10 communities around Phoenix and 10
rural communities in Arizona. Discuss the variances of concentration
of arsenic in both the populations by drawing box plot. TLO8 L3 1.1.3
Phoenix 3 7 25 10 15 6 1 25 15 7
Arizona 48 44 40 38 33 21 20 12 1 18
98, 84, 97, 93, 88, 57, 100, 63, 83, 97, 93, 52, 74, 83, 63, 88,
84,100, 84, 78, 83, 68, 84, 47, 86, 81, 54, 99, 91, 49, 80, 81, 89,
93, 90, 57, 94, 83 78, 29, 64, 74, , 72, 89, 67, 89, 70
Write a summary paragraph describing and comparing the distribution
of final exam scores for the two groups of students.
Learning Outcomes:
At the end of the topic the student should be able to:
1. Identify different types of events. And distinguish different events. CO2 L1 1.1
2. Explain different approaches to probability like mathematical (classical) & CO2 L2 1.1
axiomatic probability.
Lesson Schedule
2. Addition rule and (for two and three events without proof). Multiplication rule.
5. Numerical illustrations.
Sr PI
Questions TLO BL
No Code
2 Explain the basic terminologies associated with probability theory. TLO2 L2 1.1.3
3 List different types of events associated with probability theory. TLO2 L1 1.1.3
4 Explain mutual exclusive and independent events with suitable TLO2 L2 1.1.3
examples.
5 An electronic control panel has 3 toggle switches labeled I, II and III
each of which can be either ON (O) or OFF (F).
(i) Construct a tree to represent the possible configuration for these
three switches.
(ii) List the elements of the sample space generated by the tree.
(iii) List the sample points that constitute the events:
A: at least one switch is ON.; B: switch I is ON TLO3 L3 1.1.3
C: no switch is ON; D: Four switches are ON
(iv) Are the events A and B mutually exclusive? Are events A and C
mutually exclusive? Are events A and D mutually exclusive?
(v) What is the name given to an event such as D?
(vi) If at any given time each switch is just as likely to be ON as OFF,
what is the probability that no switch is ON?
Learning Outcomes:
At the end of the topic the student should be able to:
TLO's CO's BL CA Code
1.List different types of random variables and classify them ac discrete CO2 L1 1.1
2. Explain and illustrate the concept of a random variable and its probability CO2 L2 1.1
distribution, cdf along with graphical representation
3.Find mean and variance of discrete and continuous probability CO2 L2 1.1
distributions
4. Apply appropriate probability distributions to calculate probabilities in CO2 L3 1.1
specific applications.
5. Calculate marginal and conditional probability distribution from joint CO2 L3 1.1
probability distributions.
Lesson Schedule
Review Questions
Sr Questions TLO BL PI
No Code
1 List example of random variables and classify them into discrete and TLO1 L1 1.1.3
continuous random variables.
2 Five defective μP-chips are accidentally mixed with twenty good ones. It
is not possible to look at a chip and tell whether or not it is defective. Find
TLO2 L2 1.1.3
the probability distribution of number of defective μP-chips, if four μP-chips
are drawn at random from this lot. Graphically represent the probability
function and cumulative distribution function.
3 Consider the random variable that represents the number of heads
obtained on tossing five fair coins. The probability of obtaining heads on
TLO2 L3 1.1.3
any one coin is 1/2. The probability function and cumulative distribution are
given by the Binomial distribution. Tabulate probability distribution and
cumulative distribution. Graph the results.
4 A random variable X has the following probability distribution:
x : -2 -1 0 1 2 3
TLO3 L3 1.1.3
P(x): 0.1 K 0.2 2k 0.3 k
Find the (i) value of k (ii) mean and variance (iii) P(-1 < x ≤ 2) (iv) Express
density function and c.d.f graphically.
5 A company is considering drilling four oil wells. The probability of success
for each well is 0.40, independent of the results for any other well. The cost
of each well is $200,000. Each well that is successful will be worth
$600,000.
a) What is the probability that one or more wells will be successful? TLO4 L3 1.1.3
b) What is the expected number of successes?
c) What is the expected gain?
d) What will be the gain if only one well is successful?
e) Considering all possible results, what is the probability of a loss
rather than a gain?
f) What is the standard deviation of the number of successes?
6 In the precision bombing attack there is a 50% chance that any one bomb
TLO4 L3 1.1.3
will strike the target. Two direct hits are required to destroy the target
E(Y) and E(XY) (iii) Determine the coefficient of correlation between X and
Y, interpret your answer (iv) Is two random variables independent?
Learning Outcomes:
At the end of the topic the student should be able to:
TLO's CO's BL CA Code
6.Test the hypothesis for proportion or means(single and difference). CO3 L3 1.1
Lesson Schedule
Class No. – Portion covered per hour
4. Numerical exercises.
7. Test hypotheses on the mean of a normal distribution using either a Z-test or a t-test procedure
Review Questions
Sr. PI
Questions TLO BL
No Code
3 Identify the situation when you use cluster sampling and stratified sampling. 1.1.3
TLO3 L2
Explain with examples.
4 A population contains 3 units 5, 4 and 7. Obtain the sampling distribution of
sample mean when sample of size 2 is drawn (i) with replacement (ii) TLO2 L2 1.1.3
without replacement.
5 Explain hypothesis, null and alternative hypothesis, two types of errors, TLO4 L2 1.1.3
level of significance, confidence interval, critical region
6 A survey is proposed to be conducted to know the annual earnings of the
old Engineering graduates of Delhi University. How large should the sample
be taken in order to estimate the mean monthly earnings within plus and TLO5 L3 1.1.3
minus Rs.10,000 at 95% confidence level? The standard deviation of the
annual earnings of the entire population is known to be 30,000.
7 An astronomer wants to measure the distance from her observatory to a
distant star. However, due to atmospheric disturbances, any measurement
will not yield the exact distance 𝑑. As a result, the astronomer has decided
to make a series of measurements and then use their average value as an
estimate of the actual distance. If the astronomer believes that the values TLO5 L3 1.1.3
of the successive measurements are independent random variables with a
mean of 𝑑 light year and a standard deviation of 2 light years, how many
measurements need she make to be at least 95% certain that her estimate
is accurate to within ± 0.5 light years?
8 The point estimate for proportion of 16-kbit dynamic RAMs that function
correctly for at least 1000 hours based on a sample of size 100 is 0.91. TLO5 L3 1.1.3
Construct 95% and 99% confidence interval for population proportion.
A random sample of 30 ball bearing produced by a company have a mean
diameter of 0.5060 cm with s.d 0.004 cm. Find the maximum error estimate
E and 98% confidence interval for the actual mean diameter of a ball TLO5 L3 1.1.3
bearings produced by this company assuming sampling from normal
population.
9 A random sample of 500 apples was taken from a large consignment and
TLO5 L3 1.1.3
60 were found to be bad. Obtain 96% and 99% confidence limits for the
percentage of bad apples in the consignment.
10 The mean and standard deviation of marks scored by a sample of 100
1.1.3
students are 67.45 and 2.92. Find (i) 95% and 97% confidence intervals for
estimating the mean marks of the student population.
11 Ten specimens of copper wires drawn from a large lot have the following
breaking strength (in Kg. weight) 578, 572, 570, 568, 572, 571, 570, 572, TLO5 L3 1.1.3
596, 548. Find 99% confidence limits for the mean.
12 Independent random samples of size 375 are selected from the population
of Canadian business and from the population of business in U.S. The point
TLO5 L3 1.1.3
estimate for the difference in the proportion of businesses in Canada and
the proportion of businesses in the U.S with on-site mainframe computers is
𝑝1 − 𝑝2 = 0.589 − .619 = −0.03. Construct 95% C.I for this difference.
13 A random sample of 500 workers engaged in R&D last year is selected. Of
these 178 earn over $72,000 per year. Of the 450 workers in R&D studied TLO5 L3 1.1.3
during the current year 200 earn in excess of $72,000 per year. (i) Let
𝑝1 and 𝑝2 denote the proportion of workers engaged in R&D who earned over
$72,000 per year last year and this year, respectively. Find point estimates
for 𝑝1 , 𝑝2 and 𝑝1 − 𝑝2 . (ii) Construct 95% C.I for 𝑝1 − 𝑝2 .
15 It is generally assumed that men are taller than women, but we would like to
test at 0.01 L.O.S this, so we conduct a survey of 8000 individuals, and a
summary of the heights of the males and females who participated in the TLO5 L3 1.1.3
survey (in inches) is given below: Male Female Sample Size 1600, 6400
Mean 172, 170 Standard Deviation 6.3, 6.4 respectively.
18 It is generally assumed that men are taller than women, but we would like to
test at 0.01 L.O.S this, so we conduct a survey of 8000 individuals, and a
TLO5 L3 1.1.3
summary of the heights of the males and females who participated in the
survey (in inches) is given below: Male Female Sample Size 1600, 6400
Mean 172, 170 Standard Deviation 6.3, 6.4 respectively.
19 A study of TV viewers was conducted to find the opinion about the mega
serial ‘Ramayana”. If 56% of sample of 300 viewer from south and 48% of
TLO5 L3 1.1.3
200 viewer from north preferred the serial, test
(a) There is a difference of opinion between south and north using p-value
(b) Ramayana is preferred in south using p-value
20 Ten specimens of copper wires drawn from a large lot have the following
breaking strength (in Kg. weight) 578, 572, 570, 568, 572, 571, 570, 572, TLO6 L3 1.1.3
596, 548. Test whether the mean breaking strength of the lot may be taken
be 578kg weight?
21 In 1950 in India the mean life expectancy was 50 years. If the life
expectancies from a random sample of 11 persons are 58.2, 56.6, 54.2, TLO6 L3 1.1.3
50.4, 44.2, 61.9, 57.5, 53.4, 49.7, 55.4, 57, does it confirm the expected view
at 5% LOS.
22 A builder claims that heat pumps are installed in 70% of all homes being
constructed today in the city of Richmond. Would you agree with this claim
TLO6 L3 1.1.3
if a random survey of new homes in this city shows that 8 out of 15 had heat
pumps installed? Use a 0.01 level of significance.
same examination was found to be 300. Find out whether the proportion of
failures in the university teaching department is significantly greater than the
proportion of failures in the affiliated colleges.
24 Many consumers think that automobiles built on Mondays are more likely to
have serious defects than those built on any other day of the week. To
support this theory a random sample of 100 cars built on Monday is selected
TLO6 L3 1.1.3
and inspected. Of these 8 are found to have serious defects. A random
sample of 200 cars produced on other days reveals 12 with serious defects.
Do these data support the stated connection?
Learning Outcomes:
5. Determine the relationship between two and more than two variables CO4 L3 2.3
using regression technique.
6. Interpret the relationship between two variables using angle between CO4 L2 1.1
the two regression lines.
Lesson Schedule
Review Questions
Sr. Questions PI
No TLO BL
Code
1 Discuss different types of relationship between two variables
TLO1 L2 1.1.3
2 What you mean by correlation and regression? Are they similar? Discuss.
TLO1 L2 1.1.3
3 How do you measure association between two variables? Illustrate with
TLO2 L2 1.1.3
suitable examples.
4 A survey was conducted to study the relationship between sales and
advertising expenditure. Estimate (i) the sales for advertising expenditure of
Rs.90 lakhs (ii) the advertising expenditure for sales target of Rs.25 crore
TLO3 L3 2.3.1
(iii) their correlation.
Sales(Rs. Crore) : 10 11 13 15 16 19 14
Adv. Exp (Rs Lakh): 60 62 65 70 73 75 71
6 Find the coefficient of correlation between industrial production and export
using the following data and comment on the result.
TLO3 L3 1.1.3
Production 55 56 58 59 60 60 62
Exports 35 38 38 39 44 43 45
7 1−𝑟 2 𝜎𝑥 𝜎𝑦
Use 𝑡𝑎𝑛𝜃 = ( )( )to interpret the relation between the two variables
𝑟 𝜎𝑥2 +𝜎𝑦2 TLO2 L2 1.1.3
when r = 0, r = 1 and r = -1.
8 Of two personnel evaluation techniques available, the first requires a two-
hour test interview while the second can be completed in less than an hour.
The scores for each of the 15 individuals who took both tests are given in
the below table:
Applicant 1 2 3 4 5 6 7 8 9 10 11 12
TLO4 L3 2.3.1
Test1 75 89 60 71 92 105 55 87 73 77 84 91
Test2 38 56 35 45 59 70 31 52 48 41 51 58
(i) Construct a scatter plot for the data. Does the assumption of linearity
appear to be reasonable? (ii) Use the regression line to predict the score
on the second test for an applicant who scored 85 on test1.
9 The following data relate to radio advertising expenditure, newspaper
advertising expenditure and sales. Fit a regression 𝑦 = 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2 .
Calculate the coefficient of multiple determination.. Does the model explain
the variation in y, compare the y-actual and y-predicted values.
TLO4 L3 2.3.1
Radio ad. Exp.(‘000Rs) (x1) 4 7 9 12
Newspaper ad. Exp.(‘000Rs) (x2) 1 2 5 8
Sales (Rs Lakhs) (y) 7 12 17 20
10 In order to study the relationship of advertising and capital investment with
corporate profits on the following data, recorded in unit of $100,000 were
collected for 10 medium sized firms in the same year. The variable y
represents profit for the year, x1 represents capital investment, and x2
represents advertising expenditure.
y 15 16 2 3 12 1 16 18 13 2 TLO4 L3 2.3.1
x1 25 1 6 30 29 20 12 15 6 16
x2 4 5 3 1 2 0 4 5 4 2
Using the model 𝑦 = 𝑎0 + 𝑎1 𝑥1 + 𝑎2 𝑥2 find the least square prediction
equation for data. Calculate the coefficient of determination. What
percentage of overall variation in explained by the model?
1. Use Chi-square test for independence of attributes and goodness of fit. CO5 L3 1.1
Lesson Schedule
Class No. - Portion covered per hour
3. Numerical illustrations.
5. Numerical illustrations.
Review Questions
Sr. PI
Questions TLO BL
No Code
1 Number of jobs related injuries in an aircraft was being observed of
Government of India. The values of last 100 months were as follows:
3 In 250 digits from the lottery numbers, the frequency of the digits 0, 1, 2, 9
were 23, 25, 20, 23, 23, 22, 29, 25, 33 and 27. Test the hypothesis that they TLO1 L3 1.1.3
were randomly drawn.
4 A computer system has six I/O channels and the system personnel are TLO1 L3 1.1.3
reasonably certain that the load on the channels is balanced. If X is the
Frequency 39 23 12 1
(a) Based on these 75 observations, is a binomial distribution an
appropriate model? Perform a goodness-of-fit with α = 0.05.
PI
Q.No Questions Marks CO BL
Code
Out of 20 engineers working on a project, five are post graduates. If three
1a of them are selected at random, What is the probability that (i) they are 06 2 L3 1.1.3
all graduates? (ii) at least one is a post graduate?
Does this data set come from the normal distribution? Discuss.
1b 25, 25, 27.7, 25.9, 25.9, 21.7, 22.8, 28.9, 26.4, 22.4 07 1 L3 1.1.3
Sample Grand total scores for eight female and male candidates are
listed.
Female scores 1226 965 841 1053 1056 1393 1312 1222
1c 07 1 L3 1.1.3
Male scores 1059 1328 1175 1123 923 1017 1214 1042
Using an appropriate measure, determine which gender of candidates
has the most consistent level of scores. Justify your answer.
Explain the meaning of skewness using sketches of frequency curves.
2a State the different measures of skewness that are commonly used. How 04 1 L3 1.1.3
does skewness differ from dispersion?
Suppose that the covid test results have an accuracy of 95% and 40% of
the people have covid positive. If a patient test positive then what is the
2b probability that he actually has a disease? If a patient test negative then 08 2 L3 1.1.3
what is the probability that he does not have the disease?
Draw a tree diagram.
For the given data
22.5, 23.8, 23.2, 22.8, 10.1, 23.5, 24.0, 23.2, 24.2, 24.3, 23.3, 23.4,
2c 23.0, 23.5, 22.8 08 1 L3 1.1.3
(i) Construct a boxplot and identify outliers if any.
(ii) Discuss symmetry numerically and graphically.
Data set: Amount (in dollars) spent on books for a semester
91 472 279 249 530 376 188 341 266 199
142 273 189 130 489 266 248 101 375 486
190 398 188 269 43 30 127 354 84
(i)Construct a frequency histogram for the data.
3a (ii)How many percent of books cost more than Rs.450? Explain your 08 1 L3 1.1.3
reasoning.
(iii) Mention the number of books whose cost is between Rs.200 and
Rs.500 with the help of ogives.
Construct a decision induction tree for the data base: Triangle and
squares
Sl.no Attribute Shape
Color Outline Dot
1 Green Dashed No Triangle
2 Green Dashed Yes Triangle
3 yellow Dashed No Square
3b 4 Red Dashed No Square 12 2 L3 1.1.3
B PI
Q.No Questions Marks CO
L Code
1a An urn contains 3 red and 5 white balls. Three balls are drawn at random with
replacement. Obtain a bivariate distribution of X and Y, where X denotes 06 2 L2 1.1.3
number of red balls and Y denotes number of white balls.
b A game consists of tossing darts onto a large flat mat that has been divided into
450 blocks of 6 square inches each. In one session, 370 darts were thrown. We 07 2 L3 1.1.3
want the probability that one block was hit exactly twice or exactly 4 times.
c A study of the electromechanical protection devices used in electrical power
systems showed that of 193 devices that failed when tested, 75 were due to
mechanical parts failures. (i)Find 96% confidence interval for the proportion of 07 3 L3 1.1.3
failures that are due to mechanical parts failures.(ii) How large a sample is
required to estimate proportion to within 0.03 with 96% confidence?
2.a The average test marks in a particular class is 79. The standard deviation is 5. If
the marks are normally distributed, how many students in class of 200 did not 06 2 L3 1.1.3
receive marks between 75 and 82.
b 10 packets are chosen at random from a godown and their weights in kilogram
are found to be
15.75 15.75 16.0 16.25 16.5 17.25 17.25 17.5 17.5 17.75 07 3 L3 1.1.3
Discuss the suggestion that the mean weight in the universe is 16.25kg.use 5%
level of significance?
Note :Answer Five Questions: Any two full questions from each Unit I & Unit II and one full question from Unit III
PI
Q.No Unit-I Marks CO BL
Code
After applying filter to the e-mails, messages were classified as spam and
non-spam. The word ‘offer’ occurs in 70% of the spam messages and only
5% of the non-spam messages. Also 10% of the messages are spam. Find
the following probabilities
1a 06 1 L2 1.1.3
(i) both messages contain the word ‘offer’
(ii) neither message contain the word ‘offer’
(iii) message is span given that it contains the word ‘offer’.
(iv) message is not span given that it does not contain the word ‘offer’.
Does this data set come from the normal distribution? Discuss.
b 3.89 4.75 6.33 4.75 7.21 5.78 5.80 5.20 6.64 07 1 L3 1.1.3
Consider the ISA and ESA for a Applied statistics class. Suppose 23% of
students obtained an A grade in ISA. Of those students who earned an A
grade in ISA, 37% received an A grade in the ESA, and 12% of the
b 08 1 L3 1.1.3
students who obtained lower than an A in the ISA received an A in the ESA.
You randomly pick up a final exam and notice the student received an A.
What is the probability that this student obtained an A grade in the ISA?
A semi-commercial test plant produced the following daily outputs in
tonnes/ day:
1.3 2.5 1.8 1.4 3.2 1.9 1.3 4.0 1.1 1.7
c 08 1 L3 1.1.3
1.4 3.0 1.6 1.2 2.3 2.9 1.1 1.7 2.0 1.4
(i) Construct a boxplot and identify outliers if any.
(ii) Discuss symmetry numerically and graphically.
What Would You Do? You work in the admissions department for a college
and are asked to recommend the minimum SAT scores that the college will
accept for a position as a full-time student. Here are the SAT scores for a
sample of 50 applicants.
1325 1072 982 996 872 849 785 706 669 1049
885 1367 935 980 1188 869 1006 1127 979 1034
1052 1165 1359 667 1264 727 808 955 544 1202
3.a 12 2 L3 1.1.3
1051 1173 410 1148 1195 1141 1193 768 812 887
1211 1266 830 672 917 988 791 1035 688 700
(a) Construct a relative frequency histogram for the data using 10 classes.
(b) If you set the minimum score at 986, what percent of the applicants will
you be accepting? Explain your reasoning.
(c) If you want to accept the top 88% of the applicants, what should the
minimum score be? Explain your reasoning.
Refering the given data set.
Predict whether a person buys computer or not. The person’s age is less
than 30, income is medium, he is a student and has a fair credit rating.
example age income student Credit rating Buys computer
1 ≤ 30 high no fair no
2 ≤ 30 high no excellent no
3 31- 40 high no fair Yes
4 >40 medium no fair Yes
5 >40 low yes fair Yes
b 08 1 L3 1.1.3
6 >40 low yes excellent no
7 31- 40 low yes excellent Yes
8 ≤ 30 medium no fair No
9 ≤ 30 low yes fair Yes
10 >40 medium yes fair Yes
11 ≤ 30 medium yes excellent Yes
12 31- 40 medium no excellent Yes
13 31- 40 high yes fair Yes
B.P 147 125 160 118 149 128 150 145 115
Estimate the B.P when age is 45.
The following data relate to radio advertising expenditure, newspaper
advertising expenditure and sales. Fit a regression 𝑦 = 𝑎 + 𝑎1 𝑥1 + 𝑎2 𝑥2
Radio adv expenditure 4 7 9 12
c Newspaper adv expenditure 1 2 5 8 08 4 L3 1.1.3
Sales 7 12 17 20
Calculate the coefficient of determination. What percentage of overall
variation in explained by the model?
Genetic theory states that children having one parent of blood type M and
the other of blood type N will always be one of the three types M,, MN,N
and that the proportions of these types will have on an average be 1:2:1. A
8.a 05 3 L3 1.1.3
report states that out of 300 children having one M parent and one N
parent, 30% were found to be of type M, 45% of type MN and the
remainder of type N. Test the theory by chi-square test.
The following data is collected on two characters.
Cinegoers Non-cinegoers
Literate 83 57
b 05 3 L3 1.1.3
Illiterate 45 68
Based on this, can you conclude that there is no relation between the habit
of cinema going and literacy?
Number of jobs related injuries in an aircraft was being observed of
Government of India. The values of last 100 months were as follows:
Injuries per month 0 1 2 3 4 5 6
c 10 3 L3 1.1.3
Frequency of occurrence 35 40 13 6 4 1 1
Apply Chi-square test to these data to test the hypothesis that the above
lying distribution is Poisson. Use 5% level of significance.
Table: Significant values of Chi-square distribution (Right tail areas) for given
df 0.99 0.95 0.90 0.10 0.05 0.025 0.01
1 --- 0.004 0.016 2.706 3.841 5.024 6.635
2 0.020 0.103 0.211 4.605 5.991 7.378 9.210
3 0.115 0.352 0.584 6.251 7.815 9.348 11.345
4 0.297 0.711 1.064 7.779 9.488 11.143 13.277
5 0.554 1.145 1.610 9.236 11.070 12.833 15.086
6 0.872 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 2.167 2.833 12.017 14.067 16.013 18.475
8 1.646 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 3.325 4.168 14.684 16.919 19.023 21.666
10 2.558 3.940 4.865 15.987 18.307 20.483 23.209
11 3.053 4.575 5.578 17.275 19.675 21.920 24.725
12 3.571 5.226 6.304 18.549 21.026 23.337 26.217
13 4.107 5.892 7.042 19.812 22.362 24.736 27.688
14 4.660 6.571 7.790 21.064 23.685 26.119 29.141
15 5.229 7.261 8.547 22.307 24.996 27.488 30.578
16 5.812 7.962 9.312 23.542 26.296 28.845 32.000
17 6.408 8.672 10.085 24.769 27.587 30.191 33.409
18 7.015 9.390 10.865 25.989 28.869 31.526 34.805
19 7.633 10.117 11.651 27.204 30.144 32.852 36.191
20 8.260 10.851 12.443 28.412 31.410 34.170 37.566
21 8.897 11.591 13.240 29.615 32.671 35.479 38.932
22 9.542 12.338 14.041 30.813 33.924 36.781 40.289
23 10.196 13.091 14.848 32.007 35.172 38.076 41.638
24 10.856 13.848 15.659 33.196 36.415 39.364 42.980
25 11.524 14.611 16.473 34.382 37.652 40.646 44.314
26 12.198 15.379 17.292 35.563 38.885 41.923 45.642
27 12.879 16.151 18.114 36.741 40.113 43.195 46.963
28 13.565 16.928 18.939 37.916 41.337 44.461 48.278
29 14.256 17.708 19.768 39.087 42.557 45.722 49.588
30 14.953 18.493 20.599 40.256 43.773 46.979 50.892
40 22.164 26.509 29.051 51.805 55.758 59.342 63.691
50 29.707 34.764 37.689 63.167 67.505 71.420 76.154
60 37.485 43.188 46.459 74.397 79.082 83.298 88.379
70 45.442 51.739 55.329 85.527 90.531 95.023 100.425
80 53.540 60.391 64.278 96.578 101.879 106.629 112.329
90 61.754 69.126 73.291 107.565 113.145 118.136 124.116
100 70.065 77.929 82.358 118.498 124.342 129.561 135.807