Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
88 views

LP Stats

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

LP Stats

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

DEPARTMENT OF MATHEMATICS

FMTH0301/Rev.5.3
Course Plan

Semester: 4(CSE) Year: 2023-24


Course Title: Applied Statistics with R Course Code: 20EMAB209
Total Contact Hours: 60 Duration of ESA Hours:3
ESA Marks: 50 ISA Marks: 50
Lesson Plan Author: D. A. Patil, Sumedha .S.S, Jyoti. Matcha Date: 05-02-2024
Checked By: Dr. G. B. Marali Date: 05-02-2024

Prerequisites: This course requires basic knowledge of statistics and probability.

Course Outcomes (COs):


At the end of the course the student should be able to:

i. Identify and use statistical methods and models appropriate for specific types of study designs, data

and research objectives.

ii. Apply the fundamental concepts of probability and probability distributions to solve problems related

engineering field.

iii. Derive sampling distribution, obtain the estimates of the parameters and test claims about the

population parameters using results from sampling.

iv. Apply regression and correlation techniques to build empirical models to data and asses model

adequacy.

Powered by www.ioncudos.com Page 1 of 34.


DEPARTMENT OF MATHEMATICS

Course Articulation Matrix: Mapping of Course Outcomes (COs) with Program


Outcomes (POs)

Course Title: Applied Statistics with R Semester: 4 - Semester


Course Code: 20EMAB209 Year: 2023-24
Course Outcomes (COs) / Program 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Outcomes (POs)
1. Identify and use statistical H
methods and models appropriate
for specific types of study
designs, data and research
objectives.
2. Apply the fundamental concepts H
of probability and probability
distributions to solve problems
related engineering field.
3. Derive sampling distribution, H
obtain the estimates of the
parameters and test claims about
the population parameters using
results from sampling.
4. Apply regression and correlation H
techniques to build empirical
models to data and asses model
adequacy.

Degree of compliance L: Low M: Medium H: High

Competency addressed in the Course and corresponding Performance Indicators

Competency Performance Indicators


1.1 - Demonstrate competence in mathematics 1.1.3 Apply numerical analysis, linear algebra,
probability & queuing theory and statistics to solve
2.3 - Demonstrate an ability to formulate and problems.
interpret a model 2.3.1 Apply computer engineering principles to
formulate mathematical models that are
appropriate in terms of applicability and required
accuracy.

Eg: 1.2.3: Represents program outcome ‘1’, competency ‘2’ and performance indicator ‘3’.

Powered by www.ioncudos.com Page 2 of 34.


DEPARTMENT OF MATHEMATICS

Course Content

Course Code: 20EMAB209 Course Title: Applied Statistics with R


L-T-P-SS: 3-1-0-0 Credits: 4 Contact Hrs: 60
ISA Marks: 50 ESA Marks: 50 Total Marks: 100
Teaching Hrs: 40 Exam Duration: 3 hrs

Content Hrs
Unit – 1

Chapter 1: Description of data


Introduction: Data, Type of Variables, mean, weighted mean, median, mode, Quartiles, 8 hours
Variance, Coefficient of variation, skewness, Histogram, Box plots, Normal Quantitle
Qunatile plots.
6 hours
Chapter 2: Probability
Introduction: Definition, Interpretation of probability value, addition rule, multiplication rule,
Baye’s rule, Applications: Data Classification Methods - Decision Tree Induction, Bayesian
Classification.
R-tutorial: Introduction to Data handling, Description of data graphically, Histogram,
Skewness, Boxplot, QQ-norm, Decision tree.
8 hours
Unit – 2

Chapter 3: Random variables and Probability Distribution


Random variables, simple Examples, Discrete and continuous random variables. 8 hours
Theoretical distributions: Binomial, Poisson, Normal. Introduction to bivariate distribution,
joint probability distribution, marginal distribution, covariance.
8 hours
Chapter 4: Statistical Inference I
Introduction: Sampling, SRSWR, SRSWOR, Cluster Sampling, Stratified Sampling, Basic
terminologies of testing hypothesis, Confidence interval, Sample size determination,
Hypothesis test for proportions, means(single and differences), using P-value approach

R-tutorial: Probability distribution, Testing of Hypothesis for proportions, means (single


and differences) 8 hours
Unit – 3

Chapter 5: Correlation and Regression


5 hours
Meaning of correlation and regression, coefficient of correlation, Linear regression (ANOVA
approach), Multiple linear regression, Logistic Regression.

Chapter 6: Statistical Inference II 5 hours


Test for independence of attributes (m x n contingency table) Inference based on choice of
suitable test procedure(Goodness of fit)

R-tutorial: Linear Regression with ANOVA approach, Multiple Regression with ANOVA 4 hours
approach

Powered by www.ioncudos.com Page 3 of 34.


DEPARTMENT OF MATHEMATICS

Text Books
1. J. Susan Milton, Jesse C. Arnold, Introduction to Probability and Statistics: Principles and
Applications for Engineering and the Computing Sciences, 4 th Ed, TATA McGraw-Hill Edition
2007.

2. Kishor S Trivedi, probability and statistics with reliability queuing and computer science
applications, 1ed, PHI, 2000.

Reference Books:

1. Gupta S C and Kapoor V K, Fundamentals of Mathematical Statistics, 1ed, Sultan Chand &
Sons, New Delhi, 2000.
2. Jiawei Han, Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann
Publishers, 2005.
3. Sheldon M.Ross , Introduction to Probability and Statistics for Engineers and Scientists

Evaluation Scheme

ISA Scheme
Assessment Weightage in Marks
ISA - 1 20

ISA - 2 20

Tutorial activity using R-program 10

Total 50

Powered by www.ioncudos.com Page 4 of 34.


DEPARTMENT OF MATHEMATICS

Course Unitization for ISA and ESA

No. of No. of No. of No. of Questions


Topics / Chapters Teaching Questions Questions Questions in ESA
hours in ISA-1 ISA-2 in Tutorial

Unit-1

1.Description of data 8 1.50 -- 2 3

2. Probability 6 1.50 -- 2

Unit-2

3. Random variables and 8 -- 1.50 2 3


Probability Distribution

4. Statistical Inference I 8 -- 1.50 2

Unit-3

5.Correlation and Regression 5 -- -- 2 1

6. Statistical Inference II 5 -- -- -- 1

Note
1. Each Question carries 20 marks and may consists of sub-questions.
2. Mixing of sub-questions from different chapters within a unit (only for Unit I and Unit II) is allowed in
ISA-I, ISA-II and ESA.
3. Answer 5 full questions of 20 marks each (two full questions from Unit I, II and one full question from
Unit III) out of 8 questions in ESA.

Powered by www.ioncudos.com Page 5 of 34.


DEPARTMENT OF MATHEMATICS

Course Assessment Plan

Course Title: Applied Statistics with R Code: 20EMAB209

Weightage in Assessment Methods


Course outcomes (COs)
assessment
ISA-1 ISA-2 R-tutorial-Test ESA

1. Identify and use statistical methods and


20%   
models appropriate for specific types of study
designs, data and research objectives.
2. Apply the fundamental concepts of
35%   
probability and probability distributions to solve
problems related engineering field.
3. Derive sampling distribution, obtain the
estimates of the parameters and test claims 35%   
about the population parameters using results
from sampling.
4. Apply regression and correlation techniques 
10% 
to build empirical models to data and asses
model adequacy.
Weightage
20% 20% 10% 50%

Date: 05-02-2024 Head of Department

Powered by www.ioncudos.com Page 6 of 34.


DEPARTMENT OF MATHEMATICS

Chapter-wise Plan

Course Code and Title: 20EMAB209 / Applied Statistics with R


Chapter Number and Title: 1. Description of data Planned Hours: 8hrs

Learning Outcomes:
At the end of the topic the student should be able to:
TLO's CO's BL CA
Code

1. List Measures of Central tendency, Measures of Dispersion. CO1 L1 1.1

2. Recall Formulae for computing various statistical measures. CO1 L1 1.1

3. Tabulate a given data. CO1 L2 1.1

4. Distinguish between central tendency and dispersion CO1 L2 1.1

5. Describe dispersion and significance of measuring it CO1 L2 1.1

6. Choose suitable measure to describe the data CO1 L1 1.1

7. Compute Various types of averages, measures of dispersion CO1 L3 1.1

8. Compare two or more data sets by analyzing their characteristics CO1 L2 1.1

Lesson Schedule
Class No. - Portion covered per hour

1. Introduction to data, type of variables and scales of measurement.


2. Tabulation of data.
3. Construction of histogram, frequency curve.
4. Measures of central tendency: Arithmetic mean, median, mode.
5. Examples on central tendency.
6. Quartiles, Variance, Coefficient of variation, skewness.

7. Construction of Box plots.


8. Construction of normal Quantile-Quantile plots.

Review Questions

Sr.No Questions PI
TLO BL
Code
1 List and describe different types of variables used in collecting data. TLO1 L2 1.1.3

Powered by www.ioncudos.com Page 7 of 34.


DEPARTMENT OF MATHEMATICS

2 List different types of representing data. TLO1 L1 1.1.3


3 Why we need measure of central tendency and dispersion? Discuss
TLO2 L3 1.1.3
with suitable examples.
4 Draft a blank table to show the population of a town according to
1) Sex: men, women TLO3 L3 1.1.3
2) Religion: Hindu, Muslim, Christian
3) Wages: below Rs.5000, Rs.5000-10000, Rs.10000 & above
5 In a sample study regarding smoking habit in a town, the following
data were obtained: Men population = 58%; Smokers = 22% TLO3 L3 1.1.3
Men smokers =18%; Tabulate the above data.
6 According to 1972 censes the population of Punjab was 37508
thousands of which 19942 thousands male, during the same census
the population of Baluchistan was 2405 thousands of which 1272
thousands were male. During 1961 census the population of Punjab TLO3 L3 1.1.3
was 25581 thousands of which 13643 thousands were male. During
the same census the population of Baluchistan was 1161 thousands
of which 640 were male. Arrange the above information in a tabular
form.
7 The total number of accidents in Southern Railway in 1960 was 3, 500,
and it decreased by 300 in 1961 and by 700 in 1962. The total number
of accidents in metre gauge section showed a progressive increase
from 1960 to 1962. It was 245 in 1960, 346 in 1961; and 428 in 1962.
In the metre gauge section, “not compensated” cases were 49 in 1960, TLO3 L3 1.1.3
77 in 1961, and 108 in 1962. “Compensated” cases in the broad gauge
section were 2, 867, 2, 587 and 2, 152 in these three years
respectively. From the above report, you are required to prepare a neat
table as per the rules of tabulation.
8 Identify the situations where we use mean, median and mode. TLO4 L2 1.1.3
9 Describe how skewness is used to characterize data. TLO4 L2 1.1.3
10 The following are the grades of 50 students in Statistics class.
75 89 66 52 90 68 83 94 77 60
38 47 87 65 97 49 65 70 73 81
85 77 83 56 63 79 69 82 84 70
62 75 29 88 74 37 81 76 74 63 TLO5 L3 1.1.3
69 73 91 87 76 58 63 60 71 82
(a)For the above data construct (i) Frequency distribution
(ii) Relative frequency distribution (iii) Cumulative frequency
distribution (iv) Histogram (vi) Frequency curve.
(b) Describe this data relative to symmetry and skewness
11 The following table shows age distribution of cases of certain disease
reported during a year in a particular state.
Age 5-14 15-24 25-34 35-44 45-54 55-64 TLO7 L3 1.1.3
No. of Cases 5 10 120 22 13 5
For the above data

Powered by www.ioncudos.com Page 8 of 34.


DEPARTMENT OF MATHEMATICS

(i) Construct histogram, ogive curves


(ii) Compute mean, median, mode, variance, Q.D
(iii) Is the distribution symmetric? Discuss.
12 The following sample data set lists the number of minutes 50 Internet
subscribers spent on the Internet during their most recent session.
Construct a frequency distribution that has seven classes.
50 40 41 17 11 7 22 44 28 21 19 23 37 51 54 42 88
41 78 56 72 56 17 7 69 30 80 56 29 33 46 31 39 20
18 29 34 59 73 77 36 39 30 62 54 67 39 31 53 44 TLO5 L3 1.1.3

Draw an ogive for the frequency distribution. Estimate how many


subscribers spent 60 minutes or less online during their last session.
Also, use the graph to estimate when the greatest increase in usage
occurs.
13 What Would You Do? You work at a bank and are asked to
recommend the amount of cash to put in an ATM each day. You don’t
want to put in too much (security) or too little (customer irritation). Here
are the daily withdrawals (in 100s of dollars) for a period of 30 days.

72 84 61 76 104 76 86 92 80 88
98 76 97 82 84 67 70 81 82 89
74 73 86 81 85 78 82 80 91 83 TLO5 L3 1.1.3

(a) Construct a relative frequency histogram for the data, using eight
classes.
(b) If you put $ 9000 in the ATM each day, what percent of the days in
a month should you expect to run out of cash? Explain your reasoning.
(c) If you are willing to run out of cash for 10% of the days, how much
cash, in hundreds of dollars, should you put in the ATM each day?
Explain your reasoning.
14
The following scores represent the final examination grades for an
elementary statistics course:
23 60 79 32 57 74 52 70 82 36 80 77 81 95 41
65 92 85 55 76 52 10 64 75 78 25 80 98 81 67
41 71 83 54 64 72 88 62 74 43 60 78 89 76 84
48 84 90 15 79 34 67 17 82 69 74 63 80 85 61
TLO6 L3 1.1.3
a) Find the median, mode, lower and upper quartile of this data.
b) What fraction of the class received scores which were less than 65?
c) Make a frequency table, starting the first class interval at a lower
class boundary of 9.5. Use Sturges’ Rule.
d) Draw a frequency histogram, a relative frequency histogram and a
cumulative frequency diagram.
e) Show a box plot of these data.
15 Nicotine content in milligrams, from 40 cigarettes of certain brand were
recorded as follows:
1.09, 1.92, 2.31, 1.79, 2.28, 1.74, 1.47, 1.97, 0.85, 1.24, 1.58, 2.03,
1.70, 2.17, 2.55, 2.11, 1.86, 1.90, 1.68, 1.51, 1.64, 0.72, 1.69, 1.85, TLO6 L3 1.1.3
1.82, 1.79, 2.46, 1.88, 2.08, 1.67, 1.37, 1.93, 1.40, 1.64, 2.09, 1.75,
1.63, 2.37, 1.75, 1.69
Construct Box plot and analyze it.

Powered by www.ioncudos.com Page 9 of 34.


DEPARTMENT OF MATHEMATICS

16 Two catalysts are being analyzed to determine how they affect the
average performance of a chemical processes. Here are the data.
Catalyst 1: 91.5 92.18 95.39 91.79 89.07 94.72 89.21 TLO8 L3 1.1.3
Catalyst 2: 89.18 90.95 93.21 97.19 97.04 91.07 92.75
Analyze the data by drawing box plot.
17 Consider the following data. They represent measures concentration
of arsenic in drinking water to 10 communities around Phoenix and 10
rural communities in Arizona. Discuss the variances of concentration
of arsenic in both the populations by drawing box plot. TLO8 L3 1.1.3
Phoenix 3 7 25 10 15 6 1 25 15 7
Arizona 48 44 40 38 33 21 20 12 1 18

18 Does the following sample come from a normally distributed


population? TLO8 L3 1.1.3
3.89 4.75 6.33 4.75 7.21 5.78 5.80 5.20 6.64
19 Do the following values come from a normal distribution?
7.19 6.31 5.89 4.5 3.77 4.25 5.19 5.79 6.79 TLO8 L3 1.1.3

19 An informal experiment was conducted at McNair Academic High


School in Jersey city, New Jersey to investigate the use of laptop
computers as a learning tool in the study of algebra. A freshman class
of 20 students was given laptops to use at school and at home, while
another freshman class of 27 students was not given laptops; however
many of these students were able to use computers at home. The final
exam scores for the two classes are as follows:
TLO6 L3 1.1.3
Laptops No Laptops

98, 84, 97, 93, 88, 57, 100, 63, 83, 97, 93, 52, 74, 83, 63, 88,
84,100, 84, 78, 83, 68, 84, 47, 86, 81, 54, 99, 91, 49, 80, 81, 89,
93, 90, 57, 94, 83 78, 29, 64, 74, , 72, 89, 67, 89, 70
Write a summary paragraph describing and comparing the distribution
of final exam scores for the two groups of students.

Course Code and Title: 20EMAB209 / Applied statistics with R

Chapter Number and Title: 2. Probability Planned Hours: 6 hours

Learning Outcomes:
At the end of the topic the student should be able to:

TLO's CLO's BL CA Code

1. Identify different types of events. And distinguish different events. CO2 L1 1.1

2. Explain different approaches to probability like mathematical (classical) & CO2 L2 1.1

Powered by www.ioncudos.com Page 10 of 34.


DEPARTMENT OF MATHEMATICS

axiomatic probability.

3. Solve problems in various diversified situation using addition, multiplication


CO2 L3 1.1
and other important results on probability.

4. Apply Baye’s rule to solve problems involving prior and posterior


CO2 L3 1.1
probability.

5. Apply concept of probability to classify the data using Decision tree


CO2 L3 1.1
Induction and Bayesian classification.

Lesson Schedule

Class No. - Portion covered per hour

1. Introduction, Basic terminology, Classical and axiomatic approach of probability

2. Addition rule and (for two and three events without proof). Multiplication rule.

3. Baye’s rule(without proof) with application

4. Methods of classification of the data.

5. Numerical illustrations.

6. Examples on Bayesian classification


Review Questions

Sr PI
Questions TLO BL
No Code

1 List deterministic and stochastic experiments. TLO1 L1 1.1.3

2 Explain the basic terminologies associated with probability theory. TLO2 L2 1.1.3

3 List different types of events associated with probability theory. TLO2 L1 1.1.3

4 Explain mutual exclusive and independent events with suitable TLO2 L2 1.1.3
examples.
5 An electronic control panel has 3 toggle switches labeled I, II and III
each of which can be either ON (O) or OFF (F).
(i) Construct a tree to represent the possible configuration for these
three switches.
(ii) List the elements of the sample space generated by the tree.
(iii) List the sample points that constitute the events:
A: at least one switch is ON.; B: switch I is ON TLO3 L3 1.1.3
C: no switch is ON; D: Four switches are ON
(iv) Are the events A and B mutually exclusive? Are events A and C
mutually exclusive? Are events A and D mutually exclusive?
(v) What is the name given to an event such as D?
(vi) If at any given time each switch is just as likely to be ON as OFF,
what is the probability that no switch is ON?

Powered by www.ioncudos.com Page 11 of 34.


DEPARTMENT OF MATHEMATICS

6 If one card is drawn from a well-shuffled bridge deck of 52 playing cards


(13 of each suit), what is the probability that the card is a queen or a TLO3 L3 1.1.3
heart? Notice that a card can be both a queen and a heart. Then a queen
of hearts (or queen ∩ heart) overlaps the two categories.
7 A fair six-sided die is tossed twice. What is the probability that a five will TLO3 L3 1.1.3
occur at least once?
8 A box contains 20 DVDs, 4 of which are defective. If two DVDs are
TLO3 L3 1.1.3
selected at random (without replacement) from this box, what is the
probability that both are defective?
9 Computer system uses passwords that consist of five letters followed by
a single digit.
a) How many passwords are possible?
b) How many passwords consists of three A’s and two B’s, and
TLO3 L3 1.1.3
end in an even digit?
c) If you forget your password but remember that it has the
characteristics described part (b), what is the probability that
you will guess the password correctly on the first attempt?
10 The probability of conducting an examination on time is 95%. If there is
no delay in admission and 60% if there is a delay. If the probability there 1.1.3
TLO3 L3
will be a delay in admissions is 20%, find the probability of holding the
examination on time.
11 A binary communication channel carries data one of two types of signals
denoted by 0 and 1. Owing to noise, a transmitted 0 is sometimes
received as a 1 and a transmitted 1 is sometimes received as a 0. For a
given channel, assume a probability of 0.94 that a transmitted 0 is
correctly received as 0 and a probability of 0.91 that a transmitted 1 is
received as a 1. Further assume a probability of 0.45 of transmitting a 0. TLO4 L3 1.1.3
If a signal is sent, determine:
 Probability that a 1 is received.
 Probability that a 0 is received.
 Probability that a 1 was transmitted, given that a 1 was received.
 Probability that a 0 was transmitted, given that a 0 was received.
 Probability of an error.
12 It is known that of the articles produced by a factory, 20% come from
Machine A, 30% from Machine B, and 50% from Machine C. The
percentages of satisfactory articles among those produced are 95% for
A, 85% for B and 90% for C. An article is chosen at random. TLO4 L3 1.1.3
a) What is the probability that it is satisfactory?
b) Assuming that the article is satisfactory, what is the probability that it
was produced by Machine A?

13 Three road construction firms, X, Y and Z, bid for a certain contract.


From past experience, it is estimated that the probability that X will be
awarded the contract is 0.40, while for Y and Z the probabilities are 0.35 1.1.3
TLO4 L3
and 0.25. If X does receive the contract, the probability that the work will
be satisfactorily completed on time is 0.75. For Y and Z these
probabilities are 0.80 and 0.70.

Powered by www.ioncudos.com Page 12 of 34.


DEPARTMENT OF MATHEMATICS

a)What is the probability that Y will be awarded the contract and


complete the work satisfactorily?
b) What is the probability that the work will be completed satisfactorily?
c) It turns out that the work was done satisfactorily. What is the probability
that Y was awarded the contract?
14 A certain transistor is manufactured at three factories located at places
X, Y and Z. It is known that the X factory produces twice as many
transistors as the Y factory, which produces the same number as the Z
factory (during the period). Experience also shows that 0.2% of the 1.1.3
TLO4 L3
transistors produce at X and Y are faulty and so are 0.4% of those
produced at Z. A service engineer, while maintaining electronic
equipment, finds a defective transistor. What is the probability that the
factory Y is to be blame
15 Construct decision tree for the following example.
# Attribute Class
Sail?
Outlook Compan Sailbo
y at
1 sunny Big Small Yes
2 sunny Med Small Yes
3 sunny Med Big Yes
4 sunny No Small Yes TLO5,
L3 1.1.3
5 sunny Big Big Yes 6
6 rainy No Small No
7 rainy Med Small Yes
8 rainy Big Big Yes
9 rainy No Big No
10 rainy Med Big No
Predict for the class label X=(outlook=”rainy”, company=”big”,
sailboat=”small”) of a tuple using naïve Bayesian classification.

16 Construct decision tree for the following example.


# Attribute Shape
Color Outline dot
1 Green dashed No triangle
2 Green dashed Yes triangle TLO5,
L3 1.1.3
3 Yellow dashed No square 6

4 Red dashed No square


5 Red solid No square
6 Red solid Yes triangle
7 Green solid No square

Powered by www.ioncudos.com Page 13 of 34.


DEPARTMENT OF MATHEMATICS

8 Green dashed No triangle


9 Yellow solid Yes square
10 Red solid No square
Predict for the class label X=(color=”red”, outline=”dashed”, dot=”yes”)
of a tuple using naïve Bayesian classification.
17 Construct decision tree for the following example.
# Attribute
cheat
Refund Marital Status Taxable income
1 Yes Single >80K No
2 No Married >80K No
3 No Single <80K No
TLO5, 1.1.3
4 Yes Married >80K No L3
6
5 No Divorced >80K Yes
6 No Married <80K No
7 Yes Divorced >80K No
8 No Single >80K Yes
9 No Married <80K No
10 No Single >80K Yes

Course Code and Title: 20EMAB209 / Applied statistics with R

Chapter Number and Title: 3. Random variables and Probability


Planned Hours: 8 hrs
Distribution

Learning Outcomes:
At the end of the topic the student should be able to:
TLO's CO's BL CA Code
1.List different types of random variables and classify them ac discrete CO2 L1 1.1
2. Explain and illustrate the concept of a random variable and its probability CO2 L2 1.1
distribution, cdf along with graphical representation
3.Find mean and variance of discrete and continuous probability CO2 L2 1.1
distributions
4. Apply appropriate probability distributions to calculate probabilities in CO2 L3 1.1
specific applications.

5. Calculate marginal and conditional probability distribution from joint CO2 L3 1.1
probability distributions.

Lesson Schedule

Powered by www.ioncudos.com Page 14 of 34.


DEPARTMENT OF MATHEMATICS

Class No. - Portion covered per hour

1. Definition of random variable (Discrete & Continuous), mean & variance.


2. Numerical illustrations.
3. Binomial distribution, Poisson distribution and their properties.
4. Numerical illustrations
5. Normal distribution, properties of standard normal curve
6. Numerical illustrations
7. Bivariate distribution, Joint Probability distribution.
8. Marginal distribution, Independence, mean, variance, covariance.

Review Questions

Sr Questions TLO BL PI
No Code
1 List example of random variables and classify them into discrete and TLO1 L1 1.1.3
continuous random variables.
2 Five defective μP-chips are accidentally mixed with twenty good ones. It
is not possible to look at a chip and tell whether or not it is defective. Find
TLO2 L2 1.1.3
the probability distribution of number of defective μP-chips, if four μP-chips
are drawn at random from this lot. Graphically represent the probability
function and cumulative distribution function.
3 Consider the random variable that represents the number of heads
obtained on tossing five fair coins. The probability of obtaining heads on
TLO2 L3 1.1.3
any one coin is 1/2. The probability function and cumulative distribution are
given by the Binomial distribution. Tabulate probability distribution and
cumulative distribution. Graph the results.
4 A random variable X has the following probability distribution:
x : -2 -1 0 1 2 3
TLO3 L3 1.1.3
P(x): 0.1 K 0.2 2k 0.3 k
Find the (i) value of k (ii) mean and variance (iii) P(-1 < x ≤ 2) (iv) Express
density function and c.d.f graphically.
5 A company is considering drilling four oil wells. The probability of success
for each well is 0.40, independent of the results for any other well. The cost
of each well is $200,000. Each well that is successful will be worth
$600,000.
a) What is the probability that one or more wells will be successful? TLO4 L3 1.1.3
b) What is the expected number of successes?
c) What is the expected gain?
d) What will be the gain if only one well is successful?
e) Considering all possible results, what is the probability of a loss
rather than a gain?
f) What is the standard deviation of the number of successes?
6 In the precision bombing attack there is a 50% chance that any one bomb
TLO4 L3 1.1.3
will strike the target. Two direct hits are required to destroy the target

Powered by www.ioncudos.com Page 15 of 34.


DEPARTMENT OF MATHEMATICS

completely. How many bombs must be dropped to give a 99% chance or


better of completely destroying the target?

7 Consider the random variable that represents the number of boys in a


family. Out of 500 families with 5 children each, construct p.d.f and c.d.f 1.1.3
TLO4 L3
with graph, Also find how many families would you expected to have (i)
One boy (iii) At most two girls.
8 A set of 8 symmetrical coins was tossed 256 times and the frequencies of
throws observed were as follows. Fit a Binomial distribution. 1.1.3
TLO4 L3
No. of heads 0 1 2 3 4 5 6 7 8
Frequency of throws 2 6 24 63 64 50 36 10 1
9 It is possible for a computer to pick up an erroneous signal that does not
show up as an error on the screen. The error is called a silent paging error.
A particular terminal is defective, and when using the system word
processor, it introduces a silent paging error with probability 0.1. The word
TLO4 L3 1.1.3
process is used 20 times during a given week.
a. Find the probability that no silent paging errors occur.
b. Find the probability that at least one such error occurs.
c. Would it be unusual for more than four such errors to occur?
Explain, based on the probability involved.
10 In an automatic telephone exchange the probability that any one call is
wrongly connected is 0.001. What is the minimum number of independent 1.1.3
TLO4 L3
calls required to ensure a probability 0.9 that at least one call is wrongly
connected?
11 Customers arrive at a checkout counter at an average rate of 1.5 per
minute. What distribution will apply if reasonable assumptions are made?
List those assumptions. Find the probabilities that a) exactly two will arrive TLO4 L3 1.1.3
in any given minute; b) at least three will arrive during an interval of two
minutes; c) at most 8 will arrive during an interval of six minutes.
12 The spontaneous flipping of a bit stored in a computer memory is called a
“soft fail.” Let X denotes the time in millions of hours before the first soft
fail is observed. Suppose that the density for X is given by 𝑓(𝑥) = 𝑘 𝑒 −𝑥 , TLO3 L3 1.1.3
𝑥 > 0. (a) Find the average time that one must wait to observe the first
soft fail. (b) Also find the variance in waiting time.
13 Suppose that the life in hours of a certain part of radio tube is a continuous
random variable X with p.d.f given by:
100
, 𝑥 ≥ 100
𝑓(𝑥) = { 𝑥2 .
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 TLO3 L3 1.1.3
(a) What is the probability that all of three such tubes in a given radio set
will have to be replaced during the first 150 hours of operation?
(b)What is the probability that none of three of the original tubes will have
to be replaced during that first 150 hours of operation?
14 Let X denotes the time in hours needed to locate and correct a problem in
TLO4 L3 1.1.3
the software that governs the timing of traffic lights in the downtown area

Powered by www.ioncudos.com Page 16 of 34.


DEPARTMENT OF MATHEMATICS

of a large city. Assume that X is normally distributed with mean 10 hours


and variance 9.
(i) Find the probability that the next problem will require at most 15 hours
to find and correct. (ii) The fastest 5% of repairs take at most how many
hours to complete?
15 For a certain type of fluorescent light in a large building, the cost per bulb
of replacing bulbs all at once is much less than if they are replaced
individually as they burn out. It is known that the lifetime of these bulbs is 1.1.3
TLO4 L3
normally distributed, and that 60% last longer than 2500 hours, while 30%
last longer than 3000 hours. What are the approximate mean and
standard deviation of the lifetimes of the bulbs?
16 The average number of acres burned by forest and range fires in a large
New Mexico county is 4,300 acres per year, with a standard deviation of
750 acres. The distribution of the number of acres burned is normal. What TLO4 L3 1.1.3
is the probability that between 2,500 and 4,200 acres will be burned in any
given year?
17 In a normal distribution, 31% of the items are under 45 and 10% are over 1.1.3
TLO4 L3
64. Find the mean and standard deviation of the distribution.
18 An urn contains 3 red and 5 white balls. Three balls are drawn at random
TLO5 L3 1.1.3
with replacement. Obtain a bivariate distribution of X and Y, where X
denotes number of red balls and Y denotes number of white balls.
19 Consider an experiment that consists two throws of a fair die. Let X be
TLO5 L3 1.1.3
number of 4s and Y the number of 5s obtained in the 2 throws. Find the
joint probability distribution of X and Y.
20 The joint probability distribution of two discrete random variables X and Y
is given by𝑓(𝑥, 𝑦) = 𝑘(𝑥 + 𝑦) where x and y are integers such that 0 ≤ 𝑥 ≤ TLO5 L3 1.1.3
2, 0 ≤ 𝑦 ≤ 3. (i) Find k (ii) Obtain marginal distribution of X and Y (iii)
Compute coefficient of correlation.
21 The joint probability distribution of two random variables X and Y is as
follows. Find (i) P(X=1, Y=-1) (ii) P(X<2,Y>0)
Y TLO5 L3 1.1.3
-2 -1 4 5
X
1 0.1 0.2 0 0.3
2 0.2 0.1 0.1 0
22 The joint probability distribution of two random variables X and Y is as
follows. (i) Find marginal distribution of X and Y (ii) E(X), E(Y) and E(XY)
(iii) Determine the coefficient of correlation between X and Y, interpret your
answer (iv) Is two random variables independent? .
TLO5 L3 1.1.3
Y
3 4 5
X
2 1/6 1/6 1/6
5 1/12 1/12 1/12
7 1/12 1/12 1/12
23 Two marbles are selected are at random without replacement from a box
containing 3 blue, 2 red and 3 green marbles. Obtain a bivariate TLO5 L3 1.1.3
distribution of X and Y, where X denotes number of red balls and Y denotes
number of green balls also (i) Find marginal distribution of X and Y (ii) E(X),

Powered by www.ioncudos.com Page 17 of 34.


DEPARTMENT OF MATHEMATICS

E(Y) and E(XY) (iii) Determine the coefficient of correlation between X and
Y, interpret your answer (iv) Is two random variables independent?

Course Code and Title: 20EMAB209 / Applied statistics with R


Chapter Number and Title: 4. Statistical Inference I Planned Hours: 8 hrs

Learning Outcomes:
At the end of the topic the student should be able to:
TLO's CO's BL CA Code

1.List Advantages of sampling CO3 L1 1.1

2.Explain the SRSWR and SRSWOR CO3 L2 1.1

3. Distinguish between Cluster sampling and Stratified sampling. CO3 L2 1.1

4. Basic terminologies of testing of hypothesis. CO3 L1 1.1

5. Construct confidence interval and Sample size. CO3 L2 1.1

6.Test the hypothesis for proportion or means(single and difference). CO3 L3 1.1

Lesson Schedule
Class No. – Portion covered per hour

1. Need for sampling, Sampling with and without replacement

2. Stratified sampling, Cluster sampling

3. Confidence interval, Sample size determination.

4. Numerical exercises.

5. Basic terminologies of testing hypothesis.

6. Test hypotheses on a population proportion

7. Test hypotheses on the mean of a normal distribution using either a Z-test or a t-test procedure

8. Test hypotheses on the difference of means of a normal distribution

Review Questions

Sr. PI
Questions TLO BL
No Code

1 What is the need for sampling? Explain. 1.1.3


TLO1 L2
2 Distinguish between SRSWR and SRSWOR with suitable examples. TLO2 L2 1.1.3

Powered by www.ioncudos.com Page 18 of 34.


DEPARTMENT OF MATHEMATICS

3 Identify the situation when you use cluster sampling and stratified sampling. 1.1.3
TLO3 L2
Explain with examples.
4 A population contains 3 units 5, 4 and 7. Obtain the sampling distribution of
sample mean when sample of size 2 is drawn (i) with replacement (ii) TLO2 L2 1.1.3
without replacement.
5 Explain hypothesis, null and alternative hypothesis, two types of errors, TLO4 L2 1.1.3
level of significance, confidence interval, critical region
6 A survey is proposed to be conducted to know the annual earnings of the
old Engineering graduates of Delhi University. How large should the sample
be taken in order to estimate the mean monthly earnings within plus and TLO5 L3 1.1.3
minus Rs.10,000 at 95% confidence level? The standard deviation of the
annual earnings of the entire population is known to be 30,000.
7 An astronomer wants to measure the distance from her observatory to a
distant star. However, due to atmospheric disturbances, any measurement
will not yield the exact distance 𝑑. As a result, the astronomer has decided
to make a series of measurements and then use their average value as an
estimate of the actual distance. If the astronomer believes that the values TLO5 L3 1.1.3
of the successive measurements are independent random variables with a
mean of 𝑑 light year and a standard deviation of 2 light years, how many
measurements need she make to be at least 95% certain that her estimate
is accurate to within ± 0.5 light years?
8 The point estimate for proportion of 16-kbit dynamic RAMs that function
correctly for at least 1000 hours based on a sample of size 100 is 0.91. TLO5 L3 1.1.3
Construct 95% and 99% confidence interval for population proportion.
A random sample of 30 ball bearing produced by a company have a mean
diameter of 0.5060 cm with s.d 0.004 cm. Find the maximum error estimate
E and 98% confidence interval for the actual mean diameter of a ball TLO5 L3 1.1.3
bearings produced by this company assuming sampling from normal
population.
9 A random sample of 500 apples was taken from a large consignment and
TLO5 L3 1.1.3
60 were found to be bad. Obtain 96% and 99% confidence limits for the
percentage of bad apples in the consignment.
10 The mean and standard deviation of marks scored by a sample of 100
1.1.3
students are 67.45 and 2.92. Find (i) 95% and 97% confidence intervals for
estimating the mean marks of the student population.
11 Ten specimens of copper wires drawn from a large lot have the following
breaking strength (in Kg. weight) 578, 572, 570, 568, 572, 571, 570, 572, TLO5 L3 1.1.3
596, 548. Find 99% confidence limits for the mean.
12 Independent random samples of size 375 are selected from the population
of Canadian business and from the population of business in U.S. The point
TLO5 L3 1.1.3
estimate for the difference in the proportion of businesses in Canada and
the proportion of businesses in the U.S with on-site mainframe computers is
𝑝1 − 𝑝2 = 0.589 − .619 = −0.03. Construct 95% C.I for this difference.
13 A random sample of 500 workers engaged in R&D last year is selected. Of
these 178 earn over $72,000 per year. Of the 450 workers in R&D studied TLO5 L3 1.1.3
during the current year 200 earn in excess of $72,000 per year. (i) Let
𝑝1 and 𝑝2 denote the proportion of workers engaged in R&D who earned over

Powered by www.ioncudos.com Page 19 of 34.


DEPARTMENT OF MATHEMATICS

$72,000 per year last year and this year, respectively. Find point estimates
for 𝑝1 , 𝑝2 and 𝑝1 − 𝑝2 . (ii) Construct 95% C.I for 𝑝1 − 𝑝2 .

14 In a certain factory there are two independent processes manufacturing the


same item. The average weight in a sample of 250 items produced from
one process is found to be 120ozs. With the s.d of 12ozs. While the
corresponding figures in a sample of 400 items from the other process are TLO5 L3 1.1.3
124 and 14. Obtain the standard error of the difference between the two
sample means. Find 99% confidence limits for the difference in the average
weights of the items produced by the two processes respectively.

15 It is generally assumed that men are taller than women, but we would like to
test at 0.01 L.O.S this, so we conduct a survey of 8000 individuals, and a
summary of the heights of the males and females who participated in the TLO5 L3 1.1.3
survey (in inches) is given below: Male Female Sample Size 1600, 6400
Mean 172, 170 Standard Deviation 6.3, 6.4 respectively.

16 The mean life time of 100 electric bulbs produced by a manufacturing


company is estimated to be 1570 hours with a standard deviation of 120 TLO6 L3 1.1.3
hours. Test the hypothesis that the mean life time of bulbs produced by the
company is 1600 hours.
17 An examination was given to two classes A and B consisting of 40 and 50
students respectively. In class A, the mean mark was 74 with a standard
deviation of 8, while in class B the mean mark was 78 with a standard
TLO6 L3 1.1.3
deviation of 7. Is there a significant difference between the performances in
the two classes, at the level of significance 0.05? What about the situation
at 0.01 level of significance?

18 It is generally assumed that men are taller than women, but we would like to
test at 0.01 L.O.S this, so we conduct a survey of 8000 individuals, and a
TLO5 L3 1.1.3
summary of the heights of the males and females who participated in the
survey (in inches) is given below: Male Female Sample Size 1600, 6400
Mean 172, 170 Standard Deviation 6.3, 6.4 respectively.
19 A study of TV viewers was conducted to find the opinion about the mega
serial ‘Ramayana”. If 56% of sample of 300 viewer from south and 48% of
TLO5 L3 1.1.3
200 viewer from north preferred the serial, test
(a) There is a difference of opinion between south and north using p-value
(b) Ramayana is preferred in south using p-value
20 Ten specimens of copper wires drawn from a large lot have the following
breaking strength (in Kg. weight) 578, 572, 570, 568, 572, 571, 570, 572, TLO6 L3 1.1.3
596, 548. Test whether the mean breaking strength of the lot may be taken
be 578kg weight?
21 In 1950 in India the mean life expectancy was 50 years. If the life
expectancies from a random sample of 11 persons are 58.2, 56.6, 54.2, TLO6 L3 1.1.3
50.4, 44.2, 61.9, 57.5, 53.4, 49.7, 55.4, 57, does it confirm the expected view
at 5% LOS.
22 A builder claims that heat pumps are installed in 70% of all homes being
constructed today in the city of Richmond. Would you agree with this claim
TLO6 L3 1.1.3
if a random survey of new homes in this city shows that 8 out of 15 had heat
pumps installed? Use a 0.01 level of significance.

23 In a random sample of 400 students of University teaching department, it


TLO6 L3 1.1.3
was found that 300 students failed in examination. In another random
sample of 500 students of affiliated colleges, the number of failures in the

Powered by www.ioncudos.com Page 20 of 34.


DEPARTMENT OF MATHEMATICS

same examination was found to be 300. Find out whether the proportion of
failures in the university teaching department is significantly greater than the
proportion of failures in the affiliated colleges.

24 Many consumers think that automobiles built on Mondays are more likely to
have serious defects than those built on any other day of the week. To
support this theory a random sample of 100 cars built on Monday is selected
TLO6 L3 1.1.3
and inspected. Of these 8 are found to have serious defects. A random
sample of 200 cars produced on other days reveals 12 with serious defects.
Do these data support the stated connection?

Course Code and Title: 20EMAB209 / Applied Statistics with R


Chapter Number and Title: 5. Correlation and Regression Planned Hours: 5 hrs

Learning Outcomes:

At the end of the topic the student should be able to:

TLO's CO's BL CA Code


1. Recall formula for computing coefficient of correlation. CO4 L1 1.1

2. Describe the linear relationship between two variables. CO4 L2 1.1

3. Explain the concepts and meaning of correlation coefficients. CO4 L2 1.1

4. Distinguish between correlation and Regressions. CO4 L2 1.1

5. Determine the relationship between two and more than two variables CO4 L3 2.3
using regression technique.

6. Interpret the relationship between two variables using angle between CO4 L2 1.1
the two regression lines.

7. Calculate Coefficient of correlation and Regression coefficients CO4 L3 1.1

Lesson Schedule

Class No. - Portion covered per hour

1. Correlation: Meaning, method of find relationship between the variables, computing


coefficient of correlation
2. Regressions: Meaning and Types of regressions, Difference between them.
3. Linear regression: Properties of regression coefficients and examples.
4. Multiple linear regression and examples.
5. Logistic regression.

Powered by www.ioncudos.com Page 21 of 34.


DEPARTMENT OF MATHEMATICS

Review Questions
Sr. Questions PI
No TLO BL
Code
1 Discuss different types of relationship between two variables
TLO1 L2 1.1.3
2 What you mean by correlation and regression? Are they similar? Discuss.
TLO1 L2 1.1.3
3 How do you measure association between two variables? Illustrate with
TLO2 L2 1.1.3
suitable examples.
4 A survey was conducted to study the relationship between sales and
advertising expenditure. Estimate (i) the sales for advertising expenditure of
Rs.90 lakhs (ii) the advertising expenditure for sales target of Rs.25 crore
TLO3 L3 2.3.1
(iii) their correlation.
Sales(Rs. Crore) : 10 11 13 15 16 19 14
Adv. Exp (Rs Lakh): 60 62 65 70 73 75 71
6 Find the coefficient of correlation between industrial production and export
using the following data and comment on the result.
TLO3 L3 1.1.3
Production 55 56 58 59 60 60 62
Exports 35 38 38 39 44 43 45
7 1−𝑟 2 𝜎𝑥 𝜎𝑦
Use 𝑡𝑎𝑛𝜃 = ( )( )to interpret the relation between the two variables
𝑟 𝜎𝑥2 +𝜎𝑦2 TLO2 L2 1.1.3
when r = 0, r = 1 and r = -1.
8 Of two personnel evaluation techniques available, the first requires a two-
hour test interview while the second can be completed in less than an hour.
The scores for each of the 15 individuals who took both tests are given in
the below table:
Applicant 1 2 3 4 5 6 7 8 9 10 11 12
TLO4 L3 2.3.1
Test1 75 89 60 71 92 105 55 87 73 77 84 91
Test2 38 56 35 45 59 70 31 52 48 41 51 58
(i) Construct a scatter plot for the data. Does the assumption of linearity
appear to be reasonable? (ii) Use the regression line to predict the score
on the second test for an applicant who scored 85 on test1.
9 The following data relate to radio advertising expenditure, newspaper
advertising expenditure and sales. Fit a regression 𝑦 = 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2 .
Calculate the coefficient of multiple determination.. Does the model explain
the variation in y, compare the y-actual and y-predicted values.
TLO4 L3 2.3.1
Radio ad. Exp.(‘000Rs) (x1) 4 7 9 12
Newspaper ad. Exp.(‘000Rs) (x2) 1 2 5 8
Sales (Rs Lakhs) (y) 7 12 17 20
10 In order to study the relationship of advertising and capital investment with
corporate profits on the following data, recorded in unit of $100,000 were
collected for 10 medium sized firms in the same year. The variable y
represents profit for the year, x1 represents capital investment, and x2
represents advertising expenditure.
y 15 16 2 3 12 1 16 18 13 2 TLO4 L3 2.3.1
x1 25 1 6 30 29 20 12 15 6 16
x2 4 5 3 1 2 0 4 5 4 2
Using the model 𝑦 = 𝑎0 + 𝑎1 𝑥1 + 𝑎2 𝑥2 find the least square prediction
equation for data. Calculate the coefficient of determination. What
percentage of overall variation in explained by the model?

Powered by www.ioncudos.com Page 22 of 34.


DEPARTMENT OF MATHEMATICS

Course Code and Title: 20EMAB209 / Applied statistics with R


Chapter Number and Title: 6. Statistical Inference II Planned Hours: 5 hrs
Learning Outcomes:
At the end of the topic the student should be able to:
TLO's CO's BL CA Code

1. Use Chi-square test for independence of attributes and goodness of fit. CO5 L3 1.1

Lesson Schedule
Class No. - Portion covered per hour

1. Introduction to categorical data, contingency table.

2. test for independence of attributes.

3. Numerical illustrations.

4. Chi-square test for goodness of fit.

5. Numerical illustrations.

Review Questions

Sr. PI
Questions TLO BL
No Code
1 Number of jobs related injuries in an aircraft was being observed of
Government of India. The values of last 100 months were as follows:

Injuries per month 0 1 2 3 4 5 6 TLO1 L3 1.1.3


Frequency of occurrence 35 40 13 6 4 1 1
Apply Chi-square test to these data to test the hypothesis that the above
lying distribution is Poisson. Use 5% level of significance.

2 Two researchers adopted different sampling techniques while investigating


the same group of students to find the number of students falling in different
intelligence levels. The results are as follows:

No.of students in each level


Researcher TLO1 L3 1.1.3
Below average Average Above average Genius
X 86 60 44 10
Y 40 33 25 2
Would you say that the sampling techniques adopted by the two researchers
are significantly different?

3 In 250 digits from the lottery numbers, the frequency of the digits 0, 1, 2, 9
were 23, 25, 20, 23, 23, 22, 29, 25, 33 and 27. Test the hypothesis that they TLO1 L3 1.1.3
were randomly drawn.

4 A computer system has six I/O channels and the system personnel are TLO1 L3 1.1.3
reasonably certain that the load on the channels is balanced. If X is the

Powered by www.ioncudos.com Page 23 of 34.


DEPARTMENT OF MATHEMATICS

random variable denoting the index of channel to which a given I/O


1
operation is directed, then its p.m.f is assumed to be 𝑃𝑥 (𝑖) = 𝑝𝑖 = , 𝑖 =
6
0,1, … ,5. Out of 150 I/O operation observed, the number of operations
directed to various channels were: 𝑛0 = 22, 𝑛1 = 23, 𝑛2 = 29, 𝑛3 = 31, 𝑛4 =
26, 𝑛5 = 19. Test the hypothesis that the load on the channels is balanced.

5 A random sample of n = 60 printed circuit boards has been collected, and


the following number of defects observed. Apply Chi-square test to these
data to test the hypothesis that the number of defects in printed circuit
boards follows a Poisson distribution.

No. of defects Observed frequency TLO1 L3 1.1.3


0 32
1 15
2 9
3 4
6 Define X as the number of underfilled bottles from a filling operation in
a carton of 24 bottles. Of 75 cartons inspected, the following observations
on X are recorded:

Values 0 1 2 3 TLO1 L3 1.1.3

Frequency 39 23 12 1
(a) Based on these 75 observations, is a binomial distribution an
appropriate model? Perform a goodness-of-fit with α = 0.05.

7 A university has chosen to conduct classes during covid19. Management


wants to know whether the preference of mode of class delivery is
independent of student’s locality. The options of a random sample of 500
students are shown. Draw the conclusion based on this scenario.

Locality of students TLO1 L3 1.1.3

Mode od class Local Hostel PG


delivery
On-line classes 160 140 40
Off-line classes 40 60 60

Powered by www.ioncudos.com Page 24 of 34.


DEPARTMENT OF MATHEMATICS

Model Question Paper for In Semester Assessment (ISA-1)

Course Code: 20EMAB209 Course Title: Applied Statistics With R


Duration: 75 min
Max. Marks: 40

Note: .Solve Any TWO full Questions

PI
Q.No Questions Marks CO BL
Code
Out of 20 engineers working on a project, five are post graduates. If three
1a of them are selected at random, What is the probability that (i) they are 06 2 L3 1.1.3
all graduates? (ii) at least one is a post graduate?
Does this data set come from the normal distribution? Discuss.
1b 25, 25, 27.7, 25.9, 25.9, 21.7, 22.8, 28.9, 26.4, 22.4 07 1 L3 1.1.3

Sample Grand total scores for eight female and male candidates are
listed.
Female scores 1226 965 841 1053 1056 1393 1312 1222
1c 07 1 L3 1.1.3
Male scores 1059 1328 1175 1123 923 1017 1214 1042
Using an appropriate measure, determine which gender of candidates
has the most consistent level of scores. Justify your answer.
Explain the meaning of skewness using sketches of frequency curves.
2a State the different measures of skewness that are commonly used. How 04 1 L3 1.1.3
does skewness differ from dispersion?
Suppose that the covid test results have an accuracy of 95% and 40% of
the people have covid positive. If a patient test positive then what is the
2b probability that he actually has a disease? If a patient test negative then 08 2 L3 1.1.3
what is the probability that he does not have the disease?
Draw a tree diagram.
For the given data
22.5, 23.8, 23.2, 22.8, 10.1, 23.5, 24.0, 23.2, 24.2, 24.3, 23.3, 23.4,
2c 23.0, 23.5, 22.8 08 1 L3 1.1.3
(i) Construct a boxplot and identify outliers if any.
(ii) Discuss symmetry numerically and graphically.
Data set: Amount (in dollars) spent on books for a semester
91 472 279 249 530 376 188 341 266 199
142 273 189 130 489 266 248 101 375 486
190 398 188 269 43 30 127 354 84
(i)Construct a frequency histogram for the data.
3a (ii)How many percent of books cost more than Rs.450? Explain your 08 1 L3 1.1.3
reasoning.
(iii) Mention the number of books whose cost is between Rs.200 and
Rs.500 with the help of ogives.

Powered by www.ioncudos.com Page 25 of 34.


DEPARTMENT OF MATHEMATICS

Construct a decision induction tree for the data base: Triangle and
squares
Sl.no Attribute Shape
Color Outline Dot
1 Green Dashed No Triangle
2 Green Dashed Yes Triangle
3 yellow Dashed No Square
3b 4 Red Dashed No Square 12 2 L3 1.1.3

5 Red Solid No Square


6 Red Solid Yes Triangle
7 Green Solid No square
8 Green Dashed No Triangle
9 Yellow Solid Yes Square
10 Red Solid No Square

Model Question Paper for In Semester Assessment (ISA-2)


Course Code: 20EMAB209 Course Title: Applied Statistics With R
Duration: 75 min
Max. Marks: 40

Note: Solve Any TWO full Questions

B PI
Q.No Questions Marks CO
L Code
1a An urn contains 3 red and 5 white balls. Three balls are drawn at random with
replacement. Obtain a bivariate distribution of X and Y, where X denotes 06 2 L2 1.1.3
number of red balls and Y denotes number of white balls.
b A game consists of tossing darts onto a large flat mat that has been divided into
450 blocks of 6 square inches each. In one session, 370 darts were thrown. We 07 2 L3 1.1.3
want the probability that one block was hit exactly twice or exactly 4 times.
c A study of the electromechanical protection devices used in electrical power
systems showed that of 193 devices that failed when tested, 75 were due to
mechanical parts failures. (i)Find 96% confidence interval for the proportion of 07 3 L3 1.1.3
failures that are due to mechanical parts failures.(ii) How large a sample is
required to estimate proportion to within 0.03 with 96% confidence?
2.a The average test marks in a particular class is 79. The standard deviation is 5. If
the marks are normally distributed, how many students in class of 200 did not 06 2 L3 1.1.3
receive marks between 75 and 82.
b 10 packets are chosen at random from a godown and their weights in kilogram
are found to be
15.75 15.75 16.0 16.25 16.5 17.25 17.25 17.5 17.5 17.75 07 3 L3 1.1.3
Discuss the suggestion that the mean weight in the universe is 16.25kg.use 5%
level of significance?

Powered by www.ioncudos.com Page 26 of 34.


DEPARTMENT OF MATHEMATICS

c A population contains 3 units 7, 11 and 15. Obtain the sampling distribution of


sample mean and S.D when sample of size 2 is drawn (i) with replacement 07 3 L3 1.1.3
(ii) without replacement.
3.a The mean breaking strength of the cables supplied by a manufacturer is 1800
with a standard deviation 100. By introducing a new technique in the
manufacturing process, it is claimed that the breaking strength of the cables
06 3 L2 1.1.3
have increased. In the order to test this claim a sample of 50 cables is tested.It
was found that the mean breaking strength is 1850. Can we support the claim at
1% level of significance?
b 0 𝑓𝑜𝑟 𝑥 < 1
𝑏
The probability density function is given by 𝑓(𝑥) = {𝑥 2 𝑓𝑜𝑟 1 < 𝑥 < 5
0 𝑓𝑜𝑟 𝑥 > 5 07 2 L3 1.1.3
a) What is the value of b?
b) Obtain the probability that X is between 2 and 4.
c) What is the probability that X is exactly 2?
c A random sample 0f 500 workers engaged in research and development last
year is selected. Of these, 178 earn over $72,000 per year. Of the 450 workers
07 3 L3 1.1.3
in R & D studied during the current year, 220 earn in excess of $72,000 per
year. Can we support the claim at 4% level of significance?

Model Question Paper for End Semester Assessment (ESA)


Course :Applied Statistics With R Course Code : 20EMAB209
Total Duration (H:M):3hr : 00 Maximum Marks :100

Note :Answer Five Questions: Any two full questions from each Unit I & Unit II and one full question from Unit III

PI
Q.No Unit-I Marks CO BL
Code
After applying filter to the e-mails, messages were classified as spam and
non-spam. The word ‘offer’ occurs in 70% of the spam messages and only
5% of the non-spam messages. Also 10% of the messages are spam. Find
the following probabilities
1a 06 1 L2 1.1.3
(i) both messages contain the word ‘offer’
(ii) neither message contain the word ‘offer’
(iii) message is span given that it contains the word ‘offer’.
(iv) message is not span given that it does not contain the word ‘offer’.
Does this data set come from the normal distribution? Discuss.
b 3.89 4.75 6.33 4.75 7.21 5.78 5.80 5.20 6.64 07 1 L3 1.1.3

Salaries (in thousands of rupees) of teachers from government and private


sectors are listed below
Private teacher 38.6 38.1 38.7 36.8 34.8 35.9 39.9 36.2
c Government teacher 21.8 18.4 20.3 17.6 19.7 18.0 19.9 20.0 07 2 L3 2.3.1

Using an appropriate measure, determine which sector of teachers has the


most consistent level of salaries. Justify your answer.
Explain the meaning of skewness using sketches of frequency curves.
2.a State the different measures of skewness that are commonly used. How 04 2 L2 1.1.3
does skewness differ from dispersion?

Powered by www.ioncudos.com Page 27 of 34.


DEPARTMENT OF MATHEMATICS

Consider the ISA and ESA for a Applied statistics class. Suppose 23% of
students obtained an A grade in ISA. Of those students who earned an A
grade in ISA, 37% received an A grade in the ESA, and 12% of the
b 08 1 L3 1.1.3
students who obtained lower than an A in the ISA received an A in the ESA.
You randomly pick up a final exam and notice the student received an A.
What is the probability that this student obtained an A grade in the ISA?
A semi-commercial test plant produced the following daily outputs in
tonnes/ day:
1.3 2.5 1.8 1.4 3.2 1.9 1.3 4.0 1.1 1.7
c 08 1 L3 1.1.3
1.4 3.0 1.6 1.2 2.3 2.9 1.1 1.7 2.0 1.4
(i) Construct a boxplot and identify outliers if any.
(ii) Discuss symmetry numerically and graphically.
What Would You Do? You work in the admissions department for a college
and are asked to recommend the minimum SAT scores that the college will
accept for a position as a full-time student. Here are the SAT scores for a
sample of 50 applicants.
1325 1072 982 996 872 849 785 706 669 1049
885 1367 935 980 1188 869 1006 1127 979 1034
1052 1165 1359 667 1264 727 808 955 544 1202
3.a 12 2 L3 1.1.3
1051 1173 410 1148 1195 1141 1193 768 812 887
1211 1266 830 672 917 988 791 1035 688 700

(a) Construct a relative frequency histogram for the data using 10 classes.
(b) If you set the minimum score at 986, what percent of the applicants will
you be accepting? Explain your reasoning.
(c) If you want to accept the top 88% of the applicants, what should the
minimum score be? Explain your reasoning.
Refering the given data set.
Predict whether a person buys computer or not. The person’s age is less
than 30, income is medium, he is a student and has a fair credit rating.
example age income student Credit rating Buys computer
1 ≤ 30 high no fair no
2 ≤ 30 high no excellent no
3 31- 40 high no fair Yes
4 >40 medium no fair Yes
5 >40 low yes fair Yes
b 08 1 L3 1.1.3
6 >40 low yes excellent no
7 31- 40 low yes excellent Yes
8 ≤ 30 medium no fair No
9 ≤ 30 low yes fair Yes
10 >40 medium yes fair Yes
11 ≤ 30 medium yes excellent Yes
12 31- 40 medium no excellent Yes
13 31- 40 high yes fair Yes

Powered by www.ioncudos.com Page 28 of 34.


DEPARTMENT OF MATHEMATICS

14 >40 medium no excellent no


Unit-II
Identify the situation when you use cluster sampling and stratified sampling.
4.a 05 3 L2 1.1.3
Explain with examples.
For a certain type of fluorescent light in a large building, the cost per bulb of
replacing bulbs all at once is much less than if they are replaced individually
as they burn out. It is known that the lifetime of these bulbs is normally
b 07 2 L3 1.1.3
distributed, and that 60% last longer than 2500 hours, while 30% last longer
than 3000 hours. What are the approximate mean and standard deviation
of the lifetimes of the bulbs?
A population contains 3 units 5, 4 and 7. Obtain the sampling distribution of
c sample mean when sample of size 2 is drawn (i) with replacement (ii) 08 3 L3 1.1.3
without replacement.
The joint probability distribution of two random variables X and Y is as
follows. (i) Find marginal distribution of X and Y (ii) Is two random variables
independent? .
Y
5.a 3 4 5 06 2 L3 1.1.3
X
2 1/6 1/6 1/6
5 1/12 1/12 1/12
7 1/12 1/12 1/12
Five defective μP-chips are accidentally mixed with twenty good ones. It is
not possible to look at a chip and tell whether or not it is defective. Find the
b probability distribution of number of defective μP-chips, if four μP-chips are 07 2 L3 1.1.3
drawn at random from this lot. Graphically represent the probability
function and cumulative distribution function.
A company is engaged in the packaging of a superior quality tea in jars of
500gm each. The company is of the view that as long as jars contain
c 500gm of tea, the process is in control. The standard deviation is 50gm. A 07 3 L3 1.1.3
sample of 225 jars is taken at random and the sample average is found be
510gm. Has the process gone out of control? Construct 99% C.I
(i) Define confidence interval and Standard error (ii) A survey is proposed to
be conducted to know the annual earnings of the old Engineering graduates
of Delhi University. How large should the sample be taken in order to
6.a 06 3 L3 1.1.3
estimate the mean monthly earnings within plus and minus Rs.10,000 at
95% confidence level? The standard deviation of the annual earnings of
the entire population is known to be 30,000.
The spontaneous flipping of a bit stored in a computer memory is called a
“soft fail.” Let X denotes the time in millions of hours before the first soft fail
is observed. Suppose that the density for X is given by
b 07 2 L3 1.1.3
. (a) Find the average time that one must wait to
observe the first soft fail. (b) Also find the variance in waiting time.
In a random sample of 400 students of University teaching department, it
was found that 300 students failed in examination. In another random
sample of 500 students of affiliated colleges, the number of failures in the
c 07 3 L3 1.1.3
same examination was found to be 300. Find out whether the proportion of
failures in the university teaching department is significantly greater than
the proportion of failures in the affiliated colleges. Use p-value.
Unit-III
Write the expression for angle between two regression and interpret the
7.a 06 4 L2 1.1.3
relation between the two variables when r = 0, r = 1 and r = -1.
Following data gives the age and blood pressure of individuals
b 06 4 L3 1.1.3
Age 56 42 72 36 63 47 55 49 38

Powered by www.ioncudos.com Page 29 of 34.


DEPARTMENT OF MATHEMATICS

B.P 147 125 160 118 149 128 150 145 115
Estimate the B.P when age is 45.
The following data relate to radio advertising expenditure, newspaper
advertising expenditure and sales. Fit a regression 𝑦 = 𝑎 + 𝑎1 𝑥1 + 𝑎2 𝑥2
Radio adv expenditure 4 7 9 12
c Newspaper adv expenditure 1 2 5 8 08 4 L3 1.1.3
Sales 7 12 17 20
Calculate the coefficient of determination. What percentage of overall
variation in explained by the model?
Genetic theory states that children having one parent of blood type M and
the other of blood type N will always be one of the three types M,, MN,N
and that the proportions of these types will have on an average be 1:2:1. A
8.a 05 3 L3 1.1.3
report states that out of 300 children having one M parent and one N
parent, 30% were found to be of type M, 45% of type MN and the
remainder of type N. Test the theory by chi-square test.
The following data is collected on two characters.
Cinegoers Non-cinegoers
Literate 83 57
b 05 3 L3 1.1.3
Illiterate 45 68
Based on this, can you conclude that there is no relation between the habit
of cinema going and literacy?
Number of jobs related injuries in an aircraft was being observed of
Government of India. The values of last 100 months were as follows:
Injuries per month 0 1 2 3 4 5 6
c 10 3 L3 1.1.3
Frequency of occurrence 35 40 13 6 4 1 1
Apply Chi-square test to these data to test the hypothesis that the above
lying distribution is Poisson. Use 5% level of significance.

Powered by www.ioncudos.com Page 30 of 34.


DEPARTMENT OF MATHEMATICS

Powered by www.ioncudos.com Page 31 of 34.


DEPARTMENT OF MATHEMATICS

Powered by www.ioncudos.com Page 32 of 34.


DEPARTMENT OF MATHEMATICS

Percentage Points of the t-Distribution


Tail Probabilities
One Tail 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
Two Tails 0.20 0.10 0.05 0.02 0.01 0.002 0.001
-------+---------------------------------------------------------+-----
D 1 | 3.078 6.314 12.71 31.82 63.66 318.3 637 1
E 2 | 1.886 2.920 4.303 6.965 9.925 22.330 31.6 2
G 3 | 1.638 2.353 3.182 4.541 5.841 10.210 12.92 3
R 4 | 1.533 2.132 2.776 3.747 4.604 7.173 8.610 | 4
E 5 | 1.476 2.015 2.571 3.365 4.032 5.893 6.869 | 5
E 6 | 1.440 1.943 2.447 3.143 3.707 5.208 5.959 | 6
S 7 | 1.415 1.895 2.365 2.998 3.499 4.785 5.408 | 7
8 | 1.397 1.860 2.306 2.896 3.355 4.501 5.041 | 8
O 9 | 1.383 1.833 2.262 2.821 3.250 4.297 4.781 | 9
F 10 | 1.372 1.812 2.228 2.764 3.169 4.144 4.587 | 10
11 | 1.363 1.796 2.201 2.718 3.106 4.025 4.437 | 11
F 12 | 1.356 1.782 2.179 2.681 3.055 3.930 4.318 | 12
R 13 | 1.350 1.771 2.160 2.650 3.012 3.852 4.221 | 13
E 14 | 1.345 1.761 2.145 2.624 2.977 3.787 4.140 | 14
E 15 | 1.341 1.753 2.131 2.602 2.947 3.733 4.073 | 15
D 16 | 1.337 1.746 2.120 2.583 2.921 3.686 4.015 | 16
O 17 | 1.333 1.740 2.110 2.567 2.898 3.646 3.965 | 17
M 18 | 1.330 1.734 2.101 2.552 2.878 3.610 3.922 | 18
19 | 1.328 1.729 2.093 2.539 2.861 3.579 3.883 | 19
20 | 1.325 1.725 2.086 2.528 2.845 3.552 3.850 | 20
21 | 1.323 1.721 2.080 2.518 2.831 3.527 3.819 | 21
22 | 1.321 1.717 2.074 2.508 2.819 3.505 3.792 | 22
23 | 1.319 1.714 2.069 2.500 2.807 3.485 3.768 | 23
24 | 1.318 1.711 2.064 2.492 2.797 3.467 3.745 | 24
25 | 1.316 1.708 2.060 2.485 2.787 3.450 3.725 | 25
26 | 1.315 1.706 2.056 2.479 2.779 3.435 3.707 | 26
27 | 1.314 1.703 2.052 2.473 2.771 3.421 3.690 | 27
28 | 1.313 1.701 2.048 2.467 2.763 3.408 3.674 | 28
29 | 1.311 1.699 2.045 2.462 2.756 3.396 3.659 | 29
30 | 1.310 1.697 2.042 2.457 2.750 3.385 3.646 | 30
-------+---------------------------------------------------------+-----
Two Tails 0.20 0.10 0.05 0.02 0.01 0.002 0.001
One Tail 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
Tail Probabilities

Powered by www.ioncudos.com Page 33 of 34.


DEPARTMENT OF MATHEMATICS

Table: Significant values of Chi-square distribution (Right tail areas) for given 


df 0.99 0.95 0.90 0.10 0.05 0.025 0.01
1 --- 0.004 0.016 2.706 3.841 5.024 6.635
2 0.020 0.103 0.211 4.605 5.991 7.378 9.210
3 0.115 0.352 0.584 6.251 7.815 9.348 11.345
4 0.297 0.711 1.064 7.779 9.488 11.143 13.277
5 0.554 1.145 1.610 9.236 11.070 12.833 15.086
6 0.872 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 2.167 2.833 12.017 14.067 16.013 18.475
8 1.646 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 3.325 4.168 14.684 16.919 19.023 21.666
10 2.558 3.940 4.865 15.987 18.307 20.483 23.209
11 3.053 4.575 5.578 17.275 19.675 21.920 24.725
12 3.571 5.226 6.304 18.549 21.026 23.337 26.217
13 4.107 5.892 7.042 19.812 22.362 24.736 27.688
14 4.660 6.571 7.790 21.064 23.685 26.119 29.141
15 5.229 7.261 8.547 22.307 24.996 27.488 30.578
16 5.812 7.962 9.312 23.542 26.296 28.845 32.000
17 6.408 8.672 10.085 24.769 27.587 30.191 33.409
18 7.015 9.390 10.865 25.989 28.869 31.526 34.805
19 7.633 10.117 11.651 27.204 30.144 32.852 36.191
20 8.260 10.851 12.443 28.412 31.410 34.170 37.566
21 8.897 11.591 13.240 29.615 32.671 35.479 38.932
22 9.542 12.338 14.041 30.813 33.924 36.781 40.289
23 10.196 13.091 14.848 32.007 35.172 38.076 41.638
24 10.856 13.848 15.659 33.196 36.415 39.364 42.980
25 11.524 14.611 16.473 34.382 37.652 40.646 44.314
26 12.198 15.379 17.292 35.563 38.885 41.923 45.642
27 12.879 16.151 18.114 36.741 40.113 43.195 46.963
28 13.565 16.928 18.939 37.916 41.337 44.461 48.278
29 14.256 17.708 19.768 39.087 42.557 45.722 49.588
30 14.953 18.493 20.599 40.256 43.773 46.979 50.892
40 22.164 26.509 29.051 51.805 55.758 59.342 63.691
50 29.707 34.764 37.689 63.167 67.505 71.420 76.154
60 37.485 43.188 46.459 74.397 79.082 83.298 88.379
70 45.442 51.739 55.329 85.527 90.531 95.023 100.425
80 53.540 60.391 64.278 96.578 101.879 106.629 112.329
90 61.754 69.126 73.291 107.565 113.145 118.136 124.116
100 70.065 77.929 82.358 118.498 124.342 129.561 135.807

Powered by www.ioncudos.com Page 34 of 34.

You might also like