Basic Statistics For Economics
Basic Statistics For Economics
University of Delhi
nwjLFk ,oa lrr~ f'k{kk foHkkx
fnYyh fo'ofo|ky;
B.A. (Programme)
Semester-ISemester-II
Course
DSC-4 Credits-4
(Major)
Course Credit-4
BASIC STATISTICS FOR ECONOMICS
UNDERSTANDING POLITICAL
(Department of Economics) THEORY
As As
perper
thethe
UGCF-2022
UGCF andand
National
National
Education
Education
Policy
Policy
2020
2020
Basic Statistics for Economics
Editorial Board
Prof. J. Khuntia, V.A.Rama Raju, Vajala Ravi,
Bhavna Rajput, Anupama, Devender
Content Writers
Suramya Sharma, Dr. Pooja Sharma
Taramati, Sugandh Kumar Choudhary,
Tasha Agarwal, Ashish Kumar Garg
Academic Coordinator
Deekshant Awasthi
Published by:
Department of Distance and Continuing Education under
the aegis of Campus of Open Learning/School of Open Learning,
University of Delhi, Delhi-110 007
Printed by:
School of Open Learning, University of Delhi
TABLE OF CONTENT
Name of Lesson Content Writers Page No
LESSON 1 Introduction of Population and Sample Suramya Sharma 1-19
LESSON 2 Pictorial Methods in Descriptive Statistics Suramya Sharma 20-41
LESSON 3 Measures of Location and Variability Suramya Sharma 42-76
LESSON 4 Sample Space, Events, and Probability Pooja Sharma 77-92
LESSON 5 Conditional Probability Pooja Sharma 93-107
LESSON 6 Random Variables and Probability Pooja Sharma 108-120
Distributions
LESSON 7 Cumulative Distribution Function, Density Pooja Sharma 121-140
Function, Expected Value, and Variance
LESSON 8 Discrete Distribution Taramati 141-150
LESSON 9 Continuous Distribution Taramati 151-166
LESSON 10 Joint Probability Distribution and Sugandh Kumar 167-184
Mathematical Expectations Choudhary
LESSON 11 Correlation and Covariance Tasha Agarwal 185-202
LESSON 12 Characteristics of Estimators Ashish Kumar Garg 203-242
LESSON 13 Statistical Hypothesis Ashish Kumar Garg 243-271
LESSON 14 Error in Hypothesis Testing and Power of Ashish Kumar Garg 273-305
Test
About Contributors
Contributor's Name Designation
Suramya Sharma Guest Faculty, NCWEB, Hansraj College, University of Delhi,
Delhi
Dr. Pooja Sharma Associate Professor, Daulat Ram College, University of Delhi
Taramati Guest Faculty, Kirori Mal College, University of Delhi
Sugandh Kumar Assistant Professor, Department of Economics, S.S.Khanna Girl's
Choudhary Degree College, University of Allahabad
Tasha Agarwal Ph.D. Scholar, Ambedkar University, Delhi
Ashish Kumar Garg Assistant Professor, Ramjas College, University of Delhi
LESSON 1
STRUCTURE
3. To learn about various scales of measurement viz. ratio, interval, ordinal, and
nominal scales
4. To identify and comprehend the differences between statistical population and
samples
5. To distinguish between various sampling techniques and
6. To demonstrate the knowledge of various branches of statistics through descriptive
and inferential statistics
1.2 INTRODUCTION
"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read
and write." Samuel S. Wilks (1906 - 1964).
Over the past many decades, statistics have become an indispensable part of our lives. We often
come across statistics in one form or the other. If we turn our newspapers or televisions, we
can definitely find some surveys that establish relationship between particular issues, say,
eating fast food and the risk of having a health issue; or we may find graphs depicting growth
rates or changes in some variables overtime, say growth rate of GDP or inflation level in a
country. But the scope of statistics is not just limited to data collection and representation, but
it establishes a basis for decision making and problem solving. We can create models to not
only study the past trends but can also extend them to study the uncertain future. Statistics help
us to make informed decisions in a world of uncertainty and variability. We would not have
needed statistics had there been no uncertainty and variability around us.
The word Statistics is derived from a Latin word, “Status,” which means “a group of numbers
or figures that represent some information of human interest.” Statistics may be formally
defined as “the study of collecting, organizing, analyzing and interpreting information in the
form of data.”
Statistics helps us gain valuable insights not only in the Economics discipline, but is also
popular amongst finance scholars, engineers, medical researchers and other science and social-
science disciplines. In the financial sector, statistical analysis may be used at the micro and
macro level. At micro level, it facilitates understanding of a company or business’ performance
like determining the revenue generating capacity, relationship between advertising and sales,
etc. Whereas at macro level, statistical analysis allows a country to assess its financial condition
and measure economic growth. In the field of engineering, statistics is an indispensable part
for robust analysis, probability risk assessment, measurement of error etc. Statistics allow
clinical researchers to compare various medical treatments, evaluate the benefits of alternate
therapies, establish optimal treatment combinations, etc. Since the scope of statistics has
broadened and is now used in a number of practical fields, it is also referred to as applied
statistics.
2|Page
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
When we talk about the nature of statistics, it is considered as a science as well as an art. It is
a science since the statistical techniques are systematic and have broad application. In several
instances, statistics are used to study cause and effect relationships and the results can be
generalized in the same way as any scientific experiment or law. Statistics is also an art which
refers to the “skill of handling facts so as to achieve a given objective.” Managing, presenting,
and drawing relevant conclusions from data is considered an art.
We’ve learnt that statistics is the study of collecting, organizing, analyzing and interpreting
data. But you may wonder, what exactly is data? Data is nothing but pieces of raw information,
facts and figures that are used for analysis. Broadly, data can be of two kinds:
1.3.1 Quantitative data
1.3.2 Qualitative data
3|Page
On the contrary, any information which is not directly measurable is known as qualitative data.
Qualitative data represents the qualities or the characteristics of data. Information like political
ideology, physical attributes of a person, problems faced by workers etc are examples of
qualitative data.
Scales of measurement:
i. Nominal scale data – The information which cannot be sorted or put in any order
is known as Nominal. Such data are individual set of information where changing
the order of the information does not change any meaning. For example, occupation
of respondents- the values may range from teacher, farmer, shopkeeper,
unemployed etc. These data cannot be measured, nor can they be sorted in any way.
Another example of nominal data may be the marital status of respondents. The
variable may take values like married, unmarried, divorced, widow etc. Changing
the order of the responses does not make any difference in the understanding of the
sample
ii. Ordinal scale data – In contrast, ordinal data follows a natural order. Although
these too cannot be measured explicitly, we can sort the data or order them in a way
to observe basic comparison between values. For example, the education level of
respondents- such a variable can take values from nursery to post graduation and
the data has a specific order. Opinion of respondents towards relevance of CCTV
cameras in workplaces could take values like strongly agree, somewhat agree,
somewhat disagree, strongly disagree, etc. These values can also be arranged in a
particular order.
iii. Interval scale – This scale pertains to numerical data which possesses the property
that differences in values represent the real differences in the variable. With such
variables, we know that not only one value is greater than the other but that the
distances between the intervals on the scale are the same. For example, the
temperature in Fahrenheit or Celsius, year of birth, etc. Here a temperature of 92°F
is greater than 90°F and also the difference between 92°F and 90°F would be same
as the difference between 90°F and 88°F.
iv. Ratio scale – The data belonging to ratio scale is a quantitative measurement with
labels and orders the variable with evenly spaced intervals between values. These
scales have a real absolute zero representing the total absence of the variable being
measured. Hence ratio scale variables are exactly same as interval scale variables
along with a “True zero.” For example, weight of a commodity & height of a person,
etc. Note that a zero value indicates that the commodity is weightless.
4|Page
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
IN-TEXT QUESTIONS
We can also categorize data into univariate, bivariate and multivariate data. Uni means one
and variate refers to variable. Hence univariate data consists of only a single variable. It is the
simplest form of data. For instance, the number of cold drinks sold by a street vendor on
weekdays. The data may look like as below:
1 2 3 4 5
5|Page
In the above example, you may notice that we can only describe the data and any kind of
relationship or comparison cannot be drawn. On the other hand, bivariate and multivariate data
allow a researcher to establish relationships and correlations between variables. Here, bi means
two and hence bivariate data involves two distinct variables. A researcher may use this data to
establish a relationship between the two variables. For instance, the number of cold drinks sold
by a street vendor and the daily temperature of the city the vendor resides in. They could look
like:
1 2 3 4 5
Temperature
34 37 36 37 38
(in °C)
A researcher may draw a conclusion that sales and temperature have a positive relationship
since as temperature in the city rises, so does the number of cold drinks sell by the street vendor.
Finally, multivariate data consists of more than two variables. Suppose a researcher wishes to
analyze the determinants of cold drinks sales in a city. So, she gathers data on cold drinks sales,
temperature of city, and price of cold drinks.
1 2 3 4 5
Temperature
34 37 36 37 38
(in °C)
6|Page
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
However, in statistics, the definition of population is slightly different. Here, population refers
to all the individuals/entities possessing similar characteristics and belonging to a particular
group under study. The population could vary from study to study. For instance, a researcher
might be interested in analyzing the results of the Common Admission Test (CAT) examination
in India. Here, the population would be comprised of all the candidates who appear for CAT
in a particular year. Say in a year, about 2.3 lakh candidates appear for the exam. To study a
population, the researcher would have to gather data for all 2.3 lakh candidates—no one from
the population can be left out. The best example to understand the concept of a population is a
census. In India, a census is the process of collecting and analyzing demographic, economic
and social aspects of Indian residents. Since census is a study of the whole population, the data
is collected from each and every Indian resident. You can imagine the scale at which the data
collection is carried out that it takes about 10 years to collect, clean and publish the entire data.
In statistics, the population depends on the topic and scope of research. Other examples of a
population could be total number of people working under the NREGS, voter population in
India, number of students enrolled in private schools in a state, total number of accidents taken
place in India etc.
Studying a population helps a researcher to gain useful insights into the characteristics of all
the elements under study. Since each and every unit is considered, the results are considered to
be reliable and representative of the population. This also enables a researcher to study more
than one aspect of the population and carry out an intensive study. Such a data can also form
the basis of further investigations. However, studying a population is more suitable when we
have a small scope of study.
The descriptive statistics taken from the population are termed as ‘population parameter’ or
simply a ‘parameter’. So, a parameter describes the characteristics of a population. They are
usually denoted by Greek letters such as µ (Mu) for mean and σ (Sigma) for standard deviation.
Based on the data collected from the entire population of 2.3 Lakh candidates, if we want to
say that the average marks a candidate scores in the CAT examination are 85, we could denote
it as: 𝜇 = 85.
1.4.2 Sample
As you may have identified, the major challenge of analyzing a population is that such research
requires data to be collected from each and every member of the population, which is
undeniably a tedious task. The possibilities of making errors in studying population is also
significant since there may be missing data due to non-response by respondents or
measurement complexities due to the large amount of data, etc. Gathering data from a
population not only requires extra efforts, but also consumes a lot of time and is expensive.
Constraints on scarce resources render a population survey unfeasible.
7|Page
To overcome the drawbacks, a subset of the population, referred to as a sample, may be used
instead. A sample is an unbiased subset of the statistical population which is representative of
the entire dataset. This means that a researcher randomly draws and analyzes some
observations from the population to make inferences about the whole population. The figure 1
depicts the relationship between population and sample:
Population
Sample
Population Sample
9|Page
IN-TEXT QUESTIONS
A. 𝑠 = 5.7
B. 𝜇 = 120
C. 𝜎 = 32
D. 𝑥̅ = 18
7. Select the correct option:
I. Sample is used when:
A) Data collection is inexpensive B) Research is time sensitive
C) Population is small D) Population is unknown
II. A mean is called a statistic if it is calculated from the:
A) Sample B) Population
C) Parameter D) Standard deviation
10 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
To ensure that the samples drawn are representative of the population, it is crucial to understand
the different ways in which we can select a sample. The two broad methods of sampling are:
1.5.1 Probability sampling techniques – It is one of the commonly used sampling techniques
where each unit of population has an equal chance (or probability) of getting selected in a
sample. This means that the samples selected are random and unbiased and hence are
representative of the population. These techniques are also known as random sampling
techniques. The five techniques under probability sampling are:
a. Simple random sampling – As the name suggests, it is the most basic and crude form
of random sampling. Here each unit of population has an equal and independent chance
of being chosen. Example: in a lottery system, the names of each unit of population are
written on a chit and after thorough shuffling, the researcher picks the chits one by one
and notes down the names. Another way of simple random sampling is through random
number generation where all the units of population are assigned a number in sequential
order. Then random numbers are generated using software and sample selections is
carried out.
b. Systematic random sampling – Under systematic random sampling, the sample set is
selected from the population in a fixed interval. This technique is more feasible than
simple random sampling. To draw samples using systematic random sampling, the first
step is to arrange the units of population in an order and assign a number to each unit
𝑁
from 1 to 𝑁. Then the sampling interval is calculated using the formula: 𝐾 = 𝑛 ,
where 𝐾 is the interval, 𝑁 is the population size and 𝑛 is the sample size. Finally, we
select one unit at random and then select following units at equal interval 𝐾. For
example, suppose we have to collect information from residents of a city where the
houses are numbered from 1 to 100,000. So, if the size of the population is 100,000 and
we need a sample size of 1000 houses, then the interval should be 100,000/ 1000 = 100.
The researcher will choose one house at random and then will select every 100th house
thereafter to get the sample.
c. Stratified random sampling – The above two types of sampling techniques assume
that the population is homogeneous. In case the population is heterogenous, we use the
Stratified random sampling technique. Under this technique, we divide the population
into sub-groups based on homogeneous characteristics such as gender, age group,
income level, etc., called strata, and then select random samples from each sub-group
or stratum. For instance, if a company has 700 male employees and 300 female
employees, then simple and systematic random sampling technique may give us biased
samples. To avoid the bias, we use stratified random sampling wherein we create two
11 | P a g e
groups based on gender and then select random samples from both groups. The
technique has further two approaches to select a sample: Proportionate stratified
sampling and disproportionate stratified sampling.
d. Multi-stage sampling – At times the population of interest is quite large and
geographically diverse. In such cases, one sampling technique is not enough to select a
sample and using multi-stage sampling is suitable. Here, a sample is selected in stages,
combining different sampling techniques as described above. For instance, to study the
issues faced by primary school children, a researcher may first divide the population
into states, then use simple random sampling to create a sample of states. Next, the
researcher could again use simple random sampling to select a few districts, and finally
use systematic random sampling to identify a few schools within a district. The multi-
stage sampling method is frequently used to scale down large data sets into workable
sizes. Although there is no restriction on the number of stages you could use to select a
sample, it is important to note that all the sampling techniques used must be probability
sampling methods.
1.5.2 non-probability sampling techniques – Under non-probability sampling or non-
random sampling techniques, each unit of population does not have an equal chance of getting
selected in a sample, that is, the samples are not random—they may be biased and may not
represent the population accurately. Hence, non-probability sampling techniques may not
produce results that can be generalized. Yet, there are many occasions when non-probability
sampling methods are preferred over probability sampling methods, which will be discussed in
the following passages. The most commonly used non-probability sampling techniques are:
a. Convenience sampling – As is clear from the term itself, convenience sampling refers
to the sampling technique in which a researcher collects the data from convenient
sources. For instance, as part of the undergraduate course, a student undertakes a
research project in which she tries to understand the consumer sentiments related to
rooftop solar panels. Considering the time, cost and effort constraints, the student may
choose to collect data from the most convenient location to her, it may be within the
city she resides in or within the district. It is worth noting that such research may not
be representative of the population and generalization of the results may not be
appropriate. That said, such a technique is useful when a researcher carries out a pilot
survey to test the questionnaire.
b. Judgement sampling – Under this technique, the researcher selects a sample based on
her own judgement about the characteristics of the individuals. In other words, the
researcher uses her expertise to identify the best fir for her sample. Take the example
of a researcher who is interested in understanding the challenges faced by disabled
employees in a company. In such a case, the researcher can easily identify her sample
by using her sense of judgement about the characteristics of disabled persons.
12 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
c. Snowball sampling – At times it is difficult to locate the appropriate target group for
a study. In such cases, snowball sampling techniques may be used. This is generally
used in social science research. Under snowball sampling, the existing respondents are
asked to identify or suggest other individuals who are well-suited for the research. So
based on the references, the sample size keeps on increasing—just like a snowball. For
example, a researcher may want to understand the plight of immigrants in a city. Since
there may not be any official record of immigrants readily available, the researcher
could identify few immigrants and then ask them to help locating other immigrants for
the study. Such a technique comes in handy when we’re interested in researching hard-
to-find- groups. However, the risk of achieving biased results is quite high since the
initial respondents may refer to their friends and family, who share common
characteristics and beliefs.
The following figure summarizes all the sampling techniques we have discussed:
IN-TEXT QUESTIONS
13 | P a g e
A researcher may apply statistics to simply summarize and describe the characteristics of data
or employ statistics to draw some conclusions or inferences from the data. Vast number of
research apply two types of Statistical analysis:
1. Descriptive statistics
2. Inferential statistics
14 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
can only report the findings. It cannot be concluded, by merely looking at the descriptive
statistics, that generally, in India, households with highly educated heads save more.
So, we see that descriptive statistics cannot be used to make general estimates or predictions.
Yet, descriptive statistics are extremely useful as they can provide a snapshot of the whole data
in meaningful ways. It helps simplify large data and present it in visually attractive ways. We
will learn about the various components of descriptive statistics in later chapters.
1.6.2 Inferential Statistics
On the other hand, inferential statistics is used to predict, estimate, and make other generalized
approximations based on the data. It is usually based on sample data to draw conclusions and
generalize the results to the larger population. Hypothesis testing and regression analysis are
two examples of inferential statistics.
Inferential statistics is very popular among researchers since it allows them to gather limited
data and extrapolate the results to a larger population. This saves them a lot of cost as well as
time.
If we consider the above example again, the research institute now gathers data from 10
randomly selected villages in India with total observations equal to roughly 1000 households.
Now the institute may generalize its results to state or national level through inferential
statistics.
The figure 3 summarizes the branches of statistics we discussed:
Statistical
analysis
Descriptive Inferential
statistics statistics
Measures of
Measures of Hypothesis Regression
central Graphs
variability testing analysis
tendency
15 | P a g e
IN-TEXT QUESTIONS
11. Fill in the blanks:
A. We can make predictions and estimations using __________ statistics.
B. Histogram is an example of __________ statistics.
C. Standard deviation is calculated as a part of ___________ statistics.
12. Select the correct option:
Inferential statistics can be used to
A) Estimate B) Generalize
C) Only A) D) Both A) and B)
1.7 SUMMARY
16 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1.8 GLOSSARY
17 | P a g e
b. On the occasion of National Milk Day, the government wants to estimate the
number of cows a dairy farmer owns in a particular state. Using the census
approach, the government finds that about 40 lakh households are engaged in dairy
farming in the state with the average number of cows owned equal to 22. When a
government official took a random sample of 100 households in a district engaged
in dairy farming, she found out that the average number of cows owned equaled to
37.
Q.4 In a college, parking on the premises has become a major problem. To deal with the
problem, the college administration wishes to compute the average parking time of the
students who park their vehicles in the college parking lot. One of the college officials
quietly follows 150 students and records the duration of time the students keep their
vehicle in the parking lot.
a. Identify the population of interest to the college administration.
b. What is the sample size that the college administration is examining?
Q.5 Briefly describe any two methods of drawing non-probability samples.
Q.6 Why are descriptive statistics used? Can we use descriptive statistics to make
generalized predictions based on the data? If not, then how can we do so?
1.11 REFERENCES
• Devore, J. L. (2016). Probability and Statistics for Engineering and the Sciences.
Cengage learning.
• Larsen, R. J., & Marx, M. L. (2012). An introduction to mathematical statistics and its
applications. Prentice Hall.
• McClave, J. T., Benson, P. G., & Sincich, T. (2018). Statistics for business and
economics. Pearson Education.
19 | P a g e
LESSON 2
STRUCTURE
We’ve mentioned in the earlier chapters that descriptive statistics involve the computation of
the basic statistics of the data such as mean, median, standard deviation etc. These give a basic
idea about the distribution of the dataset. Visual representation of the data is also an integral
part of descriptive statistics. In this chapter we will take a closer look at the most common
graphic methods to present the data.
20 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
A stem and leaf plot are a convenient way to visualize continuous data. The plot can easily be
constructed by hand and gives an overview of the distribution of the observations in the data at
first glance. To create a plot, the data is arranged in an order and divided into equal intervals.
We then create a table that presents the whole data set in two columns. The values of the data
are split into two – stem and leaf. The first column—referred to as the stem—includes the tens,
hundreds or thousands unit, as per the values of the data and the second column—known as
the leaf—contains the rest of the digits. The concept will become clear with the following
example.
Say a researcher has collected data of the weight of 10 college students, chosen at random. The
weights, in kgs, are as follows:
Stem Leaf
4 3
5 29
6 167
7 14
8 8
9 2
21 | P a g e
Here, the first row with stem of 4 and leaf of 3 denotes the weight of 43kg and the last row
with stem of 9 and leaf of 2 denotes the weight of 92kg. Simply looking at the table, we can
work out that on average, the college students have a weight in sixties. We can observe that the
shape of the display gradually rises, peaks at 6 and then steadily declines. We call this a bell-
shaped curve which is symmetric in shape. We will learn about other features of a symmetrical
distribution later in the unit.
Let us consider another example in which we have a data set consisting of the number of hours
of daily sleep 100 students who are in college get. The stem and leaf display looks like this:
Stem Leaf
5 4455567889
6 00223446667899
7 000112233566789999
8 00000011222234445577778889
9 0001333355777779
10 01344446899
11 12247
Note that here, the first row with stem of 5 and leaf of 4 denotes 5.4 hours of sleep and the last
row with stem of 11 and leaf of 7 denotes 11.7 hours of sleep. We can clearly observe that most
of the students get about 8 hours of sleep. If we ask you how many students sleep for more than
10 hours a day, then you have to simply add the number of leaves written in front of stems 10
and 11. So the answer will be 16 students.
A stem and leaf plot are helpful to get a basic understanding of the dataset, however it gets
difficult to create a chart when the number of observations increase.
IN-TEXT QUESTIONS
1. The correct list of data for the following stem and leaf plot is:
Stem Leaf
0 3
3 27
5 119
7 0
22 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
A. 03, 32, 37, 11, 51, 59,70 B. 03, 27, 37, 51, 51, 59,70
C. 03, 32, 37, 51, 51, 59,70 D. 03, 32, 37, 51, 59,70
2. The following stem-and-leaf plot depicts the number of cakes that a home-baker sells
each week. If 1|7 represents 17 cakes, then,
Stem Leaf
0 689
1 024479
2 01136889
3 122
A. How many weeks did the home-bakers sell cakes?
B. How many weeks did they sell more than 25 cakes?
A dot plot is another simplified way to visualize the data in the form of dots representing each
unit of observation. The dots are stacked over one another that represent the frequency of the
value in our dataset. For example, suppose a researcher would like to know the number of
vaccinated children in a city. To make the analysis simpler, she divides the city into 5 localities
and collects the number of vaccinated children for each locality. The data collected is tabulated
in the following manner:
1 6
2 1
3 3
4 11
5 8
23 | P a g e
The dot plot for the data would look like figure 1 where the height of each column denotes the
frequency of the observation.
Since the data is continuous and unique, we would have one dot for each state. Instead, to
make the dot plots more informative, we create groups of data or class intervals.
24 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
70-75 1
75-80 2
80-85 1
85-90 3
90-95 2
95-100 1
Now we can easily create the dot plot using the above table. Try making the dot plot on your
own using the above table. Your dot plot should look something like figure 2:
25 | P a g e
IN-TEXT QUESTIONS
3. The following dot plot illustrates the number of hours a student spends talking on phone
per week:
A. How many students report that they did not talk on the phone at all during the week?
B. How many students spend at least 2 hours on the phone?
4. True or False: We cannot create dot plots for continuous data.
One of the most popularly used graph types is a bar chart that uses horizontal or vertical bars
to depict the observations in the dataset. Bar charts can further be of two types: Stacked bar
chart or grouped bar chart. Let’s understand all the kinds of bar graphs using an example.
Suppose we have the following data on two-wheeler sales in a city over the years:
2010 12,500
2015 17,000
2020 24,000
To create a vertical bar graph, we plot the years on the X-axis and the sale numbers on Y-axis.
The height of the vertical rectangular bar represents the value of the data, as presented below:
26 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
27 | P a g e
Stacked bar charts are designed in a way that two or more categories of the same data are
presented on the same bar. Stacked bar charts allow the reader to easily compare the value of
various categories simultaneously. Let us take the above example one step further. Suppose
that we have the following data for two-wheeler, three-wheeler and four-wheeler sales in a city:
To create stacked bar charts, we draw the bars for each category on top of each other for a
particular year. The final chart will look something like this:
Figure 4: Stacked bar chart of sale of two-wheelers in 2010, 2015 and 2020
Here we can compare the sales of each category of vehicle for each year. We can also create
100% stacked bar charts to present the same data in a more visually appealing way. A 100%
stacked bar chart represents the share of each category in the data out of 100. The height of all
the bars is equal to hundred percent and we can observe the relative changes in the values from
the size of the sub-bars:
28 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Figure 5: 100% Stacked bar chart of sale of two-wheelers in 2010, 2015 and 2020
Grouped bar charts are a convenient way to compare different categories of data. In a grouped
bar chart, the bars for each category are placed adjacent to each other instead of on top of each
other. In this case, we can easily observe the absolute changes in the data of each category.
When interpreting stacked and grouped bar charts, it is crucial to pay extra attention to the
legend that is usually displayed at the bottom of a chart.
It is important to remember that it can become complex to present too many categories of data
in a stacked or grouped bar chart.
Figure 6: Grouped bar chart of sale of two-wheelers in 2010, 2015 and 2020
29 | P a g e
IN-TEXT QUESTIONS
5. The following bar graph displays the favorite color of 200 kindergarten students in a
school:
A. Which is the most preferred and least preferred color among the students?
B. How many students like oranges?
6. The following grouped bar chart shows the daily number of customers visiting a
shopping center in morning (M) and evening(E).
A. On which day/days does the shopping center receive an equal number of customers
in the morning as well as evening?
B. On which day/days do we see less customers shopping in the evening than in the
morning?
30 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
7. A travel agent organizes trips to destinations either in South India or North India. The
following 100% stacked bar chart presents the share of tourists visiting both the regions
between 2010 and 2014.
A. Looking at the bar graph can you say that over the years the popularity of North Indian
destinations is increasing? Why or why not?
B. In which year did maximum tourists visit South Indian destinations as compared to
North Indian destinations?
2.6 HISTOGRAMS
At first, a histogram may look similar to a bar chart, but both are significantly different to each
other. A histogram is also represented in the form of bars placed adjacent to each other, but
each bar here represents the frequency with which an observation occurs in the dataset. The
term frequency of any particular value is simply the number of times that value occurs in the
data set. Hence, a histogram is also said to represent the ‘frequency distribution of variables.’
On the horizontal axis, we usually take the range/class intervals and on the vertical axis we
place the frequency. We construct class intervals in such a way that each observation is
contained in exactly one interval.
For instance, following are the economics marks of 20 college students:
86, 57, 69, 64, 67, 59, 81, 34, 47, 46, 38, 51, 66, 91, 42, 73, 62, 70, 77, 55
To construct a histogram, we divide the dataset into class intervals with each interval
representing 10 marks. Next, we’ll insert the frequency with which an observation occurs
within each interval:
31 | P a g e
Marks Frequency
30-40 2
40-50 3
50-60 4
60-70 5
70-80 3
80-90 2
90-100 1
32 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
following Dot plot displays data with a strong concentration of values in the center and a limited
values on either side.
When there are only a few class intervals with equal widths, there are chances that all
observations fall into just one or two of the classes. There may be some classes that have zero
frequency if equal widths of class intervals are used. Using a few bigger intervals close to
extreme observations and narrower intervals in the area of high concentration is a wise
decision. To construct a histogram with unequal class widths, first determine the frequencies.
Then relative frequencies may be calculated by the following formula:
Number of times the value occurs
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
Number of observations in the data set
This signifies the proportion of times the value occurs in the data.
The height of each bar can then be computed using the formula given by:
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠
𝐵𝑎𝑟 ℎ𝑒𝑖𝑔ℎ𝑡 =
𝐶𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ
The resulting rectangle or bar heights are typically called densities.
Such a histogram has an interesting property. If we multiply the bar height by the class width,
we will get,
= 𝐴𝑟𝑒𝑎 𝑜𝑓 𝑟𝑒𝑐𝑡𝑎𝑛𝑔𝑙𝑒
This means that the area of each rectangle or bar represents the relative frequency of the
corresponding class interval. Moreover, the sum of relative frequencies should be one and
hence the total area of all rectangles in such a density histogram is one.
SHAPES OF HISTOGRAMS
The histogram in the first example follows a quite symmetric or a bell-shaped distribution. This
means that if we place a mirror in the exact center of the distribution, the left and right side of
the distribution will have the same shape. A symmetric or a bell-shaped distribution is also
called a ‘normal distribution.’
33 | P a g e
However, this is not the case always. Asymmetric distributions are called skewed. There are
two types of skewed distributions – right skewed distribution and left skewed distribution.
When the dataset has a greater number of observations on the left side of mean, then it is called
a right or positively skewed distribution. Conversely, when the dataset has a greater number of
observations on the right side of mean, then it is called a left or negatively skewed distribution.
Now consider the mathematics marks of the same 20 students:
45, 58, 51, 54, 49, 59, 88, 44, 49, 41, 48, 61, 66, 93, 40, 64, 69, 77, 72, 51
Follow the same steps to create intervals:
Marks Frequency
40-50 7
50-60 5
60-70 4
70-80 2
80-90 2
90-100 1
The histogram looks like this:
34 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Marks Frequency
40-50 1
50-60 2
60-70 3
70-80 3
80-90 5
90-100 6
histogram arises in case the data set consists of observations of two very dissimilar kinds of
individuals or objects. Let us understand this with an example. A restaurant experiences high
footfall during lunchtime and dinner time. Hence if a researcher collects data on the number of
customers entering a restaurant in day, the histogram will display two peaks. A bimodal
histogram with hypothetical numbers would look something like this:
Figure 10: Bimodal histogram of number of the customers entering a restaurant in day
Similarly, multimodal histograms have more than two peaks. Such a graph suggests that the
data may have different patterns of response. Suppose a researcher has some data on the heights
of plants belonging to 3 different kinds of species. One of the species has tall plants, the other
has medium plants and the third species has shorter plants. She creates a histogram to find out
the pattern in her data. The following figure illustrates a histogram she created:
This is clearly a multimodal histogram where each peak represents the most common height of
each species of plants.
Although histograms are a useful way to present the data, one major drawback of histograms
is that the reader cannot identify the individual values of the data by simply looking at the
graph.
IN-TEXT QUESTIONS
37 | P a g e
11. Choose the correct option from the bracket and fill in the blank:
When a dataset has greater number of observations on the left side of mean, then it is
called a _______ (right/ left) skewed distribution.
2.7 SUMMARY
In this lesson some of the most popular ways to display data, as a part of descriptive statistics,
were discussed. In a stem and leaf plot, data is divided into two columns – stem and leaf. The
plot gives a visual understanding of the distribution of the observations. The dot plots are
illustrated in the form of dots representing each unit of observation. The dots are stacked over
one another that represent the frequency of the value in our dataset. Bar charts use horizontal
or vertical bars to depict the observations in the dataset. Bar charts can further be of two types:
stacked bar chart and grouped bar chart. Histograms are the most widely used graphs in
statistics. Vertical bars are used to depict the observations in the dataset. When the dataset has
a greater number of observations on the left side of mean, it is called a right or positively
skewed distribution whereas when the dataset has a greater number of observations on the right
side of mean, then it is called a left or negatively skewed distribution. A histogram with a single
peak is known as a unimodal histogram. A histogram with two peaks is referred to as a bimodal
histogram. Histograms with more than two peaks are referred to as multimodal histograms.
2.8 GLOSSARY
• Bar Charts : A graph that displays data through vertical or horizontal rectangular bars
with heights of each bar representing the values of the observation
38 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
• Dot Plots : A plot illustrating the frequency of observations through dots on a simple scale
• Histograms : A graph that shows the frequency of data using rectangular bars with heights
of each bar representing the frequency of observations laying in that range
• Negative Skewness : When dataset has greater number of observations on right side of
mean
• Positive Skewness : When dataset has greater number of observations on left side of mean
• Stem and Leaf Plot : A plot where each observation in data is presented in two columns-
Stem (the first digit) and leaf (the other digits).
Q.1 Following are the weights of individuals who have taken membership of two local gyms
in a city:
Gym 1: 94 90 95 93 128 95 125 91 104 116 162 102 90 110 92 113 116 90 97 103 95 120 109
91 138
Gym 2: 123 116 90 158 122 119 125 90 96 94 137 102 105 106 95 125 122 103 96 111 81 113
128 93 92
39 | P a g e
Create stem and leaf plot for both the gyms and interpret the plots.
Q.2 The following table summarizes the data collected by a researcher on the time taken to
eat breakfast by 40 respondents:
Minutes: 0 1 2 3 4 5 6 7 8 9 10 11 12
People: 6 2 3 5 2 5 0 0 2 3 7 4 1
Business
Subject English Hindi Accountancy Mathematics Economics
studies
Marks 87 93 96 99 97 92
Business
Subject English Hindi Accountancy Mathematics Economics
studies
Marks 82 97 88 71 79 95
In which subject/subjects was the difference in the marks between Honey and Hazel highest?
Q.4 Mr. Kapoor owns a garden with 30 cherry trees. The height of the trees in inches are:
61, 63, 64, 66, 68, 69, 71, 71.5, 72, 72.5, 67.5, 73.5, 74, 74.5, 76, 76.2, 76.5, 77, 77.5,
78, 78.5, 79, 79.2, 80, 81, 82, 83, 84, 85, 87
A. Create a histogram of the above data by creating intervals of 5 inches. Comment on the
shape of the graph.
B. Recently Mrs. Kapoor bought 10 more cherry plants to propagate them in their garden.
Their heights in inches are: 57.5, 66, 40.5, 59, 46, 69.5, 67, 51.5, 52, 62. Incorporate
this additional information and create a new histogram for all 40 cherry trees.
Comment and compare the shapes of both the histograms.
40 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Frequency 35 25 45 15 20 40
Number of 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
particles
Frequency 1 2 3 12 11 15 18 10 12 4 5 3 1 2 1
a. What proportion of the sampled wafers had at least one particle? At least five
particles?
b. What proportion of the sampled wafers had between five and ten particles,
inclusive? Strictly between five and ten particles?
c. Draw a histogram using relative frequency on the vertical axis. How would you
describe the shape of the histogram?
2.11 REFERENCES
• Devore, J. L. (2016). Probability and Statistics for Engineering and the Sciences.
Cengage learning.
• Larsen, R. J., & Marx, M. L. (2012). An introduction to mathematical statistics and its
applications. Prentice Hall.
• McClave, J. T., Benson, P. G., & Sincich, T. (2018). Statistics for business and
economics. Pearson Education.
41 | P a g e
LESSON 3
STRUCTURE
42 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Continuing with our discussion about descriptive statistics, we now move on to the core
elements of the descriptive statistics, also known as summary statistics. Recall that under
descriptive statistics we quantitatively present an overview of the data. Descriptive statistics
can be divided into two sub-groups:
a. Measures of central tendency – Measures of central tendency comprise of certain
measurements that give us a typical or central value from the data around which the
data is generally clustered. These central tendencies are also referred to as averages. In
this course, we will concentrate on the three major measures of central tendencies-
Mean, median, and mode. We will also briefly study quartiles, percentiles, deciles and
trimmed mean.
b. Measures of variability – Measures of variability represent the spread of the data around
the averages. It denotes the variation in the data. The most widely used measures of
variability are range, standard deviation, and variance.
Let us now look at the two in detail.
3.3.1 Mean
The most common measure of central tendency is arithmetic mean or simply, mean of the
data set. You must have studied about mean at some point in your mathematics course. Mean
is simply the average value of the dataset represented which can be calculated by adding the
value of all the observations and dividing the sum by the total number of observations, i.e.,
∑𝑛𝑖=1 𝑥𝑖
𝜇=
𝑁
Where, 𝜇 denotes the mean of 𝑁 observations, ∑𝑁
𝑖=1 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
43 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
Recall from previous lesson, population mean (parameter) is denoted by 𝜇 and sample mean
(statistic) is denoted by 𝑥̅ .
Let us calculate the mean of following 5 numbers to understand the formula better:
Note here that we denote mean as 𝑥̅ . This is called the sample mean, that is calculated from the
sample data. The mean taken from the population data is denoted by µ, as mentioned in the
previous chapters.
Mean is a useful measure of central tendency with allows the reader to get an approximate idea
about the typical value in a dataset. When we have very large datasets then the relevance of
mean is clearer. Say a researcher gathers prices of houses in a locality. She gathers the price of
about 1000 houses. The prices range between Rs. 45 lakhs to 1.9 crore. If the researcher
calculates the average price of the houses as Rs. 1.2 crore, then we can easily interpret that a
44 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
typical house in that locality is worth Rs. 1.2 crore. So, mean is a convenient way to locate the
average value of the dataset.
Before we move on to median, we should note some pros and cons of using mean. Arithmetic
mean is the simplest measure of central tendency and is rigidly defined. The mean is not
affected by the order of the data, i.e., the data may be in ascending or in descending order. Each
and every observation in the dataset is used to calculate mean. This ensures that there is no loss
of information. Finally, Arithmetic mean is capable of further mathematical treatment. If we
have separate means of two groups of data, we can easily get the combined mean. However,
mean also suffers from some limitations. First, mean is affected by extreme observations. Few
extreme observations can impact the mean of the dataset which may not be the accurate
representation anymore. Second, mean cannot be calculated if even one observation is missing.
We cannot determine the mean by merely glancing at the dataset. It needs to be calculated each
time using the formula. Finally, mean cannot be calculated for open ended class intervals.
Sometimes when we’re collecting data, we may come across extremely large or extremely
small values in our data that may either be incorrectly stated by the respondent or incorrectly
recorded by the researcher. In such cases, we see that mean gets impacted by extreme values
in our dataset. This is one of the major limitations of using mean as a central value.
The following example will make the argument clear: suppose that the salary of 10 employees
in a firm, in lakhs per annum, is given below:
45 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
When we have data that contains extreme values or outliers, in such cases, we prefer to use
median over mean. We’ll see why and how, in the subsequent section.
IN-TEXT QUESTIONS
46 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Let us consider the same example of salaries of 10 employees in a firm, in lakhs per annum:
10 𝑡ℎ 10 𝑡ℎ
𝜇̃ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 5𝑡ℎ 𝑎𝑛𝑑 6𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 3.7 𝑎𝑛𝑑 4.2
3.7 + 4.2
=
2
= 3.95
So, we can conclude that the median salary in the firm is Rs. 3.95 lakh per annum.
Note here that if we had salaries of only 9 employees, then since 9 is odd, the median would
𝑁+1 𝑡ℎ
simply be the middle observation, i.e., the ( ) observation. This means that the median
2
would’ve been the 5th observation, i.e., 3.7. We would then have concluded that the median
salary in the firm is Rs. 3.7 lakh per annum.
It is worth noting here that even if we increase the values of the extreme observations from 15
and 16.5 to say 40 and 50, we will still get the same value of median. As median does not get
affected by extreme values in a dataset, it is said to be representative of the sample. Hence, in
cases when we have extreme observations, median is a better measure of central tendency than
mean.
Median as a measure of central tendency is quite useful since it is easy to understand and
calculate and, in some cases, median can be located by simply looking at the data. Median is
better than mean since is not at all affected by extreme values in the dataset. Also, it can be
calculated for open-ended distributions. However, the limitations of using mean are that the
data must be in either ascending or descending order. If we have a very large dataset, arranging
the data may be time-consuming. Median is not based on all the observations and so may not
be representative of the dataset. Finally, it is not capable of further mathematical treatment.
47 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
IN-TEXT QUESTIONS
5. The import of electronic products in million dollars in a country for eight years was
recorded as 27.4, 16.6, 1.7, 14.1, 32.9 18.7, 3.8, 22.5. The median import of the country
is ____.
6. The following numbers are arranged in ascending order:
15 , 𝑥 , 22 , 𝑥 + 7, 32 , 56 , 88
If the median of the data is 25, then the value of x will be:
A. 17 B. 25
C. 18 C. 19
7. The runs scored in a cricket match by 11 players is as follows:
7, 16, 167, 41, 110, 57, 1, 16, 9, 0, 16
3.3.3 Mode
In simple terms, mode is that value in the dataset which appears most frequently. Like median,
the value of mode too does not get affected by extreme observations. Mode is easy to
understand and calculate and, in some cases, its value can be located by simply looking at the
data. Mode can also be calculated for open-ended distributions.
For example, following are a person’s daily expenditure, in Rs., on lunch in a week:
48 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
First take the following 16 observations and try to calculate the mean, median and mode by
yourself:
4, 5, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 9, 10
Mean =
∑𝑛𝑖=1 𝑥𝑖
𝑥̅ =
𝑛
4 + 5 + 6 + 6 + 6 + 7 + 7 + 7 + 7 + 7 + 7 + 8 + 8 + 8 + 9 + 10
16
112
=
16
= 7
We have mean equal to 7.
Since we have even (16) observations, for median, we use the following formula:
𝑛 𝑡ℎ 𝑛 𝑡ℎ
𝑥̃ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
16 𝑡ℎ 16 𝑡ℎ
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 8𝑡ℎ 𝑎𝑛𝑑 9𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
7+7
=
2
=7
Hence, we get median equal to 7.
Finally, we can look at the data and can infer that the mode is 7 since it occurs the maximum
number of times in the dataset.
49 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
In this example, it is evident that Mean = Median = Mode. We can look at the central tendencies
using a histogram as well:
Figure 1: Histogram of symmetric data
However, when we have unsymmetrical or skewed data, the three measures are not equal. We
will learn about skewness in detail later in the lesson.
The relationship between Mean, Median and Mode, when the data is not symmetrical, can
mathematically be explained by the following formula:
50 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
IN-TEXT QUESTIONS
8. A researcher records the age of each participant in her study. The ages are:
21, 59, 62, 21, 66, 28, 66, 48, 79, 59, 28, 62, 63, 63, 48, 66, 59, 66, 48, 79, 19, 79
The mode of the above data is:
A. 66 B. 59
C. 79 D. 62
9. A researcher has computed the mean of her data as 22.5 and median as 20. Calculate
the value of mode using these values. Is the distribution symmetrical?
10. A researcher calculates the following values of median, and mode of a distribution.
Median = 17.5
Mode = 20.5
A. Calculate Mean.
B. Do these values represent a symmetrical distribution?
3.3.5 Other measures of central tendency- Quartiles, percentiles, deciles and trimmed
mean
Mean and median are not the only measures of central tendency. There are several others as
well. We will briefly introduce the concepts of quartiles, percentiles and trimmed mean.
Just as a median divide the dataset into two equal halves, quartiles divide the data set into 4
equal parts. There are 3 quartiles, and each quartile consists of exactly 25% of observations.
The first quartile, Q1, containing the first 25% of observations is known as the lower quartile.
The second quartile, Q2 is called the median which divides the dataset into two. The third
quartile, Q3 is known as the upper quartile where 75% of observations lie below it and 25% of
the observations are greater than this quartile.
To calculate quartiles, we first arrange the observations in ascending or descending order. We
can then find the value of each quartile by using the following formulae:
𝑛 + 1 𝑡ℎ
𝑄1 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
51 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
𝑛 + 1 𝑡ℎ
𝑄2 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 𝑀𝑒𝑑𝑖𝑎𝑛
2
𝑛 + 1 𝑡ℎ
𝑄3 = (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
Consider the following dataset to understand the quartiles better:
0, 2, 5, 7, 8, 10, 16, 23, 35, 52, 77.
Since there are odd number of observations, you can easily identify the median in the above
dataset. Median is 10. This is our second quartile, i.e., Q2. Now as per the formula of the first
quartile,
11 + 1 𝑡ℎ
𝑄1 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
12 𝑡ℎ
= ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
= 3𝑟𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
=5
Similarly, for third quartile,
𝑛 + 1 𝑡ℎ
𝑄3 = (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
11 + 1 𝑡ℎ
= (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
12 𝑡ℎ
= (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
= 9𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
= 35
For grouped data, a quartile can be calculated using the following formula:
𝑗𝑁
− 𝑃𝑐𝑓
𝑄𝑗 = 𝐿 + 4 × 𝑖 𝑓𝑜𝑟 𝑗 = 1, 2, 3.
𝑓
52 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Here, L = lower limit of quartile class, Pcf = Preceding cumulative frequency and i = size of
quartile class.
We can easily use the above formula to obtain each quartile in the following manner:
𝑁
− 𝑃𝑐𝑓
𝑄1 = 𝐿 + 4 ×𝑖
𝑓
𝑁
− 𝑃𝑐𝑓
𝑄2 = 𝐿 + 2 ×𝑖
𝑓
and,
3𝑁
− 𝑃𝑐𝑓
𝑄3 = 𝐿 + 4 ×𝑖
𝑓
Percentiles, on the other hand, simply denote that observation below which a particular
percentage of observations fall. The value of percentiles varies on the scale from 1 to 100. For
instance, 90th percentile would indicate that observation in the dataset below which 90% of
observations fall. A percentile of an observation ‘x’ can be calculated by the following formula:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑉𝑎𝑙𝑢𝑒𝑠 𝐵𝑒𝑙𝑜𝑤 “𝑥”
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = × 100
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑉𝑎𝑙𝑢𝑒𝑠
For grouped data, a percentile can be calculated by using the following formula:
𝑗𝑁
− 𝑃𝑐𝑓
𝑃𝑗 = 𝐿 + 100 × 𝑖 𝑓𝑜𝑟 𝑖 = 1,2,3, … , 99.
𝑓
Similarly, deciles divide a dataset into 10 equal parts, which is in contrast to a percentile, which
divides a dataset into 100 parts. As seen above, we can derive a similar formula for computing
a decile:
𝑗𝑁
− 𝑃𝑐𝑓
𝐷𝑗 = 𝐿 + 10 × 𝑖 𝑓𝑜𝑟 𝑖 = 1,2,3, … , 9.
𝑓
Where the symbols have the usual meaning and interpretation. You must note here that we will
have ninety-nine percentiles (𝑃1 , 𝑃2 , … … … , 𝑃99 ) and ten deciles (𝐷1 , 𝐷2 , … … … , 𝐷9 ). For both,
percentiles and deciles, the middle values, 𝑃50 and 𝐷5 represent the median.
53 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
As we already know that mean is sensitive to extreme observations, we use trimmed mean to
eliminate the extreme observations from our analysis. Such a measure is considered to be more
accurate than the regular mean. For instance, to compute a 5% trimmed mean, we will eliminate
the smallest 5% and the largest 5% of the sample observations and then calculate the mean of
the remaining observations in the regular way.
IN-TEXT QUESTIONS
11. For the following data set, the value of upper quartile is _______.
18, 30, 32, 39, 54, 57, 61, 62, 81, 88, 90
12. 2nd Quartile = 5th Decile = 50th Percentile =
A. Mode B. Median
C. Mean D. Trimmed mean
Reporting the central values of a dataset provides only partial information about the entire data.
It is possible that the two datasets have similar measures of central tendency, but both may
differ based on the spread of the values. The logic will become clear through the following
example.
Consider two restaurants selling pizzas that are located in the same town. A researcher collected
data on the delivery time of both the restaurants, in minutes, for a week, as given below:
42 + 50 + 47 + 43 + 52 + 55 + 40
𝑥̅𝐴 =
7
54 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
329
=
7
= 47
For restaurant B,
47 + 32 + 70 + 55 + 65 + 35 + 25
𝑥̅𝐵 =
7
329
=
7
= 47
It is your task to check if the value of median for both the restaurants is also equal to 47 or not.
Moving on, we can conclude that the average delivery time of both the restaurants is 47
minutes. You may also try to create a dot plot of the two datasets. It will look something like
this:
Now, since both the datasets have the same central value, can we claim that both the datasets
convey the same information? No.
If you observe the values in each dataset carefully, you will notice that the delivery time of
restaurant A ranges between 40 and 55 minutes, whereas the delivery time of restaurant B
ranges between 25 and 70 minutes. Even in the Dot plots, you may observe that the data points
of restaurant A are clustered together, whereas the data points of restaurant B are spread out.
What can you infer from this extra piece of information? This means that restaurant A is more
consistent in delivering pizzas between 40 and 55 minutes, that is, the time frame of delivery
is shorter. Whereas the time taken by restaurant B to deliver a pizza is subject to more variation,
that is the time frame of delivery is longer. Knowledge about such variation is important when
we have to make important decisions. In this example, say you are starving, but you have just
entered an Economics class that will finish in exactly 45 minutes. If you have to take the
decision now to order a pizza, which restaurant would you prefer? You should prefer restaurant
A since it is possible that restaurant B will deliver the pizza 20 minutes earlier and you are sure
that neither would the professor finish the class, nor would you be able to leave the class that
early.
55 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
𝑅𝑎𝑛𝑔𝑒 = 𝐻 − 𝐿
Where H is the highest value and L is the lowest value of a dataset. In the above example, the
range for restaurant A = 55 − 40 = 15. Whereas the range for restaurant B = 70 − 25 = 45.
This clearly indicates that there is more variability in the data points of restaurant B as
compared to restaurant A.
56 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
The concept of range is extensively used in statistical quality control. Range is helpful in
studying the variation in the prices of shares and debentures and other commodities that are
very sensitive to price changes from one period to another. For the meteorological department
too, range is a good indicator for weather forecast.
The relative measure corresponding to range, called coefficient of range, is obtained by the
formula:
𝐻−𝐿
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
𝐻+𝐿
Although range is easy to understand and compute, the usage of this measure is limited since
it takes into consideration only the extreme data points and other observations in the data are
simply ignored. So, range can be affected by extreme values. Moreover, as the size of the
dataset increases, range loses its relevance as a measure of variability.
A slightly better measure of variability than range is the inter-quartile range. Here, instead of
taking the difference between the extreme observations in the dataset, we take the difference
between the upper and lower quartile. It is calculated as Q3 – Q1. This is a better measure than
range since it takes into account only the middle 50% of the observations and the extreme
observations do not affect the measure.
3.4.2 Standard deviation and Variance
Standard deviation and Variance are considered significantly better measures of variability as
they are based on all the observations in the dataset and hence are more sensitive than range
and inter quartile range. Since standard deviation is just the square root of variance, we will
begin our discussion with variance.
As the name suggests, variance illustrates the variation in a dataset, that is, how far each data
point is from the average. Formally, variance is calculated by dividing the sum of the squared
deviations from the mean by (n – 1), where n denotes the sample size. We symbolize sample
variance by 𝑠 2 and the formula can be written as:
2
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠 =
(𝑛 − 1)
Here, the term ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 represents the sum of square of deviations of each data point
taken from the sample mean 𝑥̅ . To make this simpler, we break this figure into three steps:
Step 1: Calculate the difference between the mean and each observation in the dataset
Step 2: Square each value of difference received from Step 1
57 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
Step 3: Add all the values you get from step 2, i.e., the squares of differences
You should note that we add up the squares of deviations (differences) instead of simply adding
only the differences since by merely adding deviations, we will get zero. To understand it
better, consider again the example of the two restaurants selling pizzas. The delivery time of
restaurant A, in minutes was:
∑(𝑥1 − 𝑥̅ ) = − 5 + 3 + 0 − 4 + 5 + 8 − 7
𝑖
=0
Since the positive and negative deviations from the mean cancel each other out, we consider
the sum of squared deviations from the mean. In this way we can eliminate all the negative
values. So, in our example,
(𝑥1 − 𝑥̅ )2 = (−5)2 = 25
(𝑥2 − 𝑥̅ )2 = 32 = 9
(𝑥3 − 𝑥̅ )2 = 02 = 0
(𝑥4 − 𝑥̅ )2 = (−4)2 = 16
(𝑥5 − 𝑥̅ )2 = 52 = 25
58 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
(𝑥6 − 𝑥̅ )2 = 82 = 64
(𝑥7 − 𝑥̅ )2 = (−7)2 = 49
∑(𝑥1 − 𝑥̅ )2 = 25 + 9 + 0 + 16 + 25 + 64 + 49
𝑖
= 188
𝑠 = √𝑠 2
population variance denotes the variability in the population and population standard deviation
denotes the typical deviation of a population value from its population mean µ.
The formal representation of the population variance and standard deviation also gets modified.
In terms of the population parameters, the population variance can be written as:
∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2
𝜎2 = √
𝑁
𝜎 = √𝜎 2
Just as we may use 𝑥̅ to make inferences about µ, similarly, we use s2 to make inferences about
σ2 .
Note here that we divide the sum of deviations from mean by N and not N-1 since s2 is based
on n-1 degrees of freedom. Degree of freedom refers to the maximum number of independent
values, that have the freedom to vary, in a sample. In other words, if we fix 𝑥̅ , then we need
only determine (n−1) number of the elements in the sample in order to know the nth element
of the sample.
Coefficient of Variation (C.V.)
A very popular and frequently used relative measure of variation is the coefficient of variation
denoted by C.V. This is simply the ratio of the standard deviation to arithmetic mean expressed
as a Percentage.
𝜎
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 = 𝐶. 𝑉. = × 100
𝑥̅
When C.V. is less in the data, it is said to be less variable or more consistent.
Consider the following data on the mean daily sales and standard deviation of four regions:
60 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
To determine which region is most consistent in terms of daily sales, we can calculate the
coefficient of variation:
10.41
𝐶𝑉1 = × 100
82
= 12.69
5.85
𝐶𝑉2 = × 100
44
= 13.29
9.52
𝐶𝑉3 = × 100
70
= 13.60
11.22
𝐶𝑉1 = × 100
60
= 18.70
Since the coefficient of variation is 12.69, the minimum for region 1. Hence the most consistent
region for sales is Region 1.
IN-TEXT QUESTIONS
13. If the standard deviation of a data is 0.012. The variance will be:
A) 0.144 B) 0.00144
C) 0.000144 D) 0.0000144
14. The variance of the first 10 whole numbers is _______.
15. In a class of 100 students, the mean marks on a particular exam was 75, and the standard
deviation was 0. This implies that:
A) All students scored 75 marks B) Variance is 0.75
C) Standard deviation cannot be zero D) None of the above
16. If the mean of certain observations is given as 60 and the standard deviation is 12, then
the coefficient of variation is 20%. (True/False)
61 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
Now that we are clear with the calculation of the measures of central tendencies and variations,
we will now examine the effects of change in origin and scale on both mean and standard
deviation/variance. It is important to learn about these effects since often a researcher may
incorrectly report the values in the dataset and to recalculate the mean and standard deviation
can become a lengthy task. To avoid such an inefficiency, we will learn how does mean and
variance respond if every value in the data set is changed.
3.5.1 Change in origin
Change in origin can also be understood in simpler terms as shifting of data. This suggests a
situation in which we add or subtract a constant value from all the observations in our dataset.
We will now see how this impacts the value of mean and standard deviation. Let us understand
this concept through a simple example.
Suppose we have the following 5 observations in our sample:
3, 9, 12, 18, 23
For convenience, we will call these observations as the ‘original dataset.’ At this point, you can
easily compute the mean and standard deviation of the original dataset.
∑𝑛
𝑖=1 𝑥𝑖 3+9+12+18+23 65
Mean = = = = 13
𝑛 5 5
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 100 +16+1+25+100 242
Variance = = = = 60.5
𝑛−1 4 4
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 100 +16+1+25+100 242
Variance = = = = 60.5
𝑛−1 4 4
62 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Did you notice the pattern in the new estimates? As compared to the original dataset, we
observe that the value of mean increased by 5 whereas the value of standard deviation and
variance remained unchanged. You need to remember that this pattern remains the same
whatever constant we add or subtract from our observations. Try repeating the exercise by
subtracting 5 from each observation in the original dataset and calculating the mean and
standard deviation of the new values. You will see that the value of mean reduces by 5 and
standard deviation again remains the same. So, in general we can conclude that:
𝑀𝑒𝑎𝑛 ∶ 𝑦̅ = 𝑥̅ + 𝑎 ,
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 ∶ 𝑠𝑦 = 𝑠𝑥
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 900+144+9+225+900 2178
Variance = = = = 544.5
𝑛−1 4 4
63 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
𝑀𝑒𝑎𝑛 ∶ 𝑦̅ = 𝑏𝑥̅ ,
In conclusion, we can say that mean is affected by both- change in origin as well as change in
change in scale; whereas Standard deviation is affected only by change in scale.
IN-TEXT QUESTIONS
17. Suppose the standard deviation of a dataset is 6. If each observation is divided by 3 then
the standard deviation of the new dataset will be:
A) 3 B) 2
C) 18 D) 9
18. A researcher measures the weight of 10 students. The mean weight she calculates is
57kg. Later she realized that the weighing scale was misreporting each weight and she
had to add 3 kgs to the weight of each student. The new mean weight of students will
be _____.
19. If the standard deviation of 11, 21, 31…,71, 81, 91 is ‘K’, then the standard deviation
of 15, 25, 35…,75, 85, 95 will be:
A) K - 4 B) K +4
C) K D) 4K
3.6 SKEWNESS
The measures of central tendencies and variation discussed above do not reveal all the
characteristics of a given set of data. For example, two distributions may have the same mean,
variance and standard deviation but may differ widely in terms of their shape and peakedness.
The given data is either symmetrical or it is not. It may be flat, normal or peaked.
If the distribution of data is not symmetrical, it is called asymmetrical or skewed. Thus,
skewness refers to the lack of symmetry in distribution.
A simple method of detecting the direction of skewness is to look at the tails to distribution.
The rules are:
64 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1. Data are symmetrical when there are no extreme values in a particular direction so that
low and high Values balance each other. In this case, mean = median = Mode, or 𝑥̅ =
Q2 = Mode.
2. If the longer tail is towards the lower value or left-hand side, the skewness is negative.
Negative skewness arises when the mean is decreased by some very low values. Then
we have, mean < median < mode.
3. If the longer tail of the distribution is towards the higher values or right-hand side, then
skewness is positive. Positive skewness occurs when mean is increased by some very
high valued observations. In this case, mean > median > mode.
65 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
𝑀𝑒𝑎𝑛 − 𝑀𝑜𝑑𝑒
𝑆𝐾 =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑥̅ − 𝑀𝑜𝑑𝑒
𝑆𝐾 =
𝜎
If Mode is not given, we can use the approximate relationship studied earlier in the lesson, i.e.
Mode = 3 Median – 2 Mean. Hence, we can write the equation of coefficient of skewness as:
3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆𝐾 =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Now, if,
• SK = 0, it is a symmetrical distribution.
• SK > 0, the distribution is positively skewed.
• SK < 0, then it is a negatively skewed distribution.
But in practice, normally the value of SK lies between +1 ≤ 𝑆𝐾 ≤ −1.
66 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
For an open-ended distribution with extreme values in data with positional measures such as
median and quartiles, Bowley’s coefficient of skewness is used,
𝑄1 + 𝑄3 − 2𝑄2
𝑆𝐾 =
𝑄3 − 𝑄1
• SK = 0, it is a symmetrical distribution.
• SK > 0, the distribution is positively skewed.
• SK < 0, then it is a negatively skewed distribution.
IN-TEXT QUESTIONS
20. For the frequency distribution of a variable x, mean = 32, median = 30 and mode = 26.
The distribution is:
A. Positively Skewed B. Negatively skewed
C. Symmetric D. None of the above
21. A researcher gathers data on the number of years of experience professors in a
university have. The mean, median, mode and standard deviation are 25, 24, 26 and 5,
respectively. Karl Pearson’s coefficient of skewness is _______ (0.20 / - 0.20).
3.7 BOXPLOTS
We will conclude this lesson by discussing boxplots. Boxplots are yet another type of graphical
representation that is extremely informative. On one hand, the stem and leaf plots, bar graphs
and histograms depict a particular aspect of the data, on the other hand, measures of central
tendency and variability also focus on separate features of the data. Is there no way we can
visualize the data and also trace the mean and variability at the same time? There is, and the
answer is boxplots. Boxplot is a comprehensive graphical representation of data in which we
can illustrate not only the central value and the variability in the data, but it is also capable of
presenting the extreme values (outliers) as well as the shape of the distribution. The following
figure displays a box plot and its features:
67 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
Figure 7: Boxplot
The left side of the box represents the first quartile of the dataset whereas the right side denotes
the third quartile. The difference between the two, i.e., the interquartile range, or fourth spread
(fs) is the length of the box. You can see the median, or the second quartile in the middle,
dividing the box into two equal halves. The whiskers (the two hands extending out of the box
on the right and left side) extend towards the smallest and the largest value in the dataset, which
are not outliers. The small dots beyond the minimum and maximum values are termed as
outliers.
Let us create a boxplot together for better understanding. Suppose we have the following
dataset containing 10 numbers:
34, 29, 25, 35, 28, 37, 30, 35, 29, 38
To create a boxplot, we first need the 5-number summary of the data, i.e., the smallest number,
Q1, Median (Q2), Q3 and the largest number. To get these, it is advisable to arrange the data in
ascending or descending order. So, we get:
25, 28, 29, 29, 30, 34, 35, 35, 37, 38
You should now try and identify the 5-number summary from the above data.
We will get, smallest number = 25
Q1 = 29
Median (Q2) = 32
Q3 = 35
Largest number = 38
68 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Interquartile range = 35 – 29 = 6
Now start by drawing an X-axis with appropriate labels and then draw a box around the first
and third quartile. Mark the median value in the middle. The length of the box must equal the
interquartile range. Next, draw two whiskers from both the ends of the box extending to the
smallest value on the left and largest value on the right. You should get a graph that looks like:
Figure 8: Boxplot
The boxplots can also be created in a vertical manner.
In a larger dataset, we can differentiate between the highest and lowest values of the dataset
and the extreme values that are also known as outliers. We use the following formula to
calculate the minimum and maximum values in a data set and any other value lower or greater
than these, respectively are termed as outliers:
Minimum value: Q1 – 1.5 IQR
Maximum value: Q3 + 1.5IQR
Where IQR is simply the Interquartile range. So, any value below the minimum and any value
above the maximum are outliers and we denote such values in the box plot as dots. Formally,
any observation farther than 1.5fs from the closest quartile is termed as an outlier. An outlier is
extreme if it is more than 3fs from the nearest quartile, and it is mild otherwise. Having an idea
about outliers is important since as we have seen in earlier sections extreme values can affect
our measures of central tendencies and variability. There is a possibility that the extreme value
is a result of an error in any step of research. By identifying such extreme values, we can be
cautious with our study ahead.
Boxplots are also useful in indicating the shape of the distribution of the data, i.e., whether we
have symmetrical or skewed data. When the median sits exactly in the center of the box and
has equal length of the whiskers on both the sides, we can say that the distribution is
symmetrical. However, when the median lies somewhere on the right end of the box with the
right whisker smaller than the left one, then the distribution is said to be left skewed and vice-
69 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
versa. The following figure represents the relationship between the shape of distribution and
boxplots:
Figure 10: Comparative boxplot of duration of Indian classical songs and rap songs
Try and attempt to interpret both the box plots yourself. What differences can you observe in
both the plot? What do these differences signify?
Let us begin interpreting the median. We can clearly see that the median length of classical
songs is significantly higher than that of rap songs. The average length of classical songs is 4.8
minutes, whereas the average length of rap songs is 4 minutes. Next, interpret the length of the
box, i.e. the interquartile range. It shows 50% of the data lies within this range. So, we can say
that half of the classical songs are 4.40 to 5 minutes long. Whereas 50% of the rap songs are
3.60 to 4.40 minutes long. We can also interpret the range of the data, i.e., the difference
between the maximum and minimum value. For Indian classical songs, the range is 5.20 – 3.80
70 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
that is 1.4 minutes. Whereas for rap songs, the range is 4.80 – 3.20 = 1.6 minutes. So, we can
say that the length of rap songs is more variable than classical songs. Finally, we can observe
the shape of the distributions by studying the location of the median value inside the box. The
median length of Indian classical songs is towards the right end of the box, indicating that the
distribution is left skewed. This means that most of the observations lie towards the right of
mean. In other words, we can say that most of the Indian classical songs in the data have a
longer duration. In contrast, the median length of rap songs lies right in the middle of the box.
This means that the distribution is quite symmetric.
IN-TEXT QUESTIONS
22. The following boxplots illustrates the data about the ages of actors and actresses who
have won the National film award since 1967.
Mark all the statements that support the data shown by the boxplots:
A. The first quartile age of Best Actor winner is less that the last quartile age of
Best Actress winner
B. The minimum age of Best Actor winner is equal to the minimum age of Best
Actress winner
C. The range of age of Best Actor winner is higher than the range of age of Best
Actress winner
D. Both the distributions are left skewed
23. Inspired by his statistics class; a student started maintaining a record of the number of
minutes he was late to enter the classroom every day. He recorded the time for 15 days
and the following list displays the data (in minutes):
19, 12, 9, 7, 17, 10, 6, 18, 9, 14, 19, 8, 5, 17, 9
Which of the given boxplots accurately depict the data:
71 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
A. B.
C. D.
3.8 SUMMARY
This lesson focused on the quantitative features of data in terms of measures of central tendency
and variability and their applications. Measures of central tendency refer to a typical or central
value of the data around which the data is generally clustered. Arithmetic mean is the average
value of the dataset which can be calculated by adding the value of all the observations and
dividing the sum by the total number of observations. Since mean is affected by extreme values,
median is preferred. Median is the middle observation in the data when the data is arranged in
ascending or descending order of magnitude.
When the distribution is symmetric, the three measures of central tendencies converge
at the same point. Quartiles divide the data set into 4 equal parts. Percentiles denote the
observation below which a particular percentage of observations fall. Deciles divide a dataset
into 10 equal parts. Since arithmetic mean is affected by extreme values, trimmed mean is used
to eliminate the extreme observations from analysis. Under measures of variability, range is
the simplest measure. It is the difference between the largest and the smallest value in the
dataset. Inter-quartile range is calculated by taking the difference between the upper and lower
quartile. Variance is calculated by dividing the sum of the squared deviations from the mean
by (n – 1). Standard deviation is simply the square root of variance. Degree of freedom refers
to the maximum number of independent values, that have the freedom to vary, in a sample.
Coefficient of variation is a relative measure of variation. Mean is affected by both- change in
origin as well as change in change in scale; whereas Standard deviation is affected only by
change in scale. Skewness refers to the lack of symmetry in distribution. In case of a right or
positively skewed distribution, the value of mean is the largest, followed by the median and
mode. The opposite is true in the case of a left or negatively skewed distribution, i.e., the value
of mode is the largest and mean has the smallest value. Boxplots are capable of illustrating the
central values, variability in the data, the extreme values (outliers) as well as the shape of the
distribution. The boxplots are extremely useful to compare two datasets and identify outliers.
3.9 GLOSSARY
• Boxplot: Graphical representation of measure of central tendency, variability and
skewness of numerical data using quartiles
72 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
74 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
If the average monthly spending by 21 women in a kitty group was Rs. 5240, what is
the new average spending if another member is added whose average monthly spending
is 5540? Use the formula above to answer.
Q.7 The sum of deviations of a certain number of observations from 12 is 166 while the
sum of deviations of these observations from 16 is (-54). Find the number of
observations and their mean.
Q.8 Consider the following data that depicts the daily income of two ice cream sellers in
two different regions:
75 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
Q.9 If the coefficient of skewness of a distribution is 0.32, the standard deviation is 6.5 and
the mean is 29.6 then find the mode of the distribution.
Q.10 There are 40 students in a class preparing for a statistics test. There are two strategies
that these students can adopt to ace the test. After the test, the professor interviewed the
students and noted down their strategies. The professor observed that 20 students
followed strategy A and the other 20 followed strategy B. Given below are the marks
of the students based on the strategies they adopted:
Strategy A: 78, 78, 79, 80, 80, 82, 82, 83, 83, 86, 86, 86, 86, 87, 87, 87, 88, 88, 88, 91
Strategy B: 66, 66, 66, 67, 68, 70, 72, 75, 75, 78, 82, 83, 86, 88, 89, 90, 93, 94, 95, 98
Write down the 5-number summary for both the strategies. Create two boxplots for each
strategy using the 5-number summary and comment on the shapes of boxplots. According to
you, which strategy is more likely to fetch you good marks in the test and why?
3.12 REFERENCES
• Devore, J. L. (2016). Probability and Statistics for Engineering and the Sciences.
Cengage learning.
• Larsen, R. J., & Marx, M. L. (2012). An introduction to mathematical statistics and its
applications. Prentice Hall.
• McClave, J. T., Benson, P. G., & Sincich, T. (2018). Statistics for business and
economics. Pearson Education.
76 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
LESSON 4
STRUCTURE
77 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
4.2 INTRODUCTION
This unit introduces the concept of ‘probability’ to the students. The phenomenon of
probability indicates the presence of randomness and the existence of some element of
uncertainty. Whenever we face a situation in which there is more than one possible outcome
that can occur, the concept of probability renders a technique for quantifying the chances or
likelihood associated with every possible outcome. There are several instances that involve
chances and thus the notion of probability is applicable. For example, in political elections,
based on exit polls it is plausible to predict that a certain political party could come into power.
By deploying a database of the previous days and considering various parameters such as
temperature, humidity, pressure, etc., the meteorologists use specific tools or techniques to
predict weather forecasts and determine that there are 60 out of 100 chances that it would rain
today.
Another example from day-to-day life is that ‘since it is supposed to rain tomorrow, it is very
likely I will use my raincoat when I go to work. Similarly, flipping a coin involves the
probability of getting either a head or a tail is 0.5 and playing with dice involves one out of six
chances that the required number will come. Thus, the concept of probability can be applied to
several interesting events.
Probability is a mathematical term and the study of probability as a branch of mathematics is
over 300 years. This chapter enables the students to understand and estimate the likelihood of
78 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Deductive Reasoning
Probability
Population Sample
Inferential Statistics
Inductive Reasoning
79 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
When a sample is derived from the given population, then the concept of probability is used to
infer anything regarding the population. This method of inference is called deductive
reasoning. However, when the sample is used to deduct or infer the population, inferential
statistics is deployed for inferring the population. The technique is referred to as ‘inductive
reasoning. Thus, the role of probability is explicit and well-defined as it plays a critical role in
inferring the sample derived from the population. It is crucial in the deductive method of
statistical inference or research.
Having understood the difference between the sample and population and the relationship
between them, also their role played in statistical inference, it is crucial to comprehend the kind
of experiments or data collection.
4.3.1 Statistical or Random Experiments
Any activity or process whose outcome is subject to uncertainty is considered an experiment.
Experiments generally suggest careful controlled testing of the situation or planned testing in
the laboratory. However, in the disciple of statistics, experiments refer to a wider scope of trials
such as tossing a coin once or several times, selecting a card from the deck, obtaining a
particular blood type from a group of individuals, etc.
Any process of observation or measurement that has more than one possible outcome and for
which there is uncertainty about which outcome will actually materialize is referred to as a
‘random experiment’. For example, tossing a coin, throwing a pair of dice, drawing a card from
the deck of cards.
4.3.2 Sample Point, Event
Each member or outcome of a sample space or population is called Sample Point and event.
It is also called an element of sample space. Let us consider the example of the toss of the coin
for which the sample space is S = {H, T}. The number of elements in the sample space or
population is n(S) = 2. Each element of the sample space that is H and T are known as a Sample
point. In general, n(S) is the number of sample points, a number of times the experiment is
repeated.
Consider an event B which is defined as Event B: Tail appears: B={T}. The number of elements
in event B is 1, denoted by n(B)=1
In a random experiment of the toss of a coin, suppose the event A denotes the event that Head
appears. A = {H}. The number of elements in event A is 1, denoted by n(A)=1
A+B = S: {H}+{T} = {H,T} = S
Let us consider another example of tossing two fair coins. The sample space or population for
this experiment is given by Sample Space: {HH, HT, TH, TT}
80 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
81 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
events. However, a person throwing the dart once, practicing, and then throws it for the second
time. The event of getting a perfect score in both throws is not independent.
82 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
in a single toss of a fair coin. For such an experiment the total number of outcomes is two,
therefore the sample space is denoted by
The sample space: S = {H, T},
The number of elements in the sample space or population is n(S) = 2
Either both tosses result in a Head or both Tosses result in a Tail or the first Toss result in a
Head while the second results in Tail or the first Toss results in a Tail and the second results in
a Head.
Let us consider another example of tossing two fair coins. The sample space or population for
this experiment is therefore given by
Sample Space: {HH, HT, TH, TT}
The number of elements in the sample space or population is 4, n (S) = 4.
Consider another example of rolling a die,
Sample space S = {1,2,3,4,5,6},
The number of elements in the sample space is n (S) = 6
And if the same dice is rolled twice, n(S)= 36 = 62 is the Sample space.
If rolled thrice, n(S) = 216 = 63 is the Sample Space.
A CASE STUDY
Consider another example of rolling a dice, The sample space for the random experiment of
rolling dice is given by the Sample space S = {1,2,3,4,5,6},
The number of elements in the sample space is 6, denoted by n (S) = 6,
Let event E be an event that reflects even numbers that appear on dice, as represented by
E= {2, 4, 6},
The number of elements in event E is 3, represented by n(E)=3
There are several varieties of events as described in the next section.
83 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
IN-TEXT QUESTIONS
1. Events are said to be _____________. if the occurrence of one event prevents the
occurrence of another event at the same time.
2. If event A represents an event that at least a head appears, and event B represents an
event that only the tail appears. Events A and B are equally likely True / False
3. In the occurrence of the event: {Head} in a single throw of the coin, the occurrence of
event {Tail} is disjoint. The two events are called
a) Mutually exhaustive b) Equally likely c) Both
4. In an experiment consisting of tossing two coins, if event A represents an Event that at
least a Head occurs and event B represents that at least a Tail occurs, then
a) Events A and B are equally likely (True/False)
b) Events A and B are mutually exclusive (True/False)
c) Events A and B together form an exhaustive set (True/False)
4.3.4 Events, Set theory and Venn Diagrams
An event can be considered a set, therefore the relationships and results from elementary set
theory can be used to study events of any random experiment. Some of the fundamental
operations of set theory can therefore be applied to events such as.
1. The complement of an event A is denoted by A'. A complement represented as A' is
the set of all outcomes in the sample space S that are not contained in set A.
2. The union of the two events A and B is denoted by A ∪ B. A union B can also be read
as “A or B” or in both events. In other words, the union of two events includes outcomes
for which both A and B occur as well as outcomes for which exactly one occurs. It
means all outcomes in at least one of the events.
3. The intersection of the two events, A and B, denoted by A ∩ B is read as “A and B”.
The intersection of two events indicates an event consisting of all outcomes that are in
both A and B.
4. A null event is an event consisting of no outcomes whatsoever and is denoted by ∅.
Suppose there are two events A and B, and it is given that A ∩ B = ∅. then A and B are
said to be mutually exclusive or disjoint events.
4.3.5 De Morgan’s laws
a. The complement of the union of events A and B is equal to the intersection of the
complement of A and the complement of B.
84 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
(A ∪ B) ' = A ‘∩ B '
b. The complement of the intersection of event A and B is equal to the union of the
complement of A and the complement of B.
(A ∩ B) ' = A' ∪ B'
The events can be represented by using the Venn diagram as shown in the diagrams below.
A B
A∩ 𝐵
A A
B
85 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
IN-TEXT QUESTIONS
1. Consider an experiment in which each of the three vehicles taking a particular freeway
exit turns left (L) or right (R) at the end of the exit ramp. Outline the sample space and
events.
2. The two events E1 and E2 are mutually exclusive, where E1 is the event consisting of
numbers less than 3 and E2 is the event that consists of numbers greater than 4. (True/
False)
3. If the two events have some common elements, the two events are not ____________.
4.4 PROBABILITY
In the realm of random experiments, the key objective of the probability of any event A is to
assign a number P(A) to event A. This value P(A) is called the probability of event A which
gives a unique measure of the chances that the event will occur.
In other words, the probability is the chance of happening or occurrence of an event such as it
might rain today, team X will probably win today, or I may win the lottery. Largely, probability
is a measure of uncertainty.
4.4.1 Classical Definition of Probability
It is also called a priori or mathematical definition of probability. The probabilities are derived
from purely deductive reasoning. This implies that one does not throw a coin to state that the
probability of obtaining a head, or a tail is ½. However, there are cases where possibilities that
arise cannot be regarded as equally likely. For example, the Probability of a recession next year
Probability of GDP value next year. Similarly, the possibility of whether it will rain, or the
outcome of an election is not equally likely.
If an experiment results in mutually exclusive and equally likely outcomes. If m outcomes are
favorable to event A and n is the total number of outcomes in the sample space, then
𝑚 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 𝐴
P(A) = , 𝑜𝑟
𝑛 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
In a single throw of a die, the total occurrences or sample space is n = 6. All are mutually
exclusive and equally likely.
86 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
IN-TEXT QUESTIONS
6. In a toss of two coins simultaneously, the probability of getting exactly 2 heads P(E)
no. of possible outcomes / total outcomes
7. In the toss of 3 coins simultaneously, the probability of getting exactly two heads.
8. What is the probability of getting at least 1 head when two coins are tossed
simultaneously?
9. Prob of getting almost 2 tails when three coins are tossed simultaneously.
10. Probability of getting at least 2 heads when three coins are tossed simultaneously.
11. Probability of getting a greater number of tails than heads when three coins are tossed
simultaneously.
4.4.3 Axiomatic Definition of Probability
The axiomatic approach to probability was provided by Russian Mathematician A.N.
Kolmogorov and includes both the above definitions. In order to ensure that the probability
assignments of values P(A) for a particular event in the sample space S, is consistent with the
intuitive notion of probability, all assignments of values of probability P(A) must satisfy the
following properties or Axioms.
1. For any event A, the probability of event A, given by P(A) is non-positive P(A) ≥0. In
other words, the probability that event A will occur can either be zero or some positive
number. The probability of event A can never be negative.
The Axiom 1 reflects the intuitive notion that the chance of A occurring should be non-
negative and is known as the Axiom of non-negativity.
2. The probability of the entire sample space is 1, that is P(S) = 1. In other words, the
probability that the entire sample space will occur is 100 percent, which means it will
surely occur. This is known as the Axiom of Certainty.
87 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
The sample space by definition is the event that must occur when the experiment is
performed. The sample space S contains all possible outcomes, therefore the maximum
possible probability is assigned to sample space S.
3. If A1, A2, A3, ………... are the infinite collection of disjoints events, then
P (A1 ∪A2 ∪ A3, ……….) = ∑∞ 𝑖=1 𝑃(𝐴)
This indicates that the probability of the union of all disjoint events belonging to the sample
space sums the chances of all individual events.
The third Axiom formalizes the idea that if we wish the probability that at least one of a number
of events will occur, given that no two events can occur simultaneously, then the chance of at
least one occurring is the sum of the chances of the individual events. This is known as the
axiom of finite additivity.
4. The probability of an event always lies between 0 and 1.
0< = P(A) < = 1,
P(A) = 0 means event A will not occur.
P(A) = 1 means event A will occur certainly.
5. Let the ∅ be the null event. The event contained no outcomes whatsoever. This property
mainly reflects Axiom 3 indicating the finite collection of disjoint events.
Therefore, P (∅) = 0, the probability of a null event is zero.
6. If A, B, and C are mutually exclusive events, the probability that any one of them will
occur is equal to the sum of probabilities of either individual occurrence.
P(A+B+C+...........) = P (AUBUC…….) = P(A) +P(B)+P(C) +...............
7. If A, B, C …… are a mutually exclusive and collectively exhaustive set of events the
sum of the probability of their individual occurrences is 1. However, if A, B, C ……
are any events, they are said to be statistically independent if the probability of their
occurring together is equal to the product of their individual probabilities. P(A∩B∩C)
= Probability of events A, B, and C occurring together or jointly or simultaneously, also
referred to as Joint probability.
P(A), P(B), and P(C) are called unconditional marginal or individual probabilities.
8. If events A, B, and C …...are not mutually exclusive then,
P(A+B) or P(AUB) = P(A) +P(B) - P(A∩B)
Where P(AB) is the joint probability that the two events occur simultaneously, that is
P (A∩ 𝐵). However, if A and B are mutually exclusive then,
P(A∩B) = P (∅) = 0
88 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
IN-TEXT QUESTIONS
4.5 SUMMARY
This lesson familiarized the students with the basic concepts of sample space and population
along with their significance. The notion of probability was introduced with help of random
experiments. Various applications of probability in real life are presented in the chapter. Certain
important concepts related to probability such as space, events, sample points, and random
89 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
experiments are described in the chapter. The basic difference between the sample, population,
sample points, and events have been emphasized. The types of events such as disjoint events,
mutually exhaustive, and exclusive events have been explained. Further, the concept of the
Venn diagram is also presented in the chapter. The notion of probability by using classical and
relative definition has been introduced. Later the properties of probabilities are also discussed
in the chapter.
4.6 GLOSSARY
90 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
2. False
3. Both
3 The sample space S; {LLL, RLL, LRL, LLR, LRR, RLR, RRL, RRR}
The event that exactly one of the three vehicles turns right: A
The elements in event A: {RLL, LRL, LLR}
The event that at most one of the vehicles turns right: B
The elements in the event B: {LLL. RLL, LRL, LLR}
In the event that all three vehicles turn in the same direction: C
The elements in the event C: {LLL, RRR}
4. E1 = {1,2}, E2 = {5,6}. The two events are mutually exclusive. True
5. Disjoint
6. ¼
7. 3/8
8. ¾
9. 7/8
10. ½
11. ½
12. Total number of possible outcomes = 6= n(S)
(i) a multiple of 3
Number of favorable outcomes = 2 {3 and 6}
Hence P (getting multiple of 3) = 2/6 = 1/3
ii) a number less than 5
Number of favorable outcomes = 4 {1, 2, 3, 4}
Hence, P (getting number less than 5) = 4/6 = 2/3
iii) an even prime number
Number of favorable outcomes = 1 {2}
Hence, P (getting an even prime number) = 1/6
iv) a prime number
Number of favorable outcomes = 3 {2,3,5}
Hence the P (getting a prime number) = 3/6 = 1/2
91 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
v) a factor of 6
Number of favorable outcomes= 4 {1, 2, 3, 6}
Hence, P (getting a factor of 6) = 4/6 = 2/3
13. a) 1/2 b) 1/3 c) 1/3
14. a) 1/4 b) 7/36 c) 5/6 d) 1/6 e) 1/3 f) 11/36 g) ½
1. Two six-faced dice are rolled together, or dice is rolled twice. The total number of
possible outcomes are 36.
2. (i) Prove that the probability of null event is zero, P (∅) = 0.
(ii) Prove that for any two events A and B
P(AUB) = P(A) +P(B) - P(AB)
4.9 REFERENCES
• Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences.
Cengage Learning.
• Freund, J. E., Miller, I., & Miller, M. (2004). John E. Freund's Mathematical Statistics:
With Applications. Pearson Education India.
• McClave, J. T., Benson, P. G., & Sincich, T. (2008). Statistics for business and
economics. Pearson Education.
92 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
LESSON 5
CONDITIONAL PROBABILITY
STRUCTURE
1. To understand the concept of conditional probability and its significance in real life.
2. To comprehend the significance of the initial assignment of probability that may be
followed by partial information relevant to the outcome.
3. To visualize that partial information may affect the assignment of probability
assignment. This leads us to the concept of conditional probability and Bayes’
Theorem.
4. To comprehend the concept of Bayes’ Theorem and its applications.
5. Learning the computation of a posterior probability from the given prior probabilities
and conditional probabilities plays a critical role in Bayes’ Theorem and
93 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
6. To practice several cases and examples for the application of Bayes’ Theorem.
5.2 INTRODUCTION
In the previous chapter, we introduced the topic of probability. In this chapter, we expose
students to the deeper concepts and situations related to probability. The probabilities assigned
to various events or occurrences are subject to what is known as experimental situations. The
initial assignment may be followed by partial information relevant to the outcome. The partial
information may affect the assignment of probability assignment. This leads us to the concept
of conditional probability and Bayes’ Theorem.
Conditional probability is considered a measure of the likelihood of an event occurring,
assuming that another event or outcome has previously occurred. For instance, a student aims
to receive an academic scholarship while applying for admission. For every 1000 applications,
the college accepts 100 applications and awards an academic scholarship to 10 of every 500
students. Of the scholarship recipients, 50 % receive university stipends for books, meals,
housing etc. As a result, the chance of students being accepted and then receiving a scholarship
is 2% given by 0.1 multiplied 0.02. While the chance of being accepted, receiving the
scholarship and then receiving the stipend for books etc. is 0.1 & given by 0.1 multiplied by
0.02 multiplied by 0.5.
In the realm of conditional probability, Bayes’ Rule or Bayes’ Law is used to calculate the
conditional probability. Bayes’ theorem is a mathematical equation that helps calculate
conditional probability. The computation of a posterior probability from the given prior
probabilities and conditional probabilities plays a critical role in Bayes’ Theorem.
5.3 CONDITIONAL PROBABILITY
94 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Where P(AB) is the joint probability of events A and B occurring together. Also, read as the
probability of events A and B or P(A∩B) or probability of event A intersection B.
The computation of conditional probability P (A | B) requires that the P(B) is positive or the
probability of event B occurring cannot be negative or zero.
Similarly, the conditional probability of B, given A, is denoted by P (B | A)
P (B | A) = P(AB)/ P(A), P(A) > 0
P(AB) = P (B | A) *P(B), P(A) > 0
95 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
Where P(AB) is the joint probability of events A and B occurring together. Also, read as the
probability of events A and B or P(A∩B) or probability of event A intersection B. The
computation of conditional probability P (A | B) requires that the P(B) is positive or the
probability of event B occurring cannot be negative or zero.
In the case of Conditional Probability is denoted by: P (A | B)
P (A | B) = P(AB)/ P(B), P(B) > 0
The conditional probability is expressed as a ratio of the unconditional probabilities. The
numerator is simply the probability of the intersection of the two events, whereas the
denominator is the probability of the conditioning event B. The conditional probability can be
represented by the following Venn diagram.
A A B
96 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Points to remember
(a) Two possible mutually disjoint events are always dependent.
Proof: Let A and B be disjoint events i.e. A ∩ B = Ø
So, P (A ∩ B) = 0
We know that P (A ∩ B) = P(A)*P (B |A), P(A) ≠ 0
P(B)*P (A |B), P(B) ≠ 0
97 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
CASE STUDY
Suppose that of all the individuals who buy the smartphone, 60% include an optional memory
card in their purchase, 40% include an extra battery and 30% include both a card and a battery.
Solution:
Let us consider a randomly selected buyer, and let event A be the memory card purchased,
while event B be the battery purchased. Thus, P(A) is 0.6, P(B) is 0.4. The probability that both
memory card and battery are purchased, P(A∩B) or P(AB) is 0.3.
Given that the selected individual purchased an extra battery, the probability that an optional
card was also purchased is P (memory card/battery) or P(A/B).
P (memory card/battery) = P(A|B) = P(A∩B)/ P(B), P(B) > 0
= 0.3/0.4 = 0.75
This implies of all those purchasing an extra battery, 75% purchased an optional memory card.
Similarly, given that the memory card was purchased, the probability of buying battery is given
by P (battery/memory card) or P(B/A).
P (battery|memory card) = P(B|A) = P(A∩B)/ P(A), P(A) > 0
= 0.3/0.6 = 0.5
It can be observed that P(A|B) ≠ P(A) and P(B|A) ≠ P(B)
This means conditional probability is not equal to unconditional probability.
IN-TEXT QUESTIONS
1. A card is drawn from a deck of cards. What is the probability that it will be either a
heart or a queen?
2. The numerator is the union of two events in the computation of conditional probability.
(True/false)
3. In a class, there are 500 students of which 300 are, males and 200 are females. Of these
100 males and 60 females plan to major in accounting. A student is selected at random
from this class and it is found that this student plans to be an accounting major. What
is the probability that the student is a male?
4. If there are two events, A and B, then the probability that event A occurs knowing that
event B has already occurred is referred to as ______________.
98 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
5. If we randomly pick two TV sets in succession from a shipment of 240 T.V tubes of
which 15 are defective. What is the probability that they will both be defective?
5.4 BAYES’ THEOREM
1
https://www.investopedia.com/terms/b/bayes-theorem.asp
99 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
∪ ………∪ ( Bk ∩A), where all the events (Bi ∩A) are mutually exclusive. This “partitioning of
A” is illustrated in figure 2 below.
𝐴
𝑃(𝐵𝑟).𝑃( )
𝐵𝑟
P(Br/A) = 𝐴 , for all r = 1, 2,3,………..k
∑𝑘
𝑖=1 𝑃 (𝐵𝑖 ).𝑃(𝐵 ))
𝑖
𝐴 𝐴 𝐴
∑𝑘𝑖=1 𝑃 (𝐵𝑖 ). 𝑃( ) = 𝑃( 𝐵1 ). 𝑃 ( ) + 𝑃(𝐵2 ). 𝑃 ( ) +
𝐵 𝑖 𝐵 1𝐵 2
𝐴 𝐴
𝑃(𝐵3 ). 𝑃(𝐵 )+……… 𝑃(𝐵𝑘 ). 𝑃(𝐵 )
3 𝑘
𝑃( 𝐵𝑟 ∩𝐴)
P(Br/A) = 𝑃(𝐴)
In the figure 2, it is evident that given that event A has occurred, the probability that A had
occurred from partition B4 is given by
𝐴
𝑃(𝐵4 ).𝑃( )
𝐵4
P(B4/A) = 𝐴 , for i = 1, 2,3,4
∑4𝑖=1 𝑃 (𝐵𝑖 ).𝑃( ))
𝐵𝑖
100 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
𝐴 𝐴 𝐴 𝐴 𝐴
∑4𝑖=1 𝑃 (𝐵𝑖 ). 𝑃( ) = 𝑃( 𝐵1 ). 𝑃 ( ) + 𝑃(𝐵2 ). 𝑃 ( ) + 𝑃(𝐵3 ). 𝑃( ) + 𝑃(𝐵4 ). 𝑃( )
𝐵
𝑖 𝐵1 𝐵 2 𝐵 3 𝐵 4
P(B4/A): Probability that partition B4 occurs given that event A has occurred.
Example: The probability of receiving a spam message given that the computer programme
filter has confirmed the probability to be more than 0.6. These are related probabilities that can
be calculated by using Bayes’ Theorem. Using the same notations, we find two mutually
exclusive and collectively exhaustive events A and B as follows.
A : The incoming mail is a spam message.
B : The incoming mail is not a spam message.
The other events defined in the context of the same experiment are:
C : Filter test confirms spam
D : Filter test did not confirm spam.
The data given to us are:
P(A) : Probability of finding spam = 0.6
P(B) : Probability of not finding spam = 0.4
P(C|A) : Probability test predicts correctly when spam is actually confirmed or found.
P(D|A) : Probability test predicts incorrectly when spam is actually found.
P(D|B) : Probability test predicts correctly when actually spam is not there.
P(C|B) : Probability test predicts incorrectly when actually no spam is found.
We are interested in finding:
P(C) : Probability that the test says spam is there.
P(D) : Probability that the test says no spam is there.
P(A|C) : Probability of finding spam, given positive test results.
P(A|D) : Probability of finding spam, given negative test results.
P(B|C) : Probability of not finding spam, given positive test results.
P(B|D)' : Probability of not finding spam, given negative.
Applying Bayes’ Theorem
101 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
𝑃(𝐶|𝐴)∗𝑃(𝐴)
P(A|C) = 𝑃(𝐶 |𝐴).𝑃(𝐴)+𝑃(𝐶 |𝐵 ).𝑃(𝐵)
0.9∗0.6
= 0.9 .∗ 0.6 +0.3∗0.4 = 0.818
𝑃(𝐶|𝐵)∗𝑃(𝐵)
P(B|C) = 𝑃(𝐶 |𝐵 ).𝑃(𝐵)+𝑃(𝐶 |𝐴).𝑃(𝐴)
0.3∗0.4
= = 0.182
0.3 .∗ 0.4 +0.9∗0.6
𝑃(𝐷|𝐴)∗𝑃(𝐴)
P(A|D) = 𝑃(𝐷 |𝐴).𝑃(𝐴)+𝑃(𝐷 |𝐵 ).𝑃(𝐵)
0.1∗0.6
= 0.1 .∗ 0.6 +0.7∗0.4 = 0.176
𝑃(𝐷|𝐵)∗𝑃(𝐵)
P(B|D) =
𝑃(𝐷 |𝐵 ).𝑃(𝐵)+𝑃(𝐷 |𝐴).𝑃(𝐴)
0.7∗0.4
= 0.1∗0.6+0.7∗0.4 = 0.824
102 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
CASE STUDY
Suppose a consulting firm rents motorbikes from three rental agencies, 60 percent from agency
1, 30 percent from agency 2, and 10 percent from agency 3. Suppose 9 percent of bikes from
agency 1, need a tune-up and 6 percent of the cars from agency 3 need a tune-up, what is the
probability that a rental bike delivered to the firm will need a tune-up?
Let A be the event that the bike needs a tune and B1, B2, and B3 are the events that the bike
comes from rental agencies 1,2, or 3. P(B1) = 0.60, P(B2) = 0.30, P (B3) = 0.10, P(A/B2) =
0.20, and P(A/B3) = 0.06
According to the Bayes’ Theorem,
P(A) = (0.60) * (0.09) + (0.30) * (0.20) + (0.10) * (0.06) = 0.12
If a rental bike delivered to the consulting firm needs a tune-up, then what is the probability
that it came from rental agency 2?
P(B2/A) = (0.30) * (0.20) / (0.60) * (0.09) + (0.30) * (0.20) + (0.10) * (0.06)
= 0.060 / 0.120 = 0.5
It is observed that although only 30 percent of the bike delivered to the firm come from agency
2, 50 percent of those who require a tune-up come from the agency.
IN-TEXT QUESTIONS
6. A balanced die is tossed twice. If A is the event that an even number comes up on the
first toss, B is the event that an even number comes up on the second toss, and C is the
event that both tosses result in the same number, are the events A, B and C
a. Pairwise independent
b. Independent?
7. The probability of simultaneous occurrences of two events can never exceed the sum
of probabilities of these events. T
8. The conditional probability of an event given another event can never be less than the
probability of the joint occurrence of their events. T
5.5 INDEPENDENCE OF EVENTS
The concept of conditional probability suggests that the probability of an event A, P(A) must
be modified in context of another event B has occurred whose outcome affects the occurrence
of event A. The new probability now assigned to A can be expressed as P(A|B). This is
considered as conditional probability of event A occurring given that event B has already
occurred. Therefore, the conditional probability of A such that B has occurred, given by P(A|B)
103 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
differs from unconditional probability P(A). This mainly indicates that the information that B
has occurred results in change in the chance of A occurring.
The chances are that the occurrence of A is not affected by the fact that B has occurred,
implying that P(A|B) = P(A). In other words, the occurrence or non-occurrence of one event
has no consequence on the chances that the other will occur. Such events are referred to as
independent events.
The two events A and B are independent if P(A|B) = P(A), while if they are dependent P(A/B)
≠ P(A). There exists a strong connection between the concept of independence and conditional
probability.
The conditional probability formula for P(A|B) and P(B|A) as given below,
P(A|B) = P(AB)/ P(B), P(B) > 0 eq (1)
For P(B/A),
P(B|A) = P(AB)/ P(A), P(A) > 0 eq (2)
104 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
105 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
3. Bayes’ Theorem: It states that based on the occurrence of another event; the
conditional probability indicates the likelihood of the second event given the first event
multiplied by the probability of the first event.
4. Independence of Events: The occurrence or non-occurrence of one event has no
consequence on the chances that the other will occur. Such events are referred to as
independent events. In other words, the two events are said to be independent only if
the conditional probability is equal to the unconditional probability.
5.8 ANSWERS TO THE QUESTIONS
1. 4/13
Hint: P(S), n(S) = 52
A: A card drawn from 52 cards
H: Heart appearing
Q: Queen appearing
P(HUQ) = P(H) +P(Q) - P(HQ)
P(H) = n(H)/n(S), = 13/52
P(Q) = n(Q) / n(S) = 4/52
P(HQ)= n(HQ)/n(S) = 1/52
P(HUQ) = 13/52 + 4/52 - 1/52 = 4/13
2. False
3. 5/8
Hint: Accounting major: A: n(A) = 160
n(S) = 500
M: n(M) = 300, n(MA) = 100, P(MA) = n(MA)/n(S) = 100/500
F: n(F) = 200
P(A) = n(A)/n(S) = 160/500
P (M/A) = P(MA) or P(AM)/ P(A) = 100/160 = ⅝
4. Conditional Probability
5. 7/1,912
Hint: Let event A be the event when the first randomly picked-up TV set out of the two
is defective, A: 1st TV defective. The number of elements in the sample space is n(S)
is 240.
106 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
107 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
LESSON 6
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
STRUCTURE
6.1 Learning Objectives
6.2 Introduction
6.3 Random Variables
6.3.1 Types of Random Variables
6.4 Probability Mass Function
6.5 Probability Density Function
6.5.1 Properties of Probability Density Function
6.6 Summary
6.7 Glossary
6.8 Answers to In-Text Questions
6.9 Self-Assessment Questions
6.10 References
6.11 Suggested Readings
6.1 LEARNING OBJECTIVES
After reading this lesson, students will be able :
1. To understand the concept of random variables, and their significance in statistical
analysis.
2. The students will be able to distinguish between the two fundamental types of random
variables, namely the discrete random variable and continuous random variable.
3. To familiarize the students with some commonly used discrete and continuous
distributions of random variables.
4. To understand the concept of probability distribution function or probability mass
function and
5. To comprehend the derivation of the probability distribution function
6.2 INTRODUCTION
We have seen in earlier units how the concept of probability enables us to compute the extent
of uncertainty associated with random experiments. An experiment may yield both qualitative
and quantitative outcomes. Statistical analysis focuses on the numerical aspect of the data or
experiment. Thus, the term random variable is introduced to represent any event or outcome
108 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
that can take different values. Such a variable takes the values that are the plausible outcomes
of any event or experiment.
Since the values are random and associated with random experiment such variables are termed
as random variables. The concept of random variable allows us to pass from the experimental
outcomes themselves to a numerical function of the outcomes.
There are two types of random variables discrete random variables and continuous random
variables. This chapter will define the concept of random variables along with the two
fundamental types of random variables. The chapter will help us to understand the derivation
of probability distribution function both for discrete and continuous random variables.
6.3 RANDOM VARIABLE
Each outcome of an experiment can be associated with a number by specifying a rule of
association example: total weight of baggage for a sample of 25 airline passengers.
Such a rule of association is called random variable
A random variable is a variable because the observed value depends on which of the possible
experimental outcomes results.
Sample Space
X2 f(x1) f(x2)
X1
109 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
A: {no. of heads in the toss}, then a variable X can be assigned as a random variable to A,
where the values that X picks up are all random based on the outcome of the experiment i.e.
Toss of two coins.
Similarly, if an event B is defined as B as the number of tails in the toss of two coins,
B: {no. of tails in the toss}, then a variable Y can be assigned as a random variable to B, where
the values that Y picks up are all random based on the outcome of the experiment i.e. Toss of
two coins.
6.3.1 Types of random variables
A random variable is a variable that takes values that are nothing, but the outcomes associated
with the random experiment. Here, on the basis of values or data taken by the random variable,
the random variables can be distinguished on the basis of the observed data and its countability.
Thus, a random variable can be distinguished as a discrete rv or a continuous rv.
a) A discrete random variable: A discrete random variable is a rv whose possible values either
constitute a finite set or else can be listed in an infinite sequence in which there is a first
element, a second element, and so on ……... (countable finite).
b) A continuous random variable: The continuous random variable consists either of all the
numbers in a single interval on the number line (infinite from - infinity to infinity) or all
numbers in a disjoint union of such intervals. No possible value of the variable has a
positive probability, P(X=c) = 0 for any possible value c.
On the basis of the data taken by the random variable, we define the functions that yield the
corresponding value of probability for each specific value of a random variable. Such functions
are called probability distribution which gives probabilities of occurrences of different possible
outcomes of an experiment. Further, depending on whether the random variable takes discrete
or continuous values, these functions are referred to as Probability mass function (Pmf) or
Probability density function (pdf).
6.4 PROBABILITY MASS FUNCTION
For a discrete random variable X that can take at most a countably infinite number of values
x1, x2, ………………., we associate a probability
pi = P [X =x] = P [ all s € S, X(s) = x]
that must satisfy the following conditions,
1. p(x) ≥ 0 for all x which implies that the value of probability distribution is positive at
all the values taken by the random variable x.
2. ∑𝑥 𝑝(𝑥) =1 which implies that all the values taken by the random variable complete the
sample space, therefore the sum of all probabilities of every value of the random
variable is 1.
110 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Example 1:
Let us consider an example of a lab in the Department of Economics, where six computers are
reserved for an Economics major. Let the random variable X denote the number of these
computers that are in use at a particular time in a day. Suppose the probability distribution of
X corresponding to each value of X is given below.
X 0 1 2 3 4 5 6
It is easy to verify that the above values satisfy both the properties of a Probability Mass
Function as each p(x) is positive and all of them sum to unity.
By using the above-given probabilities, various probabilities could be computed such as
The probability that at most 2 computers are in use is given by P (X ≤ 2), the probability that
the random variable takes at most the value 2.
P (X ≤ 2) = P (X =0 or 1 or 2) = p (0) + p (1) + p(2) = 0.05 + 0.10 +0.15 = 0.30
In case of the event that at least 3 computers are in use is given by P (X ≥ 3). Since the event
that at least 3 computers are in use is complementary to at most 2 computers are in use,
therefore, the probability can be computed as follows.
P (X ≥ 3) = 1 - P (X ≤ 2)
= 1 - 0.30
= 0.70
Another way to compute the probability of the event that at least 3 computers are in use is by
adding values of probability when X takes the values 3, 4, 5 and 6.
The probability that between 2 and 5 computers are in use is given by
P (2 ≤ X ≤ 5) = P (X = 2, 3, 4 or 5) = 0.15 + 0.25 + 0.20 + 0.15 = 0.75
The probability that the number of computers in use is strictly between 2 and 5 is
P (2 < X < 5) = P (X = 3 or 4) = 0.25 + 0.20 = 0.45
In the above example of the number of computers in use in the computer lab of the Department
of Economics, let us verify if the probability distribution function satisfies the properties.
111 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
Firstly, the value of each probability distribution function is positive. Therefore, the first
property is satisfied. The sum of all probabilities is 1. The second property is also satisfied.
Thus, the given distribution function can serve as a probability distribution function.
Let us create a probability distribution function for the toss of two fair coins. In this random
experiment, consider the event X which is the number of heads that appear.
S: Sample space X: Random variable P(X=x) p(x)
(r.v.)
(no. of heads)
TT 0 P(X=0) = ½ * ½ ¼
HT 1 P(X=1) = ½ * ½ ¼
TH 1 P(X=1) = ½ * ½ ¼
HH 2 P(X=2) = ½ * ½ ¼
Now, one can create the probability distribution function defined for the specific values x was
taken by the random variable X that represents the event, the number of heads that appear in
the toss of two fair coins. Thus, X can take values 0,1 and 2 because in two tosses of fair coins
we can have no heads, only one head or both outcomes as head.
The probability distribution function for the discrete random variable can be represented by the
following function p(x) which defines below.
¼ if x = 0
p(x) = ½ if x = 1
¼, if x = 2
1. The value of the probability distribution function is always positive. This implies p(x)
≥0
2. The sum of all values of probability distribution function at given values of x is one.
∑2𝑥=0 𝑝(𝑥) = 1
The above function satisfies both conditions; therefore, it serves as pdf or pmf, which
can be depicted the form of a graph as below.
112 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
0.5
0.4
0.3
0.2
0.1
0
1 2 3
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12
The probability distribution or pmf is therefore given by:
x p(x) = P(X=x)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36
114 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
IN-TEXT QUESTIONS
1. A random variable that can assume any possible value between two points is called a
________________ (discrete/continuous) random variable.
2. If C is a constant in a continuous probability distribution, then p(X=c) is always equal
to Zero. This statement is True/False.
3. A listing of all the outcomes of an experiment and the probability associated with each
outcome is called:
a) Probability density function
b) Cumulative distribution function
c) Probability distribution
d) Probability tabulation
CASE STUDY
Check if the following function given by
p(x) = x + 2 / 25, for x = 1,2,3,4,5
can serve as the probability distribution of a discrete random variable.
Solution: For all the values of x, p(x) is computed as follows,
For, x=1, f (1) = 3/25
x=2, f (2) = 4/25
x= 3, f (3) = 5/25
x=4, f (4) = 6/25
x=5, f (5) = 7/25
For every x, p(x) > 0, and also the sum of all p(x) is 1. Thus, all two properties of the
probability distribution function have been satisfied.
IN-TEXT QUESTIONS
4. Find the probability distribution of the total number of heads obtained in four tosses of
a balanced coin.
5. The probability distribution of a random variable is defined as
115 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
x -1 -2 0 1 2
p(x) c 2c 3c 4c 6c
Then, c is equal to
a) 0
b) ¼
c) 1
d) 1/16
6. The suitable graph for the probability distribution of a discrete random variable is
a) Probability Histogram
b) Stepwise Function
c) Both (a) and (b)
116 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
a b
Figure 3: The density curve between a and b given by P (a ≤ X ≤ b)
The probability between intervals a and b can be computed by integrating the density function
as depicted in figure 3. This implies that probability between a and b P (a ≤ X ≤ b) can be
𝑏
obtained by integrating the density function ∫𝑎 𝑓(𝑥)𝑑𝑥.
6.5.1 Properties of Probability Density Function
Every Probability density function qualifies certain properties. The first and foremost property
is the two conditions that should be satisfied for any function of a continuous random variable
to be addressed as the probability density function.
1. A function can serve as a probability density of a continuous random variable X if its
values, p(x), satisfy the following two conditions.
(a) p(x) ≥ 0 for -∞ < x < ∞, for all x
∞
(b) ∫−∞ 𝑝(𝑥)𝑑𝑥 = 1
2. If X is a continuous random variable and a and b are real constants with a ≤ b then,
CASE STUDY
For example, If X random variable has the probability density given by
f (x) = k.e-3x for x > 0
0 elsewhere
Find k and P (0.5 ≤ X ≤ 1)
117 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
The given function satisfies the two necessary conditions for the probability density function.
(a) p(x) ≥ 0 for -∞ < x < ∞
∞ ∞
(b) ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1 = ∫−∞ k. 𝑒 −3𝑥 dx = k/3 =1
1
For k = 3, P (0.5 ≤ X ≤ 1) = ∫0.5 3. 𝑒 −3𝑥 dx = 0.173
IN-TEXT QUESTIONS
The pdf of a continuous random variable X is given by:
0.075𝑥 + 0.2, 3 ≤ 𝑥 ≤ 5
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
7. The total area under the density curve of X is
(a) 1
(b) 0
(c) ½
(d) ¼
8. Find P(X≤ 4).
9. P(X≤ 4) will be the same as P(X<4). This statement is True/False.
6.6 SUMMARY
The lesson describes the concept of a random variable which takes values that are outcomes of
random experiments. The two types of random variables namely the discrete and continuous
random variables have been described in the lesson. The discrete random variables are the
variables whose values are either finite or infinite in character. While the continuous random
variable consists either of the intervals on the number line or a disjoint union of intervals. The
probability distribution functions, also known as probability mass functions, depicting
probabilities corresponding to each outcome or value of the discrete random variable have been
derived. The corresponding graph of probability mass function has been plotted. Similarly, the
probability density function, depicting the probability associated with each of the continuous
random variable has been derived. Finally, the properties of the probability distribution and
probability density functions have been presented in the lesson.
118 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
6.7 GLOSSARY
1. Random Variable: A random variable is a rule that assigns a numerical value to each
outcome in a sample space. It is a variable because the observed value depends on which
of the possible experimental outcomes results.
2. Discrete random variable: A discrete random variable is an rv whose possible values
either constitute a finite set or else can be listed in an infinite sequence in which there
is a first element, a second element, and so on
3. Continuous random variable: The continuous random variable consists either of all
the numbers in a single interval on the number line (infinite from - infinity to infinity)
or all numbers in a disjoint union of such intervals.
4. Probability distribution function: The probabilities assigned to various outcomes in
S in turn determine probabilities associated with the values of any particular random
variable rv. X. It is a function that provides relative likelihood of occurrence of all
possible outcomes of an experiment.
5. Probability density function: The function defines the probability function
representing a continuous random variable belonging to some specified range of values.
The function provides the likelihood of values of continuous random variables.
6.8 ANSWERS TO IN-TEXT QUESTIONS
1. Discrete
2. True
3. (c) Probability Distribution
4. p(x) = 4Cx /16, for x = 0, 1,2, 3, 4
5. (d) 1/16
6. (a) Probability Histogram
7. (a) 1
8. 0.4625
9. True
6.9 SELF-ASSESSMENT QUESTIONS
1. What is the difference between discrete and continuous random variables?
2. An event management company has overbooked the tickets for an upcoming music
concert. The available seating capacity is 40 while the company has sold 45 tickets.
119 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
Suppose X denotes the number of ticketed people who actually show up for the concert.
The probability mass function of X is given by:
x 35 36 37 38 39 40 41 42 43 44 45
p(x) 0.05 0.10 0.13 0.13 0.25 0.18 0.05 0.05 0.03 0.02 0.01
What is the probability that the event management company will be able to accommodate
all ticketed people who show up?
3. Prove that the following function defined by
1
𝑓(𝑥) = {5 , 2<𝑥<7
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
can serve as a valid pdf for a random variable X.
4. The pdf of a continuous random variable Y is given by
𝑘
, 0<𝑦<4
𝑓(𝑦) = {√𝑦
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find the value of k. Also, find P (Y≥ 1)
6.10 REFERENCES
• Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences.
Cengage Learning.
• Freund, J. E., Miller, I., & Miller, M. (2004). John E. Freund's Mathematical Statistics:
With Applications. Pearson Education India.
• McClave, J. T., Benson, P. G., & Sincich, T. (2008). Statistics for business and
economics. Pearson Education.
6.11 SUGGESTED READINGS
• Rice, J. A. (2006). Mathematical statistics and data analysis. Cengage Learning.
• Larsen, R. J., & Marx, M. L. (2005). An introduction to mathematical statistics.
Prentice Hall.
120 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
LESSON 7
CUMULATIVE DISTRIBUTION FUNCTION, DENSITY FUNCTION, EXPECTED
VALUE, AND VARIANCE
STRUCTURE
7.1 Learning Objectives
7.2 Introduction
7.3 Cumulative Distribution Function (CDF)
7.3.1 Properties of Cumulative Distribution Function
7.4 Cumulative Density Function
7.4.1 Properties of Cumulative Density Function
7.5 Expected Value
7.5.1 Properties of Expected Value
7.6 Variance V(X)
7.6.1 Properties of Variance
7.7 Summary
7.8 Glossary
7.9 Answers to In-Text Questions
7.10 Self-Assessment Questions
7.11 References
7.12 Suggested Readings
7.1 LEARNING OBJECTIVES
After reading this lesson, students will be able :
1. To understand the need to evaluate the descriptive statistics of the probability
distribution function.
2. To learn the formula and method to derive the expected value of the probability
distribution function.
3. To compute and apply the expected value in different random experiments by deriving
the probability distribution functions.
4. To learn the formula and method to derive the variance of the probability distribution
function.
121 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
5. To compute and apply the variance in different random experiments by deriving the
probability distribution functions and
6. To provide exposure to various useful properties of expected value and variance to the
students and their applications.
7.2 INTRODUCTION
In the earlier lesson 6, we discussed random variables, types of random variables, and
probability distributions. Once we have the probability distribution function of a random
variable, it is essential to evaluate and assess its mean and other descriptive statistics such as
expected value and variance. The population mean for a random variable is a measure of centre
for the distribution of a random variable. The expected value is essentially a formula that
enables us to evaluate the mean value as more and more values of the random variables are
collected either by trials or random experiments or any kind of experiment involving
probability, the sample mean becomes closer and closer to the expected value. It is obtained by
summing the product of the value of the random variable and its associated probabilities over
all the values of the random variable.
Another important measure of dispersion is variance. It determines the measure of spread for
the distribution of a random variable. It reflects the degree of variability of values of a random
variable from the expected value. The variance for a given probability distribution function is
obtained by summing the product of the square of the difference between the value of the
random variable and the expected value, and the associated probability of the value of the
random variable taken across all the values of the random variable.
7.3 CUMULATIVE DISTRIBUTION FUNCTIONS (CDF)
The cumulative distribution function (cdf) of a discrete r.v. variable X with pmf f(x) is defined
for every number x by
F(x) = P (X ≤ x) = ∑𝑥𝑦=−∞ 𝑝(𝑦)
For any number x, F(x) is the probability that the observed value of X will be at most x.
Let us compute the Cumulative Distribution Function for the toss of two coins.
S: Sample space X: Random variable f(x): pmf F(x): CDF (cdf)
(r.v.) (no. of heads)
TT 0 ¼ ¼
HT 1 ¾=¼+½
TH 1 ½
HH 2 ¼ 1=¼+¾
122 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
The first column represents the elements in the sample space. The next column presents the
values of a random variable which is defined as the number of heads in the toss of two coins,
denoted by X random variable. The values that random variable X takes are denoted by x and
these values are 0,1 or 2. The third column represents the corresponding probabilities
associated with each value of random variable X. The fourth column is obtained by adding or
cumulating all the earlier probabilities till each random variable.
The Cumulative Distribution Function for the toss of two coins can be presented in the
following way.
¼ if x< 1
F(x) = ¾, if 1 ≤ x < 2
1, if x ≥ 2
1. f(x) ≥ 0
The probability distribution function is positive for each value of random variable X.
2. F (x) = ∑𝐷 𝑓(𝑥)= 1
This property signifies that the value of the cumulative distribution function for the last
value of the random variable for which it is defined is equal to 1. This is due to the fact that
while cumulating all the probabilities till the last value of random variable X for which the pdf
is defined, we tend to exhaust all plausible values or outcomes of sample space and therefore
the value of CDF function is 1 as the probability of whole sample space is 1.
The graph of the above cumulative distribution function can be presented in figure 1 below.
123 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
The graph of the cumulative distribution function is a step function. For all the values of
random variables less than zero, the value of the cumulative distribution function CDF is zero
since the probability or the value of the probability distribution function is zero. The cumulative
distribution function is defined for intervals where the extreme points of each interval are not
included in the interval concerned but included in the next slab or interval. At each interval the
values of the probability function keep getting added or cumulated, thus the cumulative
distribution function appears like a ladder or steps moving upwards. At the last value of the
random variable, all the values of probability distributions get added and the value of the
cumulative distribution function attains the value one, where it reaches the maximum value
and the value of cdf remains at one for all the infinite values of random variable for which it is
defined.
Consider whether the next person buying a computer at a university bookstore buys a laptop or
a desktop model.
X = 1, if the customer purchases a laptop computer
0, if the customer purchases a desktop computer
If 20% of all purchasers during a week select a laptop computer
For X = 0, p (0) = P(X=0) = P (next customer purchases a desktop model) = 0.8
For X = 1, p (1) = P (X =1) = P (next customer purchases a laptop model) = 0.2
p(x) = P (X = x) = 0 for x ≠ 0, 1
In the above activity, the example mentioned related to the next person buying a computer at a
university bookstore buying a laptop or a desktop model. The cumulative distribution function
can be derived in the following manner.
0, if x< 0
F(x) = 0.8, if 0 ≤ x < 1
1, if x ≥ 1
124 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
The value of CDF remains zero for all negative values of x for which the probability
distribution function is zero, while the CDF takes a value of 0.8 between 0 and 1 and finally 1
for all values 1 and greater than 1.
125 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
The CDF is a non-decreasing function in the random variable X. This implies for each greater
value of X; the value of the cumulative distribution function is also greater.
If the probability distribution of a discrete random variable is given, the corresponding
distribution function can be derived.
The distribution function of the total number of heads obtained in four tosses of a balanced
coin can be obtained as below. For x = 0,1,2,3,4.
‘
f (0) = 1/16, f(1) = 4/16 , f(2) = 6/16, f(3) = 4/16 , and f(4) = 1/16
It follows that
F (0) = f (0) = 1/16
F (1) = f (0) + f(1) = 5/16
F (2) = f (0) + f(1) + f(2) = 11/16
F (3) = f(0) + f(1) + f(2) + f(3) = 15/16
F (4) = f (0) + f(1) + f(2) + f(3) + f (4) = 1
The properties of Distribution Function are satisfied by the above CDF.
1. F(- ∞ ) = 0 and F ( ∞ ) = 1
2. If a < b, then F (a) ≤ F (b) for any real numbers a and b.
Therefore, the distribution function is given by
0 for x < 0
1/16 for 0 ≤ x < 1
F(x) = 5/16 for 1 ≤ x < 2
11/16 for 2 ≤ x < 3
15/16 for 3 ≤ x < 4
1 for x ≥ 4
The distribution function is defined not only for the values taken on by the given random
variable but for all real numbers.
F (1.7) = 5/16 and F (100) =1, although the probabilities of getting “at most 1.7 heads” or “at
most heads” in four tosses of a balanced coin may not be of any real significance.
If the range of a random variable X consists of the values x1< x2< x3 < x4 ……<xn , then
126 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
0.25 x=0
f(x) = 0.10 x=1
0.30 x=2
0.35 x=3
(A) Calculate the Probability that almost 1 bulb is defective.
(B) find CDF.
2. Let X be the number of a group that attended the festival of the college.
X 0 1 2 3 4 5
P(X) 0.20 0.10 0.05 0.15 0.30 0.20
(a) Find the probability that at most two groups attended the festival.
(b) Find the probability that at least four groups attended the festival.
3. Cumulative distribution function of a random variable Y is the probability that Y takes
the value _____
(a) Equal to Y
(b) Greater than Y
(c) Less than or equal to Y
(d) Zero
127 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
f(x) F(x)
F(x) 1
x
Figure 2 (a): Probability density function Figure 2(b) : Cumulative density function
Figure 2 (a) depicts the probability density function while figure 2(b) depicts the cumulative
density function.
7.4.1 Properties of Cumulative Density Function
There are certain properties of cumulative density function:
128 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
f (x) = dF(x) / dx
where the derivative exists.
The third property indicates that one can obtain the probability function from the given
distribution or cumulative density function by simply differentiating the density function.
CASE STUDY
In the case study depicted in lesson 7, section 7.5.1, If X random variable has the probability
density given by
f (x) = k.e-3x for x > 0
0 elsewhere
= 1 - e-3t
Since F(x) = 0 for x ≤ 0,
F(x) = 0 for x ≤ 0
=1 -e-3x for x > 0
Now, to determine the probability P( 0.5 ≤ X ≤ 1 ), Cumulative density function F (x) can be
used
P( 0.5 ≤ X ≤ 1 ) = F(1) - F (0.5)
= (1 -e-3 ) - (1 - e-1.5 )
= 0.173
129 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
IN -TEXT QUESTIONS
4. Given the following probability density function pdf
130 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
If the r.v. X has a set of possible values D and pmf f(x), then the expected value of any function
h(x), denoted by E [ h(x)] or μh(x) is computed by
E [ h(x)] or μh(x) = ∑𝐷 ℎ(𝑥). 𝑓(𝑥)
The expected value of h(x) is given by the sum of the product of h(x) and the corresponding
probability distribution function f(x) as shown in the above expression, where D is the domain.
Example 1: Suppose there is a collection of 12 audio sets that include 2 with white cords. If
three of the sets are chosen at random for shipment to a hotel, how many sets with white cords
can the shipper expect to send to the hotel?
First, we need to construct the probability distribution of X, the number of sets with white cords
shipped to the hotel, given by
2 10
f(x) = Cx * C3-x for x = 0,1,2
10
C3
X 0 1 2
131 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
E(b) = b
2. Expectation of the sum of the two random variables X and Y is equal to the sum of the
expectations of those random variables.
E (X+Y) = E (X) + E (Y)
3. The expected value of ratio of two random variables is not equal to the ratio of the
expected values of those random variables.
E (X/ Y) ≠ E(X) / E(Y)
4. The expected value of the product of two random variables that are dependent is not
equal to the product of expectations of those random variables.
E (XY) ≠ E(X) * E(Y)
However, if X and Y are independent random variables then
E (XY) = E(X) * E(Y)
Hint: Since the joint probability mass function for all values of X and Y is equal to the product
of individual probability distribution function pdf of two random variables for all values of
variables.
132 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
5. The expected value of the square of X is not equal to the square of the expected value
of X
E (X2) ≠ [ E(X) ]2
6. If a is a constant, then
E (aX) = a E (X)
133 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
Let X have pmf f(x) and expected value E(X) as μₓ. Then the variance of X denoted by V(X)
or 𝝈 x²X is
V(X) = 𝝈 x²X = ∑ (X - μₓ) ². f(x) = E (X - μₓ) ²
The standard deviation sd of X is 𝝈x = √ 𝝈²x
Let us consider the computation of variance of the two coins
X: Random f(x) :pmf x.f(x) x 2 f (x)
variable (r.v.) (no.
of heads)
0 ¼ 0* 1/4 0*¼ = 0
1 ½ 1 * 1/2 1* ½ = ½
2 ¼ 2 * 1/4 4* ¼ = 1
E (X) = ∑𝐷 𝑥. 𝑓(𝑥) or E (X) = 1 E (X²) = ∑𝐷 𝑥 2 . 𝑓(𝑥) = 1.5
134 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
A CASE STUDY
Calculate the variance of a random variable X that represents the number of points rolled with
a balanced die.
X 0 1 2 3 4 5 6
V (X + Y) = V (X) + V (Y)
V (X - Y) = V (X) - V (Y)
135 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
4. If b is a constant then,
V (X + b) = V (X) + V (b)
= V (X) + 0
= V (X)
5. If a is a constant then,
V (aX) = a2 V (X)
V (5X) = 25 V (X)
6. If a and b are constants then,
V (aX + b) = a2 V (X) + 0
V (5X + 9) = 25 V (X)
7. If X and Y are independent random variables and a & b are constants then,
V (aX + bY) = a2 V (X) + b2 V (Y)
V (3X + 5Y) = 9 V (X) + 25 V (Y)
8. The variance can be computed as
V (X) = E (X2) - [ E (X) ]2
E (X2) = ∑𝐷 𝑥 2 . 𝑓(𝑥)
E (X) = ∑𝐷 𝑥. 𝑓(𝑥)
IN-TEXT QUESTION
10. V(X) = 9, Find V(2X).
(A) 18
(B) 9
(C) 36
(D) 72
11. If the standard deviation of a set of observations is 5 and if each observation is divided
by 5, then the new standard deviation is.
(A) 1
(B) 2
(C) 4
(D) 5
136 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
7.7 SUMMARY
The lesson presents the cumulative distribution function both for the discrete and continuous
random variables. The properties of the cumulative distribution function have been discussed
with respect to both the discrete random variable and continuous random variable. The method
to derive the probability density function from the cumulative density function by
differentiating the cumulative density function is explained. Further the descriptive statistics
such as expected value and variance is computed for probability distribution function. There
are several useful and interesting properties of expected value and variance comprehensively
explained in the lesson.
7.8 GLOSSARY
1. Cumulative Distribution Function: The cumulative distribution function is another
way of defining the distribution of the discrete random variable. The cumulative
distribution function (cdf) is a discrete r.v. variable X with pmf p(x) is defined for every
number x
2. Cumulative Density Function: The cumulative density function, also referred to as
Density Function is the cumulative function of probability density distribution for
continuous random variables.
3. Expected value of Distribution Function: Let x be a discrete rv with a set of possible
values D and pmf p(x). The expected value or mean value of X, denoted by E (X) or μₓ.
It is the sum of the product of each random variable and the associated probability
function
4. Variance of Distribution Function: The variance is the square of the mean of
deviation between the values of random variables from the expected value or population
means. The variance of the random variable X is denoted by 𝛔2, Var (X). The
symbol represents standard deviation under root of variance.
7.9 ANSWERS TO IN-TEXT QUESTIONS
1. (A) 0.35
0 x<0
0.25 0 ≤ x<1
(B) F(x) = 0.35 1≤x<2
0.65 2≤x<3
1 x≥3
137 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.A.(Programme)
2. (A) 0.35
(B) 0.50
3. (C) less than or equal to x
4. (A) 1/9
(B) 19/27
5. (A) 0.15
(B) 0.35
∞
6. (B) ∫−∞ 𝑦. 𝑓(𝑦)𝑑𝑦
7. (b) independent
8. (C) 3
9. (b) 2
10. (c) 36
11. (a) 1
7.10 SELF-ASSESSMENT QUESTIONS
Q. 1) If distribution function of X is given by
0 x<2
2/10 2 ≤ x<4
F(x) = 6/10 4≤x<6
7/10 6≤x<8
1 x≥8
138 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Q.4 A coin is tossed thrice. Let X denotes the number of tails. Find its expectation and
variance
Q.5 The probability density function of a random variable Y is given as below:
• Freund, J. E., Miller, I., & Miller, M. (2004). John E. Freund's Mathematical Statistics:
With Applications. Pearson Education India.
• McClave, J. T., Benson, P. G., & Sincich, T. (2008). Statistics for business and
economics. Pearson Education.
7.12 SUGGESTED READINGS
• Rice, J. A. (2006). Mathematical statistics and data analysis. Cengage Learning.
• Larsen, R. J., & Marx, M. L. (2005). An introduction to mathematical statistics.
Prentice Hall.
140 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
LESSON-8
DISCRETE DISTRIBUTION
STRUCTURE
8.1 Learning Objectives
8.2 Introduction
8.3 Probability Distribution for Discrete Random Variable
8.3.1 Uniform Distribution
8.3.2 Bernoulli Distribution
8.3.3 Binomial Distribution
8.3.4 Poisson Distribution
8.3.5 Limiting Case of Binomial Distribution
8.3.6 Hyper Geometric Distribution
8.4 Summary
8.5 Answer to In-Text Questions
8.6 Self-Assessment Questions
8.7 References
8.1 LEARNING OBJECTIVES
After reading this lesson, students will be able to learn about :
1. Different kinds of discrete distributions.
2. Uniform distributions, Bernoulli distributions, Bernoulli trials,
3. Binomial distributions, Poisson distributions with some important numerous and
4. Waiting distributions i.e., geometric distributions, negative binomial and
hypergeometric distributions have been discussed.
8.2 INTRODUCTION
In this unit we will study the different types of discrete distributions. We have studied the
discrete random variable in unit 3. The discrete random variables form the discrete probability
141 | P a g e
distributions. Possible values of discrete random variables along with the probabilities forms
the discrete probability distribution.
1. Uniform Distribution
2. Bernoulli Distribution
3. Binomial Distribution
4. Poisson Distribution
5. Limiting case of Binomial distribution
6. Hyper-geometric distribution
142 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
0 ; otherwise
where represents average number of successes.
E(X) = V (X) = i.e., mean and variance =
For eg: Noting the number of deaths in an area during the month.
The number of cars arriving at parking during a given period of time.
The number of errors made by typist per page.
The number of defective bulbs in a manufacturing unit etc.
The average number of customers per hour at a shop.
Visits to a particular website, e mail messages sent to a particular address.
Accidents in an industrial facility.
Cosmic ray showers observed by astronomers at a particular observatory.
These are some of the examples where Poisson distribution can be used.
144 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
For example, the number of customers at a shop is 4 in an hour. Find the probability that during
an hour (i) no customer arrived (ii) 2 or more customer arrived at shop.
Let X be the number of customers at shop is an hour.
So, X follows Poisson distribution with = 4
As represents average number of successes.
So, every hour on average 10 customers arrives at the shop.
𝑒 −λ λ𝑥
So, 𝑓(𝑋 = 𝑥) = ; x=0,1, 2,3…...infinity
𝑥!
0 ; otherwise
4
e 40
f (X 0) 0.01831
(i) 0!
(ii) f(X2)=1-f(X<2)
= 1 – [f (x = 0) + f (x = 1)]
𝑒 −4 41
= 1-[0.1831+ ]
1!
= 1 – [0.01831 + 0.07326]
= 0.90843
8.3.5 Limiting Case of Binomial Distribution
If the probability of success in the binomial distribution is too small and number of trials are
large, then binomial distribution can be approximated to Poisson distribution. In such an
approximation the average number of successes is the mean of binomial distribution which is
np i.e.
= np
Mathematically, if n→ ∞ and p → 0 then binomial distribution is approximated to Poisson
distribution.
For e.g., the probability of ineffective covid vaccine is 0.002, determine that out of 1000
individuals:
(i) exactly 2 will suffer Covid infection after being vaccinated.
(ii) more than 2 will suffer from infection.
145 | P a g e
Answer. since the p is 0.002 i.e., p → 0 and n is large i.e., 1000. So, the Binomial distribution
will approximate to Poisson distribution with = np.
= np
0.002
1000
1000
= 2
e 2 22
f (X 2) 0.2706
(i) 2!
(ii) f(X>2)=1-f(X2)
1 f (X 0) f (X 1) f (X 2)
𝑒 −2 20 𝑒 −2 21 𝑒 −2 22
=1-[ + + ]
0! 1! 2!
X 0 1 2 3 4
F 25 68 45 12 5
146 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
with N Bals of which K are Red and remaining N-K are white. Let us draw a random sample
of n balls without replacement. Then the probability of getting x red balls out of ‘n’ is given by
K N K
Cx Cn x
f (x)
N Cn
;x = 0, 1, 2, ....n
0 otherwise
For example: Let there be 5 economics and 10 commerce graduates. The organization requires
5 analysts that are to be chosen from the economics and commerce students. Find the
probability of selecting 3 economics students for this job.
K = 5 = number of economic students
N – K = 10 = number of commerce student
N = 15 total student
x = selecting economics student as analyst
n = total student selected as analyst
K N K
Cx Cn x
f (x)
N Cn
5 10
C3 C2
f (x 3) 15
C5
= 0.1498
8.4 SUMMARY
A thorough knowledge of all the discrete distribution will help to assess the level of uncertainty
and plan accordingly. All the distribution along with their probability mass function, mean and
variance are shown in the following table.
Distribution Probability Mass Function PMF Mean Variance
Uniform 1 n2
Distribution f (x x) f (x) x 1, 2,3.....n n 12
n
2
147 | P a g e
Binomial n
Cr p r q n r np np(1-p)
Poisson 𝑒 λ λ𝑥
𝑥!
λ = 0.00006×100000
100000
λ =6
𝑒 −6 61
f(X=x)= 1!
= 0.0148
So, Answer is 0.0148
2. Bernoulli
3. small; large
148 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
214
4p=155
107 203
p=310 ; q=310
1 107 1 203 3 60
𝐶14 𝑋 (310) X(310) =0.3876
2 107 2 203 2
𝐶24 𝑋 (310) X(310) =0.3065 47.5148
3 107 3 203 1
𝐶34 𝑋 (310) X(310) =0.1077 16.6917
X 0 1 2 3 4
F 28 60 48 17 2
149 | P a g e
150 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
LESSON 9
CONTINUOUS DISTRIBUTION
STRUCTURE
9.1 Learning Objective
9.2 Introduction
9.2.1 Uniform Distribution
9.2.2 Exponential Distribution
9.2.3 Normal Distribution
9.2.4 Standard Normal Distribution
9.2.5 Central Limit Theorem
9.3 Summary
9.4 Answer to In-Text Questions
9.5 Self-Assessment Questions
9.6 References
9.1 LEARNING OBJECTIVES
After reading this lesson, students will be able to learn :
1. Uniform and exponential distributions.
2. Normal distribution with its properties.
3. Standard Normal distribution with its applications and
4. Central limit theorem and in applications.
9.2 INTRODUCTION
In previous lessons, you have learned about continuous random variables and associated
functions. In this chapter you will read about different kinds of continuous distribution. Normal
distribution is used extensively in economics and statistics. Several important applications of
normal distribution with cognizable examples have been discussed. The central limit theorem
is one of the most celebrated theorems in statistics and is used extensively.
151 | P a g e
140 70
= −
90 90
70
= = 0.78
90
130
1
P (50 Y 130) = 90
dx
50
152 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
130 50
= − = 0.89
90 90
9.2.2 Exponential Distribution
A continuous non-negative random variable X is said to have an exponential distribution with
parameters if its pdf is given by
f ( X ) = e− x ; x≥0
0 ; otherwise
Exponential distribution is also known as waiting distribution where waiting parameter is .
1 1
E( X ) = and V (X ) =
So, 2
x
P ( X x) = e − t dt
Cumulative density function 0
− t
x
P( X x) = e
− 0
= 1 − [1 − e− x ]
= e− x
Example: A call center receives 4 calls per hour. What is the probability that next call arrives
after ½ hours?
Solution: So = 4
P X = (4e −4 x )dx
1
2 1/ 2
153 | P a g e
−4 x
4 e
= −4 1/ 2
= −e−4 + e−4/ 2
= −e− + e−2
= 0.13533
9.2.3 Normal Distribution
This is the most important distribution developed in 1733 by French Mathematician De Moivre.
The normal distribution is also called Gaussian distribution as German Mathematician
Friedrich Gauss (1777-1855) derived in equation. It is a symmetric bell-shaped curve
A random variable X is called normal random variable
1 X − 2
1 −
f (X ) = e 2 − x
2
Where constant π=3.14 and e=2.718. µ and σ are the two parameters of the distribution and X
is a real number denoting the continous random variable of interest.
Properties
• It is symmetric through the mean.
• Because of the symmetry at the points of inflexion at ±σ distance, the normal curve has
a bell shape
• The right and left tails of the curve extend infinitely without touching the horizontal x
axis.
• In normal distribution Mean = Median = Mode.
• The area between two points a and b is represented by the shaded region.
154 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Property O
(1) p (a X b) = P(a X b) = P (a X b) = P (a X b)
155 | P a g e
(2) In standard Normal curve areas at the right and left of 0 is 0.5
Normal curve is symmetric So, any area between 0 and a particular point c and area between 0
and point −c will be same.
P(−c z 0) = P(0 z c)
For e.g., Area between 0 and 1.45 will be equal to area between O and −1.45.
156 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1. P( X 15)
2. P( 10< X<25)
3. P(12 X 36)
157 | P a g e
(II)
10 − 30 X − 25 − 30
P(10 X 25) = P
(III) 4 4
= P(−5 Z −1.25)
= P(−5 Z 0) = P(0 Z −1.25)
= 0.49999 − 0.3944
= 0.1055
12 − 30 X − 36 − 30
P(12 X 36) = P
(IV) 4 4
X −
= P(−4.5 1.5)
158 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
= P(−4.5 Z 1.5)
= (−4.5 Z 0) + P(0 Z 1.5)
= 0.4999 + 0.4332
= 0.9331
IN-TEXT QUESTIONS
1. In normal distribution, 34% of the items are under 50 and 5% of items are over 70. Find
the mean and variance of the distribution.
2. Standard normal distribution has mean ………... and variance…………
SKEWNESS
When a distribution aperture from its symmetry then it said to have a skewed distribution.
There are two types of skewed distribution, i.e., positive and negatively skewed distribution. A
positive skewed distribution is skewed to right or have a longer tail at the right side.
Similarly, negatively skewed distribution is skewed towards left or have longer tail at the left
side.
159 | P a g e
KURTOSIS
The degree of peakedness is determined by the kurtosis. A high peak distribution is
characterized as leptokurtic
A low peak distribution is characterized as Platykurtic
The normal distribution is Mesokurtic neither too peaked nor too low.
So, normal distribution is symmetric i.e., neither positive nor negatively skewed and they are
mesokurtic i.e., neither too high peaked nor too low peaked.
160 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
# Rule E ( X + Y ) = E ( X ) + E (Y )
X i = X1 + X 2 + ... + X n
E(X i ) = E( X1 + X 2 + ... + X n )
E(X i ) = + + .... +
To find variance of
V (X i ) = V ( X1 + X 2 + X 3 + ... X n ) .
X1 + X 2 + X 3 + ... X n ~ N ( , 2 ) V ( X1 ) = 2 , V ( X 2 ) = 2 .....V ( X n ) = 2
Since , So,
V (X i ) = V ( X1) + V ( X 2 ) + V ( X 3 ) + ...V ( X n )
V (X i ) = 2 + 2 + ... + 2
161 | P a g e
X i
X=
n
X
E( X ) = E i E ( X i ) = n
n From (1)
1
= E(X i )
n
1
= n
n
=
X
V (X ) = V i
n # Rule V (aX ) = a V ( X )
2
V (X i )
V (X ) =
n2 V (X i ) = n 2
from (2)
n 2
V (X ) =
n2
2
V (X ) =
n
2
X ~ N ,
n
Ques. In a large population the distribution of a variable has a mean of 165 and standard
deviation 25 units. If a random sample of size 35 is chosen, find the approximate probability
that the sample mean lies between 162 and 170.
2
Solution: X ~ N (165, 25 )
Where, sample size (n) = 35
We have to find the distribution of sample mean
162 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
2
X ~ N ,
n , Since = 165, 2 = 252
2
X ~ N 165, 25
35
252 2
Note that 35 i.e., n is variance
2
=
Standard deviation is n n
= P (−0.70 z 1.18)
= P( z 1.18) + P( z 0.70)
as P( z 0.70) = P( z −0.70)
= 0.3810 + 0.2580
= 0.639
As the sample size increases then distribution of X will tend to normal distribution. For a
distribution to be approximated to normal distribution, sample size must be at least 30 or in
other words n 30. As the sample size increases, even discrete distribution approximates
normal distribution.
163 | P a g e
IN-TEXT QUESTIONS
3. Consider a random sample of size 30 taken from a Normal distribution with Mean 60
and variance 25. Let the sample mean be denoted by X . So, calculate the probability
that X assumes a value greater than 62.
164 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
165 | P a g e
2
;
n
166 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
LESSON 10
JOINT PROBABILITY DISTRIBUTION AND MATHEMATICAL EXPECTATIONS
STRUCTURE
10.1 Learning Objectives
10.2 Introduction
10.3 Joint Probability Mass Function
10.3.1 Conditional Probability Distributions
10.3.2 Independence of Random Variables
10.3.3 Marginal Probability Mass Functions
10.3.4. Expectations of Probability Mass Functions
10.4 Continuous Random Variables
10.4.1 Marginal Probability Density Functions
10.4.2 Expected Value of a Probability Density Function
10.4.3 Conditional Probability Distributions
10.5 Summary
10.6 Glossary
10.7 Answers to In-Text Questions
10.8 Self-Assessment Questions
10.9 References
10.10 Suggested Reading
167 | P a g e
P[( X , Y ) A] = P ( x, y )
( x , y ) A
(c)
It must be noted that conditions (a) and (b) are required for P ( x, y ) to be a valid joint PMF.
Example-1
Consider two random variables X and Y with joint PMF as shown in the table below:
Y=0 Y=1 Y=2
X=0 1/6 1/4 1/8
X=1 1/8 1/6 1/6
168 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
(i) P ( X = 0, Y 1)
(ii) P (Y = 0, X 1)
Solution:
Example-2
A function f is given by 𝑓(𝑥, 𝑦) = 𝑐𝑥𝑦 for 𝑥 = 1,2,3 ; 𝑦 = 1,2,3
Determine the value of 𝑐 for which the above function 𝑓(𝑥, 𝑦) 𝑣𝑎𝑙𝑖𝑑𝑎𝑡𝑒 𝑎𝑠 𝑡𝑟𝑢𝑒 𝑝. 𝑚. 𝑓.
Solution:
From the question given,
𝑓(𝑥, 𝑦) = 𝑐𝑥𝑦
Where, 𝑥 = 1,2,3 𝑎𝑛𝑑 𝑦 = 1,2,3.
There are 9 possible pairs of 𝑋 𝑎𝑛𝑑 𝑌, namely (1,1), (1,2), (1,3), (2,1), (2,2), (2,3),
(3,1), (3,2) 𝑎𝑛𝑑 (3,3). The probabilities associated with each of the pairs are:
𝑓(1,1) = 𝑐(1)(1) = 𝑐
𝑓(1,2) = 2𝑐, 𝑓(1,3) = 3𝑐, 𝑓(2,1) = 2𝑐
𝑓(2,2) = 4𝑐, 𝑓(2,3) = 6𝑐, 𝑓(3,1) = 3𝑐
𝑓(3,2) = 6𝑐, 𝑓(3,3) = 9𝑐
For 𝑓(𝑥, 𝑦) to be a valid joint 𝑝𝑚𝑓,
∑ ∑ 𝑓(𝑥, 𝑦) = 1
𝑦
𝑥
Hence,
3 3
∑ ∑ 𝑓(𝑥, 𝑦) = 𝑐 + 2𝑐 + 3𝑐 + 2𝑐 + 4𝑐 + 6𝑐 + 3𝑐 + 6𝑐 + 9𝑐 = 1
𝑥=1 𝑦=1
36𝑐 = 1
169 | P a g e
1
𝑐=
36
1
Thus, for, 𝑐 = the given function is a valid probability mass function.
36
where C , D R .
For discrete random variables X and Y, the conditional PMFs of X given Y and Y given by X
respectively are given by
PXY ( xi , y j )
PX |Y ( xi / y j ) =
PY ( y j )
{for any
xi RX and
PXY ( xi , y j )
PY | X ( y j / xi ) =
PY ( xi ) y j RY
Example-3
Consider two random variables X and Y with joint PMF as shown in the table below
Y=2 Y=4 Y=5
X=1 1/12 1/24 1/24
X=2 1/6 1/12 1/8
X=3 1/4 1/8 1/12
170 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
(a) P( X 2, Y 4)
(b) P(Y = 2 | X = 1)
Solution:
(a) P( X 2, Y 4)
= PXY (1,2) + PXY (1,4) + PXY (2,2) + PXY (2,4)
1 1 1 1 3
= + + + =
12 24 6 8 8
P( X = 1, Y = 2)
P(Y = 2 | X = 1) =
(b) P( X = 1)
P (1, 2) 1 1 1
= XY = =
PX (1) 12 6 2 Ans.
for all
xi RX and for all y j RY
1
P( X = 2, Y = 2) =
6
3 1 3
P( X = xi ) P(Y = y j ) = =
8 2 16
1 3
6 16
171 | P a g e
If ( X , Y ) are discrete variables, then marginal probability is the probability of a single event
that occur independent of another event.
PX i ( x) = PX ( x1, x2 ,..., xk )
X1... X k
In words the marginal PMF of Xi at the point X is obtained by taking the sum of the joint PMF
PX out all the vectors that belong to RX in such a way that is component is equal to X.
Example-5
Carrying forward from example 3, find the marginal PMFs of X and Y.
Solution
RX = {1,2,3}, RY = {2,4,5}
Marginal PMFs are given by
1
6 , for X = 1
3 for X = 2
PX ( x) = 8
11
for X = 3
24
0 Otherwise
1
2 , for Y = 2
1 for Y = 4
PY ( y ) = 4
1
for Y = 5
4
0 Otherwise
172 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Let X and Y be a jointly distributed Random variable with probability mass function P ( x, y )
with discrete variables. Then the expected value of function g ( x, y ) is given by
E[ g ( X , Y )] = g ( X , Y ) P ( x, y )
x y
Example-6
Find E(XY) for data given in example 2
Solution:
1 1 1 1 1
= (1 × 2 × ) + (1 × 4 × ) + (1 × 5 × ) + (2 × 2 × ) + (2 × 4 × )
12 24 24 6 12
1 1 1 1
+ (2 × 5 × ) + (3 × 2 × ) + (3 × 4 × ) + (3 × 5 × )
8 4 8 12
177
= = 7.38
24
IN–TEXT QUESTIONS
Answer the following MCQs
(a) pq + (1 − p )(1 − q )
(b) pq
(c) p (1 − q )
(d) 1 − pq
2. If a variable can take certain integer values between two given points, then it is called–
(a) Continuous random variable
(b) Discrete random variable
(c) Irregular random variable
(d) Uncertain random variable
173 | P a g e
3. If E (U ) = 2 and E (V ) = 4 then E (U − V ) = ?
(a) 2
(b) 6
(c) 0
(d) Insufficient data
4. Height is a discrete variable (T / F)
5. If X and Y are two events associated with the same sample space of a random
experiment. then P ( X | Y ) is given by
(a) P ( X Y ) / P(Y ) provided P(Y ) 0
(c) P ( X Y ) / P (Y )
(d) P( X Y ) / P( X )
174 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Joint density function is a piecewise continuous function of two variables f ( x, y ) , such that
for any "reasonable" two-dimensional set B
P( X , Y ) A = f ( x, y)dydy
A .
Definition: Let X and Y be continuous random variables. A joint density function f ( x ) for
these two variables is a function satisfying
(a) f ( x, y ) 0
and
f ( x, y )dxdy = 1
− −
Example-7
The joint PDF of (X, Y) is given by
6
(x + y ) 0 x 1, 0 y 1
2
f ( x, y ) = 5
0 Otherwise
f ( x, y )dxydy = 1
(ii) − −
the first condition is fulfilled as f ( x, y ) 0 for the verification of the second condition –
175 | P a g e
1 1
6
f ( x, y )dxdy = 5
( x + y 2 )dxdy
− − 0 0
1 1 1 1
6 6 2
= x dxdy + y dxdy
0 0
5 0 0
5
1 1
6 6
= x dxdy + y 2 dxdy
0
5 0
5
P 0 X , 0 Y
1 1
(b) 4 4
1/ 4 1/ 4
6
= 5
( x + y 2 )dxdy
0 0
1/ 4 1/ 4 1/ 4
6 6
=
5 x dxdy +
5 y 2 dxdy
0 0 0
1 1
2 x= 4 3 y= 4
= 6 x + 6 y
20 2 x =0 20 3 y =0
7
=
640 Ans.
Example-8
Consider two continuous random variables X and Y with joint p.d.f.
2 2
𝑓(𝑥, 𝑦) = {81 𝑥 𝑦, 0 < 𝑥 < 𝐾, 0 < 𝑦 < 𝐾
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
176 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Solution:
a) for 𝑓(𝑥, 𝑦) 𝑡𝑜 𝑏𝑒 𝑎 𝑣𝑎𝑙𝑖𝑑 𝑝. 𝑑. 𝑓. conditions of continuous p.d.f. must satisfy
𝑘 𝑘 2 𝐾5
therefore, ∫0 ∫0 𝑥 2 𝑦𝑑𝑥𝑑𝑦 = 1 = 243 => K = 3
81
𝑥
3 2
b) P(X>3Y) = ∫0 (∫03 81 𝑥 2 𝑦𝑑𝑦) 𝑑𝑥
3 1
= ∫0 𝑥 4 𝑑𝑥
729
1
= 15
1
6 6x 2
= ( x + y 2 )dy = +
0
5 5 5
177 | P a g e
6 2
x+ , 0 x 1
f X ( x) = 5 5
0 otherwise
fY ( y ) = f ( x, y )dx
−
1
6
= ( x + y 2 )dx
0
5
6 2 3
= y +
5 5
6 2 3
y + for 0 y 1
fY ( y ) = 5 5
0 Otherwise
10.4.2 Expected value of a PDF
Let X and Y be a continuous random variable with joint PDF f ( x, y ) . Let g be some function,
then
E[ g ( x, y )] = g ( x, ly ), f ( x, y ) dxdy
− −
Example-10
The length of a thread is 1 mm, and two points are chosen Uniformly and independently along
the thread. Find the expected distance between these two points.
Solution
Let U and V be the two points that are chosen. The joint PDF of U and V is
1 0 U ,V 1
f (U ,V ) =
0 otherwise
1 1
E[U − V ] = | U − V | dUdV
01 0
178 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1 1 1 1
= (U − V )dUdV + (V − U )dUdV
0 0 0 0
1
E[U − V ] =
3
Example 11
The joint PDF of X and Y is given by
3
( x + y) 0 x 1
f ( x, y ) = 7
0 otherwise
2
find the expected value of X / Y .
Solution
2 1
3x( x + y )
E[ X , Y ] =
2
dxdy
1 0 7 y2
2
3 1 1
= 2 + dy
7 1 3y y
3
E[ X , Y 2 ] =
28 Ans.
E[ X | Y ] = xf X |Y ( x | y)dx
179 | P a g e
Similarly, one can define the conditional PDF, expected value of Y given X = X by
interchanging the rate of X and Y.
Properties of Conditional PDFs
The conditional PDF for X, given Y = Y is a valid PDF if two conditions are satisfied–
0 f X |Y ( x, y)
(1) (a)
(b) f X |Y ( x | y)dx = 1
(2) The conditional distribution of X given Y does not equal the conditional distribution of Y
given X.
f f ( x | y) fY | X ( y | x)
i.e. X |Y
Example 12
If the joint PDF of U and V is given by
2
(U + V ) 0 U 1, 0 V 1
f (U ,V ) = 3
0 Otherwise
so that
2
1 (U + 1) 0 U 1
f U = 3
2
0 otherwise
1
1 2
E U = U (U + 1)dV
2 0 3
then
1 5
E U =
2 9
180 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
IN TEXT QUESTIONS
(b) a = 1, b = 4
(c) a = 1, b = −1
(d) a = 0, b = 0
10. Two random variables X and Y are distributed according to
( x + y) 0 x 1, 0 y 1
f X ,Y ( x, y) =
0 otherwise
The probability P( X + Y 1) is
(a) 0.66
(b) 0.33
(c) 0.5
(d) 0.1
11. What are the two important conditions that must be satisfied for f ( x, y ) to be a
legitimate PDF.
12. When do the conditional density function get converted into the marginal density
function?
(a) Only if random variable exhibits statistical dependency.
(b) Only if random variable exhibits statistical independency
(c) Only if random variable exhibit deviation from its mean value
(d) None of the above.
13. Let U and V be jointly distributed continuous variable and joint PDF is given as
(d) 0 P ( x, y ) 1
P( x, y ) = 1
x y
(e)
Respective counterpart is important in the case of continuous random variable. Conditional
probability is the probability of happening one event when the other event has already occurred.
X and Y are called independent if the joint p.d.f. is the product of the individual p.d.f.’s,
i.e., if f(x, y) = fX (x). fY (y) for all x, y.
10.6 GLOSSARY
Conditional Probability: a measure of the probability of an event occurring given that another
event has already occurred
Marginal probability Density Function: obtained by integrating the joint PDF of one variable
keeping the other constant.
E[ g ( x, y )] = g ( x, ly ), f ( x, y ) dxdy
Expected Value of a PDF: − −
182 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1. d 8. a
2. b 9. a
3. a 10. b
4. False 11. Refer to example 5
5. a 12. b
6. c 13. a) Yes
7. a b). Yes
c) Yes
(a) P( X + Y 1)
2 2
(b) P(2 X − Y 0)
3. Let X and Y be two jointly distributed continuous random variable with joint PDF
6 xy 0 x 1, 0 y x
f X ,Y ( x, y ) =
0 otherwise
10.9 REFERENCES
• Devore J. L. (2012). Probability and statistics for engineering and the sciences (8th
ed.; First Indian reprint 2012). Brooks/Cole Cengage Learning.
• Rice J. A. (2007). Mathematical statistics and data analysis (3rd ed.).
Thomson/Brooks/Cole.
• Johnson R. A. & Pearson Education. (2017). Miller & Freund’s probability and
statistics for engineers (Ninth edition Global). Pearson Education.
• Miller, I., Miller, M. (2017). J. Freund's Mathematical Statistics with Application, 8th
ed., Pearson
• Hogg R. V. Tanis E. A. & Zimmerman D. L. (2021). Probability and statistical
inference (10th Edition). Pearson.
• James McClave, P. George Benson, Terry Sincich (2017), Statistics for Business and
Economcs, Pearson Publication
10.10 SUGGESTED READING
184 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
LESSON 11
CORRELATION AND COVARIANCE
STRUCTURE
11.1 Learning Objectives
11.2 Introduction
11.3 Covariance
11.4 Correlation
11.5 Types of Correlations
11.5.1 Positive and Negative Correlation
11.5.2 Linear and Non-Linear Correlation
11.5.3 Simple and Multiple Correlation
11.6 Difference between Correlation and Covariance
11.7 Methods of Calculating Correlation
11.7.1 Scatter diagram
11.7.2 Karl Pearson’s Coefficient of Correlations
11.7.3 Spearman’s Coefficient of Rank Correlation
11.8 Glossary
11.9 Summary
11.10 Answers to the In-Text Questions
11.11 Self-Assessment Questions
11.12 References
11.13 Suggested Readings
11.1 LEARNING OBJECTIVES
After reading this lesson, students will be able to :
1. Difference between covariance and correlation
2. Method of calculating covariance
3. Types of correlation and
4. Methods of calculating correlation
185 | P a g e
11.2 INTRODUCTION
In the previous units, you must have come across problems dealing with a single variable such
as marks, weight, and height of students in a classroom in which you used statistical measures
of central tendency such as mean, median, mode, standard deviation etc. All these measures
focused on understanding the data set containing individual variables independently. However,
in the real world, we have to analyse not only single variable but number of variables at the
same time. In such a situation, the basic question that comes in our minds is whether there is
any relationship between the two or more variables or not? And if there is a relationship, then
what kind of relationship? How can we find out the presence of such relationship between the
variables? What is the strength of such a relationship? The objective of this unit is to find the
answer to such numerous questions that we come across while dealing with two or more
variables simultaneously.
11.3. COVARIANCE
Covariance is one of the statistical measures of the relationship between two variables. In other
words, it shows how two variables change simultaneously. Suppose in your classroom there
are different students with different height and weight. So, if you want to know whether there
is any relationship between the height and weight of students. In other words, whether weight
of students varies simultaneously with the height or not. Then in such a case, we can use
covariance to understand the relationship between the two variables such as height and weight
in the present case.
The formula for covariance is given by:
𝛴(X −𝑋)(Y− 𝑌)
Cov (X, Y) = 𝑁
186 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
X Y (X − 𝑋) (Y − 𝑌 ) (X − 𝑋)(Y − 𝑌 )
65 73 5 13 65
60 82 0 22 0
70 50 10 -10 -100
55 40 -5 -20 100
50 55 -10 -5 50
ΣX= 300 ΣY= 300 115
𝛴𝑋 300 𝛴𝑌 300
𝑋= = = 60 and 𝑌= = = 60
𝑁 5 𝑁 5
𝛴(𝑋− 𝑋)(Y−𝑌)
Using Cov (X, Y) = 𝑁
115
We have Cov(X, Y)= = 23
5
187 | P a g e
11.4. CORRELATION
In earlier example, where we tried to find out the relationship between weight and height of
students, and we used covariance to find that there is positive covariance between the two
variables. Positive covariance suggested that the variable move in same direction i.e. when
there is increase in height of a student, weight also rise simultaneously. However, in this
example, we couldn’t find through covariance, how much the weight increases with increase
in the height.
Therefore, covariance tells us whether there is relationship between two variable or not.
However, it fails to determine the strength of such a relationship. In other words, covariance
doesn’t inform about how closely two variables are related to each other. Thus, in order to
determine the strength of relationship between two variables, we use correlation. In other
words. If we need to determine how much one variable changes with respect to another
variable, we use another measure known as correlation.
Correlation is defined as the degree of association between two variables. In simple words, it
explains how far two variables are related to each other. In fact, coefficient of correlation is
said to be a measure of covariance between two series of variables.
Correlation is an important statistical measure which helps in determining changes in one
variable vis-à-vis another variable. For example, we know law of demand, according to which,
quantity demanded is inversely related to the price of a commodity given all other things are
constant. Similarly, Keynes physiological law of consumption, which says that if there is an
increase in income, it will lead to increase in consumption but by less than the increase in the
former. However, if we need to find out how much consumption changes with increase in
income, we can again take the help of correlation coefficient to measure this relationship
between income and consumption.
However, it is important to distinguish correlation from causation. Correlation simply informs
us about the how much one variable varies with respect to changes in another variable. It
doesn’t necessarily mean causation. It means correlation doesn’t not tell anything about the
cause-and-effect relationship between two variables, it just only gives an understanding
regarding the strength of relationship between the two variables. For example, in the following
table, we have information regarding the demand and price of a commodity.
In this case, there is a perfect negative correlation between the demand and price. However, it
implies that decrease in price causes demand to rise. This is only explaining inverse relationship
between the price and quantity demanded. In order to determine the cause-and-effect
relationship, we need to use higher statistical measures such as regression analysis
188 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
700
600
Marks in History
500
400
300
200
100
0
0 10 20 30 40 50 60 70 80
Marks in English
189 | P a g e
On the other hand, when the relationship between the two variables is non-linear, then it is
referred as non- linear correlation. In this case, the amount of change in one variable does not
bear a constant ratio to the amount of change in other variable. Thus, when we plot two
variables in a graph, then we don’t get a straight line but a curve.
25
20
15
Marks
10
0
0 1 2 3 4 5 6 7 8 9 10
No. of Hours of study
190 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
The value of correlation lies between -1 and The value of covariance lies between -∞ and
+1. +∞.
It measures the direction as well as strength It only indicates the direction of the
of the relationship between the given two relationship between the given two variables.
variables.
It is free from units of measurement. It depends on units of measurement.
As the name suggests, in this method we will simply put the data into the graph in the form of
scatter plot to find out the correlation between the two variables. If the scatter of the plotted
points is dense, then the correlation between the two variables is higher. However, if the scatter
of the plotted point is spread widely, then the correlations between the two variables is small.
This is one of the simplest methods of ascertaining relationship between two variables as it just
requires plotting on graph and visualization.
191 | P a g e
The above graph plots the GDP at current price and public expenditure on education by the
government of India since 1999-2000 to 2019-2020. In this case, the points lie closely on a
upward sloping straight line from left to right, thus the correlation between the GDP and public
expenditure on education is highly positive.
192 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
The above graph plots the GDP and the expenditure on education as the percentage of GDP
spent on education in India during 1999-2000 to 2019-20. In this case, the points are widely
scattered in the graphs which indicates weak correlation between the GDP and expenditure on
education as the percentage of GDP
11.7.2. Karl Pearson’s Coefficient of Correlation
It is the mathematical method of calculating coefficient of correlation. The coefficient of
correlation in case of Karl Pearson is represented by r. When both variables in a particular data
set are normally distributed, it is the best method to use this method. However, extreme values
can have an impact on this coefficient, which makes it undesirable when one or both of the
variables are not normally distributed because they could exaggerate or weaken the strength of
the association.
Assumptions of Karl Pearson’s coefficient of correlation:
1) There exits linear relationship between two variables
2) The two variables are normally distributed
3) There is a presence of cause-and-effect relationship between the factors which affect
the distribution of the two variables.
𝛴𝑥𝑦
r = 𝑁𝜎𝑥𝜎𝑦
where x = ( X − 𝑋)
y = (Y − 𝑌 )
N = No of items
𝜎𝑥= Standard deviation of X
𝜎𝑦= Standard deviation of Y
𝛴𝑥𝑦
In simpler form, r =
√𝛴𝑥 2 𝛴𝑦 2
Where x = ( X − 𝑋)
y= (Y − 𝑌 )
x2= (X − 𝑋 )2
y2= (Y − 𝑌)2
193 | P a g e
Example 1: Find the Karl Pearson’s coefficient of correlation between the marks in economics
and mathematics of the following students.
Economics 70 50 55 70 85 90
Mathematics 35 45 28 33 25 14
70 35 0 5 0 25 0
50 45 -20 15 400 225 -300
55 28 -15 -2 225 2 30
70 33 0 3 0 9 0
85 25 15 -5 225 25 -75
90 14 20 -16 400 256 -320
ΣX = 420 ΣY = Σx2= 1250 Σy2= 544 Σxy= -665
180
𝛴𝑋
𝑋 = = 420/6 = 70
𝑁
𝛴𝑌
𝑌= = 180/6 = 30
𝑁
𝛴𝑥𝑦
Using, r =
√𝑥 2 × 𝑦 2
We have
−665
r=
√1250 ×544
194 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
−665
=
√6,80,000
−665
= 824.62
= - 0.8064
= - 0.81(approx.)
Therefore, the marks of students in economics and mathematics are inversely correlated to each
other as the Karl Pearson’s coefficient of correlation is -0.81.
Example 2: find out the coefficient of correlation from the following data set using direct
method
A 1 6 9 3 4 5 8 2 1
B 12 9 5 6 3 7 15 11 9
Solution:
X Y X2 Y2 XY
1 12 1 144 12
6 9 36 81 54
9 5 81 25 45
3 6 9 36 18
4 3 16 9 12
5 7 25 49 35
195 | P a g e
8 15 64 225 120
2 11 4 121 22
1 9 1 81 9
327− (39×37)
r=
√9×237−(39)2 √9×771−(77)2
327− 3003
r=
√2133−1521 √6939−5929
−2676
r=
√612 √1010
−2676
r = 24,74 × 31.78
−2676
r = 786.24
r = - 3.40
11.7.3. Spearman’s Coefficient of Rank Correlation
In the previous method of calculating correlation coefficient, important assumption was made
that the variables under the study must be normally distributed so as to yield appropriate results.
However, in actual circumstances, we often face a situation where the variables are not
normally distributed but skewed. In such a situation, there is a need to use another method
which doesn’t make such unrealistic assumptions about the distribution of the variables in
question. Such one method is Spearman’s rank correlation, under which no assumption is to be
followed for calculating coefficient of correlation between the two variables.
In Spearman’s rank correlation, variables are ranked, and the calculations are made on the basis
of ranks not the original observations in order to determine the coefficient of correlation.
The formula for spearman’s rank correlation is
6𝛴𝐷 2
R = 1- 𝑁(𝑁2−1)
N = no of observation
197 | P a g e
Solution:
Students Ranks in Ranks in (R1-R2)= D (R1-R2) = D2
Section A (R1) Section B(R2)
1 4 1 3 9
2 5 3 2 4
3 7 2 5 25
4 1 7 -6 36
5 3 5 -2 4
6 2 6 -4 16
7 6 4 2 4
ΣD2= 98
Using,
6𝛴𝐷 2
R = 1- 𝑁(𝑁2−1)
Where, ΣD2= 98
6 ×98
R = 1- 7(72−1)
588
R = 1 - 336
R= 1- 1.75
R = -0.75
Example 4: Calculate coefficient of correlation between the rank of participants in a dance
competition.
Participants 1 2 3 4 5
Score of 55 70 80 70 75
Judge 1
Score of 70 80 90 50 60
Judge 2
198 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Solution: In the above case, we are given score of two judges. Since ranks are not given, we
will give rank to the participants on the basis of scores of two judges.
Participants Score of R1 Score of R2 R1-R2 (R1-R2)2 = D2
Judge 1 Judge 1
1 55 5 70 3 2 4
2 70 3.5 80 2 1.5 2.25
3 80 1 90 1 0 0
4 70 3.5 50 5 1.5 2.25
5 75 2 60 4 -2 4
ΣD2= 12.50
1
6𝛴𝐷 2 − (𝑚13 − 𝑚1 )
12
Using, R = 1- 𝑁(𝑁 2 −1)
R = 1- 0.60
R= 0.40
IN-TEXT QUESTIONS
4. Scatter plot is simplest method of determining correlation. True/False
5. Karl Pearson’s is a one of the methods of correlation. True/False
6. Scatter plot method involves plotting variables in graph. True/ False
7. Spearman’s Rank correlation cannot be used in case of common ranks. True/False
8. Which of the following is the method of measuring correlation?
A) Spearman’s Rank method B) Standard Deviation
C) Covariance D) Mode
199 | P a g e
9. Find out the correlation between X and Y using Karl Pearson’s coefficient of
correlation.
X 10 20 30 40 50 60 70 80
Y 100 150 100 250 300 200 200 300
10. In the following table ranking of 10 participants by two judges in a drawing competition
is given.
Judge 1 2 5 10 1 9 8 4 3 7 6
Judge 2 2 10 7 4 9 5 9 1 8 3
11. Calculate the rank coefficient between the marginal utilities of two goods received by
the 10 individuals.
Individuals A B C D E F G H I J
Marginal utility 70 50 60 60 77 80 90 15 25 45
of Good X
Marginal Utility 60 90 55 40 59 65 70 85 73 50
of Good Y
11.8 GLOSSARY
200 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1. Covariance
2. -135.67
3. – 600
4. True
5. True
6. True
7. False
8. Spearman’s rank method
9. 0.73
10. 0.45
11. -0.22
11.11 SELF-ASSESSMENT QUESTIONS
Q.1. What is the difference between covariance and correlation?
Q.2. Discuss briefly the difference between different methods of calculating coefficient of
correlation.
Q.3. Plot the following data on a scatter plot and comment about the correlation between the
two variables.
X 10 5 1 3 2 7
Y 3 4 10 9 5 2
Q.4. Using scatter plot find out whether there is any correlation between the number of hours
individuals exercise and their respective weight.
No of 1 2 3 4 5 6
Exercise
Hours
Weight 80 65 55 60 50 60
201 | P a g e
11.12 REFERENCES
• Devore, J. (2012) Probability and Statistics for Engineers, 8th ed.. Cengage Learning
• John A. Rice (2007), Mathematical Statistics and Data Analysis, 3rd ed. Thomson
Brooks/Cole
• Miller, I,Miller, M (2017, J. Freund’s Mathematical Statistics with Applications. 8th ed
Pearson
• Larsen, R, Marx (2011). An Introduction to Mathematical Statistics inference, 10th
Edition, Pearson
11.13 SUGGESTED READINGS
202 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
LESSON 12
CHARACTERISTICS OF ESTIMATORS
STRUCTURE
12.1 Learning Objectives
12.2 Introduction
12.3 Characterstic of Estimators
12.3.1 Unbiasedness
12.3.2 Consistency
12.3.3 Efficiency
12.3.4 Sufficiency
12.4 In-Text Questions
12.5 Summary
12.6 Glossary
12.7 Answer to in-text questions
12.8 References
12.9 Suggested Readings
12.1 LEARNING OBJECTIVES
One of the main objectives of Statistics is to draw inferences about a population from the
analysis of a sample drawn from that population. Two important problems in statistical
inference are
(i) estimation
(ii) testing of hypothesis.
The theory of estimation was founded by Prof. R.A. Fisher in a series of fundamental papers
round about 1930.
12.2 INTRODUCTION
Let us consider a random variable 𝑋 with 𝑝, 𝑑. 𝑓. 𝑓(𝑥, 𝜃). In most common applications, though
not always, the functional form of the population distribution is assumed to be known except
203 | P a g e
for the value of some unknown parameter(s) 𝜃 which may take any value on a set Θ. This is
expressed by writing the p.d.f. in the form 𝑓(𝑥, 𝜃), 𝜃 ∈ Θ. The set Θ, which is the set of all
possible values of 𝜃 is called the parameter space. Such a situation gives rise not to one
probability distribution but a family of probability distributions which we write as { f
(𝑥, 𝜃), 𝜃 ∈ Θ }, e.g., if 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ), then the parameter space Θ = [(𝜇, 𝜎 2 ): −∞ < 𝜇 <
∞; 0 < 𝜎 < ∞ ]
In particular, for 𝜎 2 = 1, the family of probability distributions is given by:
{𝑁(𝜇, 1); 𝜇 ∈ Θ},whereΘ = {𝜇: −∞ < 𝜇 < ∞}
In the following discussion we shall consider a general family of distributions:
{𝑓(𝑥; 𝜃1 , 𝜃2 , … , 𝜃𝑘 ): 𝜃𝑖 ∈ Θ, 𝑖 = 1,2, … , 𝑘}.
Let us consider a random sample 𝑥1 , 𝑥2 , … , 𝑥𝑛 of size 𝑛 from a population, with probability
function 𝑓(𝑥; 𝜃1 , 𝜃2 , … , 𝜃𝑘 ), where 𝜃1 , 𝜃2 … , 𝜃𝑘 are the unknown population parameters. There
will then always be an infinite number of functions of sample values, called statistics, which
may be proposed as estimates of one or more the parameters.
Evidently, the best estimate would be one that falls nearest to the true value of the parameter
to be estimated. In other words, the statistic whose distribution concentrates as closely as
possible near the true value of the parameter may be regarded the best estimate. Hence the basic
problem of the estimation in the above case, can be formulated as follow
204 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
𝐸(𝑥‾) = 𝜇
and 𝐸(𝑠 2 ) ≠ 𝜎 2 but 𝐸(𝑆 2 ) = 𝜎 2 .
1
𝑠 2 = 𝑛 ∑𝑛𝑖=1 (𝑥𝑖 − 𝑥‾)2
1
Hence there is a reason to prefer 𝑆 2 = 𝑛−1 ∑𝑛𝑖=1 (𝑥𝑖 − 𝑥‾)2 , to the sample variance.
Note: If 𝐸(𝑇𝑛 ) > 𝜃, 𝑇𝑛 is said to be positively biased and if 𝐸(𝑇𝑛 ) < 𝜃, it is said to be
negatively biased, the amount of bias 𝑏(𝜃) being given by
𝑏(𝜃) = 𝐸(𝑇𝑛 ) − 𝛾(𝜃), 𝜃 ∈ Θ
Example 1. Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 is a random sample from a normal population 𝑁(𝜇, 1). Then
1
Show that 𝑡 = 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 , is an unbiased estimator of 𝜇 2 + 1.
Solution. We are given: 𝐸(𝑥𝑖 ) = 𝜇, 𝑉(𝑥𝑖 ) = 1 ∀𝑖 = 1,2, … , 𝑛
2
Now 𝐸(𝑥𝑖2 ) = 𝑉(𝑥𝑖 ) + (𝐸(𝑥𝑖 )) = 1 + 𝜇 2.
1 1 1
∴ 𝐸(𝑡) = 𝐸 (𝑛 ∑𝑛𝑖=1 𝑥𝑖2 ) = 𝑛 ∑𝑛𝑖=1 𝐸(𝑥𝑖2 ) = 𝑛 ∑𝑛𝑖=1 (1 + 𝜇 2 ) = 1 + 𝜇 2
Hence sample mean (𝑋‾𝑛 ) is always a consistent estimator of the population mean (𝜇).
Note 2. Obviously consistency is a property concerning the behaviour of an estimator for
indefinitely large values of the sample size 𝑛, Lee, as 𝑛 → ∞. Nothing is regarded of its
Moreover, if there exists a consistent estimator, say, 𝑇n of 𝛾(𝜃), then infinitely many such
behaviour for finite 𝑛. eftimators can be constructed, eg.
𝑛−𝑎 1 − (𝑎/𝑛) 𝑝
𝑇𝑛′ = ( ) 𝑇𝑚 = [ ] 𝑇𝑛 → 𝑇𝑛 ⟶ 𝛾(𝜃), as 𝑛 → ∞
𝑛−𝑏 1 − ((b/𝑛)
and hence, for different values of 𝑎 and 𝑏, 𝑇𝑛 is also consistent for 𝛾(𝜃).
Invariance Property of Consistent Estimators.
Theorem : If 𝑇𝑛 is a consistent estimator of 𝛾(𝜃) and 𝜓(𝛾(𝜃)) is a contimuous functiog of
𝛾(𝜃), then 𝜓(𝑇𝑛 ) is a consistent estimator of 𝜓(𝛾(𝜃)).
𝑝
Proof. Since 𝑇𝑛 is a consistent estimator of 𝛾(𝜃), 𝑇𝑛 ⟶ 𝛾(𝜃 as 𝑛 → ∞, i.e. for every 𝜀 > 0, 𝜂 >
0, ∃ a positive integer 𝑛 ≥ 𝑚(𝜀, 𝜂) such that
𝑃{|𝑇𝑛 − 𝛾(𝜃)| < 𝑒 ∣> 1 − 𝑛, ∀𝑛 ≥ 𝑚
Since 𝜓(⋅) is a continuous function, for every 𝜀 > 0, however small, ∃ a positive number 𝜀1
such that | 𝜓(𝑇𝑛 ) − 𝜓|𝛾(𝜃)| ∣< 𝜀1, whenever |𝑇𝑛 − 𝛾(𝜃)| < 𝜀, i.e.,
|𝑇𝑛 − 𝛾(𝜃)| < 𝜀 ⇒ |𝜓(𝑇𝑛 ) − 𝜓|𝛾(𝜃)|| < 𝜀1
For two events 𝐴 and 𝐵, if 𝐴 ⇒ 𝐵, then
𝐴 ⊆ 𝐵 ⇒ 𝑃(𝐴) ≤ 𝑃(𝐵) or 𝑃(𝐵) ≥ 𝑃(𝐴)
So
𝑃[∣ 𝜓(𝑇𝑛 ) − 𝜓(𝛾(𝜃)|| < 𝜀1 ] ≥ 𝑃[|𝑇𝑛 − 𝛾(𝜃)| < 𝜀]
⇒ 𝑃[∣ 𝜓(𝑇𝑛 ) − 𝜓(𝛾(𝜃)|| < 𝜀1 ] ≥ 1 − 𝜂; ∀𝑛 ≥ 𝑚
𝑝
𝜓(𝑇𝑛 ) ⟶ 𝜓[𝛾(𝜃)], as 𝑛 → ∞ or 𝜓(𝑇𝑛 ) is a consistent estimator of 𝛾(𝜃).
206 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
𝑇 1 𝑝𝑞
Var (𝑋‾) = Var (𝑛) = 𝑛2 , Var (𝑇) = 𝑛 → 0 as 𝑛 → ∞.
207 | P a g e
median is also an unbiased and consistent estimator of 𝜇. Thus, there is a necessity of some
further criterion which will enable us to choose between the estimators with the common
property of consistency. Such a criterion which is based on the variances of the sampling
distribution of estimators is usually known as efficiency. If, of the two consistent estimators
𝑇1 , 𝑇2 of a certain parameter 𝜃, we have
𝑉(𝑇1 ) < 𝑉(𝑇2 ), for all 𝑛 then 𝑇1 is more efficient than 𝑇2 for all samples sizes.
Since 𝑉(𝑥‾) < 𝑉(𝑀𝑑), we conclude that for normal distribution, sample mean is more efficient
estimator for 𝜇 than the sample median, for large samples at least.
Most Efficient Estimator: If in a class of consistent estimators for a parameter, there exist
Vone whose sampling variance is less than that of any such estimator, it is called the most
efficient estimator. Wheneoer such an estimator exists, it provides a criterion for measurement
of efficiency of the other estimators.
Efficiency If 𝑇1 is the most efficient estimator with variance 𝑉1 and 𝑇2 is ary other estimator
with variance 𝑉2, then the efficiency 𝐸 of 𝑇2 is defined as :
𝑉1
𝐸=
𝑉2
Obvionsly, E cannot exceed unity.
If 𝑇, 𝑇1 , 𝑇2 , … , 𝑇𝑛 are all estimators of 𝛾(𝜃) and Var (𝑇) is minimum, then the efficiency 𝐸𝑖 of
𝑇i , (𝑖 = 1,2, … , 𝑛) is defined as :
Var 𝑇
𝐸𝑖 = ; 𝑖 = 1,2, … , 𝑛
Var 𝑇𝑖
Obviously 𝐸𝑖 ≤ 1; 𝑖 = 1,2, … 𝑛. For example, in the normal samples, since sample mean 𝑥‾ is
the most efficient estimator of 𝜇 , the efficiency E of 𝑀𝑑 for such samples, (for large 𝑛 ), is :
𝑉(𝑥‾) 𝜎 2 /𝑛 2
𝐸= = = = 0.637.
𝑉(𝑀𝑑) 𝜋𝜎 2 /(2𝑛) 𝜋
Example 1. A random sample (𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 , 𝑋5 ) of size 5 is drawn from a normal population
with unknowon mean 𝜇. Consider the following estimators to estimate 𝜇 :
𝑋1 +𝑋2 +𝑋1 +𝑋4 +𝑋5
(i) 𝑡1 = ,
5
𝑋1 +𝑋2
(ii) 𝑡2 = + 𝑋3
2
2𝑋1 +𝑋2 +𝜆𝑋3
(iii) 𝑡3 = . where 𝜆 is such that 𝑡3 is an unbiased estimator of 𝜇.
3
208 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Find 𝜆. Are 𝑡1 and 𝑡2 mbiased? State giving reasons, the estimator which is best among 𝑡1 , 𝑡2
and 𝑡3
Solution. We are given :
𝐸(𝑋𝑖 ) = 𝜇, Var (𝑋𝑖 ) = 𝜎 2 , ( say ); Cov (𝑋𝑖 , 𝑋𝑗 ) = 0, (𝑖 ≠ 𝑗 = 1,2, … , 𝑛)
1 1 1
(i) 𝐸(𝑡1 ) = 5 ∑5𝑖=1 𝐸(𝑋𝑖 ) = 5 ∑5𝑖=1 𝜇 = 5 , 5𝜇 = 𝜇 ⇒ 𝑡1 is an unbiased estimator of 𝜇.
1 1
(ii) 𝐸(𝑡2 ) = 2 𝐸(𝑋1 + 𝑋2 ) + 𝐸(𝑋3 ) = 2 (𝜇 + 𝜇) + 𝜇 = 2𝜇
1 1
𝑉(𝑡1 ) = {𝑉(𝑋1 ) + 𝑉(𝑋2 ) + 𝑉(𝑋3 ) + 𝑉(𝑋4 ) + 𝑉(𝑋5 )} = 𝜎 2
25 5
1 1 2 3
𝑉(𝑡2 ) = {𝑉(𝑋1 ) + 𝑉(𝑋2 )} + 𝑉(𝑋3 ) = 𝜎 + 𝜎 2 = 𝜎 2
4 2 2
1 1 5
𝑉(𝑡3 ) = {4𝑉(𝑋1 ) + 𝑉(𝑋2 )} = (4𝜎 2 + 𝜎 2 ) = 𝜎 2 (∵ 𝜆 = 0)
9 9 9
Since 𝑉(𝑡1 ) is least, 𝑡1 is the best estimator (in the sense of least variance) of 𝜇.
Example 2. 𝑋1 , 𝑋2, and 𝑋3 is a random sample of size 3 from a population with mean value 𝜇
ald variance 𝜎 2 , 𝑇1 , 𝑇2 , 𝑇3 are the estimators used to estimate mean value 𝜇, where 𝑇1 = 𝑋1 +
1
𝑋2 − 𝑋3 , 𝑇2 = 2𝑋1 + 3𝑋3 − 4𝑋2 , and 𝑇3 = 3 (𝜆𝑋1 + 𝑋2 + 𝑋3 )/3.
209 | P a g e
1
(ii) We are given: 𝐸(𝑇3 ) = 𝜇 ⇒ 3 {𝜆𝐸(𝑋1 ) + 𝐸(𝑋2 ) + 𝐸(𝑋3 )} = 𝜇
1
⇒ (𝜆𝜇 + 𝜇 + 𝜇) = 𝜇 ⇒ 𝜆 + 2 = 3 ⇒ 𝜆 = 1
3
1
(iii) With 𝜆 = 1, 𝑇3 = 3 (𝑋1 + 𝑋2 + 𝑋3 ) = 𝑋‾. Since sample mean is a consistent estimator of
population mean 𝜇, by Weak Law of Large Numbers, 𝑇3 is a consistent estimator of 𝜇.
(iv) We have
Var (𝑇1 ) = Var (𝑋1 ) + Var (𝑋2 ) + Var (𝑋3 ) = 3𝜎 2
Var (𝑇2 ) = 4Var (𝑋1 ) + 9Var (𝑋3 ) + 16Var (𝑋2 ) = 29𝜎 2
1 1
Var (𝑇3 ) = [Var (𝑋1 ) + Var (𝑋2 ) + Var (𝑋3 )] = 𝜎 2 (∵ 𝜆 = 1)
9 3
Since Var (𝑇3 ) is minimum, 𝑇3 is the best estimator of 𝜇 in the sense of minimum variance.
Minimum Variance Unbiased (M.V.U.) Estimators:
If a statistic 𝑇 = 𝑇(𝑥1 , 𝑥2 , … , 𝑥𝑛 ), based on sample of size 𝑛 is such that
(i) 𝑇 is unbiased for 𝛾(𝜃), for all 𝜃 ∈ Θ and
(ii) It has the smallest variance among the class of all unbiased estimators of 𝛾(𝜃), then 𝑇
is called the minimum variance unbiased estimator (𝑀𝑉𝑈𝐸) of 𝛾(𝜃).
More precisely, 𝑇 is MVUE of 𝛾(𝜃) if
𝐸𝜃 (𝑇) = 𝛾(𝜃) for all 𝜃 ∈ Θ and Var𝜃 (𝑇) ≤ Var𝜃 (𝑇 ′ ) for all 𝜃 ∈ Θ
where 𝑇 ′ is any other unbiased estimator of 𝛾(𝜃).
210 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1
𝐸(𝑇) = {𝐸(𝑇1 ) + 𝐸(𝑇2 )} = 𝛾(𝜃)
2
1 1
Var (𝑇) = Var { (𝑇1 + 𝑇2 )} = Var (𝑇1 + 𝑇2 ) [∵ Var (𝐶𝑋) = 𝐶 2 Var (𝑋)]
2 4
1
= {Var (𝑇1 ) + Var (𝑇2 ) + 2Cov (𝑇1 , 𝑇2 )}
4
1
= {Var (𝑇1 ) + Var (𝑇2 ) + 2𝜌√Var (𝑇1 )Var (𝑇2 )}
4
1
= Var (𝑇1 )(1 + 𝜌),
2
where 𝜌 is Karl Pearson's co-efficient of correlation between 𝑇1 and 𝑇2 .
Since 𝑇1 is the 𝑀𝑉𝑈 estimator, Var (𝑇) ≥ Var (𝑇1 )
1 1
⇒ Var (𝑇1 )(1 + 𝜌) ≥ Var (𝑇1 ) ⇒ (1 + 𝜌) ≥ 1 ⇒ 𝜌 ≥ 1
2 2
Since |𝜌| ≤ 1, we must have 𝜌 = 1, i.e., 𝑇1 and 𝑇2 must have a linear relation of the form:
𝑇1 = 𝛼 + 𝛽𝑇2 where 𝛼 and 𝛽 are constants independent of 𝑥1 , 𝑥2 , … , 𝑥𝑛 but may depend on
𝜃, i.e., we may have 𝛼 = 𝛼(𝜃) and 𝛽 = 𝛽(𝜃).
Taking expectation of both sides then we get
𝜃 = 𝛼 + 𝛽𝜃
Also we get Var (𝑇1 ) = Var (𝛼 + 𝛽𝑇2 ) = 𝛽 2 Var (𝑇2 )
⇒ 1 = 𝛽 2 ⇒ 𝛽 = ±1
But since 𝜌(𝑇1 , 𝑇2 ) = +1, the coefficient of regression of 𝑇1 on 𝑇2 must be positive.
∴ 𝛽=1⇒𝛼=0
so we get 𝑇1 = 𝑇2 as desired.
Theorem 2. Let 𝑇1 and 𝑇2 be unbiased estimators of 𝛾(𝜃) with efficiencies 𝑐1 and 𝑐2
respectively anf 𝜌 = 𝜌𝜃 be the correlation coefficient between them. Then
√𝑒1 𝑒2 − √(1 − 𝑒1 )(1 − 𝑒2 ) ≤ 𝜌 ≤ √𝑒1 𝑒2 + √(1 − 𝑒1 )(1 − 𝑒2 )
Proof. Let 𝑇 be minimum variance unbiased estimator of 𝛾(𝜃). Then we are given :
𝐸𝜃 (𝑇1 ) = 𝛾(𝜃) = 𝐸𝜃 (𝑇2 ) ∀𝜃 ∈ Θ
𝑉𝜃 (𝑇) 𝑉 𝑉
𝑒1 = = , ( say ) ⇒ 𝑉1 =
𝑉𝜃 (𝑇1 ) 𝑉1 𝑐1
𝑉𝜃 (𝑇) 𝑉 𝑉
𝑒2 = = , ( say ) ⇒ 𝑉2 =
𝑉𝜃 (𝑇2 ) 𝑉2 𝑐2
Let us consider another estimator: 𝑇3 = 𝜆𝑇1 + 𝜇𝑇2 ,
211 | P a g e
𝜌 2 1 1 2
( − 1) − ( − 1) ( − 1) ≤ 0 ⇒ (𝜌 − √𝑒1 𝑒2 ) − (1 − 𝑒1 )(1 − 𝑒2 ) ≤ 0
√𝑒1 𝑒2 𝑒1 𝑒2
2
∴ 𝜌 − 2√𝑒1 𝑒2 𝜌 + (𝑒1 + 𝑒2 − 1) ≤ 0
This implies that 𝜌 lies between the roots of the equation :
𝜌2 − 2√𝑒1 𝑐2 𝜌 + (𝑒1 + 𝑒2 − 1) = 0
1
which are given by : 2 {2√𝑒1 𝑒2 ± 2√𝑒1 𝑒2 − (𝑒1 + 𝑒2 − 1)} =
212 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
0, if ∑ 𝑥𝑖 ≠ 𝑘
{ 𝑖=1
′
Since this does not depend on ' 𝑝 , 𝑇 = ∑𝑛𝑖=1 𝑥𝑖 is sufficient for ' 𝑝 '.
FACTORIZATION THEOREM (Neymann).
The necessary and sufficient condition for a distribution to admit sufficient statistic is
provided by the 'factorization theorem' due to Neymann.
Statement 𝑇 = 𝑡(𝑥) is sufficient for 𝜃 if and only if the joint density function 𝐿 (say), of the
sample values can be expressed in the form:
𝐿 = 𝑔𝜃 [𝑡(𝑥)] ⋅ ℎ(𝑥)
where (as indicated) 𝑔𝜃 [𝑡(𝑥)] depends on 𝜃 and 𝑥 only through the value of 𝑡(𝑥) and ℎ(𝑥) is
independent of 𝜃.
Remarks 1. It should be clearly understood that by 'a function independent of 𝜃 , we not only
mean that it does not involve 𝜃 but also that its domain does not contain 𝜃. For example, the
function:
1
𝑓(𝑥) = , 𝑎 − 𝜃 < 𝑥 < 𝑎 + 𝜃; −∞ < 𝜃 < ∞
2𝑎
213 | P a g e
depends on 𝜃.
2. It should be noted that the original sample 𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑛 ), is always a sufficient
statistic.
3. The most general form of the distributions admitting sufficient statistic is Koopman's
form and is given by: 𝐿 = 𝐿(𝐱, 𝜃) = 𝑔(𝑥) ⋅ ℎ(𝜃). exp {𝑎(𝜃)𝜓(𝑥)] where ℎ(𝜃) and
𝑎(𝜃) are functions of the parameter 𝜃 only and 𝑔(𝑥) and 𝜓(𝑥) are the functions of the
sample observations only.
4. Invariance Property of Sufficient Estimator: If 𝑇 is a sufficient estimator for the
parameter 𝜃 ayd if 𝜓(𝑇) is a one to one function of 𝑇, then 𝜓(𝑇) is sufficient for 𝜓(𝜃).
5. Fisher-Neyman Criterion. A statistic 𝑡1 = 𝑡(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) is sufficient estimator of
parimeter 𝜃 if and only if the likelihood function (joint p.d.f of the sample) can be
expressed as :
𝑛
where 𝑔1 (𝑡1 , 𝜃) is the p.d.f. of the statistic 𝑡1 and 𝑘(𝑥1 , 𝑥2 … . 𝑥𝑛 ) is a function of sample
observations only, independent of 𝜃.
Note that this method requires the working out of the p.d.f. (p.m.f.) of the statistic 𝑡1 =
𝑡(𝑥1 , 𝑥2 , … , 𝑥𝑛 ), which is not always easy.
Example 1. Let 𝑥1 , 𝑥2 , … , 𝑥1 be a random sample from a uniform population on [0, 𝜃]. Find
asufficient estimator for 𝜃.
1
, 0 ≤ 𝑥𝑖 ≤ 𝜃
(𝑥
Solution. We are given: 𝑓𝜃 𝑖 = { ) 𝜃
0, otherwise
1, if 𝑎 ≤ 𝑏 𝑘(0,𝑥𝑖 )𝑘(𝑥𝑖 ,𝜃)
Let 𝑘(𝑎, 𝑏) = }. then 𝑓0 (𝑥𝑖 ) = ,
0, if 𝑎 > 𝑏 𝜃
𝑛 𝑛
𝑘(0, 𝑥𝑖 )𝑘(𝑥𝑖 , 𝜃) 𝑘 (0, min 𝑥𝑖 ) ⋅ 𝑘 ( max 𝑥𝑖 , 𝜃)
1≤𝑖≤𝑛 1≤𝑖≤𝑛
𝐿 = ∏ 𝑓𝜃 (𝑥𝑖 ) = ∏ [ ]= 𝑛
= 𝑔0 (𝑡(𝑥) ∣ ℎ(𝑥)
𝜃 𝜃
𝑖=1 𝑖=1
where
𝑘{𝑡(𝐱), 𝜃}
𝑔0 [𝑡(𝐱)] = , 𝑡(𝑥) = max 𝑥𝑖 and ℎ(𝑥) = 𝑘 (0, min 𝑥𝑖 )
𝜃𝑛 1≤𝑖≤𝑛 1≤𝑖≤𝑛
214 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Aliter. We have
𝑛
1
𝐿 = ∏ 𝑓(𝑥𝑖 , 𝜃) = ; 0 < 𝑥𝑖 < 𝜃
𝜃𝑛
𝑖=1
1 𝑛 1
where 𝑔𝜃 [𝑓(𝑥)] = (𝜎√2𝜋) exp [− 2𝜎2 {𝑓2 (𝑥) − 2𝜇𝜇1 (𝑥) + 𝑛𝜇 2 }]
215 | P a g e
𝑛 𝑛 𝑛
−(𝑥𝑖 −𝜃)
𝐿 = ∑ 𝑓(𝑥𝑖 𝜃) = ∑ {𝑒 } = exp (− ∑ 𝑥𝑖 ) × exp (𝑛𝜃)
𝑖=1 𝑖=1 𝑖=1
Let 𝑌1 , 𝑌2 , … , 𝑌𝑛 denote the order statistics of the random sample such that 𝑌1 < 𝑌2 < ⋯ < 𝑌𝑛 .
The p.d.f. of the smallest observation 𝑌1 is given by:
𝑔1 (𝑦1 , 𝜃) = 𝑛[1 − 𝐹(𝑦1 )]𝑛−1 𝑓(𝑦1 , 𝜃),
where 𝐹(⋅) is the distribution function corresponding to 𝑝 ⋅ 𝑑. 𝑓. 𝑓(⋅).
Thus the likelihood function (") of X1 , X2 , … , X𝑛 may be expressed as
𝑛
𝑛𝜃
exp (− ∑𝑛𝑖=1 𝑥𝑖 )
𝐿 =𝑒 exp (− ∑ 𝑥𝑖 ) = 𝑛exp (−𝑛(𝑦1 − 𝜃)) { }
𝑛exp (−𝑛𝑦𝑖 )
𝑖=1
exp (− ∑𝑛𝑖=1 𝑥𝑖 )
= 𝑔1 (m 𝑥𝑖 , 𝜃) { }
𝑛exp (−𝑛𝑦𝑖 )
Hence by Fisher-Neymann criterion, the first order statistic 𝑌1 = min(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) is a
sufficient statistic for 𝜃.
12.4 IN-TEXT QUESTIONS
Question: 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑁
be identically distributed random variable with mean 2 and variance 1. Let N be a random
variable follows Poisson distribution with mean 2 and independent of
𝑋i′ S. Let 𝑆𝑁 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 , then Var (SN ) is equals
A. 4
B. 10
C. 2
D. 1
Question: 2
Let 𝐴 and 𝐵 be independent Random Variables each having the uniform distribution on [0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵}, then Cov (𝑈, 𝑉) is equals
A. -1/36
B. 1/36
C. 1
216 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
D. 0
Question: 3
Let 𝑋1 , 𝑋2 , 𝑋3 be random sample from uniform (0, 𝜃 2 ), 𝜃 > 1 Then maximum likelihood
estimation (mle) of 𝜃
2
A. 𝑋(1)
B. √X(3)
C. √X(1)
D. 𝛼𝑋(1) + (1 − 𝛼)𝑋(3) ; 0 < 𝛼 < 1
Question: 4
For the discrete variate with density:
1 6 1
𝑓(𝑥) = 𝐼(−1) (𝑥) + 𝐼(0) (𝑥) + 𝐼(1) (𝑥).
8 8 8
√2(𝑋1 + 𝑋2 )
𝑊=
√(𝑋2 − 𝑋1 )2 + (𝑌2 − 𝑌1 )2
217 | P a g e
1
A.
3
1
B. 6
1
C. 2
5
D. 6
Question: 7
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from
1 1
𝐔 (𝜃 − 2 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and
𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }. Define
1 1
𝑇1 = (𝑋(1) + 𝑋(𝑛) ), 𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
2 4
1
and 𝑇3 = 2 (3𝑋(𝑛) − 𝑋(1) − 2)
an estimator for 𝜃, then which of the following is/are
TRUE?
A. 𝑇1 and 𝑇2 are MLE for 𝜃 but 𝑇3 is not MLE for 𝜃
218 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
𝑆𝑛 −3𝑛
A. ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
𝑆
B. For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as
n→∞
𝑆𝑛
C. → 1 with probability 1
𝑛
D. Both A and B
Question : 10
Let 𝑋, 𝑌 are i.i.d Binomial (𝑛, 𝑝) random variables. Which of the following are true?
219 | P a g e
A. 𝑋 + 𝑌 ∼ Bin (2𝑛, 𝑝)
1
A. 𝑃(𝑋 > 0) = 2
1
B. P(X > 0 ∣ Y < 0) = 2
1
C. P(X > 0, Y < 0) = 4
D. All of the above
Question: 12
Let X and 𝑌 are random variable with 𝐸[𝑋] = 𝐸[𝑌], then which of the following is NOT
TRUE?
A. E{E[X ∣ Y]} = E[Y]
B. V(𝑋 − 𝑌) = 𝐸(𝑋 − 𝑌)2
C. 𝐸[𝑉(𝑋 ∣ 𝑌)] + 𝑉[𝐸(𝑋 ∣ 𝑌)] = 𝑉(𝑋)
D. X and Y have same distribution
Question : 13
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from Exp (𝜃 ) distribution, where 𝜃 ∈ (0, ∞).
1
If 𝑋‾ = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 , then a 95% confidence interval for 𝜃 is
2
𝜒2𝑛,0.95
A. (0, ]
𝑛𝑋‾
2
𝜒2𝑛,0.95
B. [ , ∞)
𝑛𝑋‾
220 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
2
𝜒2𝑛,0.95
C. (0, ]
2𝑛𝑋‾
2
𝜒2𝑛,0.95
D. [ , ∞)
2𝑛𝑋‾
Question: 14
𝑋𝑖 , 𝑖 = 1,2, …
be independent random variables all distributed according to the PDF 𝑓𝑥 (𝑥) = 1,0 ≤ 𝑥 ≤ 1.
Define
𝑌𝑛 = 𝑋1 𝑋2 𝑋3 … 𝑋𝑛 , for some integer n. Then Var (𝑌𝑛 ) is equal to
𝑛
A. 12
1 1
B. − 22𝑛
3𝑛
1
C. 12𝑛
1
D. 12
Question : 15
Let 𝑋1 , 𝑋2 , … , 𝑋4
be i.i.d random variables having continuous distribution.
Then
𝑃(𝑋3 < 𝑋2 < max(𝑋1, 𝑋4 )) equal
A. 1/2
B. 1/3
C. 1/4
D. 1/6
Question : 16
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from
221 | P a g e
1 1
U (𝜃 − 0 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and
𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }
A. 1 only
B. 2 only
C. Both 1 and 2
D. Neither 1 nor 2
Question: 17
Lęt 𝐹𝑛 be a sequence of DFs defined by
0, 𝑥<0
1
𝐹𝑛 (𝑥) = {1 − , 0 ≤ 𝑥 ≤ 𝑛 and let
𝑛
1, 𝑛≤𝑥
lim𝑛→∞ 𝐹𝑛 (𝑥) = 𝐹(𝑥)
then which of the following is/are TRUE?
0, 𝑥<0
A. 𝐹(𝑥) = {
1, 𝑥≥0
222 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Question : 18
Lę 𝑋𝑖 (𝑖 = 1,2, … , 𝑛)
be a random sample drawn from uniform distribution on [𝜃, 𝜃 + 1],
then which of the following statement is/are correct?
1
A. (𝑋‾ − 2) is an unbiased estimate of 𝜃
𝑋(1) +𝑋(𝑛) 1
B. − 2 is an unbiased estimate of 𝜃
2
Question: 19
Lęt {𝑋𝑛 , 𝑛 ≥ 1} be i.i.d uniform (−1,2) random variables and let 𝑆𝑛 = ∑𝑛𝑘=1 𝑋𝑘 ⋅
Then, as n → ∞
𝑆𝑛 1
A. → 2 in probability
𝑛
𝑠𝑛 1
B. → 2 in distribution
𝑛
C. P(𝑆𝑛 ≤ 𝑛) → 1 as n → ∞
𝑋(𝑛) +𝑋(1)
Define 𝑇𝑛 = .
2
A. 𝑇𝑛 is consistent for 𝜃
B 𝑇𝑛 is MLE for 𝜃
C. 𝑇𝑛 is unbiased consistent for 𝜃
D. all options are correct
223 | P a g e
Question : 21
The cumulative distribution function of a random variable X given by
0, if 𝑥 < 0
4
, if 0 ≤ 𝑥 < 1
𝐹(𝑥) = 9
8
, if 1 ≤ 𝑥 < 2
9
{1, if 𝑥 ≥ 2
Which of the following statements is (are) TRUE?
A. The random variable 𝑋 takes positive probability only at least two points
5
B. 𝑃(1 ≤ 𝑋 ≤ 2) = 9
21
C. E(X) = 3
4
D. 𝑃(0 < 𝑋 < 1) = 9
Question: 22
Let A and B be events in a sample space S such that
1 1
𝑃(𝐴) = 2 = 𝑃(𝐵) and 𝑃(𝐴𝑐 ∩ 𝐵 𝑐 ) = 3. Which of the following is correct?
5
A. 𝑃(𝐴 ∪ 𝐵 𝑐 ) = 6
5
B. 𝑃(𝐴 ∪ 𝐵 𝑐 ) ≤ 6
224 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1 1
+ tan−1 (𝑥) − ∞ < 𝑥 < ∞
𝐹(𝑥) =
2 𝜋
Which of the following is NOT TRUE?
1 1
A. 𝑓(𝑥, 𝑦) = 𝜋2 (1+𝑥 2)(1+𝑦 2) ; −∞ < (𝑥, 𝑦) < ∞
1 1
B. 𝑓(𝑥) = 𝜋 1+𝑥 2 ; −∞ < 𝑥 < ∞
Question : 24
Suppose 𝑟1.23 and 𝑟1.234 are sample multiple correlation coefficient of 𝑋1 on 𝑋2 , 𝑋3 and 𝑋1 on
𝑋2 , 𝑋3 , 𝑋4 respectively. Which of the following is possible?
2
𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
A. [ , ]
2∑𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑛
𝑖=1 ln 𝑋𝑖
225 | P a g e
2
𝜒𝛼 (𝑛) 𝜒2 𝛼 (𝑛)
1−
B. [−2∑𝑛2 , 2
]
𝑖=1 ln 𝑋𝑖 −2∑𝑛
𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
C. [ , ]
−2∑𝑛
𝑖=1 ln 𝑋𝑖 −2∑𝑛
𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (𝑛) 𝜒2 𝛼 (𝑛)
1−
2 2
D. [2∑𝑛 , ]
𝑖=1 ln 𝑋𝑖 2∑𝑛
𝑖=1 ln 𝑋𝑖
Question : 26
Suppose that 𝑟 ball are drawn one at time without replacement from a bag containing n white
and m black balls. Let 𝑆𝑟 be the number of black balls drawn, then var (𝑆𝑟 )
is equal to
𝑚𝑛𝑟
A. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚+𝑛+1)
𝑚𝑛𝑟
B. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚+𝑛)
𝑚𝑛𝑟
C. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚+𝑛−1)
𝑚𝑛𝑟
D. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚−𝑛)
Question: 27
Let 𝐹𝑛 be a sequence of DFs defined by
0, 𝑥<0
1
𝐹𝑛 (𝑥) = {1 − , 0 ≤ 𝑥 ≤ 𝑛
𝑛
1, 𝑛≤𝑥
and let lim𝑛→∞ 𝐹𝑛 (𝑥) = 𝐹(𝑥)
0, 𝑥<0
A. 𝐹(𝑥) = {
1, 𝑥≥0
C. 𝑋𝑛 converge in probability to 0
Question : 28
The Cumulative distribution function of a random variable 𝑋 is given by
0, 𝑥<2
1 7
𝐹(𝑥) = { (𝑥 2 − ) , 2≤𝑥<3
10 3
1, 𝑥≥3
√1/2
𝑓(𝑧) = { √𝜋 𝑒 −𝑧/2 𝑧 −1/2 if 𝑧 > 0
0, otherwise
2 −𝑧 2/2
𝑒 if 𝑧 > 0
𝑓(𝑧) = { √𝜋
0, otherwise
−𝑧
𝑒 if 𝑧 > 0
𝑓(𝑧) = {
0, otherwise
2 1
⋅ if 𝑧 > 0
𝑓(𝑧) = {√𝜋 (1 + 𝑧 2 )
0, otherwise
227 | P a g e
Question: 30
If the joint moment generating function of the random variables X and Y is
2 +18𝑡 2 +12𝑠𝑡)
M(s, t) = 𝑒 (𝑠+3𝑡+2𝑠
Which of the following is/are correct?
A. 𝐸(𝑋) < 𝐸(𝑌)
B. Corr (𝑋, 𝑌) > 0
C. Cov (𝑋, 𝑌) = 12
D. all of the above
12.5 SUMMARY
The main points which we have covered in this lessons are what is estimator and what is
consistency, efficiency and sufficiency of the estimator and how to get best estimator.
12.6 GLOSSARY
Motivation: These Problems are very useful in real life and we can use it in data science ,
economics as well as social sciemce.
Attention: Think how the best estimator are useful in real world problems.
12.7 ANSWER TO IN-TEXT QUESTIONS
Answer 1 : B
Explanation:
Let 𝑋1 , 𝑋2 , …
Answer 2 : B
Explanation:
If 𝐴 and 𝐵 be independent 𝑅𝑎𝑛𝑑𝑜𝑚 𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒 each having the uniform distribution on [0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵},
then
228 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
𝑋(3) ≤ 𝜃 2 ⇒ 𝜃ˆ ∈ [√𝑋(3) , ∞)
3
1
𝐿(𝑋, 𝜃) = ∏ 𝑓(𝑥𝑖 , 𝜃) =
𝜃6
𝑖=1
∂𝐿
⇒ ∂𝜃 < 0 there fore given function is decreasing then 𝜃ˆ = √𝑋(3)
Answer 4 : C
Explanation:
X −1 0 1
1 6 1
E(X) = −1 × + 0 × + 1 × = 0
8 8 8
1 6 1 1
E(𝑋 2 ) = 1 × + 0 × + 1 × =
8 8 8 4
1 1
𝑉(𝑋) = 𝐸(𝑋 2 ) − {𝐸(𝑋)}2 = ⇒ 𝜎𝑋 =
4 2
229 | P a g e
1
𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4 [By Chebychev’s inequality]
Answer 5 : B
Explanation:
Let 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2)
be a i.i.d random sample of size 2 from a standard normal
√2(X1 +X2 )
distribution. Then W = ∼ 𝑡(2)
√(X2 −X1 )2 +(Y2 −Y1 )2
Answer 7 : A
Explanation:
1 1
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 2 , 𝜃 + 2)
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
230 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1 1
𝜃ˆ ∈ [𝑋(𝑛) − , 𝑋(1) + ]
2 2
distribution of 𝑋 free from parameter, then
1 1
𝜃ˆ = 𝜆 (𝑋(𝑛) − ) + (1 − 𝜆) (𝑋(1) + ) ; 0 < 𝜆 < 1
2 2
1 1 3
Take 𝜆 = 2 , 4 and 4 then we obtained mle of 𝜃 are
1 1 1
(𝑋(1) + 𝑋(𝑛) ); 4 (3𝑋(1) + 𝑋(𝑛) + 1); 4 (3𝑋(1) + 𝑋(𝑛) + 1) respectively.
2
Answer 9 : D
Explanation:
Clearly, 𝑋1 , 𝑋2 , … , 𝑋𝑛
are i.i.d 𝐺(3,1) random variables. Then, 𝐸(𝑋𝑖 ) = 3 and Var (𝑋𝑖 ) = 3, 𝑖 = 1,2, …
Let 𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , then E(𝑆𝑛 ) = 3𝑛 and Var (𝑆𝑛 ) = 3𝑛
𝑆𝑛 −3𝑛
Using CLT ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
231 | P a g e
𝑆 3𝑛 𝑆 3𝑛
lim𝑛→∞ 𝐸 ( 𝑛𝑛) = lim𝑛→∞ = 3; lim𝑛→∞ 𝑉 ( 𝑛𝑛) = lim𝑛→∞ 𝑛2 = 0
𝑛
𝑆
For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as n → ∞
(B) When there are more than two variables include, the observation lead to multinomial
distribution.
(𝑋, 𝑌) not follows Multinomial (2𝑛; 𝑝, 𝑝)
232 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Answer 11 : D
Explanation:
1 2 +𝑦 2 ) 1 2 1 2
1 1 1
The joint pdf of 𝑋 and 𝑌 is 𝑓(𝑥, 𝑦) = 2𝜋 𝑒 −2(𝑥 = 𝑒 −2(𝑥 ) × 𝑒 −2(𝑦 ) ; (𝑥, 𝑦)
√2𝜋 √2𝜋
∈ ℝ2
It is easy to see that 𝑋 and 𝑌 are i.i.d 𝑁(0,1) random
variables, and therefore,
1
𝑃(𝑋 > 0) = 2
1 1 1
𝑃(𝑋 > 0)𝑃(𝑌 < 0) = × =
2 2 4
233 | P a g e
Answer 14 : B
Explanation:
𝑋1 , 𝑋2 , … , 𝑋𝑛 are independent, we have that E(𝑌𝑛 ) = E(𝑋1 ) × … × 𝐸(𝑋2 ). Similarly,
𝐸(𝑌𝑛2 ) = E(𝑋12 ) × … × 𝐸(𝑌𝑛2 ). Since
E(𝑋𝑖 ) = 1/2 and E(𝑌𝑖2 ) = 1/3 for i = 1,2, … , n
it follows that
1 1
Var (𝑌𝑛 ) = 𝐸(𝑌𝑛2 ) − [E(𝑌𝑛 )]2 = 𝑛 − 2𝑛
3 2
Hence option (B) is correct.
Answer 15 : C
Explanation:
Note that 𝑃(𝑋1 < 𝑋2 ) + 𝑃(𝑋2 < 𝑋1 ) + 𝑃(𝑋1 = 𝑋2 ) = 1
since the corresponding events are disjoint and exhaust
all the probabilities. But 𝑃(𝑋1 < 𝑋2 ) = 𝑃(𝑋2 < 𝑋1 )
by symmetry. Furthermore, 𝑃(𝑋1 = 𝑋2 ) = 0
1
since the random variables are continuous. Therefore, 𝑃(𝑋1 < 𝑋2 ) = 2. From above results
1
𝑃(𝑋3 < 𝑋2 < max(𝑋1, 𝑋4 )) = 4
Hence option C is correct.
Answer 16 : A
Explanation:
1 1
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 2 , 𝜃 + 2)
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
ˆ
𝜃 ∈ [𝑋(𝑛) − 2 , 𝑋(1) + 2] ; distribution of 𝑋
234 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1 1
free from parameter, then 𝜃ˆ = 𝜆 (𝑋(𝑛) − 2) + (1 − 𝜆) (𝑋(1) + 2) ; 0 < 𝜆 < 1
1 1
Take 𝜆 = 2 , 4 we get
1
𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ) is MLE as well as consistent for 𝜃
1
𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
4
is MLE as well as consistent for 𝜃 but not unbiased…
Answer 17 : D
Explanation:
0, 𝑥 < 0
lim𝑛→∞ 𝐹𝑛 (𝑥) = 𝐹(𝑥) = {
1, 𝑥 ≥ 0
Note that 𝐹𝑛 is the DF of the RV𝑋𝑛 with PMF
1 1
𝑃(𝑋𝑛 = 0) = 1 − , 𝑃(𝑋𝑛 = 𝑛) =
𝑛 𝑛
And F is the distribution function of the RVX degenerate at
1
X = 0. We have 𝐸(𝑋𝑛𝑘 ) = 𝑛𝑘 (2) = 𝑛𝑘−1 where k
is a positive integer. Also 𝐸(𝑋 𝑘 ) = 0 so that
𝐸(𝑋𝑛𝑘 ) → 𝐸(𝑋 𝑘 ) for any k ≥ 1
𝑋𝑛 does not converge in probability to 0 because 𝑋𝑛
converges in probability to 1 .
Answer 18 : D
Explanation:
Given that
𝑋 ∼ 𝑈[𝜃, 𝜃 + 1] ⇒ 𝑓(𝑥) = 1; 𝜃 ≤ 𝑋𝑖 ≤ 𝜃 + 1
235 | P a g e
1 2𝜃+1 1
𝐸 (𝑋‾ − 2) = −2=𝜃
2
1
(𝑋‾ − 2) is an unbiased estimate of 𝜃
For option (B)
𝑋(1) + 𝑋(𝑛) 1 2𝜃 + 1 1
𝐸( − )= − =𝜃
2 2 2 2
𝑋(1) +𝑋(𝑛) 1
− 2 is an unbiased estimate of 𝜃
2
𝑆𝑛 1
this implies 𝑛
→ 2 in probability
𝑆𝑛 1 𝑆𝑛 1
If → 2 in probability then → 2 in distribution
𝑛 𝑛
1
𝑆𝑛 −
2
Using CLT: 3
∼ 𝑁(𝟎, 1)as 𝐧 → ∞
√
4
236 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
𝑠𝑛 −𝐸(𝑆𝑛 ) 𝑛−𝐸(𝑆𝑛 )
𝐏( ≤ ) → 1 as 𝐧 → ∞
√𝑆𝑛 √𝑆𝑛
Answer 20 : D
Explanation:
1 1
1, 𝜃 − 2 ≤ 𝑥𝑖 ≤ 𝜃 + 2
Here 𝐿 = 𝐿(𝜃; 𝑋1 , 𝑋2 , … , 𝑋𝑛 ) = {
0, otherwise
1 1
If 𝑋(1) , 𝑋(2) , … , 𝑋(𝑛) is order sample, then 𝜃 − 2 ≤ 𝑥(1) ≤ 𝑥(2 ) ≤ ⋯ ≤ 𝜃 + 2
1 1
Thus 𝐿 attains the maximum if 𝜃ˆ ∈ [𝑋(𝑛) − 2 , 𝑋(1) + 2]
Therefore the convex linear combination is also a MLE for 𝜃
1 1
𝜃ˆ = 𝜆 (𝑋(𝑛) − ) + (1 − 𝜆) (𝑋(1) + )
2 2
1 𝑋(𝑛) +𝑋(1)
take 𝜆 = 2 ⇒ 𝜃ˆ = 𝑇𝑛 = and also
2
𝑋(𝑛) + 𝑋(1)
𝐸( )=𝜃
2
𝑋(𝑛) +𝑋(1)
By property of MLE𝑇𝑛 = 2
is consistent for 𝜃
From above 𝑇𝑛 are unbiased, consistent and also MLE for 𝜃
Answer 21 : B
Explanation:
Since F is continuous on R ∖ {0,1,2}, it follows that P(X = x) = 0∀x ∈ ℝ ∖ {0,1,2}
4
Further, P(X = 0) = F(0) − F(0−) = 9
4
P(X = 1) = F(1) − F(1−) =
9
237 | P a g e
1
and 𝑃(𝑋 = 2) = 𝐹(2) − 𝐹(2−) = 9
Therefore, we conclude that 𝑋 is a discrete random
variable positive probabilities at three points 0,1,2.
4 1 5
P(1 ≤ X ≤ 2) = P(X = 1) + P(X = 2) = + =
9 9 9
2
4 4 1 2
𝐸(𝑋) = ∑𝑥=0 𝑥𝑃(𝑋 = 𝑥) = 0 × + 1 × + 2 × =
9 9 9 3
and 𝑃(0 < 𝑋 < 1) = 0.
Answer 24 : C
Explanation:
Since sample multiple correlation lies between 0 to 1
238 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
0 ≤ 𝑟1.23,…,𝑛 ≤ 1
So option C is only hold this condition
Answer 26 : A
Explanation:
1, if the kth ball drawn is black
Let us define 𝑋𝑘 = { 𝑘 = 1,2, … , 𝑟
0, if the kth ball drawn is white
Then 𝑆𝑟 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑟
𝑚 𝑛
Also, P(𝑋𝑘 = 1) = 𝑚+𝑛′ , and P(𝑋𝑘 = 0) = 𝑚+𝑛
𝑚 𝑚𝑛
Thus E(𝑋𝑘 ) = 𝑚+𝑛 and V(𝑋𝑘 ) = (𝑚+𝑛)2
239 | P a g e
then jth and 𝑘th balls drawn are black, and = 0 otherwise.
𝑚 𝑚−1
Thus 𝐸(𝑋𝑗 , 𝑋𝑘 ) = 𝑃(𝑋𝑗 = 1, 𝑋𝑘 = 1) = 𝑚+𝑛 𝑚+𝑛−1
𝑚𝑛
and Cov (𝑋𝑗 , 𝑋𝑘 ) = (𝑚+𝑛)2(𝑚+𝑛−1)
𝑚𝑟 𝑚𝑛𝑟
Thus E(𝑆𝑟 ) = ∑𝑟𝐾=1 𝐸(𝑋𝑘 ) = and 𝑉(𝑆𝑟 ) = (𝑚 + 𝑛 − 𝑟)
𝑚+𝑛 (𝑚+𝑛)2 (𝑚+𝑛+1)
Answer 27 : C
Explanation:
0, 𝑥 < 0
lim𝑛→∞ 𝐹𝑛 (𝑥) = 𝐹(𝑥) = {
1, 𝑥 ≥ 0
Note that 𝐹𝑛 is the 𝐷𝐹 of the 𝑅𝑉𝑋𝑛 with PMF
1 1
𝑃(𝑋𝑛 = 0) = 1 − , 𝑃(𝑋𝑛 = 𝑛) =
𝑛 𝑛
And F is the distribution function of the RVX degenerate at X = 0.
1
We have 𝐸(𝑋𝑛𝑘 ) = 𝑛𝑘 (𝑛) = 𝑛𝑘−1
where k is a positive integer. Also 𝐸(𝑋 𝑘 ) = 0,
so that 𝐸(𝑋𝑛𝑘 ) → 𝐸(𝑋 𝑘 ) for any k ≥ 1
240 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
21
⋅ if 𝑧 > 0
𝑓(𝑧) = {√𝜋 (1 + 𝑧 2 )
0, otherwise
B. Answer 30 : D
Explanation:
2 2
M(s, t) = 𝑒 (𝑠+3𝑡+2𝑠 +18𝑡 +12𝑠𝑡)
∂𝑀 ∂𝑀
𝐸(𝑌) = [ ] = 3; 𝐸(𝑋) = [ ] =1
∂𝑡 (0,0) ∂𝑠 (0,0)
∂2 𝑀
𝐸(𝑋𝑌) = [ ] = 15
∂𝑠 ∂𝑡 (0,0)
𝐸(𝑋) < 𝐸(𝑌)
Cov (𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝐸(𝑋) ⋅ 𝐸(𝑌) = 15 − 1 × 3 = 12
Cov (𝑋, 𝑌) > 0 this implies Corr (𝑋, 𝑌) > 0
241 | P a g e
12.8 REFERENCES
• S. C Gupta , V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
12.9 SUGGESTED READINGS
• S. C Gupta , V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
• B.L Agarwal, Programmed Statistics ,New Age International Publishers, 2nd Edition.
242 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
LESSON 13
STATISTICAL HYPOTHESIS
STRUCTURE
13.1 Learning Objectives
13.2 Introduction
13.3 Statistical Hypothesis
13.3.1 Simple and Composite Hypothesis
13.3.2 Critical Region
13.3.3 Type I and Type II Error
13.3.4 Most Powerful Test
13.3.5 Neymann Pearson Lemma
13.4 In-Text Questions
13.5 Summary
13.6 Glossary
13.7 Answer to In-Text Questions
13.8 References
13.9 Suggested Readings
13.1 LEARNING OBJECTIVES
One of the main objectives to discuss testing of hypothesis and how it can be use in real
analysis.
13.2 INTRODUCTION
The main problems in statistical inference can be broadly classified into two areas:
(i) The area of estimation of population parameter(s) and setting up of confidence intervals
for them, i.e, the area of point and intertal estimation and
(ii) Tests of statistical hypothesis.
In Neyman-Pearson theory, we use statistical methods to arrive at decisions in certain situations
where there is lack of certainty on the basis of a sample whose size is fixed in advance while
in Wald's sequential theory the sample size is not fixed but is regarded as a random variable.
243 | P a g e
244 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
manufactured under some standard manufacturing process have an average life of 𝜇 hours and
it is proposed to test a new procedure for manufacturing light bulbs. Thus, we have two
populations of bulbs, those manufactured by standard process and those manufactured by the
new process. In this problem the following three hypotheses may be set up:
(i) New process is better than standard process.
(ii) New process is inferior to standard process.
(iii) There is no difference between the two processes.
The first two statements appear to be biased since they reflect a preferential attitude to one or
the other of the two processes. Hence the best course is to adopt the hypothesis of no difference,
as stated in (iii). This suggests that the statistician should take up the neutral or null attitude
regarding the outcome of the test. His attitude should be on the null or zero line in which the
experimental data has the due importance and complete say in the matter. This neutral or non-
committal attitude of the statistician or decision-maker before the sample observations are
taken is the keynote of the null hypothesis.
Thus in the above example of light bulbs if 𝜇0 is the mean life (in hours) of the bulbs
manufactured by the new process then the null hypothesis which is usually denoted by H0 , can
be stated as follows: 𝐻0 : 𝜇 = 𝜇0 .
As another example let us suppose that two different concerns manufacture drugs for inducing
sleep, drug 𝐴 manufactured by first concern and drug 𝐵 manufactured by second roncern. Each
company claims that its drug is superior to that of the other and it is desired to test which is a
superior drug 𝐴 or 𝐵 ? To formulate the statistical hypothesis let 𝑋 be a random variable which
denotes the additional hours of sleep gained by an individual when drug 𝐴 is given and let the
random variable 𝑌 denote
the additional hours of sleep gained when drug 𝐵 is used, Let us suppose that 𝑋 and 𝑌 follow
the probability distributions with means 𝜇𝑋 and 𝜇𝑌 respectively. Here our null hypothesis
would be that there is no difference between the effects of two drugs. Symbolically, 𝐻0 : 𝜇𝑋 =
𝜇𝑌
Alternative Hypothesis.
It is desirable to state what is called an alternative hypothesis in respect of every statistical
hypothesis being tested because the acceptance or rejection of null hypothesis is meaningful
only when it is being tested against a rival hypothesis which should rather be explicitly
mentioned. Alternative hypothesis is usually denoted by 𝐻1 . For example, in the example of
light bulbs, alternative hypothesis could be 𝐻1 : 𝜇 > 𝜇0 or 𝜇 < 𝜇0 or 𝜇 ≠ 𝜇0 . In the example of
drugs, the alternative hypothesis could be 𝐻1 : 𝜇𝑋 > 𝜇𝑌 or 𝜇𝑋 < 𝜇𝑌 or 𝜇𝑋 ≠ 𝜇𝑌 .
In both the cases, the first two of the alternative hypotheses give rise to what are called 'one
tailed' test and the third alternative hypothesis results in 'two tailed' tests.
245 | P a g e
Important Remarks
1. In the formulation of a testing problem and devising a 'test of hypothesis' the roles of 𝐻0
and 𝐻1 are not at all symmetric. In order to decide which one of the two hypotheses should
be taken as null hypothesis 𝐻0 and which one as alternative hypothesis 𝐻1 , the intrinsic
difference between the roles and the implifications of these two terms should be clearly
understood.
2. If a particular problem cannot be stated as a test between two simple hypotheses, i.e., simple
null hypothesis against a simple alternative hypothesis, then the next best alternative is to
formulate the problem as the test of a simple null hypothesis against a composite alternative
hypothesis. In other words, one should try to structure the problem so that null hypothesis
is simple rather than composite.
3. Keeping in mind the potential losses due to wrong decisions (which may or may not be
measured in terms of money), the decision maker is somewhat conservative in holding the
null hypothesis as true unless there is a strong evidence from the experimental sample
observations that it is false. To him, the consequences of wrongly rejecting a null
hypothesis seem to be more severe than those of wrongly accepting it. In most of the cases,
the statistical hypothesis is in the form of a claim that a particular product or product process
is superior to some existing standard. The null hypothesis H0 in this case is that there is no
difference between the new product or production process and the existing standard. In
other words, null hypothesis nullifies this claim. The rejection of the null hypothesis
wrongly which amounts to the acceptance of claim wrongly involves huge amount of
pocket expenses towards a substantive overhaul of the existing set-up. The resulting loss is
comparatively regarded as more serious than the opportunity loss in wrongly accepting H0
which amounts to wrongly rejecting the claim, i.e., in sticking to the less efficient existing
standard. In the light-bulbs problem discussed earlier, suppose the research division of the
concern, on the basis of the limited experimentation, claims that its brand is more effective
than that manufactured by standard process. If in fact, the brand fails to be more effective
the loss incurred by the concern due to an immediate obsolescence of the product, decline
of the concern's image, etc., will be quite serious. On the other hand, the failure to bring
out a superior brand in the market is an opportunity loss and is not a consideration to be as
serious as the other loss.
13.3.2 Critical Region
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be the sample observations denoted by O. All the values of 𝑂 will be aggregate
of a sample and they constitute a space, called the sample space, which is denoted by 𝑆.
Since the sample values 𝑥1 , 𝑥2 , … , 𝑥𝑛 can be taken as a point in 𝑛-dimensional space, we specify
some region of the 𝑛-dimensional space and see whether this point lies within this region or
outside this region. We divide the whole sample space 𝑆 into two disjoint parts 𝑊 and 𝑆 − 𝑊
or 𝑊 ‾ or 𝑊 ′ : The null hypothesis H0 is rejected if the observed sample point falls in 𝑊 and if
it falls in 𝑊 ′ we reject 𝐻1 and accept H0 . The region of rejection of H0 when H0 is true is that
246 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
region of the outcome set where H0 is rejected if the sample point falls in that region and is
called critical region. Evidently, the size of the critical region is 𝛼, the probability of
committing type 1 error (discussed below).
Suppose if the test is based on a sample of size 2, then the outcome set or the sample space is
the first quadrant in a two dimensional space and a test criterion will enable us to separate our
outcome set into two complementary subsets, W and 𝑊 ‾ . If the sample point falls in the subset
𝑊, 𝐻0 is rejected, otherwise 𝐻0 is accepted. This is shown in the adjoining diagram :
From the above table it is obvious that in any testing problem we are liable to commit two types
of errors.
247 | P a g e
Errors of Type I and Type II. The error of rejecting 𝐻0 (accepting 𝐻1 ) when 𝐻0 is true is called
Type I error and the error of accepting 𝐻0 when 𝐻0 is false (𝐻1 is true) is called Type II error.
The probabilities of type I and type II errors are denoted by 𝛼 and 𝛽 respectively.
Thus
𝛼 = Probability of type I error = Probability of rejecting 𝐻0 when 𝐻0 is true.
An ideal test would be the one which properly keeps under control both the types of errors. But
since the commission of an error of either type is a random variable, equivalently an ideal test
should minimise the probability of both the types of errors, viz., 𝛼 and 𝛽. But unfortunately,
for a fixed sample size 𝑛, 𝛼 and 𝛽 are so related (like producer's and consumer's risk in sampling
inspection plans), that the reduction in one results in an increase in the other. Consequently,
the simultaneous minimising of both the errors is not possible. Since type I error is deemed to
be more serious than the type II error (c.f. Remark 3§18.2.3 ) the usual practice is to control 𝛼
at a predetermined low level and subject to this constraint on the probabilities of type I error,
choose a test which minimises 𝛽 or maximises the power function 1 − 𝛽. Generally, we choose
𝛼 = 0.05 or 0.01 .
STEPS IN SOLVING TESTING OF HYPOTHESIS PROBLEM
The major steps involved in the solution of a 'testing of hypothesis' problem may be outlined
as follows:
1) Explicit knowledge of the nature of the population distribution and the parameter(s) of
interest, i.e., the parameter(s) about which the hypotheses are set up.
2) Setting up of the null hypothesis 𝐻0 and the alternative hypothesis 𝐻1 in terms of the range
of the parameter values each one embodies.
3) The choice of a suitable statistic 𝑡 = 𝑡(𝑥1 , 𝑥2 , … . , 𝑥𝑛 ) called the test statistic, which will
best reflect upon the probability of 𝐻0 and 𝐻1 .
4) Partitioning the set of possible values of the test statistic 𝑡 into two disjoint sets 𝑊 (called
the rejection region or critical region) and 𝑊 ‾ (called the acceptance region) and framing
the following test :
(i) Reject 𝐻0 (i.e., accept 𝐻1 ) if the value of 𝑡 falls in 𝑊.
(ii) ‾.
Accept 𝐻0 if the value of 𝑡 falls 𝑊
5) After framing the above test, obtain experimental sample observations, compute the
appropriate test statistic and take action accordingly.
OPTIMUM TEST UNDER DIFFERENT SITUATIONS
The discussion enables us to obtain the so called best test under different situations. In any
testing problem the first two steps, viz, the form of the population distribution, the parameter(s)
of interest and the framing of H0 and 𝐻1 should be obvious from the description of the problem.
The most crucial step is the choice of the 'best test, i.e., the best statistic ' 𝑡 ' and the critical
248 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
region 𝑊 where by best test we mean one which in addition to controlling 𝛼 at any desired low
level has the minimum type II error 𝛽 or maximum power 1 - 𝛽, compared to 𝛽 of all other
tests having this 𝛼. This leads to the following definition.
13.3.4 Most Powerful Test
Most Powerful Test (MP Test). Let us consider the problem of testing a simple hypothesis:
𝐻0 : 𝜃 = 𝜃0 against a simple alternative hypothesis : 𝐻1 : 𝜃 = 𝜃1
Definition. The critical region 𝑊 is the most powerful (MP) critical region of size 𝛼 (and the
corresponding test a most potverful test of level 𝛼 ) for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1
if
𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼
𝑊
𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼
𝑊
and 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻1 ) ≥ 𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻1 ) for all 𝜃 ≠ 𝜃0
whatever the yegion 𝑊1 satisfying may be.
13.3.5 Neymann Pearson Lemma
This Lemma provides the most powerful test of simple hypothesis against a simple alternative
hypothesis. The theorem, known as Neyman-Pearson Lemma, will be proved for density
function 𝑓(𝑥, 𝜃) of a single continuous variate and a single parameter. However, by regarding
𝑥 and 𝜃 as vectors, the proof can be easily generalised for any number of random variables
𝑥1 , 𝑥2 , … , 𝑥𝑛 and any number of parameters 𝜃1 , 𝜃2 … , 𝜃k . The variables 𝑥1 , 𝑥2 … . . 𝑥𝑛 occurring
in this theorem are understood to represent a random sample of size 𝑛 from the population
249 | P a g e
whose density function is 𝑓(𝑥, 𝜃). The lemma is concerned with a simple hypothesis 𝐻0 : 𝜃 =
𝜃0 and a simple alternative 𝐻1 : 𝜃 = 𝜃1 .
Neyman Pearson Lemma
Let 𝑘 > 0, be a constant and 𝑊 be a critical region of size 𝛼 such that
𝑓(𝑥, 𝜃1 )
𝑊 = {𝑥 ∈ 𝑆: > 𝑘}
𝑓(𝑥, 𝜃0 )
𝐿1
⇒ 𝑊 = {𝑥 ∈ 𝑆: > 𝑘}
𝐿0
𝐿
‾ = {𝑥 ∈ 𝑆: 1 < 𝑘}
and 𝑊
𝐿0
where 𝐿0 and 𝐿1 are the likelihood functions of the sample observations 𝑥 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 )
under 𝐻0 and 𝐻1 respectively. Then 𝑊 is the most powerful critical region of the test
hypothesis 𝐻0 : 𝜃 = 𝜃0 against the alternative 𝐻1 : 𝜃 = 𝜃1 .
𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼
𝑊
𝑃(x ∈ 𝑊 ∣ 𝐻1 ) = ∫ 𝐿1 𝑑x = 1 − 𝛽, (say).
𝑊
In order to establish the lemma, we have to prove that there exists no other critical region, of
size less than or equal to 𝛼, which is more powerful than 𝑊.
Let 𝑊1 be another critical region of size 𝛼1 ≤ 𝛼 and power 1 − 𝛽1 so that we have
𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼1
𝑊1
𝐿1
and 𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻1 ) = ∫ 𝑑x = 1 − 𝛽1
𝑊1
250 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
If 𝛼1 ≤ 𝛼, we have
∫ 𝐿0 𝑑x ≤ ∫ 𝐿0 𝑑x
𝑊1 𝑊
⇒ ∫ 𝐿0 𝑑x ≤ ∫ 𝐿0 𝑑x
𝐵∪𝐶 𝐴∪𝐶
⇒ ∫ 𝐿0 𝑑x ≤ ∫ 𝐿0 𝑑x
𝐵 𝐴
⇒ ∫ 𝐿0 𝑑x ≥ ∫ 𝐿0 𝑑x
𝐴 𝐵
Since 𝐴 ⊂ 𝑊,
(18.5) ⇒ ∫ 𝐿1 𝑑x > 𝑘 ∫ 𝐿0 𝑑x ≥ 𝑘 ∫ 𝐿0 𝑑x
𝐴 𝐴 𝐵
Also it implies
𝐿1
‾
≤ 𝑘∀𝑥 ∈ 𝑊
𝐿0
⇒ ∫ 𝐿1 𝑑x ≤ 𝑘 ∫ 𝐿0 𝑑𝐱
‾
𝑊 ‾
𝑊
‾ , say 𝑊
This result also holds for any subset of 𝑊 ‾ ∩ 𝑊1 = 𝐵. Hence
∫ 𝐿1 𝑑x ≤ 𝑘 ∫ 𝐿0 𝑑x ≤ ∫ 𝐿1 𝑑x
𝐵 𝐵 𝐴
Adding ∫𝐶 𝐿1 𝑑𝐱 to both sides, we get
∫ 𝐿1 𝑑𝐱 ≤ ∫ 𝐿1 𝑑𝐱 ⇒ 1 − 𝛽 ≥ 1 − 𝛽1
𝑊1 𝑊
251 | P a g e
Remark. Let W defined in (18.5) of the above theorem be the most powerful critical region
of size 𝛼 for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1 , and let it be independent of 𝜃1 ∈ Θ1 =
Θ − Θ0 , where Θ0 is the parameter space under 𝐻0 . Then we say that C.R. W is the UMP CR
of size 𝛼 for testing: 𝐻0 : 𝜃 = 𝜃0 , against 𝐻1 : 𝜃 ∈ Θ1 .
Example 1. Given the frequency function :
1
𝑓(𝑥, 𝜃) = {𝜃 , 0 ≤ 𝑥 ≤ 𝜃
0, elsewhere
and that you are testing the null hypothesis 𝐻0 : 𝜃 = 1 against 𝐻1 : 𝜃 = 2, by means of a single
observed value of 𝑥. What would be the sizes of the type 1 and type II errors, if you choose the
interval (i) 0.5 ≤ 𝑥, (ii) 1 ≤ 𝑥 ≤ 1.5 as the critical regions? Also obtain the power function of
the test.
Solution. Here we want to test 𝐻0 : 𝜃 = 1, against 𝐻1 : 𝜃 = 2.
(i) Here and
𝑊 = {𝑥: 0.5 ≤ 𝑥} = {𝑥: 𝑥 ≥ 0.5}
‾
𝑊 = {𝑥: 𝑥 ≤ 0.5}
𝛼 = 𝑃{𝑥 ∈ 𝑊 ∣ 𝐻0 } = 𝑃{𝑥 ≥ 0.5 ∣ 𝜃 = 1} = 𝑃{0.5 ≤ 𝑥 ≤ 𝜃 ∣ 𝜃 = 1}
1 1
= 𝑃{0.5 ≤ 𝑥 ≤ 1 ∣ 𝜃 = 1} = ∫ [𝑓(𝑥, 𝜃)]𝜃=1 𝑑𝑥 = ∫ 1. 𝑑𝑥 = 0.5
0.5 0.5
Similarly,
𝛽 ‾ ∣ 𝐻1 } = 𝑃{𝑥 ≤ 0.5 ∣ 𝜃 = 2}
= 𝑃{𝑥 ∈ 𝑊
0.5 0.5
1
= ∫ [𝑓(𝑥, 𝜃)]𝜃=2 𝑑𝑥 = ∫ 𝑑𝑥 = 0.25
0 0 2
Thus the sizes of type 𝐼 and type 𝐼 errors are respectively 𝛼 = 0.5 and 𝛽 = 0.25
and power function of the test = 1 − 𝛽 = 0.75
252 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
253 | P a g e
The size ' 𝛼 ' of the critical region is : 𝛼 = 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = 𝑃(𝑈 ≥ 0 ∣ 𝐻0 ) − − − −(∗∗)
𝑈−𝐸(𝑈) 𝑈+55
Under 𝐻0 : 𝜇 = −1, 𝑈 ∼ 𝑁(−55,1540)[ From (∗)] ⇒ 𝑍 = 𝜎 =
𝑈 √1540
55 55
∴ Under 𝐻0 , when 𝑈 = 0, 𝑍 = = 39.2428 = 1.4015
√1540
[From ( ∗∗ ) ]
(From Normal Probability Tables)
Alternatively, 𝛼 = 1 − 𝑃(𝑍 ≤ 1.4015) = 1 − Φ(1.4015),
where Φ(⋅) is the distribution function of standard normal variate.
Power of the test is : 1 − 𝛽 = 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻1 ) = 𝑃(𝑈 ≥ 0 ∣ 𝐻1 )
Under 𝐻1 : 𝜇 = 1, 𝑈 ∼ 𝑁(55,1540)
𝑈 − 𝐸(𝑈) −55
⇒ 𝑍 = = = −1.40 (when U = 0)
𝜎𝑢 √1540
∴ 1 − 𝛽 = 𝑃(𝑍 ≥ −1.40) = 𝑃(−1.4 ≤ 𝑍 ≤ 0) + 0.5
= 𝑃(0 ≤ 𝑍 ≤ 1.4) + 0.5 (By Symmetry)
= 0.4192 + 0.5 = 0.9192
Alternatively, 1 − 𝛽 = 1 − 𝑃(𝑍 ≤ −1.40) = 1 − Φ(−1.40),
where Φ(.)𝑖𝑠𝑡ℎ𝑒𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑜𝑓𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑛𝑜𝑟𝑚𝑎𝑙𝑣𝑎𝑟𝑖𝑎𝑡𝑒.
254 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Example 6. Use the Neyman-Pearson Lemma to obtain the region for testing 𝜃 = 𝜃0 against
𝜃 = 𝜃1 > 𝜃0 and 𝜃 = 𝜃1 < 𝜃0 , in the case of a normal population 𝑁(𝜃, 𝜎 2 ), where 𝜎 2 is
known. Hence find the power of the test.
Solution.
𝑛 𝑛 𝑛
1 1
𝐿 = ∏ 𝑓(𝑥𝑖 , 𝜃) = ( ) exp {− 2 ∑ (𝑥𝑖 − 𝜃)2 }
𝜎√2𝜋 2𝜎
𝑖=1 𝑖=1
Using Neyman-Pearson Lemma, best critical region (B.C.R.) is given by (for 𝑘 > 0 )
1 𝑛 2
𝐿1 exp {− 2𝜎 2 ∑𝑖=1 (𝑥𝑖 − 𝜃1 ) }
= ≥𝑘
𝐿0 exp {− 1 ∑𝑛 (𝑥 − 𝜃 )2 }
2𝜎 2 𝑖=1 𝑖 0
255 | P a g e
𝑛 𝑛
1
⇒ exp [− 2 {∑ (𝑥𝑖 − 𝜃1 )2 − ∑ (𝑥𝑖 − 𝜃0 )2 }] ≥ 𝑘
2𝜎
𝑖=1 𝑖=1
𝑛
𝑛 1
⇒ exp [− 2 (𝜃12 − 𝜃02 ) + 2 (𝜃1 − 𝜃0 ) ∑ 𝑥𝑖 ] ≥ 𝑘
2𝜎 𝜎
𝑖=1
𝑛
𝑛 1
⇒ − 2
(𝜃12 − 𝜃02 ) + 2 (𝜃1 − 𝜃0 ) ∑ 𝑥𝑖 ≥ log 𝑘
2𝜎 𝜎
𝑖=1
256 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
𝜆2 − 𝜃0 𝜆2 − 𝜃0
⇒ 𝑃 (𝑍 ≥ )= 1−𝛼 ⇒ = 𝑧1−𝛼
𝜎/√𝑛 𝜎/√𝑛
𝜎
⇒ 𝜆2 = 𝜃0 + 𝑧1−𝛼
√𝑛
Power of the test. By definition, the power of the test in case (𝑖) is :
1 − 𝛽 = 𝑃[𝑥 ∈ 𝑊 ∣ 𝐻1 ] = 𝑃[𝑥‾ ≥ 𝜆1 ∣ 𝐻1 ]
𝜆1 − 𝜃1 𝑥‾ − 𝜃1
= 𝑃 (𝑍 ≥ ) [∵ Under 𝐻1 , 𝑍 = ∼ 𝑁(0,1)]
𝜎/√𝑛 𝜎/√𝑛
𝜎
𝜃0 + 𝑧𝛼 − 𝜃1
√ 𝑛
= 𝑃 (𝑍 ≥ )
𝜎/√𝑛
𝜃1 − 𝜃0
= 𝑃 (𝑍 ≥ 𝑧𝛼 − )
𝜎/√𝑛
= 1 − 𝑃(𝑍 ≤ 𝜆3 )
= 1 − Φ(𝜆3 ),
where Φ(.)𝑖𝑠𝑡ℎ𝑒𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑜𝑓𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑛𝑜𝑟𝑚𝑎𝑙𝑣𝑎𝑟𝑖𝑎𝑡𝑒.
Similarly in case (ii), (𝜃1 < 𝜃0 ), the power of the test is
𝜆2 − 𝜃1
1−𝛽 = 𝑃(𝑥‾ < 𝜆2 ∣ 𝐻1 ) = 𝑃 (𝑍 < )
𝜎/√𝑛
𝜎
𝜃0 + 𝑧1−𝛼 − 𝜃1
√𝑛
= 𝑃 (𝑍 < )
𝜎/√𝑛
𝜃0 − 𝜃1
= 𝑃 (𝑍 < 𝑧1−𝛼 + ) = Φ(𝜆4 ), (∵ 𝜃0 > 𝜃1 ) … (18 ⋅ 13𝑎)
𝜎/√𝑛
√𝑛(𝜃0 −𝜃1 ) √𝑛(𝜃0 −𝜃1 )
where 𝜆4 = 𝑧1−𝛼 + = − 𝑧𝛼
𝜎 𝜎
257 | P a g e
258 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Question 4.
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is a random sample from a population with density
1 −𝑥
𝑓(𝑥, 𝜃) = {𝜃 𝑒
𝜃 if 0<𝑥 < ∞
0, otherwise
Where 𝜃 > 0 is an unknown parameter, what is a 100(1 − 𝛼)% confidence interval for 𝜃?
2∑𝑛 𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑖=1 ln 𝑋𝑖
A. [ 2 , 2 ]
𝜒 𝛼 (2𝑛) 𝜒𝛼 (2𝑛)
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
B. [𝜒2 , 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝑎
1−
2 2
2∑𝑛 𝑋𝑖 2∑𝑛 𝑋
C. [ 𝜒2𝑖=1 , 𝑖=1 𝑖 ]
(2𝑛) 𝜒2 (2𝑛)
𝛼 𝛼
1−
2 2
2∑𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑛
𝑖=1 ln 𝑋𝑖
D. [ 2 (2𝑛) , ]
𝜒𝛼 𝜒2 𝛼 (2𝑛)
1−
2 2
Question 5.
Suppose that 𝑋 has uniform distribution on the interval [0,100]. Let 𝑌 denote the greatest
integer smaller than or equal to X. Which of the following is true?
1
A. 𝑃(𝑌 ≤ 25) = 4
26
B. 𝑃(𝑌 ≤ 25) = 100
C. 𝐸(𝑌) = 50
101
D. 𝐸(𝑌) = 2
Question 6.
Let 𝑥1 = 3, 𝑥2 = 4, 𝑥3 = 3, 𝑥4 = 2.5 be the observed values of a random sample from the
probability density function
𝑥 𝑥
1 1 1 − 2
𝑓( 𝑥 ∣ 𝜃 ) = 3 [𝜃 𝑒 − 𝜃 + 𝜃2 𝑒 𝜃 + 𝑒 −𝑥 ] , 𝑥 > 0, 𝜃 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 {1,2,3,4}
259 | P a g e
Question 7.
Let the random variable 𝑋 and 𝑌 have the joint probability mass function
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 𝑒 −2 (𝑦) ( ) ( ) , 𝑦 = 0,1,2, … , 𝑥; 𝑥 = 0,1,2, …
4 4 𝑥!
Then 𝑉(𝑌) is equal to
A. 1
B. 1/2
C. 2
D. 3/2
Question 8.
Let the discrete random variables 𝑋 and 𝑌 have the joint probability mass function
𝑒 −1
; 𝑚 = 0,1,2, … , 𝑛; 𝑛 = 0,1,2, …
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) = {(𝑛 − 𝑚)! 𝑚! 2𝑛
0, otherwise
Which of the following statements is(are) TRUE?
A. The marginal distribution of 𝑋 is Poisson with mean 1/2
B. The random variable 𝑋 and 𝑌 are independent
1
C. The conditional distribution of X given Y = 5 is Bin (6, 2)
D. 𝑃(𝑌 = 𝑛) = (𝑛 + 1)𝑃(𝑌 = 𝑛 + 2) for 𝑛 = 0,1,2, …
Question 9.
Consider the trinomial distribution with the probability mass function
260 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
2! 1 𝑥 2 𝑦 3 2−𝑥−𝑦
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = ( ) ( ) ( )
𝑥! 𝑦! (2 − 𝑥 − 𝑦)! 6 6 6
, 𝑥 ≥ 0, 𝑦 ≥ 0, and 0 < 𝑥 + 𝑦 ≤ 2. Then Corr (𝑋, 𝑌) is equal to…
(correct up to two decimal places)
A) -0.31
B) 0.31
C) 0.35
D) 0.78
Question 10.
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2 be the observed values of a random sample of size
four from a distribution with the probability density function
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
Then the maximum likelihood estimate of 𝜃 2 + 𝜃 + 1 is equal (up to decimal place).
A) 1.75
B) 1.89
C) 1.74
D) 0.87
Question 11.
Let 𝑈 ∼ 𝐹5,8 and 𝑉 ∼ 𝐹8,5. If 𝑃[𝑈 > 3.69] = 0.05, then the value of C such that
𝑃[𝑉 > 𝑐] = 0.95 equals… (round off two decimal places)
A) 0.27
B) 1.27
C) 2.27
D) 2.29
Question 12.
Let P be a probability function that assigns the same weight to each of the points of the
sample space Ω = {1,2,3,4}. Consider the events E = {1,2}, F = {1,3} and G = {3,4}. Then
which of the following statement(s) is (are) TRUE?
1. E and F are independent
261 | P a g e
262 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
B. 2.5
C. 3.5
D. 4.5
Question 16.
Let 𝑋 be a random variable with cumulative distribution function
1 𝑛+2𝑘+1
𝑃(𝑋 = ℎ, 𝑌 = 𝑘) = ( ) ; 𝑛 = −𝑘, −𝑘 + 1, … , ; 𝑘 = 1,2, …
2
Then E(Y) equals
A. 1
B. 2
C. 3
D. 4
Question 17.
Let 𝑋 be a random variable with the cumulative distribution function
0, 𝑥<0
1 + 𝑥2
, 0≤𝑥<1
𝐹(𝑥) = 10
3 + 𝑥2
, 1≤𝑥<2
10
{ 1, 𝑥≥2
Which of the following statements is (are) TRUE?
3
A. 𝑃(1 < 𝑋 < 2) = 10
31
B. 𝑃(1 < 𝑋 ≤ 2) = 5
11
C. 𝑃(1 ≤ 𝑋 < 2) = 2
41
D. 𝑃(1 ≤ 𝑋 ≤ 2) = 5
Question 18.
Let the random variables 𝑋1 and 𝑋2 have joint probability density function
263 | P a g e
𝑥1 𝑒 −𝑥1𝑥2
𝑓(𝑥1 , 𝑥2 ) = { , 1 < 𝑥1 < 3, 𝑥2 > 0
2
0, otherwise.
What is the value Var (𝑋2 ∣ 𝑋1 = 2) …(up to two decimal place)?
A) 0.27
B) 0.28
C) 0.25
D) 1.90
Question 19
Let 𝑥1 = 1, 𝑥2 = 0, 𝑥3 = 0, 𝑥4 = 1, 𝑥5 = 0, 𝑥6 = 1 be the data on a random sample of size 6
from Bin (1, 𝜃) distribution, where 𝜃 ∈ (0,1). Then the uniformly minimum variance unbiased
estimate of 𝜃(1 + 𝜃) equal to
13.5 SUMMARY
The main points which we have covered in this lessons are what is estimator and what is
consistency, efficiency and sufficiency of the estimator and how to get best estimator.
13.6 GLOSSARY
• Motivation: These Problems are very useful in real life and we can use it in data
science , economics as well as social sciemce.
• Attention: Think how the best estimator are useful in real world problems.
13.7 ANSWER TO IN-TEXT QUESTIONS
Answer 1: A
Explanation :
1
𝑓(𝑥, 𝜃) = {𝜃 ; 0 < 𝑥 < 𝜃
0, otherwisse
𝑛𝜃
E(𝑋(𝑛) ) =
𝑛+1
1 𝜃
Let Y = 𝜃 (𝑋(𝑛) + 𝑛+1)
𝑋(𝑛) 1
so E(Y) = E( + )=1
𝜃 𝑛+1
1 1
lim𝑛→∞ 𝐸 [ (𝑋(𝑛) + )] = 1;
𝜃 𝑛+1
264 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
= 10
Hence option C is correct.
Answer 4: B
Explanation :
2
We use the random variable 𝑄 = 𝜃 ∑𝑛𝑖=1 𝑋𝑖 ∼ 𝜒(2𝑛)
2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
1 − 𝛼 = 𝑃 (𝜒α2 (2𝑛) ≤ 𝑄 ≤ 𝜒1−
2
α (2𝑛)) = 𝑃 [ 2 ≤𝜃≤ 2 (2𝑛) ]
2 2 𝜒 α (2𝑛) 𝜒α
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
Thus, 100(1 − 𝛼)% confidence interval for 𝜃 is given by [𝜒2 ≤𝜃≤ 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝛼
1−
2 2
Hence option B is correct.
Answer 5: B
Explanation :
Let 𝑌 = [X]; where 𝑌 denote the greatest integer smaller than or equal to 𝑋.
265 | P a g e
26 1 26
𝑃(𝑌 ≤ 25) = 𝑃([𝑋] ≤ 25) = 𝑃(𝑋 ∈ (0,26)) = ∫0 𝑑𝑥 =
100 100
Hence B is the correct option.
Answer 6: C
Explanation :
1
𝑥‾ = (3 + 4 + 3.5 + 2.5) = 3.25
4
1 1
𝐸(𝑋) = [𝜃 + 𝜃 2 + 1]Γ2 = [𝜃 + 𝜃 2 + 1] = 3.25
3 3
2
𝜃 + 𝜃 − 8.75 = 0 then 𝜃 = 2.5 or −3.5
Since 𝜃 ∈ {1,2,3,4} then 𝜃 = 3
Hence option C is correct.
Answer 7: D
Explanation :
The marginal pmf of 𝑌 is given by
∞ ∞
−2
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑌 = 𝑦) = ∑ 𝑃(𝑋 = 𝑥), 𝑌 = 𝑦) = ∑ 𝑒 (𝑦) ( ) ( )
4 4 𝑥!
𝑥=𝑦 𝑥=𝑦
𝑦 ∞ 𝑢 𝑦+𝑢
3 𝑦+𝑢 1 2
= 𝑒 −2 ( ) ∑ ( 𝑦 ) ( ) (Assume 𝑢 = 𝑥 − 𝑦)
4 4 (𝑦 + 𝑢)!
𝑢=0
𝑦 ∞ ∞
−2
3 (𝑦 + 𝑢)! 1 𝑢 2𝑦+𝑢 −2
3 𝑦 2𝑦 1! 1 𝑢
=𝑒 ( ) ∑ ( ) == 𝑒 ( ) ∑ ( )
4 𝑦! 𝑢! 4 (𝑦 + 𝑢)! 4 𝑦! 𝑢! 2
𝑢 0
3 3 𝑦
−2
3 𝑦 2𝑦 1/2 𝑒 −2(2)
=𝑒 ( ) 𝑒 = , 𝑦 = 0,1, …
4 𝑦! 𝑦!
Which is the pmf of Poisson random variable with parameter 3/2, so 𝐸(𝑋) = 3/2 and
𝑉(𝑋) = 3/2.
Answer 8: A
Explanation :
The marginal probability mass function of X is given by
𝑃(𝑋 = 𝑚) = ∑∞
𝑛=𝑚 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑚 = 0,1,2, … )
266 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1 1 𝑚
− ( )
𝑒 22
= , 𝑚 = 0,1,2, …
𝑚!
Thus the marginal distribution of X is Poisson with mean 1/2.
The marginal probability mass function of 𝑌 is given by
𝑃(𝑌 = 𝑛) = ∑∞ 𝑚=0 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑛 = 0,1,2, … )
𝑒 −1
= , 𝑛 = 0,1,2, …
𝑛!
Thus the marginal distribution of 𝑌 is Poisson with mean 1 .
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ≠ 𝑃(𝑋 = 𝑚)𝑃(𝑌 = 𝑛)
Therefore 𝑋 and 𝑌 are not independent.
𝑃(𝑋 = 𝑚, 𝑌 = 5) 5! 1 5
𝑃(𝑋 = 𝑚 ∣ 𝑌 = 5) = = ( ) , 𝑚 = 0,1,2, … ,5
𝑃(𝑌 = 5) 𝑚! (5 − 𝑚)! 2
1
Thus the conditional distribution of 𝑋 given 𝑌 = 5 is B in (5, 2)
𝑃(𝑌=𝑛)
Since 𝑃(𝑌=𝑛+1) = (𝑛 + 1) for 𝑛 = 0,1,2, …
Answer 9: 𝑨
Explanation :
The trinomial distribution of two r.v.'s 𝑋 and 𝑌 is given by
𝑛!
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑝 𝑥 𝑞 𝑦 (1 − 𝑝 − 𝑞)(𝑛−𝑥−𝑦)
𝑥! 𝑦! (𝑛 − 𝑥 − 𝑦)!
for 𝑥, 𝑦 = 0,1,2, … , 𝑛 and 𝑥 + 𝑦 ≤ 𝑛, where p + q ≤ 1.
n = 2, p = 1/6 and q = 2/6
1 1 10
Var (X) = 𝑛𝑝1 (1 − 𝑝1 ) = 2 × (1 − ) = ; Var (Y) = 𝑛𝑝2 (1 − 𝑝2 )
6 6 36
2 2
= 2 × ̅6 (1 − 6) = 16/36
1 2 4
Cov (𝑋, 𝑌) = −𝑛𝑝1 𝑝2 = −2 × × =−
6 6 36
Cov (𝑋, 𝑌) 4
Corr (𝑋, 𝑌) = =− = −0.31
√Var (𝑋)√Var (𝑌) 4√10
Hence −0.31 is the correct answer.
267 | P a g e
Answer 10: 𝐀
Explanation :
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
𝜃 ∈ (∞, 𝑋(1) ]
𝑑
Since 𝑑𝜃 𝑓(𝑥 ∣ 𝜃) > 0 ∀𝜃 ∈ (∞, 𝑋(1) ], then
268 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1 1 𝑘−1
= ( ) {𝑘 = 1,2, …
2 2
which is the pmf of geometric distribution with parameter 1/2}
269 | P a g e
1 1 𝑘−1
𝐸(𝑌) = ∑∞
𝑘=0 𝑘 ( ) =2
2 2
Hence option B is correct.
Answer 17: A
Explanation :
3
𝑃(1 < 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) =
10
3
𝑃(1 < 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) =
5
1
𝑃(1 ≤ 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) + 𝑃(𝑋 = 1) =
2
4
𝑃(1 ≤ 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) + 𝑃(𝑋 = 1) =
5
Answer 18: C
Explanation :
∞ 𝑥1 𝑒 −𝑥1 𝑥2 1
The marginal pdf of 𝑋1 is 𝑔(𝑥1 ) = ∫0 𝑑𝑥2 = 2 , 1 < 𝑥1 < 3
2
𝑓(𝑥1 , 𝑥2 )
ℎ(𝑥2 ∣ 𝑥1 ) = = 𝑥1 𝑒 −𝑥1𝑥2 , 𝑥2 > 0
𝑔(𝑥1 )
1
𝑋2 ∣ 𝑋1 ∼ Exp (𝑥1 ) with mean 𝑥
1
1
Therefore Var (𝑋2 ∣ 𝑋1 = 2) = 4 = 0.25
Hence 0.25 is correct answer.
Answer 19: 0.7
Explanation :
𝑇 𝑇(𝑇−1)
+ 𝑛(𝑛−1) is UMVUE of 𝜃(1 + 𝜃)
𝑛
𝑇 𝑇(𝑇−1)
𝑬 (𝑛 + 𝑛(𝑛−1)) = 𝜃(1 + 𝜃); where
𝑇 = ∑𝑛𝑖=1 𝑋𝑖
𝑇 𝑇(𝑇−1) 3 3(3−1) 21
Therefore, + = + = = 0.70
𝑛 𝑛(𝑛−1) 6 6(6−1) 30
270 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
13.8 REFERENCES
• S. C Gupta , V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
13.9 SUGGESTED READINGS
• S. C Gupta , V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
• B.L Agarwal, Programmed Statistics ,New Age International Publishers, 2nd Edition.
271 | P a g e
LESSON 14
ERROR IN HYPOTHESIS TESTING AND
POWER OF TEST
STRUCTURE
14.1 Learning Objectives
14.2 Introduction
14.3 Error in Hypothesis Testing and Power of Test
14.3.1 Type I and Type II Error
14.3.2 Unbiased Test and Unbiased Critical Region
14.3.3 UMP (Uniformly Most Powerful) Critical Region
14.3.4 Likelihoof Ratio Test
14.4 In-Text Questions
14.5 Summary
14.6 Glossary
14.7 Answer to In-Text Questions
14.8 References
14.9 Suggested Readings
14.1 LEARNING OBJECTIVES
One of the main objectives to discuss testing of hypothesis and how it can be use in real
analysis.
14.2 INTRODUCTION
The main problems in statistical inference can be broadly classified into two areas:
(i) The area of estimation of population parameter(s) and setting up of confidence intervals
for them, i.e, the area of point and intertal estimation and
(ii) Tests of statistical hypothesis.
In Neyman-Pearson theory, we use statistical methods to arrive at decisions in certain situations
where there is lack of certainty on the basis of a sample whose size is fixed in advance while
in Wald's sequential theory the sample size is not fixed but is regarded as a random variable.
14.3 ERROR IN HYPOTHESIS TESTING AND POWER
We will discuss these distributions in details.
(i) Type I and Type II Error
272 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
274 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Hence if 𝑇 = 𝑡(𝑥) is sufficient statistic for 𝜃 then the MPCR for the test may be defined in
terms of the marginal distribution of 𝑇 = 𝑡(x), rather than the joint distribution of
𝑥1 , 𝑥2 , … , 𝑥𝑛 .
14.3.3 UMP (Uniformly Most Powerful ) Critical Region
It provides best critical region for testing 𝐻0 : 𝜃 = 𝜃0 against the hypothesis 𝜃 = 𝜃1 , provided
𝜃1 > 𝜃0 while it defines the best critical region for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1 ,
provided 𝜃1 < 𝜃0 . Thus, the best critical region for testing simple hypothesis 𝐻0 : 𝜃 = 𝜃0
against the simple hypothesis 𝜃 = 𝜃1 + 𝑐, 𝑐 > 0 will not serve as best critical region for testing
simple hypothesis 𝐻0 : 𝜃 = 𝜃0 against simple alternative hypothesis 𝐻1 : 𝜃 = 𝜃0 − 𝑐, 𝑐 > 0.
Hence in this problem, no uniformly most powerful test exists for testing the simple hypothesis,
𝐻0 : 𝜃 = 𝜃0 against the composite alternative hypothesis, 𝐻1 : 𝜃 ≠ 𝜃0 .
However, for each alternative hypothesis, 𝐻1 : 𝜃 = 𝜃1 > 𝜃0 or 𝐻1 : 𝜃 = 𝜃 < 𝜃0 , 𝑎 UMP test
exists .
Remark. In particular, if we take 𝑛 = 2, then the B.C.R. for testing 𝐻0 : 𝜃 = 𝜃0 , against
𝐻1 : 𝜃 = 𝜃1 (> 𝜃0 ) is given by :
𝑊 = {x: (𝑥1 + 𝑥2 )/2 ≥ 𝜃0 + 𝜎𝑧𝑎 /√2}
= {x: 𝑥1 + 𝑥2 ≥ 2𝜃0 + √2𝜎𝑧𝛼 }
= {x: 𝑥1 + 𝑥2 ≥ 𝐶}, (say)
where 𝐶 = 2𝜃0 + √2𝜎𝑧𝛼 = 2𝜃0 + √2𝜎 × 1.645, if 𝛼 = 0.05.
Similarly, the B.C.R. for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1 (< 𝜃0 ) with 𝑛 = 2 and 𝛼 =
0.05 is given by
𝑊1 = {x: (𝑥1 + 𝑥2 )/2 ≤ 𝜃0 − 𝜎𝑧𝑎 /√2}
= {x: (𝑥1 + 𝑥2 ) ≤ 2𝜃0 − √2𝜎 × 1.645}
= {x: 𝑥1 + 𝑥2 ≤ C1 }, (say),
where 𝐶1 = 2𝜃0 − √2𝜎𝑧𝛼 = 2𝜃0 − √2𝜎 × 1.645, if 𝛼 = 0.05
The B.C.R. for testing 𝐻0 : 𝜃 = 𝜃0 against the two tailed alternative 𝐻1 : 𝜃 = 𝜃1 (≠ 𝜃0 ), is
given by : 𝑊2 = {x: (𝑥1 + 𝑥2 ≥ 𝐶) ∪ (𝑥1 + 𝑥2 ≤ 𝐶1 )}
The regions are given by the shaded portions in the following figures (i), (ii) and (iii)
respectively.
275 | P a g e
Example 1. Show that for the normal distribution with zero mean and variance 𝜎 2 , the best
critical region for 𝐻0 : 𝜎 = 𝜎0 against the alternative 𝐻1 : 𝜎 = 𝜎1 is of the form :
𝑛
276 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
277 | P a g e
𝑛 𝑛
𝛽1 𝑛
⇒ ( ) exp {−𝛽1 ∑ (𝑥𝑖 − 𝛾1 ) + 𝛽0 ∑ (𝑥𝑖 − 𝛾0 )} ≥ 𝑘
𝛽0
𝑖=1 𝑖=1
𝛽1 𝑛
⇒ ( ) exp [−𝛽1 𝑛(𝑥‾ − 𝛾1 ) + 𝛽0 𝑛(𝑥‾ − 𝛾0 )] ≥ 𝑘
𝛽0
⇒ 𝑛log (𝛽1 /𝛽0 ) − 𝑛𝑥‾(𝛽1 − 𝛽0 ) + 𝑛𝛽1 𝛾1 − 𝑛𝛽0 𝛾0 ≥ log 𝑘
(since log 𝑥 is an increasing function of 𝑥 ).
1 𝛽1
⇒ 𝑥‾(𝛽1 − 𝛽0 ) ≤ {𝛾1 𝛽1 − 𝛾0 𝛽0 − log 𝑘 + log ( )}
𝑛 𝛽0
1 1 𝛽1
∴ 𝑥‾ ≤ {𝛾1 𝛽1 − 𝛾0 𝛽0 − log 𝑘 + log ( )} provided 𝛽1 > 𝛽0 .
𝛽1 − 𝛽0 𝑛 𝛽0
Example 3. Examine whether a best critical region exists for testing the null hypothesis
𝐻0 : 𝜃 = 𝜃0 against the alternative hypothesis 𝐻1 : 𝜃 > 𝜃0 for the parameter 𝜃 of the distribution:
1+𝜃
𝑓(𝑥, 𝜃) = ,1 ≤ 𝑥 < ∞
(𝑥 + 𝜃)2
1
Solution. ∏𝑛𝑖=1 𝑓(𝑥𝑖 , 𝜃) = (1 + 𝜃)𝑛 ∏𝑛𝑖=1 (𝑥 +𝜃)2
𝑖
𝑥 +𝜃
Thus the test criterion is ∑𝑛𝑖=1 log (𝑥𝑖 +𝜃0), which cannot be put in the form of a function of the
𝑖 1
sample observations, not depending on the hypothesis. Hence no B.C.R. exists in this case.
278 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
𝐿 = ∏ 𝑓(𝑥𝑖 ; 𝜃1 , 𝜃2 , … , 𝜃𝑘 )
𝑖=1
279 | P a g e
According to the principle of maximum likelihood, the likelihood equation for estimating any
parameter 𝜃𝑖 is given by
∂𝐿
= 0, (𝑖 = 1,2, … , 𝑘)
∂𝜃𝑖
Using (18.17), we can obtain the maximum likelihood estimates for the parameters
(𝜃1 , 𝜃2 , … , 𝜃𝑘 ) as they are allowed to vary over the parameter space Θ and the subspace Θ0 .
Substituting these estimates in (18.16), we obtain the maximum values of the likelihood
function for variation of the parameters in Θ and Θ0 respectively. Then the criterion for the
likelihood ratio test is defined as the quotient of these two maxima and is given by
𝐿(Θ̂0 ) Sup𝜃∈Θ0 𝐿(x, 𝜃)
𝜆 = 𝜆(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) = =
𝐿(Θ̂) Sup𝜃∈Θ 𝐿(x, 𝜃)
where 𝐿(Θ̂0 ) and 𝐿Θ̂ are the maxima of the likelihood function with respect to the parameters
in the regions Θ0 and Θ respectively.
The quantity 𝜆 is a function of the sample observations only and does not involve re parameters.
Thus 𝜆 being a function of the random variables, is also a random variable. Obvious 𝜆 > 0.
Further Θ0 ⊂ Θ ⇒ 𝐿(Θ0 ) ≤ 𝐿(Θ) ⇒ 𝜆 ≤ 1
Hence, we get
The critical region for testing 𝐻0 (against 𝐻1 ) is an interval
0 < 𝜆 < 𝜆0
where 𝜆0 is some number (< 1) determined by the distribution of 𝜆 and the desired probability
of type 1 error, i.e., 𝜆0 is given by the equation :
𝑃(𝜆 < 𝜆0 ∣ 𝐻0 ) = 𝛼
For example, if 𝑔(.)𝑖𝑠 𝑡ℎ𝑒 𝑝. 𝑑. 𝑓. 𝑜𝑓𝜆 then 𝜆0 is determined from the equation :
𝜆0
∫ 𝑔(𝜆 ∣ 𝐻0 )𝑑𝜆 = 𝛼
0
A test that has critical region defined is a likelihood ratio test for testing H0 .
Remark. define the critical region for testing the hypothesis 𝐻0 by the likelihood ratio test.
Suppose that the distribution of 𝜆 is not known but the distribution of some function of 𝜆 is
known, then this knowledge can be utilized as given in the following theorem.
Theorem . If 𝜆 is the likelihood ratio for testing a simple hypothesis 𝐻0 and if 𝑈 = 𝜙(𝜆) is a
monotonic increasing (decreasing) function of 𝜆 then the test based on 𝑈 is equivalent to the
likelihood ratio test. The critical region for the test based on 𝑈 is :
𝜙(0) < 𝑈 < 𝜙(𝜆0 ), ∣ [[𝜙(𝜆0 ) < 𝑈 < 𝜙(0)]
280 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Proof. The critical region for the likelihood ratio test is given by 0 < 𝜆 < 𝜆0 , where 𝜆0 is
determined by
𝜆0
∫ 𝑔(𝜆 ∣ 𝐻0 )𝑑𝜆 = 𝛼
0
where ℎ(𝑢 ∣ 𝐻0 ) is the p.d.f. of 𝑈 when 𝐻0 is true. Here the critical region 0 < 𝜆 < 𝜆0
transforms to 𝜙(0) < 𝑈 < 𝜙(𝜆0 ). However if 𝑈 = 𝜙(𝜆) is a monotonic decreasing function
of 𝜆, then the inequalities are reversed and we get the critical region as 𝜙(𝜆0 ) < 𝑈 < 𝜙(0).
2. If we are testing a simple null hypothesis 𝐻0 then there is a unique distribution determined
for 𝜆. But if 𝐻0 is composite, then the distribution of 𝜆 may or may not be unique. In such a
case the distribution of 𝜆 may possibly be different for different parameter points in Θ0 and
then 𝜆0 is to be chosen such that
𝜆0
∫ 𝑔(𝜆 ∣ 𝐻0 )𝑑𝜆 ≤ 𝛼
0
281 | P a g e
282 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
D. standard deviation
Question 3.
Consider the sample linear regression model 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛 Where 𝜖𝑖′ 𝑠 are
i.i.d random variables with mean 0 and variance 𝜎 2 ∈ (0, ∞). Suppose that we have a data set
(𝑥1 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) with n = 10, ∑𝑛𝑖=1 𝑥𝑖 = 50, ∑𝑛𝑖=1 𝑦𝑖 = 40, ∑𝑛𝑖=1 𝑥𝑖2 = 500
∑𝑛𝑖=1 𝑦𝑖2 = 400 and ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 400. An unbiased estimate of 𝜎 2 is :
A. 5
B. 1/5
C. 10
D. 1/10
Question 4.
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is a random sample from a population with density
1 −𝑥
𝑓(𝑥, 𝜃) = {𝜃 𝑒
𝜃 if 0<𝑥 < ∞
0, otherwise
Where 𝜃 > 0 is an unknown parameter, what is a 100(1 − 𝛼)% confidence interval for 𝜃?
2∑𝑛 𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑖=1 ln 𝑋𝑖
A. [ , ]
𝜒2 𝛼 (2𝑛) 2 (2𝑛)
𝜒𝛼
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
B. [𝜒2 , 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝑎
1−
2 2
2∑𝑛 𝑋𝑖 2∑𝑛 𝑋
C. [ 𝜒2𝑖=1 , 𝑖=1 𝑖 ]
(2𝑛) 𝜒2 (2𝑛)
𝛼 𝛼
1−
2 2
2∑𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑛𝑖=1 ln 𝑋𝑖
D. [ 2 (2𝑛) , ]
𝜒𝛼 𝜒2 𝛼 (2𝑛)
1−
2 2
Question 5.
Suppose that 𝑋 has uniform distribution on the interval [0,100]. Let 𝑌 denote the greatest
integer smaller than or equal to X. Which of the following is true?
1
A. 𝑃(𝑌 ≤ 25) = 4
283 | P a g e
26
B. 𝑃(𝑌 ≤ 25) = 100
C. 𝐸(𝑌) = 50
101
D. 𝐸(𝑌) = 2
Question 6.
Let 𝑥1 = 3, 𝑥2 = 4, 𝑥3 = 3, 𝑥4 = 2.5 be the observed values of a random sample from the
𝑥 𝑥
1 1 1
−
probaability density function 𝑓( 𝑥 ∣ 𝜃 ) = 3 [𝜃 𝑒 − 𝜃 + 𝜃2 𝑒 𝜃2 + 𝑒 −𝑥 ] , 𝑥 >
0, 𝜃 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 {1,2,3,4}. Then the method of moment estimate (MME) of 𝜃 is
A. 1
B. 2
C. 3
D. 4
Question 7.
Let the random variable 𝑋 and 𝑌 have the joint probability mass function
−2
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 𝑒 (𝑦) ( ) ( ) , 𝑦 = 0,1,2, … , 𝑥; 𝑥 = 0,1,2, …
4 4 𝑥!
Then 𝑉(𝑌) is equal to
A. 1
B. 1/2
C. 2
D. 3/2
Question 8.
Let the discrete random variables 𝑋 and 𝑌 have the joint probability mass function
𝑒 −1
; 𝑚 = 0,1,2, … , 𝑛; 𝑛 = 0,1,2, …
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) = {(𝑛 − 𝑚)! 𝑚! 2𝑛
0, otherwise
Which of the following statements is(are) TRUE?
A. The marginal distribution of 𝑋 is Poisson with mean 1/2
B. The random variable 𝑋 and 𝑌 are independent
1
C. The conditional distribution of X given Y = 5 is Bin (6, 2)
284 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
285 | P a g e
Question 12.
Let P be a probability function that assigns the same weight to each of the points of the
sample space Ω = {1,2,3,4}. Consider the events E = {1,2}, F = {1,3} and G = {3,4}. Then
which of the following statement(s) is (are) TRUE?
1. E and F are independent
2. E and G are independent
3. E, F and G are independent
Select the correct answer using code given below:
A. 1 only
B. 2 only
C. 1 and 2 only
D. 1,2 and 3
Question 13.
Let 𝑋1 , 𝑋2 , … , 𝑋4 and 𝑌1 , 𝑌2 , … , 𝑌5 be two random samples of size 4 and 5 respectively, from a
51 𝑋 2 +𝑋 2 +𝑋 2 +𝑋 2
2 3 4
standard normal population. Define the statistic T = (4) 𝑌 2+𝑌 2 +𝑌 2 +𝑌 2 +𝑌 2 , then which of the
1 2 3 4 5
following is TRUE?
A. Expectation of 𝑇 is 0.6
B. Variance of T is 8.97
C. T has F-distribution with degree of freedom 5 and 4
D. T has F-distribution with degree of freedom 4 and 5
Question 14.
Let 𝑋, 𝑌 and 𝑍 be independent random variables with respective moment generating function
1 2
𝑀𝑋 (𝑡) = 1−𝑡 , 𝑡 < 1; 𝑀𝑌 (𝑡) = 𝑒 𝑡 /2 = 𝑀𝑍 (𝑡) 𝑡 ∈ ℝ. Let 𝑊 = 2𝑋 + 𝑌 2 + 𝑍 2 then P(W > 2)
is equals to
A. 2𝑒 −1
B. 2𝑒 −2
C. 𝑒 −1
D. 𝑒 −2
286 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Question 15.
Let 𝑥1 = 3, 𝑥2 = 4, 𝑥3 = 3, 𝑥4 = 2.5 be the observed values of a random sample from the
probability density function
1 1 𝑥 1 −𝑥
𝑓(𝑥 ∣ 𝜃) = [ 𝑒 − 𝜃 + 2 𝑒 𝜃2 + 𝑒 −𝑥 ] , 𝑥 > 0, 𝜃 ∈ (0, ∞)
3 𝜃 𝜃
Then the method of moment estimate (MME) of 𝜃 is
A. 1.5
B. 2.5
C. 3.5
D. 4.5
Question 16.
Let 𝑋 be a random variable with cumulative distribution function
1 𝑛+2𝑘+1
𝑃(𝑋 = ℎ, 𝑌 = 𝑘) = ( ) ; 𝑛 = −𝑘, −𝑘 + 1, … , ; 𝑘 = 1,2, …
2
Then E(Y) equals
A. 1
B. 2
C. 3
D. 4
Question 17.
Let 𝑋 be a random variable with the cumulative distribution function
0, 𝑥<0
1 + 𝑥2
, 0≤𝑥<1
𝐹(𝑥) = 10
3 + 𝑥2
, 1≤𝑥<2
10
{ 1, 𝑥≥2
Which of the following statements is (are) TRUE?
3
A. 𝑃(1 < 𝑋 < 2) = 10
31
B. 𝑃(1 < 𝑋 ≤ 2) = 5
287 | P a g e
11
C. 𝑃(1 ≤ 𝑋 < 2) = 2
41
D. 𝑃(1 ≤ 𝑋 ≤ 2) = 5
Question 18.
Let the random variables 𝑋1 and 𝑋2 have joint probability density function
𝑥1 𝑒 −𝑥1𝑥2
𝑓(𝑥1 , 𝑥2 ) = { , 1 < 𝑥1 < 3, 𝑥2 > 0
2
0, otherwise.
What is the value Var (𝑋2 ∣ 𝑋1 = 2) …(up to two decimal place)?
A) 0.27
B) 0.28
C) 0.25
D) 1.90
Question 19.
Let 𝑥1 = 1, 𝑥2 = 0, 𝑥3 = 0, 𝑥4 = 1, 𝑥5 = 0, 𝑥6 = 1 be the data on a random sample of size 6
from Bin (1, 𝜃) distribution, where 𝜃 ∈ (0,1). Then the uniformly minimum variance unbiased
estimate of 𝜃(1 + 𝜃) equal to
Question: 20
Let 𝑋1 , 𝑋2 , … , 𝑋𝑁 be identically distributed random variable with mean 2 and variance 1. Let N
be a random variable follows Poisson distribution with mean 2 and independent of 𝑋i′ S. Let
𝑆𝑁 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 , then Var (SN ) is equals:
A. 4
B. 10
C. 2
D. 1
Question: 21
Let 𝐴 and 𝐵 be independent Random Variables each having the uniform distribution on [0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵}, then Cov (𝑈, 𝑉) is equals
A. -1/36
B. 1/36
C. 1
288 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
D. 0
Question 22.
Let 𝑋1 , 𝑋2 , 𝑋3 be random sample from uniform (0, 𝜃 2 ), 𝜃 > 1, then maximum likelihood
estimation (mle) of 𝜃
2
A. 𝑋(1)
B. √X(3)
C. √X(1)
D. 𝛼𝑋(1) + (1 − 𝛼)𝑋(3) ; 0 < 𝛼 < 1
Question 23.
For the discrete variate with density:
1 6 1
𝑓(𝑥) = 𝐼(−1) (𝑥) + 𝐼(0) (𝑥) + 𝐼(1) (𝑥).
8 8 8
Which of the following is TRUE?
1
A. 𝐸(𝑋) = 2
1
B. 𝑉(𝑋) = 2
1
C. 𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4
1
D. 𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≥ 4
Question: 24
Lęt 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2)
be a i.i.d random sample of size 2 from a standard normal distribution. What is the
distribution W is given by
√2(𝑋1 + 𝑋2 )
𝑊=
√(𝑋2 − 𝑋1 )2 + (𝑌2 − 𝑌1 )2
A. t-distribution with 1 d.f
B. t-distribution with 2 d.f
C. Chi-square distribution with 2 d.f
D. Does not determined
289 | P a g e
Question: 25
The moment generating function of a random variable X is given by
1 1 1 1
𝑀𝑋 (𝑡) = 6 + 3 𝑒 𝑡 + 3 𝑒 2𝑡 + 6 𝑒 3𝑡 , −∞ < 𝑡 < ∞, then P(X ≤ 2) equals
1
A. 3
1
B. 6
1
C. 2
5
D. 6
Question: 26
1 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from 𝐔 (𝜃 − 2 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and 𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }.
1 1 1
Define 𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ), 𝑇2 = 4 (3𝑋(1) + 𝑋(𝑛) + 1) and 𝑇3 = 2 (3𝑋(𝑛) − 𝑋(1) − 2) an
estimator for 𝜃, then which of the following is/are TRUE?
A. 𝑇1 and 𝑇2 are MLE for 𝜃 but 𝑇3 is not MLE for 𝜃
B. 𝑇1 is MLE for 𝜃 but 𝑇2 and 𝑇3 are not MLE for 𝜃
C. 𝑇1 , 𝑇2 and 𝑇3 are MLE for 𝜃
D. 𝑇1 , 𝑇2 and 𝑇3 are not MLE for 𝜃
Question: 27
Let 𝑋 and 𝑌 be random variable having joint probability density function
𝑘
𝑓(𝑥, 𝑦) = ; −∞ < (𝑥, 𝑦) < ∞
(1 + 𝑥 2 )(1 + 𝑦2)
Where k is constant, then which of the following is/are TRUE?
1
A. k = 𝜋2
1 1
B. 𝑓(𝑥) = 𝜋 1+𝑥 2 ; −∞ < 𝑥 < ∞
C. P(X = Y) = 0
D. All of the above
290 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Question: 28
Lę 𝑋1 , 𝑋2 , … , 𝑋𝑛 be sequence of independently and identically distributed random variables
with the probability density function
1 2 −𝑥
𝑓(𝑥) = {2 𝑥 𝑒 , if 𝑥 > 0 and let
0, otherwise
𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , then which of the following statement is/are TRUE?
𝑆𝑛 −3𝑛
A. ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
𝑆
B. For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as n → ∞
𝑆𝑛
C. → 1 with probability 1
𝑛
D. Both A and B
Question: 29
Let 𝑋, 𝑌 are i.i.d Binomial (𝑛, 𝑝) random variables. Which of the following are true?
A. 𝑋 + 𝑌 ∼ Bin (2𝑛, 𝑝)
Question: 30
Let 𝑋 and 𝑌 be continuous random variables with the joint probability density function
1 2 2
1
𝑓(𝑥, 𝑦) = 2𝜋 𝑒 −2(𝑥 +𝑦 ) ; (𝑥, 𝑦) ∈ ℝ2
Which of the following statement is/are TRUE?
1
A. 𝑃(𝑋 > 0) = 2
1
B. P(X > 0 ∣ Y < 0) =
2
1
C. P(X > 0, Y < 0) = 4
D. All of the above
291 | P a g e
Question: 31
Let X and 𝑌 are random variable with 𝐸[𝑋] = 𝐸[𝑌], then which of the following is NOT
TRUE?
A. E{E[X ∣ Y]} = E[Y]
B. V(𝑋 − 𝑌) = 𝐸(𝑋 − 𝑌)2
C. 𝐸[𝑉(𝑋 ∣ 𝑌)] + 𝑉[𝐸(𝑋 ∣ 𝑌)] = 𝑉(𝑋)
D. X and Y have same distribution
Question: 32
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from Exp (𝜃 ) distribution, where 𝜃 ∈ (0, ∞).
1
If 𝑋‾ = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 , then a 95% confidence interval for 𝜃 is
2
𝜒2𝑛,0.95
A. (0, ]
𝑛𝑋‾
2
𝜒2𝑛,0.95
B. [ , ∞)
𝑛𝑋‾
2
𝜒2𝑛,0.95
C. (0, ]
2𝑛𝑋‾
2
𝜒2𝑛,0.95
D. [ , ∞)
2𝑛𝑋‾
Question: 33
𝑋𝑖 , 𝑖 = 1,2, … be independent random variables all distributed according to the PDF 𝑓𝑥 (𝑥) =
1,0 ≤ 𝑥 ≤ 1. Define 𝑌𝑛 = 𝑋1 𝑋2 𝑋3 … 𝑋𝑛 , for some integer n. Then Var (𝑌𝑛 ) is equal to
𝑛
A. 12
1 1
B. − 22𝑛
3𝑛
1
C. 12𝑛
1
D. 12
Question: 34
Let 𝑋1 , 𝑋2 , … , 𝑋4 be i.i.d random variables having continuous distribution.
Then 𝑃(𝑋3 < 𝑋2 < max(𝑋1 , 𝑋4 )) equal
A. 1/2
B. 1/3
C. 1/4
D. 1/6
292 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Question: 35
1 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 0 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and 𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }
Consider the following statement on above:
1
1. 𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ) is consistent for 𝜃
1
2. 𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
4
is unbiased consistent for 𝜃
Select the correct answer using code given below:
A. 1 only
B. 2 only
C. Both 1 and 2
D. Neither 1 nor 2
14.5 SUMMARY
The main points which we have covered in this lessons are what is estimator and what is
consistency, efficiency and sufficiency of the estimator and how to get best estimator.
14.6 GLOSSARY
• Motivation: These Problems are very useful in real life and we can use it in data
science , economics as well as social sciemce.
• Attention: Think how the best estimator are useful in real world problems.
14.7 ANSWER TO IN-TEXT QUESTIONS
Answer 1: A
Explanation :
1
𝑓(𝑥, 𝜃) = {𝜃 ; 0 < 𝑥 < 𝜃
0, otherwisse
𝑛𝜃
E(𝑋(𝑛) ) =
𝑛+1
1 𝜃
Let Y = 𝜃 (𝑋(𝑛) + 𝑛+1)
293 | P a g e
𝑋(𝑛) 1
so E(Y) = E( +𝑛+1) = 1
𝜃
1 1
lim𝑛→∞ 𝐸 [ (𝑋(𝑛) + )] = 1;
𝜃 𝑛+1
Hence option A is correct.
Answer 2: B
Explanation:
Convexity (peakedness) is decided by kurtosis.
Answer 3: C
Explanation :
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑛𝑥‾𝑦‾ 400 − 10 × 5 × 4 4 4
𝛽ˆ = 𝑛 2 2
= 2
= ; 𝛼ˆ = 𝑦‾ − 𝑥‾𝛽ˆ = 4 − 5 × = 0
∑𝑖=1 𝑥𝑖 − 𝑛𝑥‾ 500 − 10 × 5 5 5
An unbiased estimate of 𝜎 2 is
1 2 1 4 2
𝜎ˆ 2 = 𝑛−2 ∑𝑛𝑖=1 (𝑦𝑖 − 𝛼ˆ − 𝛽ˆ 𝑥𝑖 ) = 10−2 ∑𝑛𝑖=1 (𝑦𝑖 − 5 𝑥𝑖 )
1 4 4 2 1 8 16
= 8 (∑𝑛𝑖=1 𝑦𝑖2 − 2 × 5 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 + (5) ∑𝑛𝑖=1 𝑥𝑖2 ) = 8 (400 − 5 × 400 + 25 × 500)
= 10
Hence option C is correct.
Answer 4: B
Explanation :
2
We use the random variable 𝑄 = 𝜃 ∑𝑛𝑖=1 𝑋𝑖 ∼ 𝜒(2𝑛)
2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
1 − 𝛼 = 𝑃 (𝜒α2 (2𝑛) ≤ 𝑄 ≤ 𝜒1−
2
α (2𝑛)) = 𝑃 [ 2 ≤𝜃≤ 2 (2𝑛) ]
2 2 𝜒 α (2𝑛) 𝜒α
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
Thus, 100(1 − 𝛼)% confidence interval for 𝜃 is given by [𝜒2 ≤𝜃≤ 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝛼
1−
2 2
Hence option B is correct.
294 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Answer 5: B
Explanation :
Let 𝑌 = [X]; where 𝑌 denote the greatest integer smaller than or equal to 𝑋.
26 1 26
𝑃(𝑌 ≤ 25) = 𝑃([𝑋] ≤ 25) = 𝑃(𝑋 ∈ (0,26)) = ∫0 𝑑𝑥 =
100 100
Hence B is the correct option.
Answer 6: C
Explanation :
1
𝑥‾ = (3 + 4 + 3.5 + 2.5) = 3.25
4
1 1
𝐸(𝑋) = [𝜃 + 𝜃 2 + 1]Γ2 = [𝜃 + 𝜃 2 + 1] = 3.25
3 3
2
𝜃 + 𝜃 − 8.75 = 0 then 𝜃 = 2.5 or −3.5
Since 𝜃 ∈ {1,2,3,4} then 𝜃 = 3
Hence option C is correct.
Answer 7: D
Explanation :
The marginal pmf of 𝑌 is given by
∞ ∞
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑌 = 𝑦) = ∑ 𝑃(𝑋 = 𝑥), 𝑌 = 𝑦) = ∑ 𝑒 −2 (𝑦) ( ) ( )
4 4 𝑥!
𝑥=𝑦 𝑥=𝑦
𝑦 ∞ 𝑢
3 𝑦+𝑢 1 2𝑦+𝑢
= 𝑒 −2 ( ) ∑ ( 𝑦 ) ( ) (Assume 𝑢 = 𝑥 − 𝑦)
4 4 (𝑦 + 𝑢)!
𝑢=0
𝑦 ∞ ∞
−2
3 (𝑦 + 𝑢)! 1 𝑢 2𝑦+𝑢 3 𝑦 2𝑦 1! 1 𝑢
=𝑒 ( ) ∑ ( ) == 𝑒 −2 ( ) ∑ ( )
4 𝑦! 𝑢! 4 (𝑦 + 𝑢)! 4 𝑦! 𝑢! 2
𝑢 0
3 3 𝑦
− ( )
−2
3 𝑦 2𝑦 1/2 𝑒 22
=𝑒 ( ) 𝑒 = , 𝑦 = 0,1, …
4 𝑦! 𝑦!
Which is the pmf of Poisson random variable with parameter 3/2, so 𝐸(𝑋) = 3/2 and
𝑉(𝑋) = 3/2.
Answer 8: A
295 | P a g e
Explanation :
The marginal probability mass function of X is given by
𝑃(𝑋 = 𝑚) = ∑∞
𝑛=𝑚 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑚 = 0,1,2, … )
1 1 𝑚
𝑒 − 2(2)
= , 𝑚 = 0,1,2, …
𝑚!
Thus the marginal distribution of X is Poisson with mean 1/2.
The marginal probability mass function of 𝑌 is given by
𝑃(𝑌 = 𝑛) = ∑∞ 𝑚=0 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑛 = 0,1,2, … )
𝑒 −1
= , 𝑛 = 0,1,2, …
𝑛!
Thus the marginal distribution of 𝑌 is Poisson with mean 1 .
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ≠ 𝑃(𝑋 = 𝑚)𝑃(𝑌 = 𝑛)
Therefore 𝑋 and 𝑌 are not independent.
𝑃(𝑋 = 𝑚, 𝑌 = 5) 5! 1 5
𝑃(𝑋 = 𝑚 ∣ 𝑌 = 5) = = ( ) , 𝑚 = 0,1,2, … ,5
𝑃(𝑌 = 5) 𝑚! (5 − 𝑚)! 2
1
Thus the conditional distribution of 𝑋 given 𝑌 = 5 is B in (5, 2)
𝑃(𝑌=𝑛)
Since 𝑃(𝑌=𝑛+1) = (𝑛 + 1) for 𝑛 = 0,1,2, …
Answer 9: 𝑨
Explanation :
The trinomial distribution of two r.v.'s 𝑋 and 𝑌 is given by
𝑛!
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑝 𝑥 𝑞 𝑦 (1 − 𝑝 − 𝑞)(𝑛−𝑥−𝑦)
𝑥! 𝑦! (𝑛 − 𝑥 − 𝑦)!
for 𝑥, 𝑦 = 0,1,2, … , 𝑛 and 𝑥 + 𝑦 ≤ 𝑛, where p + q ≤ 1.
n = 2, p = 1/6 and q = 2/6
1 1 10
Var (X) = 𝑛𝑝1 (1 − 𝑝1 ) = 2 × (1 − ) = ; Var (Y) = 𝑛𝑝2 (1 − 𝑝2 )
6 6 36
2 2
= 2 × 6̅ (1 − 6) = 16/36
1 2 4
Cov (𝑋, 𝑌) = −𝑛𝑝1 𝑝2 = −2 × × =−
6 6 36
296 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
Cov (𝑋, 𝑌) 4
Corr (𝑋, 𝑌) = =− = −0.31
√Var (𝑋)√Var (𝑌) 4√10
Hence −0.31 is the correct answer.
Answer 10: 𝐀
Explanation :
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
𝜃 ∈ (∞, 𝑋(1) ]
𝑑
Since 𝑑𝜃 𝑓(𝑥 ∣ 𝜃) > 0 ∀𝜃 ∈ (∞, 𝑋(1) ], then
Answer 13: D
Explanation :
5 𝑋12 + 𝑋22 + 𝑋32 + 𝑋42 𝑛 5
𝑇=( ) 2 ∼ 𝐹(4,5); 𝐸(𝑊) = =
4 𝑌1 + 𝑌22 + 𝑌32 + 𝑌42 + 𝑌52 𝑛−2 3
2(5)2 (7) 350
Var (𝑇) = = = 9.72
4(3)2 (1) 36
Hence option D is correct.
Answer 14: A
Explanation :
2
Since 𝑊 = 2𝑋 + 𝑌 2 + 𝑍 2 ∼ 𝜒(4)
1 −𝑤/2
𝑓𝑊 (𝑤) = {4 𝑤𝑒 , if 𝑤 > 0
0, otherwise
∞1
𝑃(𝑊 > 2) = ∫2 𝑤𝑒 −𝑤/2 𝑑𝑤 = 2𝑒 −1
4
Hence option A is correct.
Answer 15: C
Explanation :
1
𝑥‾ = (3 + 4 + 3.5 + 2.5) = 3.25
4
1 1
𝐸(𝑋) = [𝜃 + 𝜃 2 + 1]Γ2 = [𝜃 + 𝜃 2 + 1] = 3.25
3 3
2
𝜃 + 𝜃 − 8.75 = 0 then 𝜃 = 2.5 or −3.5
Since 𝜃 ∈ (0, ∞) then 𝜃 = 2.5
Hence option C is correct.
Answer 16: B
Explanation :
𝑃(𝑌 = 𝑘) = ∑∞
𝑛=−𝑘 𝑃(𝑋 = 𝑛, 𝑌 = 𝑘): { put m = n + k}
1 1 𝑘−1
= ( ) {𝑘 = 1,2, …
2 2
which is the pmf of geometric distribution with parameter 1/2}
298 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1 1 𝑘−1
𝐸(𝑌) = ∑∞𝑘=0 𝑘 ( ) =2
2 2
Hence option B is correct.
Answer 17: A
Explanation :
3
𝑃(1 < 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) =
10
3
𝑃(1 < 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) =
5
1
𝑃(1 ≤ 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) + 𝑃(𝑋 = 1) =
2
4
𝑃(1 ≤ 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) + 𝑃(𝑋 = 1) =
5
Answer 18: C
Explanation :
∞ 𝑥1 𝑒 −𝑥1 𝑥2 1
The marginal pdf of 𝑋1 is 𝑔(𝑥1 ) = ∫0 𝑑𝑥2 = 2 , 1 < 𝑥1 < 3
2
𝑓(𝑥1 , 𝑥2 )
ℎ(𝑥2 ∣ 𝑥1 ) = = 𝑥1 𝑒 −𝑥1𝑥2 , 𝑥2 > 0
𝑔(𝑥1 )
1
𝑋2 ∣ 𝑋1 ∼ Exp (𝑥1 ) with mean 𝑥
1
1
Therefore Var (𝑋2 ∣ 𝑋1 = 2) = 4 = 0.25
Hence 0.25 is correct answer.
Answer 19: 0.7
Explanation :
𝑇 𝑇(𝑇−1)
+ 𝑛(𝑛−1) is UMVUE of 𝜃(1 + 𝜃)
𝑛
𝑇 𝑇(𝑇−1)
𝑬 (𝑛 + 𝑛(𝑛−1)) = 𝜃(1 + 𝜃); where
𝑇 = ∑𝑛𝑖=1 𝑋𝑖
𝑇 𝑇(𝑇−1) 3 3(3−1) 21
Therefore, 𝑛 + 𝑛(𝑛−1) = 6 + 6(6−1) = 30 = 0.70
299 | P a g e
Answer 20: B
Explanation:
Let 𝑋1 , 𝑋2 , … be identically distributed random variable and let N be a random variable.
Define 𝑆𝑁 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁
Then E(SN ) = E(Xi ) ⋅ E(N) = 4
𝑉(𝑆𝑁 ) = 𝐸(𝑁)Var (𝑋𝑖 ) + [𝐸(𝑋𝑖 )]2 Var (𝑁) = 10
Answer 21: B
Explanation:
If 𝐴 and 𝐵 be independent 𝑅𝑎𝑛𝑑𝑜𝑚 𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒 each having the uniform distribution on [0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵},
then
𝐸(𝑈) = 1/3, 𝐸(𝑉) = 2/3 and 𝑈𝑉 = 𝐴𝐵 and 𝑈 + 𝑉 = 𝐴 + 𝐵
Thus Cov (𝑈, 𝑉) = 𝐸(𝑈𝑉) − 𝐸(𝑈)
𝐸(𝑉) = 𝐸(𝐴𝐵) − 𝐸(𝑈)
1 2 1
E(V) = E(A) ⋅ E(B) − E(U) ⋅ E(V) = − =
4 9 36
Answer 22: B
Explanation:
1
𝑋𝑖 ∼ 𝑈(0, 𝜃 2 ) 𝑓(𝑥) = ; 0 < 𝑥𝑖 < 𝜃 2
𝜃2
𝑋(3) ≤ 𝜃 2 ⇒ 𝜃ˆ ∈ [√𝑋(3) , ∞)
3
1
𝐿(𝑋, 𝜃) = ∏ 𝑓(𝑥𝑖 , 𝜃) =
𝜃6
𝑖=1
∂𝐿
⇒ ∂𝜃 < 0 there fore given function is decreasing then 𝜃ˆ = √𝑋(3)
Answer 23: C
Explanation:
X −1 0 1
300 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
1 6 1 1
E(𝑋 2 ) = 1 × + 0 × + 1 × =
8 8 8 4
2) 2
1 1
𝑉(𝑋) = 𝐸(𝑋 − {𝐸(𝑋)} = ⇒ 𝜎𝑋 =
4 2
𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } = 𝑃{|𝑋| ≥ 1} = 1 − 𝑃(|𝑋| < 1)
= 1 − 𝑃(−< 𝑋 < 1) = 1 − 𝑃(𝑋 = 0) = 1/4
1
𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4 [By Chebychev’s inequality]
Answer 24: B
Explanation:
Let 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2) be a i.i.d random sample of size 2 from a standard normal distribution.
√2(X1 +X2 )
Then W = ∼ 𝑡(2)
√(X2 −X1 )2 +(Y2 −Y1 )2
Hence option (b) is correct.
Answer 25: D
Explanation:
Let 𝑋 be Random Variable with 𝑀𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∑etx P(X = x)
1
; 𝑥=0
6
1
; 𝑥=1
3
Then 𝑃(𝑋 = 𝑥) = 1
; 𝑥=2
3
1
{6 ; 𝑥 = 3
𝑃(𝑋 ≤ 2) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)
1 1 1 5
= + + =
6 3 3 6
Answer 26: A
Explanation:
1 1
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 2 , 𝜃 + 2)
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
𝜃ˆ ∈ [𝑋(𝑛) − , 𝑋(1) + ]
2 2
distribution of 𝑋 free from parameter,
301 | P a g e
1 1
then 𝜃ˆ = 𝜆 (𝑋(𝑛) − 2) + (1 − 𝜆) (𝑋(1) + 2) ; 0 < 𝜆 < 1
1 1 3
Take 𝜆 = 2 , 4 and 4 then we obtained mle of 𝜃 are
1 1 1
(𝑋(1) + 𝑋(𝑛) ); 4 (3𝑋(1) + 𝑋(𝑛) + 1); 4 (3𝑋(1) + 𝑋(𝑛) + 1) respectively.
2
302 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Basic Statistics for Economics
303 | P a g e
Explanation:
1 1
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 2 , 𝜃 + 2)
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
ˆ
𝜃 ∈ [𝑋(𝑛) − 2 , 𝑋(1) + 2] ; distribution of 𝑋
1 1
free from parameter, then 𝜃ˆ = 𝜆 (𝑋(𝑛) − 2) + (1 − 𝜆) (𝑋(1) + 2) ; 0 < 𝜆 < 1
1 1
Take 𝜆 = 2 , 4 we get
1
𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ) is MLE as well as consistent for 𝜃
1
𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
4
is MLE as well as consistent for 𝜃 but not unbiased…
Hence option (A) is correct.
14.8 REFERENCES
• S. C Gupta, V.K Kapoor, Fundamentals of Mathematical Statistics, Sultan Chand
Publication, 11th Edition.
14.9 SUGGESTED READINGS
• S. C Gupta, V.K Kapoor, Fundamentals of Mathematical Statistics, Sultan Chand
Publication, 11th Edition.
• B.L Agarwal, Programmed Statistics, New Age International Publishers, 2nd Edition.
305 | P a g e
9 788119 169580