A Hand Note for Department of Finance
Time series, Sampling and Test of Hypothesis
𝑦=𝑇×𝑆×𝐶×𝐼
Sample
Design
Sample
Size
Sampling
Frame
Sampling
Technique
𝒛𝟐 × 𝒔𝟐 . 𝑵
𝒏= 𝟐
𝒛 × 𝒔𝟐 + (𝑵 − 𝟏)𝒆𝟐
𝒙 − 𝝁𝟎
𝒛= 𝝈
(
)
𝒏
Department of Finance
Jagannath University
Prepared By
Md. Mazharul Islam (Jony)
ID no:091541, 3rd Batch.
Department of Finance.
Jagannath University.
Email: jony007ex@gmail.com
Md. Mazharul Islam Jony
1|Page
Time Series
Time Series:
A time series is a set of observation taken at specific times, usually at equal interval. In other
words, Time series is a collection of data recorded over a period of time (weekly, monthly,
quarterly), an analysis of history that can be used by management to make current decisions and
plans based on long term forecasting. It usually assumes past pattern to continue into the future.
Mathematically, a time series is defined by the values 𝑦1 , 𝑦2 , … .. of a variable y (temperature,
closing price of share etc) at times 𝑡1 , 𝑡2 , … … . Thus y is a function of t; this is symbolized 𝑦 =
𝑓(𝑡). Here t is an independent variable and y is a dependent variable. For example, The total
annual production of rice in Bangladesh over of years, the daily closing price of a share on the
stock exchange, the hourly temperatures etc.
Components of Time Series:
There are four basic types of variations which account for the change in the series over a period of
time. These four types of patterns, variations, movements are known as components or elements of
time series. They are,
Secular Trend:
The general or smooth tendency of the data to grow and decline over a long period of time is
technically called secular trend or simply trend. By trend we mean, smooth, regular, long term
movement of the data; sudden and erratic movement either in upward or in downward direction
have nothing to do with the trend. For example, Population, Price, Production etc are showing
upward trend.
Trend can be divided under two heads:
1. Linear or Straight line trend: A linear trend equation is used when the data are increasing or
decreasing by equal amount. The equation is 𝑦 = 𝑎 + 𝑏𝑡.
2. Nonlinear trend: A nonlinear trend equation is used when the data are increasing or
decreasing by increasing amounts over time. The equation is log 𝑦 = log 𝑎 + log 𝑏(𝑡).
Seasonal Variation:
Seasonal variation is a pattern of change in time series that have taken place during a period of 12
month which tends to repeat each year as a result of change in climate, weather condition, festival
etc. For example, During winter there is a greater demand for woolen clothes, On the occasion of
Eid, there is a big demand for sweets and bank withdrawal go up for shopping.
Uses of seasonal Variation:
Seasonal variation fluctuations help plan for sufficient goods and materials on hand to meet
verifying seasonal demand.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
2|Page
Seasonal variations are fluctuation that coincide with certain seasons and are repeated year
after year.
Analysis of seasonal fluctuations over a period of years helps in evaluating current sales.
Cyclical Variation:
The term cycle refers to the recurrent variation in time series that usually last longer than a year
and regular, neither in amplitude nor in length. A business cycle consists of the recurrence of rise
and fall movements of business activity from some sort of statistical trend or normal (a statistical
average). There are four well-defined periods or phases in the business cycle namely, i) Prosperity,
ii) Decline, iii) Depression, iv) Improvement.
For example, In prosperity period, Share business is booming, prices are high and profits are
made. After a period of time, the price of share decline, it also decline the business activity, then
the depression come. As a result factories close, business fail. After a rigid economy, in the
improvement or recovery period, increasing business activity with rising prices.
Irregular Variation:
Irregular Variation refers to such variations in business activities which do not repeat in a definite
pattern. In fact, the category labeled irregular variation is really intended to include all types of
variation other than those accounting for the trend, seasonal and cyclical movements. For example,
Irregular variations are caused by such isolated special occurrences as flood, earthquakes, strikes,
war, rapid technological progress etc.
Irregular Variation can be classified into two types.
Episodic: Episodic fluctuations are unpredictable, but they can be identified. The initial
impact on the economy of a major strike or a war can be identified, but strike or war cannot
be predicted.
Residual: After the episodic fluctuations have been removed, the remaining variation is
called residual variation. The residual fluctuations, often called chance fluctuations, are
unpredictable and they cannot be identified.
Note: Neither episodic nor residual variation can be projected into the future.
Mathematical Model or Relationship for Time Series:
In traditional time series analysis, mathematical relationship can be described as, 𝑦𝑡 = 𝑓(𝑇, 𝑆, 𝐶, 𝐼).
Where,
T = Secular trend, S = Seasonal variation, C = Cyclical variation, I = Irregular variation,
y = Total value of time series a time t.
There are two relationships in the time series.
Multiplicative Relationship: Multiplicative relationship between four components, that is, it is
assumed that any particular value in series is the product of factors than can be attributed to the
various components. Symbolically,
𝑦=𝑇×𝑆×𝐶×𝐼
In this model, the seasonal, cyclical and irregular items except trend are not viewed as absolute
amounts but rather as relative influences.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
3|Page
For example, A seasonal index of 110 percent would mean that the actual value is 10 percent
higher than it otherwise would be because of seasonal influences.
Additive Relationship: Additive relationship is to treat each observation of a time series as the
sum of these four components. Symbolically,
𝑦 = 𝑇 + 𝑆 + 𝐶 + 𝐼.
When the relationship is assumed the major aim of time series analysis is to isolate those parts of
the overall variation of a time series which are traceable to each of these four components and
measuring each part independently.
In contrast, the multiplicative model is not only considered the standard or traditional assumption
for series analysis. It is more often employed in practice than all other possible models combined.
Moving-Average Method:
The moving average method is not only useful in smoothing a time series to see its trend; it is the
basic method used in measuring the seasonal fluctuation. When a trend is to be determined by the
method of moving average, the average value for a number of years ( or months, or weeks) is
secured and this average is taken as the normal or trend value for the unit of time falling at the
middle of the period covered in the calculation of the average.
The effect of averaging is to give a smoother curve, lessening the influence of the fluctuations that
pull the annual figures away from general trend.
While applying this method, it is necessary to select a period for moving average such as 3 yearly
moving averages, 6 yearly moving averages, 8 yearly moving averages etc. The period of moving
average is to be decided in the light of the length of the cycle.
The formula for 3 years moving average, will be:
𝑎+𝑏+𝑐 𝑏+𝑐+𝑑 𝑐+𝑑+𝑒
,
,
………
3
3
3
Md. Mazharul Islam (Jony)
ID no:091541, 3rd Batch.
Department of Finance.
Jagannath University.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
4|Page
Sampling
Sampling:
Sampling refers to drawing a sample (a subset) from a population (the full set). In other words, Sampling is
that part of statistical practice concerned with the selection of an unbiased or random subset of individual
observations within a population of individuals intended to yield some knowledge about the population of
concern, especially for the purposes of making predictions based on statistical inference. Sampling is an
important aspect of data collection.
Census:
A census is the process of obtaining information about every member of a population (not necessarily a
human population). It can be contrasted with sampling in which information is only obtained from a subset
of a population. As such it is a method used for accumulating statistical data.
Sample and Population:
A sample in a research study is a relatively small number of individuals about whom information is
obtained. The larger group to whom the information is then generalized is the population.
Reasons for sampling instead of census / Need for sampling:
There are 6 reasons for sampling. They are described below..
1. Economy: Unit cost of collecting data in the case of census is significantly less than in the case of
sampling. For example: In case of census is taka 200, while in the case of sampling is taka 1,000 but due to
the larger number of items the total cost involve in the case of census of census is significantly higher than
in the case of sampling.
For example, We can find out the total cost of collecting information by multiplying the total population
with the unit cost in case of census. Here total population = N
We can find out the total cost of collecting information by multiplying the sample size with the unit cost in
case of census. Here sample size=n.
Census
:
Sampling :
10,00,000 × 200 = 20,00,00,000
5,000 × 1000 = 50,00,000
2.Timeliness: Unit time involve in the case of sampling then in the case census but due to the larger size of
population total time involve in the case of census in significantly higher than in the case of census.
3. Large size of many populations: In some cases the size of the population is extremely large. All of
them are not treasonable due in traveling, disease, death, mental abnormality, prisoners etc. In that situation
the only way to conduct the research is collecting data through a sample survey.
4. Inaccessibility of the entire population: In some cases the entire population may not be accessible. At
that case sampling is necessary. Suppose in some cases the entire population is inaccessible because of
aircraft crash.
5. Destructive nature of many populations: Due to destructive nature of many of the population, the
resources are completed to collect information only on a part of the population.
For example:
Blood test for a patient.
Life hours of a tube light.
6. Reliability: By using a scientific sampling technique one can minimize the sampling error and as
qualified investigators are included, the non-sampling error committed in the case of sample survey is also
minimum.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
5|Page
The amount of non-sampling error in the case of census is much higher than the total amount of sampling
and non-sampling error committed in the case of a sample survey ( as less qualified investigator are involve
in the case of census and the supervision, monitoring and quality control mechanism in the case of census.
The degree of errors has a relationship with reliability. If error decrease than the reliability increase
sampling decrease both the sampling and non-sampling error. So, it enhance the reliability of information.
Sampling Errors:
There are two types of sampling errors:
1. Sampling error: The gap between the sample mean and population mean constitute the sampling error.
The gaps between various sample means are knows as sampling fluctuation.
2. Non-sampling error: Any sample estimate many also be affected by error other than the sampling error
is known as non-sampling error. Non-sampling error are two types:
i. Systematic: The errors arising out of the systematic tendency of the respondents to conceal the fact
and to overestimate or under-estimate values.
ii. Unsystematic: Errors arising in the process of collecting, recording, processing and analyzing the
data due to carelessness or other mistakes are known as unsystematic errors.
Sampling errors in Sampling and Census:
1. In the case of census there is no sampling error but there are non sampling errors. In the case sampling
there are both sampling error and non-sampling error.
2. By using a scientific sampling technique one can minimize the sampling error and as qualified
investigators are included, the non-sampling error committed in the case of sample survey is also minimum.
The amount of non-sampling error in the case of census is much higher than the total amount of sampling
and non-sampling error committed in the case of a sample survey as less qualified investigator are involve
in the case of census and the supervision, monitoring and quality control mechanism in the case of census.
Sample Design
There are three Components of Sample Design. They are…
Sample
Design
Sample
Size
Sampling
Frame
Sampling
Technique
SAMPLE SIZE
In general, the larger the sample size (selected with the use of probability techniques) the better. The more
heterogeneous a population is on a variety of characteristics (e.g. race, age, sexual orientation, religion)
then a larger sample is needed to reflect that diversity.
Determination of sample size through the approach based on confidence level and precision rate:
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
6|Page
If population size is unknown
𝑛=
𝑧2 × 𝑠2
𝑒2
𝑜𝑟
𝑛=
𝑧2 × 𝑝 × 𝑞
𝑒2
where,
n = sample size
s = standard deviation
p = proportion of success for the indicator
q =1-p
z = standard normal variate at a given level of significance (z = 1.96 at 95% level of confidence)
e = Precision rate or amount of admissible error in the estimate
If population size is known:
𝑧2 × 𝑠2. 𝑁
𝑛= 2
𝑧 × 𝑠 2 + (𝑁 − 1)𝑒 2
𝑧 2 × 𝑝 × 𝑞. 𝑁
𝑛= 2
𝑧 × 𝑝 × 𝑞 + (𝑁 − 1)𝑒 2
𝑜𝑟
where,
n = sample size
p = proportion of success for the indicator
q=1-p
z = standard normal variate at a given level of significance. (z = 1.96 at 95% levels of confidence)
N= Population size
e = Precision rate or amount of admissible error in the estimate
The sample size will be adjusted further (if n > 5%,- N) using the formula given below:
𝑛
𝑛𝑎 = 𝑛 ,
1+
𝑁
Where, 𝑛𝑎 = adjusted sample size
Note:
* Large Population Size’s 1% to 2%.
* Medium Population Size’s 5% to 10%.
* Small Population Size’s 20% to 50%.
* If there is a budget & time constraint, a sample size of 30 items from each group irrespective of the
total number of items of different groups may help objectively assess the characteristics of all the items
belonging to different groups.
PROBLEMS:
1. Determine the sample size for estimating the true weight of containers based on the following
information:
i. Estimate must be made at 95% confidence level and within 0.15 kg of the true weight.
ii. The standard deviation of weight = 1.5 kg..
2. In an election for the post of President of the FBCCI there are 2 candidates: X and Y. You are
interested in estimating the proportion of voters favoring Candidate X at 95% confidence interval and with
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
7|Page
an error not to exceed 2%. A previous poll showed 55% of the registered voters favoring candidate X.
What sample size should be used for the purpose?
3. Beximco Pharma wishes to estimate the true proportion of defectiveness of its products at 99%
confidence level and with 2% precision rate. Past record indicates that 6% of the products are defective.
What would be the required sample size for estimating the true proportion of defectiveness in a production
lot of 25000 units?
4. ACI wishes to estimate the true proportion of defectiveness of its products at 99% confidence level
and with 3% precision rate. Past record indicates that 13% of the products are defective. What would be the
required sample size for estimating the true proportion of defectiveness in a production lot of 8000 units?
5. The government wishes to be 95% certain of estimating the mean income of garments workers within
an error of Tk. 20. Past studies indicate that the standard deviation of income is Tk. 200. What size of
sample of garments workers should be used to obtain a reliable estimate of the mean income?
6. Monno Ceramic wishes to estimate the true proportion of defectiveness of its products at 99%
confidence level and with 3% precision rate. Past record indicates that 9% of the products are defective.
What would be the required sample size for estimating the true proportion of defectiveness in a production
lot of 10000 units?
Sampling Frame
A sample frame is a list that includes every member of the population from which a sample is to be taken.
Without some form of sample frame, a random sample of a population, other than an extremely small
population, is impossible.
In short, The List Containing the Particulars about all the items of a Population is known as the Sampling
Frame. It is Prepared with a view to facilitating the Researcher to Select required Samples.
Sampling Methods or Techniques:
Based on the Methods of Drawing Samples, Sampling Techniques are broadly be divided into two
types:
Sampling
Techniques
Random Sampling
Non-Random
Sampling
Simple Random
Sampling
Persuasive/ Judgmental
Stratified Random
Sampling
Convenience
Cluster Sampling
Quata
Systematic Sampling
Multiphase Sampling
Multistage Sampling
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
8|Page
Random sampling:
When all the items of the population have an equal chance to be included in the sample, the technique is
known as Random sampling. In random sampling, each item or element of the population has an equal
chance of being chosen at each draw.
A sample is random if the method for obtaining the sample meets the criterion of randomness (each
element having an equal chance at each draw). The actual composition of the sample itself does not
determine whether or not it was a random sample.
There are 6 types of Random Sampling, they are, Simple Random Sampling, Stratified Sampling,
Systematic Sampling, Cluster Sampling, Multiphase Sampling and Multistage Sampling.
Simple Random Sampling
Simple random sample (SRS) is a special case of a random sample. A sample is called simple random
sample if each unit of the population has an equal chance of being selected for the sample. Whenever a unit
is selected for the sample, the units of the population are equally likely to be selected. It is an equal
probability sampling method (which is abbreviated by EPSEM). EPSEM means "everyone in the sampling
frame has an equal chance of being in the final sample."
Difference between Random Sample and Simple Random Sample:
If each unit of the population has known (equal or un-equal) probability of selection in the sample, the
sample is called a random sample. If each unit of the population has equal probability of being selected for
the sample, the sample obtained is called simple random sample.
Selection of Simple Random Sample:
A simple random sample is usually selected by without replacement. The following methods are used for
the selection of a simple random sample:
Lottery Method: This is an old classical method but it is a powerful technique and modern
methods of selection are very close to this method. All the units of the population are numbered
from 1 𝑡𝑜 𝑁. This is called sampling frame. These numbers are written on the small slips of paper
or the small round metallic balls. The paper slips or the metallic balls should be of the same size
otherwise the selected sample will not be truly random. The slips or the balls are thoroughly mixed
and a slip or ball is picked up. Again the population of slips is mixed and the next unit is selected. In
this manner, the number of slips equal to the sample size ′𝑛′ is selected. The units of the population
which appear on the selected slips make the simple random sample. This method of selection is
commonly used when size of the population is small. For a large population there is a big heap of
paper slips and it is difficult to mix the slips properly
Using a Random Number Table: All the units of the population are numbered from 1 to 𝑁 or from
0 to 𝑁 − 1. We consult the random number table to take a simple random sample. Suppose the size
of the population is 80 and we have to select a random sample of 8 units. The units of the
population are numbered from 01 to 80. We read two-digit numbers from the table of random
numbers. We can take a start from any columns or rows of the table. Let us consult random number
table given in this content. Two-digit numbers are taken from the table. Any number above 80 will
be ignored and if any number is repeated, we shall not record it if sampling is done without
replacement. Let us read the first two columns of the table. The random number from the table is
10, 37, 08, 12, 66, 31, 63 and 73. The two numbers 99 and 85 have not been recorded because the
population does not contain these numbers. The units of the population whose numbers have been
selected constitute the simple random sample. Let us suppose that the size of the population is 100.
If the units are numbered from 001 to 100, we shall have to read 3-digit random numbers. From the
first 3 columns of the random number table, the random numbers are 100, 375, 084, 990 and 128
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
9|Page
and so on. We find that most of the numbers are above 100 and we are wasting our time while
reading the table. We can avoid it by numbering the units of the population from 00 to 99. In this
way, we shall read 2-digit numbers from the table. Thus if N is 100, 1000 or 10000, the numbering
is done from 00 to 99, 000 to 999 or 0000 to 9999.
Using the Computer: The facility of selecting a simple random sample is available on the
computers. The computer is used for selecting a sample of prize-bond winners, a sample of Hajj
applicants, and a sample of applicants for residential plots and for various other purposes.
Stratified sampling:
Stratified sampling is probably the most commonly used probability method. In a stratified sample the
sampling frame is divided into non-overlapping groups or strata, e.g. geographical areas, age-groups,
genders. A sample is taken from each stratum, and when this sample is a simple random sample it is
referred to as stratified random sampling.
Stratification is the process of grouping members of the population into relatively homogeneous
subgroups before sampling. The strata should be mutually exclusive: every element in the population must
be assigned to only one stratum. The strata should also be collectively exhaustive: no population element
can be excluded. Then random or systematic sampling is applied within each stratum. This often improves
the representativeness of the sample by reducing sampling error. It can produce a weighted mean that has
less variability than the arithmetic mean of a simple random sample of the population.
Stratified sampling techniques are generally used when the population is heterogeneous, or dissimilar,
where certain homogeneous, or similar, sub-populations can be isolated (strata).
Types of stratified random sampling:
There are actually two different types of stratified sampling.
Proportional stratified sampling: When the total sample size is distributed among different strata
according to size of the population of each of the strata then the type of sampling is known as proportional
stratified random sampling. In proportional stratified, the sample proportions are made to be the same as
the population proportions on the stratification variable(s). Proportional stratified sampling is an equal
probability sampling.
Proportional weight of each of the strata is maintained through the use of following formula..
𝑁𝑖
𝑛𝑖 =
×𝑛
𝑁
Here, 𝑛𝑖 = sample size of the ith stratum,
𝑁𝑖 = Population size of the ith stratum,
𝑛 = sample size.
Example: If in Uttara sector 12, there are 10000 poor, 5000 Middle class and 20000 Rich people lived. We
take 200 people from here as sample to precede a survey, and then allocate the total sample size to
different strata following the techniques of proportional stratified random sampling.
Solution:
Here,
𝑁𝑖 =10000+5000+20000 = 35000, n = 200.
Then, Sample size
For poor
Department of Finance
10000
𝑛1 = 35000 × 200 = 57
Jagannath University
Md. Mazharul Islam (Jony)
ID no:091541, 3rd Batch.
Department of Finance.
Jagannath University.
Md. Mazharul Islam Jony
10 | P a g e
5000
For Middle class,
𝑛2 = 35000 × 200 = 29
For Rich,
𝑛3 = 35000 × 200 = 114
20000
Example: There are 90000 trader in the city of Dhaka of these 50% are retailers the proportion of
Arothdar, Wholesaler, hawker are 10%, 15%, 50%, 25% respectively. Determine the size of the sample to
estimate the contribution of trade services to the national economy on GDP at 95% Confidence level and
2% precision. Allocate the total sample size to different Strata following the technique of proportional
stratified random sampling.
Solution:
Here, N = 90000, z = 95% = 1.96, p = 0.50, q = (1−0.50) = 0.50, e = 0.02.
Then 𝑛 =
𝑧 2 ×𝑝×𝑞×𝑁
2
𝑧 2×𝑝 ×𝑞+(𝑁 −1)𝑒
(1.96)2 ×0.50×0.50×90000
= (1.96)2 ×0.50×0.50+(90000 −1)0.02 2 = 2339
∴ Sample size
For Retailer
𝑛3 = 90000 × 2339 = 1169
[90000 × 50% = 45000]
For Arothdar
𝑛1 = 90000 × 2339 = 234
[90000× 10% = 9000]
𝑛2 = 90000 × 2339 = 351
[90000 × 15% = 13500]
𝑛4 = 90000 × 2339 = 585
[90000 × 25% = 22500]
For Wholesaler
Foe hawker
45000
9000
13500
22500
Disproportional stratified sampling: In disproportional stratified sampling, the sample proportions are
made to be different from the proportions on the stratification variable(s). In other words, the subsamples
are not proportional to their sizes in the population.
Here is an example showing the difference between proportional and disproportional stratified sampling:
For example if gender is your stratification variable and the population is composed of 75% females
and you want a sample of 100 people, then you would randomly select 75 females and 25 males. In
disproportional stratified sampling you might instead select 50 males and 50 females from this same
population. In the first case the percentages are proportional; in the second case they are not
proportional.
In both types, the sampling frame is first divided into subpopulations.
Difference between proportional and Disproportional stratified random sampling
The main difference between the two sampling techniques is the proportion given to each stratum with
respect to other strata. In proportional sampling, each stratum has the same sampling fraction while in
disproportional sampling technique; the sampling fraction of each stratum varies.
Proportionate Versus Disproportionate Stratification
All stratified sampling designs fall into one of two categories, each of which has strengths and weaknesses
as described below.
Proportionate stratification. With proportionate stratification, the sample size of each stratum is
proportionate to the population size of the stratum. This means that each stratum has the
same sampling fraction.
Proportionate stratification provides equal or better precision than a simple random sample
of the same size.
Gains in precision are greatest when values within strata are homogeneous.
Gains in precision accrue to all survey measures.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
11 | P a g e
Disproportionate stratification. With disproportionate stratification, the sampling fraction may
vary from one stratum to the next.
The precision of the design may be very good or very poor, depending on how sample points
are allocated to strata. The way to maximize precision through disproportionate stratification
is discussed in a subsequent lesson (see Statistics Tutorial: Sample Size Within Strata).
If variances differ across strata, disproportionate stratification can provide better precision
than proportionate stratification, when sample points are correctly allocated to strata.
With disproportionate stratification, the researcher can maximize precision for a single
important survey measure. However, gains in precision may not accrue to other survey
measures.
Cluster Sampling:
A cluster is a group of individuals or objects having different characteristics. In a cluster sampling, clusters
are so formed as there be the maximum heterogeneity within cluster and homogeneity between cluster to
cluster. Cluster sampling involves selecting the sample units in groups.
In short, Cluster sampling is a sampling technique used when "natural" groupings are evident in a statistical
population.
For example, a sample of telephone calls may be collected by first taking a collection of telephone lines and
collecting all the calls on the sampled lines. The analysis of cluster samples must take into account the
intra-cluster correlation which reflects the fact that units in the same cluster are likely to be more similar
than two units picked at random.
Systematic Sampling
Systematic sampling is a random sampling technique which is frequently chosen by researchers for its
simplicity and its periodic quality. Systematic sampling relies on arranging the target population according
to some ordering scheme and then selecting elements at regular intervals through that ordered list.
A common way of selecting members for a sample population using systematic sampling is simply to
divide the total number of units in the general population by the desired number of units for the sample
population. The result of the division serves as the marker for selecting sample units from within the
general
population.
For example, if anyone wanted to select a random group of 1,000 people from a population of 50,000 using
systematic sampling, he would simply select every 50th person, since 50,000/1,000 = 50.
Multiphase Sampling
Multiphase Sampling is a sampling method in which certain items of information are drawn from the whole
units of a sample and certain other items of information are taken from the subsample.
For example, if the collection of information concerning variate, y, is relatively expensive, and there exists
some other variate, x, correlated with it, which is relatively cheap to investigate, it may be profitable to
carry out sampling in two phases. At the first phase, x is investigated, and the information thus obtained is
used either,
(a) to stratify the population at the second phase, when y is investigated, or
(b) as supplementary information at the second phase, a ratio or regression estimate being used.
Two-phase sampling is sometimes called ―double sampling‖.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
12 | P a g e
Multistage sampling
Multistage sampling is a complex form of cluster sampling in which two or more levels of units are
embedded one in the other. Multistage sampling is a sampling method in which the population is divided
into a number of groups or primary stages from which samples are drawn; these are then divided into
groups or secondary stages from which samples are drawn, and so on.
For instance, Geographic areas (primary units), Factories (secondary units), and Employees (tertiary
units). At each stage, a sample of the corresponding units is selected. At first, a sample of primary units is
selected, then, in each of those selected, a sample of secondary units is selected, and so on. All ultimate
units (individuals, for instance) selected at the last step of this procedure are then surveyed.
Multistage sampling is used frequently when a complete list of all members of the population does not exist
and is inappropriate. Moreover, by avoiding the use of all sample units in all selected clusters, multistage
sampling avoids the large, and perhaps unnecessary, costs associated traditional cluster sampling.
Non-Random Sampling:
Under non-random sampling samples are chosen on the basis of the experience, experiment, liking and
disliking of the researcher. Here all the items do not have equal opportunities to select. There are three
types of Non-random sampling. They are discussed below.
Persuasive/ Judgmental Sampling:
Judgment sampling is a common nonprobability method. The researcher selects the sample based on
judgment. This is usually an extension of convenience sampling. For example, a researcher may decide to
draw the entire sample from one "representative" city, even though the population includes all cities. When
using this method, the researcher must be confident that the chosen sample is truly representative of the
entire population. In judgment sampling, the researcher or some other "expert" uses his/her judgment in
selecting the units from the population for study based on the population’s parameters.
In short, Judgment sampling involves the choice of subjects who are most advantageously placed or in the
best position to provide the information required.
Convenience sampling:
When data are collected from a group or chunk of responded at the convenience of the researcher then this
kind of sampling is known as convenience sampling. This type of sampling is conducted only to assess the
attitudinal dimensions of the respondent.
Convenience sampling is a non-probability method. This means that subjects are chosen in a nonrandom
manner, and some members of the population have no chance of being included.
Convenience sampling is also known as Opportunity Sampling, Accidental Sampling or Haphazard
Sampling.
Example: A group of students in a high school do a study about teacher attitudes. They interview teachers
at the school, a couple of teachers in the family and few others who are known to their parents.
Quota Sampling:
In quota sampling, the population is first segmented into mutually exclusive sub-groups, just as in stratified
sampling. Then judgment is used to select the subjects or units from each segment based on a specified
proportion. For example, an interviewer may be told to sample 200 females and 300 males between the age
of 45 and 60. This means that individuals can put a demand on who they want to sample (targeting)
Quota sampling is useful when time is limited, a sampling frame is not available, the research budget is
very tight or when detailed accuracy is not important. You can also choose how many of each category is
selected.
There are similarities with stratified sampling, but in quota sampling the selection of the sample is nonrandom.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
13 | P a g e
Example: Interviewers might be tempted to interview those who look most helpful. The problem is that
these samples may be biased because not everyone gets a chance of selection. This random element is its
greatest weakness and quota versus probability has been a matter of controversy for many years
In judgment sampling, the researcher or some other "expert" uses his/her judgment in selecting the
units from the population for study based on the population’s parameters.
Random versus Non-random Samples
In statistics, a sample is a subset of a population. Usually, the population is very large, making a complete
enumeration of all the values in the population impractical or impossible. The sample represents a subset of
manageable size; the sample size is the number of units in the sample. Samples are collected and statistics
are calculated from the samples so that one can make inferences or extrapolations from the sample to the
population. This process of collecting information from a sample is referred to as sampling.
Samples are selected in such a way as to avoid presenting a biased view of the population. The sample will
be unrepresentative of the population if certain members of the population are excluded from any possible
sample. For example, if a researcher is interested in the drug-usage patterns among teenagers, but collects
the sample from schools, the sample is biased because it excludes teenagers not in school for a variety of
reasons, such as lack of funds to attend or schooled at home. Biases may also occur if some members of the
population are more likely or less likely to be included in the sample than other members of the population
for a reason other than the sample design. So the sample collected from schools is also biased because
students who miss a lot of school days because of a chronic illness will be less likely to be selected than
students who attend regularly.
The best way to avoid a biased or unrepresentative sample, and thus to obtain a representative sample of
the population, is to select a random sample, also known as a probability sample. A random sample is
defined as a sample in which every individual member of the population has a non-zero probability of
being selected as part of the sample. In a simple random sample, every individual member of the population
has the same probability of being selected as every other individual member. Other types of random
samples fall under the category of complex sample design.
A sample that is not random is called a non-random sample or a non-probability sample. Some examples of
non-random samples are convenience samples, judgment samples, purposive samples, quota samples, and
snowball samples.
Shortcuts
Here are some important terms used in sampling:
A sample is a set of elements taken from a larger population.
The sample is a subset of the population which is the full set of elements or people or whatever you are
sampling.
A statistic is a numerical characteristic of a sample, but a parameter is a numerical characteristic of
population.
Sampling error refers to the difference between the value of a sample
statistic, such as the sample mean, and the true value of the population Md. Mazharul Islam (Jony)
ID no:091541, 3rd Batch.
parameter, such as the population mean. Note: some error is always
present in sampling. With random sampling methods, the error is
Department of Finance.
random rather than systematic.
Jagannath University.
The response rate is the percentage of people in the sample selected for
the study who actually participate in the study.
A sampling frame is just a list of all the people that are in the population. Here is an example of a
sampling frame (a list of all the names in my population, and they are numbered). Note that the
following sampling frame also has information on age and gender included in case you want to draw
some samples and do some calculations.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
14 | P a g e
Test of Hypothesis
Test of Hypothesis
A hypothesis is a theory that is testable. A hypothesis test is a statistical method that uses sample data to
evaluate a hypothesis about a population or populations. Every hypothesis test requires the analyst to state a
null hypothesis and an alternative hypothesis. The hypotheses are stated in such a way that they are
mutually exclusive. That is, if one is true, the other must be false; and vice versa.
Null Hypothesis (𝑯𝒐 )
The null hypothesis represents a theory that has been put forward, either because it is believed to be true or
because it is to be used as a basis for argument, but has not been proved.
The null hypothesis (H0) states that in the general population there is no change, no difference, or no
relationship … (=, ≤, ≥). In the context of an experiment, the null hypothesis predicts that the experiment
has no effect on the dependent variable for the population.
For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on
average, than the current drug. We would write
𝑯𝒐 : There is no difference between the two drugs on average.
We give special consideration to the null hypothesis. This is due to the fact that the null hypothesis relates
to the statement being tested, whereas the alternative hypothesis relates to the statement to be accepted if /
when the null is rejected.
Alternative Hypothesis (𝑯𝟏 )
The alternative hypothesis, 𝐻1 , is a statement of what a statistical hypothesis test is set up to establish. The
alternative hypothesis (H1) states that there is a change, a difference, or a relationship for the general
population … (≠, <, >). In the context of an experiment, the alternative hypothesis predicts that the
independent variable does have an effect on the dependent variable. For example, in a clinical trial of a new
drug, the alternative hypothesis might be that the new drug has a different effect, on average, compared to
that of the current drug. We would write
𝐻1 : the two drugs have different effects, on average.
The alternative hypothesis might also be that the new drug is better, on average, than the current drug. In
this case we would write
𝐻1 : the new drug is better than the current drug, on average.
The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We
either "Reject 𝐻𝑜 in favor of 𝐻1 " or "Do not reject 𝐻𝑜 "; we never conclude "Reject 𝐻1 ", or even "Accept
𝐻1 ".
If we conclude "Do not reject 𝐻𝑜 ", this does not necessarily mean that the null hypothesis is true, it only
suggests that there is not sufficient evidence against 𝐻𝑜 in favor of H1. Rejecting the null hypothesis then,
suggests that the alternative hypothesis may be true.
Test Statistic
A test statistic is a quantity calculated from our sample of data. Its value is used to decide whether or not
the null hypothesis should be rejected in our hypothesis test.
The choice of a test statistic will depend on the assumed probability model and the hypotheses under
question.
Critical Value(s)
The critical value(s) for a hypothesis test is a threshold to which the value of the test statistic in a sample is
compared to determine whether or not the null hypothesis is rejected.
The critical value for any hypothesis test depends on the significance level at which the test is carried out,
and whether the test is one-sided or two-sided.
See also critical region.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
15 | P a g e
Critical Region
The critical region CR, or rejection region RR, is a set of values of the test statistic for which the null
hypothesis is rejected in a hypothesis test. That is, the sample space for the test statistic is partitioned into
two regions; one region (the critical region) will lead us to reject the null hypothesis H0, the other will not.
So, if the observed value of the test statistic is a member of the critical region, we conclude "Reject H0"; if
it is not a member of the critical region then we conclude "Do not reject H0".
Power
The power of a statistical hypothesis test measures the test's ability to reject the null hypothesis when it is
actually false - that is, to make a correct decision.
In other words, the power of a hypothesis test is the probability of not committing a type II error. It is
calculated by subtracting the probability of a type II error from 1, usually expressed as:
Power = 1 - P(type II error) = 1 − 𝛽
The maximum power a test can have is 1, the minimum is 0. Ideally we want a test to have high power,
close to 1.
Test Statistic
It is the random variable X whose value is tested to arrive at a decision. The Central Limit Theorem states
that for large sample sizes (n > 30) drawn randomly from a population, the distribution of the means of
those samples will approximate normality, even when the data in the parent population are not distributed
normally. A z statistic is usually used for large sample sizes (n > 30), but often large samples are not easy
to obtain, in which case the t-distribution can be used. The population standard deviation is estimated by
the sample standard deviation, s. The t curves are bell shaped and distributed around t=0. The exact shape
on a given t-curve depends on the degrees of freedom. In case of performing multiple comparisons by one
way Anova, the F-statistic is normally used.It is defined as the ratio of the mean square due to the
variability between groups to the mean square due to the variability within groups. The critical value of F is
read off from tables on the F-distribution knowing the Type-I error and the degrees of freedom between
& within the groups.
Confidence and Precision
The confidence level of a confidence interval is an assessment of how confident we are that the true
population mean is within the interval.
The precision of the interval is given by its width (the difference between the upper and lower endpoint).
Wide intervals do not provide us with very precise information about the location of the true population
mean. Short intervals provide us with very precise information about the location of the population mean.
If the sample size n remains the same:
Increasing the confidence level of an interval decreases precision
Decreasing the confidence level of an interval increases its precision
Generally confidence levels are chosen to be between about 90% and 99%. These confidence levels usually
provide reasonable precision and confidence.
Decision Errors in Test of Hypothesis
In statistical hypothesis testing, there are two types of errors that can be made or incorrect conclusions that
can be drawn. If a null hypothesis is incorrectly rejected when it is in fact true, this is called a Type I error
(also known as a false positive). A Type II error (also known as a false negative) occurs when a null
hypothesis is not rejected despite being false. The Greek letter 𝛼 is used to denote the probability of type I
error, and the letter 𝛽 is used to denote the probability of type II error.
Type I error
Type I error, also known as an "error of the first kind", an 𝛼 error, or a "false positive": the error of
rejecting a null hypothesis when it is actually true. Plainly speaking, it occurs when we are observing a
difference when in truth there is none, thus indicating a test of poor specificity. An example of this would
be if a test shows that a woman is pregnant when in reality she is not, or telling a patient he is sick when in
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
16 | P a g e
fact he is not. Type I error can be viewed as the error of excessive credulity. In other words, a Type I error
indicates "A Positive Assumption is False"
Types of error
Type of decision
Accept(or Fail to Reject )
Reject H0
H0
Correct decision
Wrong decision
H0 true
(Null Hypothesis is true)
(1- 𝛼)
Type I error (𝛼)
Confidence Level
False Positive
Wrong decision
Correct decision
H0 false
(Alternative Hypothesis (H1) is true)
Type II error (𝛽)
(1- 𝛽)
False Negative
Power of the Test
1.00
1.00
Sum
A type I error is often considered to be more serious, and therefore more important to avoid, than a type II
error. The hypothesis test procedure is therefore adjusted so that there is a guaranteed 'low' probability of
rejecting the null hypothesis wrongly; this probability is never 0. This probability of a type I error can be
precisely computed as,
P(type I error) = significance level = 𝛼.
Type II error
Type II error, also known as an "error of the second kind", a 𝛽 error, or a "false negative": the error of
failing to reject a null hypothesis when in fact we should have rejected it. In other words, this is the error of
failing to observe a difference when in truth there is one, thus indicating a test of poor sensitivity. An
example of this would be if a test shows that a woman is not pregnant, when in reality, she is. Type II error
can be viewed as the error of excessive skepticism. In other words, a Type II error indicates "A Negative
assumption is False".
The probability of a type II error is generally unknown, but is symbolized by 𝛽 and written,
P(type II error) = 𝛽
False positive rate
The false positive rate is the proportion of absent events that yield positive test outcomes, i.e., the
conditional probability of a positive test result given an absent event.
The false positive rate is equal to the significance level. The specificity of the test is equal to 1
minus the false positive rate.
In statistical hypothesis testing, this fraction is given the Greek letter 𝛼, and 1 − 𝛼 is defined as the
specificity of the test. Increasing the specificity of the test lowers the probability of type I errors,
but raises the probability of type II errors (false negatives that reject the alternative hypothesis
when it is true).
False negative rate
The false negative rate is the proportion of present events that yield negative test outcomes, i.e., the
conditional probability of a negative test result given present event.
In statistical hypothesis testing, this fraction is given the letter 𝛽. The power (or the sensitivity) of
the test is equal to 1 minus 𝛽.
Bayes' theorem
The probability that an observed positive result is a false positive (as contrasted with an observed
positive result being a true positive) may be calculated using Bayes' theorem.
The key concept of Bayes' theorem is that the true rates of false positives and false negatives are
not a function of the accuracy of the test alone, but also the actual rate or frequency of occurrence
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
17 | P a g e
within the test population; and, often, the more powerful issue is the actual rates of the condition
within the sample being tested.
In summary:
Rejecting a null-hypothesis when it should not have been rejected creates a type I error.
Failing to reject a null-hypothesis when it should have been rejected creates a type II error.
(In either case, a wrong decision or error in judgment has occurred.)
Decision rules (or tests of hypotheses), in order to be good, must be designed to minimize errors of
decision.
Minimizing errors of decision is not a simple issue for any given sample size the effort to reduce
one type of error generally results in increasing the other type of error.
Based on the real-life application of the error, one type may be more serious than the other.
(In such cases, a compromise should be reached in favor of limiting the more serious type of error.)
The only way to minimize both types of error is to increase the sample size, and this may or may
not be feasible.
Significance Level
The significance level of a statistical hypothesis test is a fixed probability of wrongly rejecting the
null hypothesis H0, if it is in fact true.
It is the probability of a type I error and is set by the investigator in relation to the consequences of
such an error. That is, we want to make the significance level as small as possible in order to
protect the null hypothesis and to prevent, as far as possible, the investigator from inadvertently
making false claims.
The significance level is usually denoted by 𝛼.
Significance Level = P(type I error) = 𝛼
Usually, the significance level is chosen to be 0.05 (or equivalently, 5%).
One-Tailed and Two-Tailed Tests
A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling
distribution, is called a one-tailed test. For example, suppose the null hypothesis states that the
mean is less than or equal to 10. The alternative hypothesis would be that the mean is greater than
10. The region of rejection would consist of a range of numbers located on the right side of
sampling distribution; that is, a set of numbers greater than 10.
A test of a statistical hypothesis, where the region of rejection is on both sides of the sampling
distribution, is called a two-tailed test. For example, suppose the null hypothesis states that the
mean is equal to 10. The alternative hypothesis would be that the mean is less than 10 or greater
than 10. The region of rejection would consist of a range of numbers located on both sides of
sampling distribution; that is, the region of rejection would consist partly of numbers that were less
than 10 and partly of numbers that were greater than 10.
Steps in Hypothesis Testing
All hypothesis tests are conducted the same way. The researcher states a hypothesis to be tested,
formulates an analysis plan, analyzes sample data according to the plan, and accepts or rejects the
null hypothesis, based on results of the analysis.
1.
Stating the Management Question: The first step is to state the management problem in
terms of a question that identifies the population(s) of interest to the researcher, the
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
18 | P a g e
2.
parameter(s) of the variable under investigation, and the hypothesized value of the
parameter(s). This step makes the researcher not only define what is to be tested but what
variable will be used in sample data collection.
State the hypotheses: Every hypothesis test requires the analyst to state a null hypothesis
and an alternative hypothesis. The hypotheses are stated in such a way that they are
mutually exclusive. That is, if one is true, the other must be false; and vice versa.
3.
Level of Significance: Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.
4.
Choosing Test method: Typically, the test method involves a test statistic and a sampling
distribution. Computed from sample data, the test statistic might be a mean test, proportion,
difference between means, difference between proportions, z-test, t-test, chi-square test, etc.
Given a test statistic and its sampling distribution, a researcher can assess probabilities
associated with the test statistic. If the test statistic probability is less than the significance
level, the null hypothesis is rejected.
Calculate Test Statistic: The fifth step is to calculate a statistic analogous to the parameter
specified by the null hypothesis. If the null hypothesis is defined by the parameter (𝜇), then
the statistics computed on our data set would be the mean (𝑥 ) and the standard deviation
(s). Mathematically,
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 −𝐻𝑦𝑝𝑜𝑡 ℎ𝑒𝑠𝑖𝑠𝑒𝑑 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
Test Statistic =
5.
Standard
6.
7.
8.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐
𝑆𝐷
𝜎
error between 𝑥 and 𝜇:
= = 𝜎𝑥
𝑛
𝑛
Comparison between Calculated value and the Table of the Theoretical Value:
Compare the observed value of the statistic to the critical value.
Make a decision: If the test statistic falls in the critical region, Reject H0 in favors of HA. If
the test statistic does not fall in the critical region, conclude that there is not enough
evidence to reject H0.
Provide Answer to the Management Question: The final step is to describe the results and
state correct statistical conclusions in an understandable way to the management. The
conclusion consists of two statements, one describing the results of the null hypothesis and
the other describing the results of the alternative hypothesis. The first statement should state
as to whether we accepted or rejected the null hypothesis and for what value of alpha or pvalue for our test statistic. The second statement should answer the research question
proposed in step 1 stating the sample statistic collected which estimated the parameter we
hypothesized.
Md. Mazharul Islam (Jony)
ID no:091541, 3rd Batch.
Department of Finance.
Jagannath University.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
19 | P a g e
Terminology of Hypothesis Test
All hypothesis tests share the same basic terminology and structure.
A null hypothesis is an assertion about a population that you would like to test. It is "null" in the
sense that it often represents a status quo belief, such as the absence of a characteristic or the lack of
an effect. It may be formalized by asserting that a population parameter, or a combination of
population parameters, has a certain value. In the example given in the Introduction, the null
hypothesis would be that the average price of gas across the state was $1.15. This is written
H0: µ = 1.15.
An alternative hypothesis is a contrasting assertion about the population that can be tested against
the null hypothesis. In the example given in the Introduction, possible alternative hypotheses are:
H1: µ ≠ 1.15 — State average was different from $1.15 (two-tailed test)
H1: µ > 1.15 — State average was greater than $1.15 (right-tail test)
H1: µ < 1.15 — State average was less than $1.15 (left-tail test)
To conduct a hypothesis test, a random sample from the population is collected and a relevant test
statistic is computed to summarize the sample. This statistic varies with the type of test, but its
distribution under the null hypothesis must be known (or assumed).
The p value of a test is the probability, under the null hypothesis, of obtaining a value of the test
statistic as extreme or more extreme than the value computed from the sample.
The significance level of a test is a threshold of probability α agreed to before the test is conducted.
A typical value of α is 0.05. If the p value of a test is less than α, the test rejects the null hypothesis.
If the p value is greater than α, there is insufficient evidence to reject the null hypothesis. Note that
lack of evidence for rejecting the null hypothesis is not evidence for accepting the null hypothesis.
Also note that substantive "significance" of an alternative cannot be inferred from the statistical
significance of a test.
The significance level α can be interpreted as the probability of rejecting the null hypothesis when it
is actually true—a type I error. The distribution of the test statistic under the null hypothesis
determines the probability α of a type I error. Even if the null hypothesis is not rejected, it may still
be false—a type II error. The distribution of the test statistic under the alternative hypothesis
determines the probability β of a type II error. Type II errors are often due to small sample sizes.
The power of a test, 1 – β, is the probability of correctly rejecting a false null hypothesis.
Results of hypothesis tests are often communicated with a confidence interval. A confidence
interval is an estimated range of values with a specified probability of containing the true population
value of a parameter. Upper and lower bounds for confidence intervals are computed from the
sample estimate of the parameter and the known (or assumed) sampling distribution of the
estimator. A typical assumption is that estimates will be normally distributed with repeated
sampling (as dictated by the Central Limit Theorem). Wider confidence intervals correspond to
poor estimates (smaller samples); narrow intervals correspond to better estimates (larger samples).
If the null hypothesis asserts the value of a population parameter, the test rejects the null hypothesis
when the hypothesized value lies outside the computed confidence interval for the parameter.
Chi-square test
A chi-square test (also chi squared test or 𝜒 2 test) is any statistical hypothesis test in which the
sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is
true, or any in which this is asymptotically true, meaning that the sampling distribution (if the null
hypothesis is true) can be made to approximate a chi-square distribution as closely as desired by
making the sample size large enough. Chi-square test also known as Pearson's chi-square test.
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
20 | P a g e
Definitions of other symbols:
α, the probability of
Type I error
(rejecting a null
hypothesis when it is
in fact true)
n = sample size
n1 = sample 1 size
n2 = sample 2 size
= sample mean
μ0 = hypothesized
population mean
μ1 = population 1
mean
μ2 = population 2
mean
σ = population
standard deviation
σ2 = population
variance
Name
s = sample standard deviation
s2 = sample variance
s1 = sample 1 standard deviation
s2 = sample 2 standard deviation
t = t statistic
df = degrees of freedom
𝑑= sample mean of differences
d0 = hypothesized population mean
difference
sd = standard deviation of
differences
Formula
Two-sample z-test
𝒛=
𝑡=
(z is the distance from the mean in relation
to the standard deviation of the mean).
For non-normal distributions it is
possible to calculate a minimum
proportion of a population that falls
within k standard deviations for any k
(see: Chebyshev's inequality).
𝒙𝟏 − 𝒙𝟐 − 𝒅𝟎
𝝈𝟏 𝟐 𝝈𝟐 𝟐
𝒏𝟏 + 𝒏𝟐
𝒙𝟏 − 𝒙𝟐 − 𝒅𝟎
𝑆𝑝
Two-sample pooled
t-test, equal
variances*
Assumptions or notes
(Normal population or n > 30) and σ
known.
𝑥 − 𝜇0
𝑧= 𝜎
(
)
𝑛
One-sample z-test
𝟏
𝟏
+
𝒏𝟏 𝒏𝟐
2
𝑝= x/n = sample proportion, unless
specified otherwise
p0 = hypothesized population proportion
p1 = proportion 1
p2 = proportion 2
dp = hypothesized difference in proportion
min{n1,n2} = minimum of n1 and n2
x1 = n1p1
x2 = n2p2
χ2 = Chi-squared statistic
F = F statistic
Normal population and independent
observations and σ1 and σ2 are known
,
ℎ𝑒𝑟𝑒, 𝑆𝑝
𝑛1 − 1 𝑆1 2 + 𝑛2 − 1 𝑆2 2
=
𝑛1 + 𝑛2 − 2
and 𝐷𝑓 = 𝑛1 + 𝑛2 − 2
(Normal populations or n1 + n2 > 40) and
independent observations and σ1 = σ2
and σ1 and σ2 unknown
We
Md. Mazharul Islam (Jony)
ID no:091541, 3rd Batch.
Department of Finance.
Jagannath University.
Website: http://jagannath.academia.edu/jony007ex ; Email: jony007ex@gmail.com ; Phone:+88 01198150195
Department of Finance
Jagannath University
Md. Mazharul Islam Jony
21 | P a g e