Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
307 views

FIN 640 - Lecture Notes 4 - Sampling and Estimation

This document discusses the topics of Week 4 in a statistics course: 1. It introduces different sampling methods like simple random sampling, systematic sampling, and stratified random sampling. 2. It covers the distribution of the sample mean and the central limit theorem. It also discusses point estimates, interval estimates, and how to calculate confidence intervals for the population mean. 3. It discusses potential sources of sampling bias like data-mining bias and sample selection bias. Maintaining a representative sample is important for drawing accurate statistical conclusions about a population.

Uploaded by

Vipul
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
307 views

FIN 640 - Lecture Notes 4 - Sampling and Estimation

This document discusses the topics of Week 4 in a statistics course: 1. It introduces different sampling methods like simple random sampling, systematic sampling, and stratified random sampling. 2. It covers the distribution of the sample mean and the central limit theorem. It also discusses point estimates, interval estimates, and how to calculate confidence intervals for the population mean. 3. It discusses potential sources of sampling bias like data-mining bias and sample selection bias. Maintaining a representative sample is important for drawing accurate statistical conclusions about a population.

Uploaded by

Vipul
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Week 4: Sampling and Estimation

PROF. MICHAEL DONG


CALIFORNIA STATE UNIVERSITY LONG BEACH

FALL 2020
1. Sampling

Road Map 2. Distribution of The Sample Mean

3. Point and Interval Estimates

4. Sampling Bias
1. Sampling

3
Simple Random Sampling
 When an analyst chooses to sample, he must formulate a sampling
plan. A sampling plan is the set of rules used to select a sample. The
basic type of sample from which we can draw statistically sound
conclusions about a population is the simple random sample
(random sample, for short).
 A simple random sample is a subset of a larger population created
in such a way that each element of the population has an equal
probability of being selected to the subset.
 The procedure of drawing a sample to satisfy the definition of a
simple random sample is called simple random sampling.
Systematic sampling
 With systematic sampling, we select every kth member until we
have a sample of the desired size. The sample that results from this
procedure should be approximately random. real sampling
situations may require that we take an approximately random
sample.
Sampling Error
 Sampling error is the difference between the observed value of a
statistic and the quantity it is intended to estimate.
Sampling distribution
 The sampling distribution of a statistic is the distribution of all the
distinct possible values that the statistic can assume when
computed from samples of the same size randomly drawn from the
same population.

 For a population size of 10, and we pick a sample of 4, how many


possible samples are there?
Stratified Random Sampling
 In stratified random sampling, the population is divided into
subpopulations (strata) based on one or more classification criteria.
Simple random samples are then drawn from each stratum in sizes
proportional to the relative size of each stratum in the population.
These samples are then pooled to form a stratified ran- dom
sample.
Stratified Random Sampling – Bond
Indexing
Stratified Random Sampling – Bond
Indexing
Time-Series and cross-Sectional data
 Cross-sectional data are data on some characteristic of individuals,
groups, geographical regions, or companies at a single point in
time.
 Time-series data are data on some characteristic of one
individuals, groups, geographical regions, or companies over time.
Also called Longitudinal data.
 If both dimensions exist, we call it Panel Data. Panel data consist of
observations through time on a single characteristic of multiple
observational units.
2. Distribution of The Sample Mean

12
The Central Limit Theorem
The Central Limit Theorem
 Let’s recall the example we used in estimating mean stock returns
for IBM.
Standard Error of the Sample Mean
 For sample mean X calculated from a sample generated by a
population with standard deviation σ, the standard error of the
sample mean is given by one of two expressions.
 When we know σ, the population standard deviation:

 When we do not know the population standard deviation and need


to use the sample standard deviation, s, to estimate it:
What we do in practice in estimating standard
errors
 in practice, we almost always need to use equation 2. The estimate
of s is given by the square root of the sample variance, s2,
calculated as follows:

 We will soon see how we can use the sample mean and its
standard error to make probability statements about the
population mean by using the technique of confidence intervals.
3. Point and Interval Estimates

17
We care most about the population mean
 So we use estimators calculated from the sample to estimate the
population mean
 The formulas that we use to compute the sample mean and all the
other sample statistics are examples of estimation formulas or
estimators.
 The particular value that we calculate from sample observations
using an estimator is called an estimate.
Point Estimate
 To take the example of the mean, the calculated value of the
sample mean in a given sample, used as an estimate of the
population mean, is called a point estimate of the population
mean.
 In many applications, we have a choice among a number of
possible estimators for estimating a given parameter. how do we
make our choice? We often select estimators because they have
one or more desirable statistical properties. Following is a brief
description of three desirable properties of estimators:
unbiasedness (lack of bias), efficiency, and consistency.
Unbiasedness
 An unbiased estimator is one whose expected value (the mean of its
sampling distribution) equals the parameter it is intended to estimate.
 For example, the expected value of the sample mean, X , equals μ, the
population mean, so we say that the sample mean is an unbiased estimator
(of the population mean).
 The sample variance, s2, which is calculated using a divisor of n − 1 (equation
3), is an unbiased estimator of the population variance, σ2. if we were to
calculate the sample variance using a divisor of n, the estimator would be
biased: its expected value would be smaller than the population variance. We
would say that sample variance calculated with a divisor of n is a biased
estimator of the population variance.
Unbiasedness
 Sample mean and sample variance are both unbiased estimators of
the population mean and variance.
Consistency
 A consistent estimator is one for which the probability of estimates
close to the value of the population parameter increases as sample
size increases.
Consistency
 Law of Large Numbers (LLN)
 The weak law of large numbers (also called Khinchin's law) states
that the sample average converges in probability towards the
expected value.

 The strong law of large numbers states that the sample average
converges almost surely to the expected value
Efficiency
 An unbiased estimator is efficient if no other unbiased estimator of
the same parameter has a sampling distribution with smaller
variance.
 Sample mean X is an efficient estimator of the population mean;
sample variance s2 is an efficient estimator of σ2.
Interval Estimate
 Confidence Interval:
 A confidence interval is a range for which one can assert with a given
probability 1 − α, called the degree of confidence, that it will contain the
parameter it is intended to estimate. This interval is often referred to as
the 100(1 − α)% confidence interval for the parameter.
 The endpoints of a confidence interval are referred to as the lower and
upper confidence limits.
Confidence interval estimate
 A 100(1 − α)% confidence interval for a parameter has the
following structure:
Point estimate ± reliability factor × Standard error

 A 100(1 − α)% confidence interval for a parameter has the following


structure
Confidence interval for the Population Mean
(with Known Pop. Variance)
 A 100(1 − α)% confidence interval for population mean μ when we are sampling from
a normal distribution with known variance σ2 is given by

 Reliability Factors for Confidence Intervals Based on the Standard Normal Distribu-
tion. We use the following reliability factors when we construct confidence intervals
based on the standard normal distribution:
 90 percent confidence intervals: use z0.05 = 1.65
 95 percent confidence intervals: use z0.025 = 1.96
 99 percent confidence intervals: use z0.005 = 2.58

 Zα is the value that makes P(Z > Zα) = α


 90 percent confidence intervals: use z0.05 = 1.65
 95 percent confidence intervals: use z0.025 = 1.96
 99 percent confidence intervals: use z0.005 = 2.58

28
29
Confidence interval for the Population Mean
(with Unknown Pop. Variance)
 If we are sampling from a population with unknown variance, then a 100(1 −
α)% confidence interval for the population mean μ is given by

 tα is the value that makes P(t > tα) = α

 For a sample of size n, the t distribution will have n-1 degrees of freedom,
denoted t(n-1)
31
Example
 Suppose an investment analyst takes a random sample of US equity
mutual funds and calculates the average Sharpe ratio. The sample
size is 100, and the average Sharpe ratio is 0.45. The sample has a
standard deviation of 0.30.
 Calculate and interpret the 90 percent confidence interval for the
population mean of all US equity mutual funds.
 Recognizing that the population variance of the distribution of
Sharpe ratios is unknown, the analyst decides to calculate the
confidence interval using the theoretically correct t-statistic.
Example
Selection of Sample Size

Hence, the larger the sample size, the greater precision with which we can estimate
the population parameter.

This might explain why sometimes we don’t observe statistically significant results.
4. Sampling Bias

35
Data-Mining Bias
 Data-mining is the practice of determining a model by extensive
searching through a dataset for statistically significant patterns
(that is, repeatedly “drilling” in the same data until finding
something that appears to work).
 Out-of-sample test
 If we were to just report the significant variables, without also
reporting the total number of variables that we tested that were
unsuccessful as predictors, we would be presenting a very
misleading picture of our findings. – Asset Pricing Tests
Sample Selection Bias
 When data availability leads to certain assets being excluded from
the analysis, we call the resulting problem sample selection bias.
 Funds or companies that are no longer in business do not appear
there. So, a study that uses these types of databases suffers from a
type of sample selection bias known as survivorship bias.
Look-ahead Bias
 A test design is subject to look-ahead bias if it uses information that
was not available on the test date.
 For example, tests of trading rules that use stock market returns
and accounting balance sheet data must account for look-ahead
bias. in such tests, a company’s book value per share is commonly
used to construct the P/B variable. Although the market price of a
stock is available for all market participants at the same point in
time, fiscal year-end book equity per share might not become
publicly available until sometime in the following quarter.
Time-Period Bias
 A test design is subject to time-period bias if it is based on a time
period that may make the results time-period specific.
Q&A

You might also like