Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
THEORY OF
            ESTIMATION
    Estimation Of
      Point,
      Interval and
      Sample Size.



1                        9/3/2012
INTRODUCTION:
     Estimation Theory is a procedure of “guessing”
      properties of the population from which data are
      collected.

     i.e, The objective of estimation is to determine the
      approximate value of a population parameter on the
      basis of a sample statistic.

     An estimator is a rule, usually a formula, that tells
      you how to calculate the estimate based on the
      sample.

2                                                        9/3/2012
PROPERTIES OF GOOD
    ESTIMATORS
     Unbiased: the average value of the estimator equals
      the parameter to be estimated.
     Minimum variance: of all the unbiased estimators,
      the best estimator has a sampling distribution with the
      smallest standard error.




3                                                      9/3/2012
TOPICS TO BE DISCUSSED:
     Point Estimate: A point estimate is a one-
      number summary of data
     Interval Estimation: Two numbers are calculated
      to create an interval within which the parameter is
      expected to lie..
     For example, suppose we want to estimate the
      mean summer income of a class of business
      students.
     Point Estimate:
       For n=25 students, is calculated to be 400 $/week.
     Interval Estimate:
       An alternative statement is:
4      The mean income is between 380 and 420 $/week.
                                                  9/3/2012
Sample Size
     "Sample Size" - is the number of a population
      that will be evaluated as representing the
      entire population, and from which statistics will
      be derived.
     The sample size is an important feature of
      any empirical study in which the goal is to
      make inferences about a population from a
      sample.
     In practice, the sample size used in a study is
      determined based on the expense of data
      collection, and the need to have sufficient
5                                                 9/3/2012
      statistical power .
• The larger the sample, the closer we get to
        the population.
    •   Too large is unethical, because it's wasteful.
    •   Too small is unethical, because the outcome
        will be indecisive.
    •   If you get significance and you’re wrong, it’s a
        false-positive or Type I statistical error.
    •   If you get non-significance and you’re wrong,
        it’s a false negative or Type II statistical
        error.



6                                                      9/3/2012
Factors That Influence Sample Size

    • The "right" sample size for a particular
        application depends on many factors, including
        the following:
    •   Cost considerations (e.g., maximum budget,
        desire to minimize cost).
    •   Administrative concerns (e.g., complexity of
        the design, research deadlines).
    •   Minimum acceptable level of precision.
    •   Confidence level.
    •   Variability within the population or
        subpopulation (e.g., stratum, cluster) of
        interest.
    •   Sampling method.
7                                                  9/3/2012
Ex:
• In a survey sampling involving stratified sampling
  there would be different sample sizes for each
  population. In a census, data are collected on the
  entire population, hence the sample size is equal
  to the population size
 Stratified sample size
• With more complicated sampling techniques, such
  as stratified sampling, the sample can often be
  split up into sub-samples.
• Typically, if there are k such sub-samples (from k
  different strata) then each of them will have a
  sample size ni, i = 1, 2, ..., k. These ni must
8 conform to the rule that n + n + ... + n = n (i.e.
                                                  9/3/2012
                               1     2       k
ESTIMAION OF SAMPLE
    POINT:
     A single number is calculated to estimate the
      parameter.
     A point estimate is obtained by selecting a suitable
      statistic and computing its value from the given
                                               ˆ
      sample data. The selected statistic is called the point
      estimator of θ.        
     A point estimate of an unknown parameter    is a statistic
      that represents a “guess” at the value of .
     Parameters
        In statistical inference, the term parameter is used
         to denote a quantity , say, that is a property of an
         unknown probability distribution.
9       Parameters are unknown, and one of the goals of 9/3/2012
 Example (Machine breakdowns)
        Estimating
         P(machine breakdown due to operator misuse).
      Some general Concepts of Point Estimation:
        Unbiasedness.
        Principle of Minimum Variance.
      Methods of Point Estimation:
        Maximum Likelihood Estimation.
        The Method of Moments.




10                                                  9/3/2012
Point Estimator Of Population
       Mean
     A point estimate of population 
                                    mean                   is the
     sample mean         xi
                    x
                          n
     A sample of weights of 34 male freshman students was obtained.
     185     161     174     175     202     178     202     139      177
     170     151     176     197     214     283     184     189      168
     188     170     207     180     167     177     166     231      176
     184     179     155     148     180     194     176
     If one wanted to estimate the true mean of all male freshman students,
     you might use the sample mean as a point estimate for the true mean.

               sample mean  x  182.44
11                                                                  9/3/2012
BIASED & UNBIASED
                     ˆ
      A point estimate     for a parameter is said to
      be
       unbiasedEifˆ)  
                 (
                                ˆ
      If this equality does not hold,   is said to be a
                 bias  E (ˆ) of
       biased estimator   θ, with




12                                                  9/3/2012
Variance of a Point Estimator




      The sampling distributions of
       two unbiased estimators.
      Of     all    the   unbiased
       estimators, we prefer the
       estimator whose sampling
       distribution has the smallest
       spread or variability.
13                                     9/3/2012
INTERVAL ESTIMATES
      An Estimation of a population      parameter given
       by two numbers between which the parameter may
       be called as an internal estimation of the
       parameter.
      Eg : If we say that a distance is 5.28 feet, we are
       giving a point estimate. If, on the other hand, we
       say that the distance is 5.28 ± 0.03 feet, i.e., the
       distance lies between 5.25 and 5.31 feet, we are
       giving an interval estimate.
      A statement of the error or precision of an
      estimate is often called its reliability.
14                                                  9/3/2012
CONFIDENCE INTERVAL ESTIMATES
     OF POPULATION PARAMETERS
      Let μS and σS be the mean and standard
       deviation of the sampling distribution of a
       statistic S.
      Then, if the sampling distribution of S is
       approximately normal we can expect to find S
       lying in the interval μS . σS to μS + σS, μS . 2σS
       to μS + 2σS or μS . 3σS to μS + 3σS about
       68.27%, 95.45%, and 99.73% of the time,
       respectively.
      We can be con.dent of .nding μS in the intervals
       S. σS to S + σS, S . 2σS to S + 2σS, or S . 3σS
15     to S + 3σS about 68.27%, 95.45%, and 99.73%    9/3/2012
CONFIDENCE LIMITS:
      The end numbers of these intervals (S ± σS, S ± 2 σS, S ±
      3 σS) are then called the 68.37%, 95.45%, and 99.73%
      Confidence Limits.
     CONFIDENCE LEVEL :
      S ± 1.96 σS and S ± 2.58 σS are 95% and 99% (or 0.95
      and0.99) confidence limits for μS. The percentage
      confidence is often called Confidence Level.
     CRITICAL VALUE :
      The numbers 1.96, 2.58, etc., in the confidence
      limits are called Critical Values, and are denoted
      by zC. From confidence levels we can find critical
      values.
16                                                        9/3/2012
Eg:
     we give values of zC corresponding to various
      confidence levels used in practice. For confidence
      levels not presented in the table, the values of zC can
      be found from the normal curve areas under the
      Standard Normal Curve from 0 to z.




           CL   99.7% 99% 98      96%    95.45   95%    90%     80%    68.27
                          %              %                             %
                3.00   2.58 2.3   2.05   2.00    1.96   1.645   1.28   1.00
                            3

17                                                                     9/3/2012
 In cases where a statistic has a sampling distribution
     that is different from the normal distribution,
     appropriate modifications to obtain confidence intervals
     have to be made.
 CONFIDENCE INTERVALS:
  Confidence Intervals for Means
  Confidence Intervals for Proposition
  Confidence Intervals for Differences and Sums.




18                                                    9/3/2012
Confidence Intervals for
     Means :
      We shall see how to create confidence intervals
       for the mean of a population using two different
       cases.
      The first case shall be when we have a Large
       Sample Size (N ≥ 30).
      The second case shall be when we have a
       Smaller Sample (N < 30).
      Then Underlying Population is normal.




19                                                    9/3/2012
Large Samples (n ≥ 30) :
      If the statistic S is the sample mean X, then the
       95% and 99% confidence limits for estimation of
       the population mean μ are given by X ±1.96 σX
       and X ± 2.58 σX, respectively.
      The confidence limits are given by X ± zc σX
       where zc, which depends on the particular level
       of confidence desired.




20                                                   9/3/2012
In case sampling from an infinite
     population or if sampling is done with
     replacement from a finite population,
     and by


     •If sampling is done without replacement
     from a population of finite size N.
     •The population standard deviation σis
     unknown, so that to obtain the above
     confidence limits, we use the estimator
21
     Sˆ or S.                                9/3/2012
Small Samples (n < 30) and
     Population Normal :
     • We use the t distribution to obtain confidence
     levels. For example, if –t0.975 and t0.975 are
     the values of T for which 2.5% of the area lies
     in each tail of the t distribution, then a 95%
     confidence interval for T is given by


     from which we can see that μ can be
     estimated to lie in the interval with 95%
     confidence.

22                                                  9/3/2012
 In general the confidence limits for
      population means are given by
      where the tc values.


     •Sample size is very important! We
     construct different confidence intervals
     based on sample size, so make sure
     we know which procedure to use.


23                                       9/3/2012
Confidence Intervals for
     Proportions :
      The statistic S is the proportion of “successes”
       in a sample of size n ≥ 30 drawn from a
       binomial population in which p is the proportion
       of successes.
      Then the confidence limits for p are given by P
       ± zc σP, where P denotes the proportion of
       success in the sample of size n. Using the
       values of σP obtained, we see that the
       confidence limits for the population proportion
24     are given by                                  9/3/2012
 In case sampling from an infinite population or if
      sampling is with replacement from a finite population.
      Similarly, the confidence limits are if sampling is
      without replacement from a population of finite size
      N.




25                                                      9/3/2012
Confidence Intervals for
     Differences and Sums :

      If S1 and S2 are two sample statistics with
      approximately normal sampling distributions,
      confidence limits for the differences of the
      population parameters corresponding to S1 and
      S2 are given by




26                                                   9/3/2012
while confidence limits for the sum of the
      population parameters are given by provided
      that the samples are independent.




     • Confidence limits for the difference of
     two population means, in the case where
     the populations are infinite and have
     known standard deviations σ1, σ2, are
     given by

27                                              9/3/2012
 Where                are the
      respective means and sizes of the two
      samples drawn from the populations.
      Confidence limits for the difference of two
      population proportions, where the
      populations are infinite, are given by



28                                             9/3/2012
 where P1 and P2 are the two sample proportions and
      n1 and n2 are the sizes of the two samples drawn
      from the populations.
     VARIANCE :
      The variance for the difference of means is the same
      as the variance for the sum of means.




29                                                   9/3/2012
30   9/3/2012

More Related Content

Theory of estimation

  • 1. THEORY OF ESTIMATION Estimation Of Point, Interval and Sample Size. 1 9/3/2012
  • 2. INTRODUCTION:  Estimation Theory is a procedure of “guessing” properties of the population from which data are collected.  i.e, The objective of estimation is to determine the approximate value of a population parameter on the basis of a sample statistic.  An estimator is a rule, usually a formula, that tells you how to calculate the estimate based on the sample. 2 9/3/2012
  • 3. PROPERTIES OF GOOD ESTIMATORS  Unbiased: the average value of the estimator equals the parameter to be estimated.  Minimum variance: of all the unbiased estimators, the best estimator has a sampling distribution with the smallest standard error. 3 9/3/2012
  • 4. TOPICS TO BE DISCUSSED:  Point Estimate: A point estimate is a one- number summary of data  Interval Estimation: Two numbers are calculated to create an interval within which the parameter is expected to lie..  For example, suppose we want to estimate the mean summer income of a class of business students.  Point Estimate:  For n=25 students, is calculated to be 400 $/week.  Interval Estimate:  An alternative statement is: 4  The mean income is between 380 and 420 $/week. 9/3/2012
  • 5. Sample Size  "Sample Size" - is the number of a population that will be evaluated as representing the entire population, and from which statistics will be derived.  The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample.  In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient 5 9/3/2012 statistical power .
  • 6. • The larger the sample, the closer we get to the population. • Too large is unethical, because it's wasteful. • Too small is unethical, because the outcome will be indecisive. • If you get significance and you’re wrong, it’s a false-positive or Type I statistical error. • If you get non-significance and you’re wrong, it’s a false negative or Type II statistical error. 6 9/3/2012
  • 7. Factors That Influence Sample Size • The "right" sample size for a particular application depends on many factors, including the following: • Cost considerations (e.g., maximum budget, desire to minimize cost). • Administrative concerns (e.g., complexity of the design, research deadlines). • Minimum acceptable level of precision. • Confidence level. • Variability within the population or subpopulation (e.g., stratum, cluster) of interest. • Sampling method. 7 9/3/2012
  • 8. Ex: • In a survey sampling involving stratified sampling there would be different sample sizes for each population. In a census, data are collected on the entire population, hence the sample size is equal to the population size  Stratified sample size • With more complicated sampling techniques, such as stratified sampling, the sample can often be split up into sub-samples. • Typically, if there are k such sub-samples (from k different strata) then each of them will have a sample size ni, i = 1, 2, ..., k. These ni must 8 conform to the rule that n + n + ... + n = n (i.e. 9/3/2012 1 2 k
  • 9. ESTIMAION OF SAMPLE POINT:  A single number is calculated to estimate the parameter.  A point estimate is obtained by selecting a suitable statistic and computing its value from the given  ˆ sample data. The selected statistic is called the point estimator of θ.   A point estimate of an unknown parameter is a statistic that represents a “guess” at the value of .  Parameters  In statistical inference, the term parameter is used to denote a quantity , say, that is a property of an unknown probability distribution. 9  Parameters are unknown, and one of the goals of 9/3/2012
  • 10.  Example (Machine breakdowns)  Estimating P(machine breakdown due to operator misuse).  Some general Concepts of Point Estimation:  Unbiasedness.  Principle of Minimum Variance.  Methods of Point Estimation:  Maximum Likelihood Estimation.  The Method of Moments. 10 9/3/2012
  • 11. Point Estimator Of Population Mean A point estimate of population  mean is the sample mean  xi x n A sample of weights of 34 male freshman students was obtained. 185 161 174 175 202 178 202 139 177 170 151 176 197 214 283 184 189 168 188 170 207 180 167 177 166 231 176 184 179 155 148 180 194 176 If one wanted to estimate the true mean of all male freshman students, you might use the sample mean as a point estimate for the true mean. sample mean  x  182.44 11 9/3/2012
  • 12. BIASED & UNBIASED ˆ  A point estimate for a parameter is said to be unbiasedEifˆ)   ( ˆ  If this equality does not hold, is said to be a bias  E (ˆ) of biased estimator   θ, with 12 9/3/2012
  • 13. Variance of a Point Estimator  The sampling distributions of two unbiased estimators.  Of all the unbiased estimators, we prefer the estimator whose sampling distribution has the smallest spread or variability. 13 9/3/2012
  • 14. INTERVAL ESTIMATES  An Estimation of a population parameter given by two numbers between which the parameter may be called as an internal estimation of the parameter.  Eg : If we say that a distance is 5.28 feet, we are giving a point estimate. If, on the other hand, we say that the distance is 5.28 ± 0.03 feet, i.e., the distance lies between 5.25 and 5.31 feet, we are giving an interval estimate.  A statement of the error or precision of an estimate is often called its reliability. 14 9/3/2012
  • 15. CONFIDENCE INTERVAL ESTIMATES OF POPULATION PARAMETERS  Let μS and σS be the mean and standard deviation of the sampling distribution of a statistic S.  Then, if the sampling distribution of S is approximately normal we can expect to find S lying in the interval μS . σS to μS + σS, μS . 2σS to μS + 2σS or μS . 3σS to μS + 3σS about 68.27%, 95.45%, and 99.73% of the time, respectively.  We can be con.dent of .nding μS in the intervals S. σS to S + σS, S . 2σS to S + 2σS, or S . 3σS 15 to S + 3σS about 68.27%, 95.45%, and 99.73% 9/3/2012
  • 16. CONFIDENCE LIMITS:  The end numbers of these intervals (S ± σS, S ± 2 σS, S ± 3 σS) are then called the 68.37%, 95.45%, and 99.73% Confidence Limits. CONFIDENCE LEVEL :  S ± 1.96 σS and S ± 2.58 σS are 95% and 99% (or 0.95 and0.99) confidence limits for μS. The percentage confidence is often called Confidence Level. CRITICAL VALUE :  The numbers 1.96, 2.58, etc., in the confidence limits are called Critical Values, and are denoted by zC. From confidence levels we can find critical values. 16 9/3/2012
  • 17. Eg: we give values of zC corresponding to various confidence levels used in practice. For confidence levels not presented in the table, the values of zC can be found from the normal curve areas under the Standard Normal Curve from 0 to z. CL 99.7% 99% 98 96% 95.45 95% 90% 80% 68.27 % % % 3.00 2.58 2.3 2.05 2.00 1.96 1.645 1.28 1.00 3 17 9/3/2012
  • 18.  In cases where a statistic has a sampling distribution that is different from the normal distribution, appropriate modifications to obtain confidence intervals have to be made. CONFIDENCE INTERVALS:  Confidence Intervals for Means  Confidence Intervals for Proposition  Confidence Intervals for Differences and Sums. 18 9/3/2012
  • 19. Confidence Intervals for Means :  We shall see how to create confidence intervals for the mean of a population using two different cases.  The first case shall be when we have a Large Sample Size (N ≥ 30).  The second case shall be when we have a Smaller Sample (N < 30).  Then Underlying Population is normal. 19 9/3/2012
  • 20. Large Samples (n ≥ 30) :  If the statistic S is the sample mean X, then the 95% and 99% confidence limits for estimation of the population mean μ are given by X ±1.96 σX and X ± 2.58 σX, respectively.  The confidence limits are given by X ± zc σX where zc, which depends on the particular level of confidence desired. 20 9/3/2012
  • 21. In case sampling from an infinite population or if sampling is done with replacement from a finite population, and by •If sampling is done without replacement from a population of finite size N. •The population standard deviation σis unknown, so that to obtain the above confidence limits, we use the estimator 21 Sˆ or S. 9/3/2012
  • 22. Small Samples (n < 30) and Population Normal : • We use the t distribution to obtain confidence levels. For example, if –t0.975 and t0.975 are the values of T for which 2.5% of the area lies in each tail of the t distribution, then a 95% confidence interval for T is given by from which we can see that μ can be estimated to lie in the interval with 95% confidence. 22 9/3/2012
  • 23.  In general the confidence limits for population means are given by where the tc values. •Sample size is very important! We construct different confidence intervals based on sample size, so make sure we know which procedure to use. 23 9/3/2012
  • 24. Confidence Intervals for Proportions :  The statistic S is the proportion of “successes” in a sample of size n ≥ 30 drawn from a binomial population in which p is the proportion of successes.  Then the confidence limits for p are given by P ± zc σP, where P denotes the proportion of success in the sample of size n. Using the values of σP obtained, we see that the confidence limits for the population proportion 24 are given by 9/3/2012
  • 25.  In case sampling from an infinite population or if sampling is with replacement from a finite population. Similarly, the confidence limits are if sampling is without replacement from a population of finite size N. 25 9/3/2012
  • 26. Confidence Intervals for Differences and Sums :  If S1 and S2 are two sample statistics with approximately normal sampling distributions, confidence limits for the differences of the population parameters corresponding to S1 and S2 are given by 26 9/3/2012
  • 27. while confidence limits for the sum of the population parameters are given by provided that the samples are independent. • Confidence limits for the difference of two population means, in the case where the populations are infinite and have known standard deviations σ1, σ2, are given by 27 9/3/2012
  • 28.  Where are the respective means and sizes of the two samples drawn from the populations.  Confidence limits for the difference of two population proportions, where the populations are infinite, are given by 28 9/3/2012
  • 29.  where P1 and P2 are the two sample proportions and n1 and n2 are the sizes of the two samples drawn from the populations. VARIANCE :  The variance for the difference of means is the same as the variance for the sum of means. 29 9/3/2012
  • 30. 30 9/3/2012