Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
                  Why Statistics?
 Populations, Samples, and Census
         Some Sampling Concepts

             Lecture 1
Chapter 1: Basic Statistical Concepts

                      M. George Akritas

                M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                           Why Statistics?
          Populations, Samples, and Census
                  Some Sampling Concepts

Why Statistics?

Populations, Samples, and Census

Some Sampling Concepts
   Representative Samples
   Simple Random and Stratified Sampling
   Sampling With and Without Replacement
   Non-representative Sampling

                         M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                           Why Statistics?
          Populations, Samples, and Census
                  Some Sampling Concepts

Example (Examples of Engineering/Scientific Studies)
    Comparing the compressive strength of two or more cement
    Comparing the effectiveness of three cleaning products in
    removing four different types of stains.
    Predicting failure time on the basis of stress applied.
    Assessing the effectiveness of a new traffic regulatory measure
    in reducing the weekly rate of accidents.
    Testing a manufacturer’s claim regarding a product’s quality.
    Studying the relation between salary increases and employee
    productivity in a large corporation.

What makes these studies challenging (and thus to require
Statistics) is the inherent or intrinsic variability:

                         M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                             Why Statistics?
            Populations, Samples, and Census
                    Some Sampling Concepts

     The compressive strength of different preparations of the same
     cement mixture will differ. The figure in http://sites.
     shows 32 compressive strength measurements, in MPa
     (MegaPascal units), of test cylinders 6 in. in diameter by 12
     in. high, using water/cement ratio of 0.4, measured on the
     28th day after they are made.
     Under the same stress, two beams will fail at different times.
     The proportion of defective items of a certain product will
     differ from batch to batch.

Intrinsic variability renders the objectives of the case studies, as
stated, ambiguous.

                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                            Why Statistics?
           Populations, Samples, and Census
                   Some Sampling Concepts

The objectives of the case studies can be made precise if stated in
terms of averages or means.

    Comparing the average hardness of two different cement
    Predicting the average failure time on the basis of stress
    Estimation of the average coefficient of thermal expansion.
    Estimation of the average proportion of defective items.

Moreover, because of variability, the words ”average” and ”mean”
have a technical meaning which can be made clear through the
concepts of population and sample.

                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                             Why Statistics?
            Populations, Samples, and Census
                    Some Sampling Concepts

Population is a well-defined collection of objects or subjects, of
relevance to a particular study, which are exposed to the same
treatment or method. Population members are called units.

Example (Examples of populations:)

    All water samples that can be taken from a lake.
    All items of a certain manufactured product.
    All students enrolled in Big Ten universities during the
    2007-08 academic year.
    Two types of cleaning products. (Each type corresponds to a

                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                            Why Statistics?
           Populations, Samples, and Census
                   Some Sampling Concepts

The objective of a study is to investigate certain characteristic(s)
of the units of the population(s) of interest.

Example (Examples of characteristics:)

    All water samples taken from a lake. Characteristics: Mercury
    concentration; Concentration of other pollutants.
    All items of a certain manufactured product (that have, or will
    be produced). Characteristic: Proportion of defective items.
    All students enrolled in Big Ten universities during the
    2007-08 academic year. Characteristics: Favorite type of
    music; Political affiliation.
    Two types of cleaning products. Characteristic: cleaning

                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                       Why Statistics?
      Populations, Samples, and Census
              Some Sampling Concepts

In the example where different (but of the same type) beams
are exposed to different stress levels:
    the characteristic of interest is time to failure of a beam under
    each stress level, and
    each stress level used in the study corresponds to a separate
    population which consists of all beams that will be exposed to
    that stress level.
This emphasizes that populations are defined not only by the
units they consist of, but also by the method or treatment
applied to these units.

                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                       Why Statistics?
      Populations, Samples, and Census
              Some Sampling Concepts

Full (i.e. population-level) understanding of a characteristic
requires the examination of all population units, i.e. a census.

    For example, full understanding of the relation between salary
    and productivity of a corporation’s employees requires
    obtaining these two characteristics from all employees.
    taking a census can be time consuming and expensive: The
    2000 U.S. Census costed $6.5 billion, while the 2010 Census
    costed $13 billion.
    Moreover, census is not feasible if the population is
    hypothetical or conceptual, i.e. not all members are
    available for examination.
Because of the above, we typically settle for examining all
units in a sample, which is a subset of the population.

                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                            Why Statistics?
           Populations, Samples, and Census
                   Some Sampling Concepts

Due to the intrinsic variability, the sample properties/attributes of
the characteristic of interest will differ from those of the
population. For example

     The average mercury concentration in 25 water samples will
     differ from the overall mercury concentration in the lake.
     The proportion in a sample of 100 PSU students who favor
     the use of solar energy will differ from the corresponding
     proportion of all PSU students.
     The relation between bear’s chest girth and weight in a
     sample of 10 bears, will differ from the corresponding relation
     in the entire population of 50 bears in a forested region.

                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                              Why Statistics?
             Populations, Samples, and Census
                     Some Sampling Concepts

The GOOD NEWS is that, if the sample is suitably drawn, then
sample properties approximate the population properties.



                             20   25   30        35        40   45   50   55

                                                 Chest Girth

Figure: Population and sample relationships 1between Basic Statistical Concepts
                      M. George Akritas Lecture Chapter 1:
                                                           chest girth and
                              Why Statistics?
             Populations, Samples, and Census
                     Some Sampling Concepts

Sampling Variability

       Samples properties of the characteristic of interest also differ
       from sample to sample. For example:
        1. The number of US citizens, in a sample of size 20, who favor
           expanding solar energy, will (most likely) be different from the
           corresponding number in a different sample of 20 US citizens.
        2. The average mercury concentration in two sets of 25 water
           samples drawn from a lake will differ.
       The term sampling variability is used to describe such
       differences in the characteristic of interest from sample to

                            M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                 Why Statistics?
Populations, Samples, and Census
        Some Sampling Concepts



               20      25     30        35        40    45     50      55

                                        Chest Girth

         Figure: Illustration of Sampling Variability.

                    M. George Akritas         Lecture 1 Chapter 1: Basic Statistical Concepts
                       Why Statistics?
      Populations, Samples, and Census
              Some Sampling Concepts

Population level properties/attributes of characteristic(s) of
interest are called (population) parameters.
     Examples of parameters include averages, proportions,
     percentiles, and correlation coefficient.
The corresponding sample properties/attributes of
characteristics are called statistics. The term sports statistics
comes from this terminology.
Sample statistics approximate the corresponding population
parameters but are not equal to them.
Statistical inference deals with the uncertainty issues which
arise in approximating parameters by statistics.
The tools of statistical inference include point and interval
estimation, hypothesis testing and prediction.

                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
                           Why Statistics?
          Populations, Samples, and Census
                  Some Sampling Concepts

Example (Examples of Estimation, Hypothesis Testing and

    Estimation (point and interval) would be used in the task of
    estimating the coefficient of thermal expansion of a metal, or
    the air pollution level.
    Hypothesis testing would be used for deciding whether to take
    corrective action to bring the air pollution level down, or
    whether a manufacturer’s claim regarding the quality of a
    product is false.
    Prediction arises in cases where we would like to predict the
    failure time on the basis of the stress applied, or the age of a
    tree on the basis of its trunk diameter.

                         M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                       Why Statistics?    Simple Random and Stratified Sampling
      Populations, Samples, and Census    Sampling With and Without Replacement
              Some Sampling Concepts      Non-representative Sampling

For valid statistical inference the sample must be
representative of the population. For example, a sample of
PSU basketball players is not representative of PSU students,
if the characteristic of interest is height.
Typically it is hard to tell whether a sample is representative
of the population. So, we define a sample to be representative
if . . . (cyclical definition!!)

           it allows for valid statistical inference.

The only guarantee for that comes from the method used to
select the sample (sampling method).
The good news is that there are several sampling methods
guarantee representativeness.

                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                             Why Statistics?    Simple Random and Stratified Sampling
            Populations, Samples, and Census    Sampling With and Without Replacement
                    Some Sampling Concepts      Non-representative Sampling

A sample of size n is a simple random sample if the selection
process ensures that every sample of size n has equal chance of
being selected.
    To select a s.r.s. of size 10 from a population of 100 units, any
    of the 100!/(10!90!) samples of size 10 must be equally likely.
    In simple random sampling every member of the population
    has the same chance of being included in the sample. The
    reverse, however, is not true.

To select a sample of 2 students from a population of 20 male and
20 female students, one selects at random one male and one
female students. Is this a s.r.s.? (Does every student have the
same chance of being included in the sample?)
                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                             Why Statistics?    Simple Random and Stratified Sampling
            Populations, Samples, and Census    Sampling With and Without Replacement
                    Some Sampling Concepts      Non-representative Sampling

Another sampling method for obtaining a representative sample is
called stratified sampling.

A stratified sample consists of simple random samples from each
of a number of groups (which are non-overlapping and make up
the entire population) called strata.

    Examples of strata include: ethnic groups, age groups, and
    production facilities.
    If the units in the different strata differ in terms of the
    characteristic under study, stratified sampling is preferable to
    s.r.s. For example, if different production facilities differ in
    terms of the proportion of defective products, a stratified
    sample is preferable.

                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                            Why Statistics?    Simple Random and Stratified Sampling
           Populations, Samples, and Census    Sampling With and Without Replacement
                   Some Sampling Concepts      Non-representative Sampling

How do we select a s.r.s. of size n from a population of N units?
    STEP 1: Assign to each unit a number from 1 to N.
    STEP 2: Write each number on a slips of paper, place the N
    slips of paper in an urn, and shuffle them.
    STEP 3: Select n slips of paper at random, one at a time.
Alternatively, the entire process can be performed in software like
R. We will see this in the next lab session.

                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                       Why Statistics?    Simple Random and Stratified Sampling
      Populations, Samples, and Census    Sampling With and Without Replacement
              Some Sampling Concepts      Non-representative Sampling

Sampling without replacement simply means that a
population unit can be included in a sample at most once. For
example, a simple random sample is obtained by sampling
without replacement: Once a unit’s slip of paper is drawn, it
is not placed back into the urn.
Sampling with replacement means that after a unit’s slip of
paper is chosen, it is put back in the urn. Thus a population
unit could be included in the sample anywhere between 0 and
n times. Rolling a die can be thought of as sampling with
replacement from the numbers 1, 2, . . . , 6.
Though conceptually undesirable, sampling with replacement
is easier to work with from a mathematical point of view.
When a population is very large, sampling with and without
replacement are practically equivalent.

                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                       Why Statistics?    Simple Random and Stratified Sampling
      Populations, Samples, and Census    Sampling With and Without Replacement
              Some Sampling Concepts      Non-representative Sampling

Non-representative samples arise whenever the sampling plan
is such that a part, or parts, of the population of interest are
either excluded from, or systematically under-represented in,
the sample. This is called selection bias.
Two examples of non-representative samples are self-selected
and convenience samples.
A self-selected sample often occurs when people are asked to
send in their opinions in surveys or questionnaires. For
example, in a political survey, often those who feel that things
are running smoothly or who support an incumbent will
(apathetically) not respond, whereas those activists who
strongly desire change will voice their opinions.

                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                            Why Statistics?    Simple Random and Stratified Sampling
           Populations, Samples, and Census    Sampling With and Without Replacement
                   Some Sampling Concepts      Non-representative Sampling

    A convenience sample is a sample made up from units that
    are most easily reached. For example, randomly selecting
    students from your classes will not result in a sample that is
    representative of all PSU students because your classes are
    mostly comprised of students with the same major as you.
    A famous example of selection bias is the following.

Example (The Literary Digest poll of 1936)
The magazine had been extremely successful in predicting the
results in US presidential elections, but in 1936 it predicted a
3-to-2 victory for Republican Alf Landon over the Democratic
incumbent Franklin Delano Roosevelt. Worth noting is that this
prediction was based on 2.3 million responses (out of 10 million
questionnaires sent). On the other hand Gallup correctly predicted
the outcome of that election by surveying only 50,000 people.
                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                      Why Statistics?    Simple Random and Stratified Sampling
     Populations, Samples, and Census    Sampling With and Without Replacement
             Some Sampling Concepts      Non-representative Sampling

Go to next lesson http://www.stat.psu.edu/~mga/401/
Go to the Stat 401 home page

                    M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts

More Related Content

Similar to B.lect1

Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
Erik D. Davenport
Berman pcori challenge document
Berman pcori challenge documentBerman pcori challenge document
Berman pcori challenge document
Lew Berman
Samples Types and Methods
Samples Types and Methods Samples Types and Methods
Samples Types and Methods
Tarek Tawfik Amin
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdf
Good Science Essay Topics. Essay on Science and Technology Science and Techn...
Good Science Essay Topics. Essay on Science and Technology  Science and Techn...Good Science Essay Topics. Essay on Science and Technology  Science and Techn...
Good Science Essay Topics. Essay on Science and Technology Science and Techn...
Kimberly Pulley
Sqqs1013 ch1-a122
Sqqs1013 ch1-a122Sqqs1013 ch1-a122
Sqqs1013 ch1-a122
kim rae KI
Ch 4 SAMPLE..doc
Ch 4 SAMPLE..docCh 4 SAMPLE..doc
Ch 4 SAMPLE..doc
Lecture 1 basic concepts2009
Lecture 1 basic concepts2009Lecture 1 basic concepts2009
Lecture 1 basic concepts2009
barath r baskaran
Acadamic Writing AricleWS.pptx
Acadamic Writing AricleWS.pptxAcadamic Writing AricleWS.pptx
Acadamic Writing AricleWS.pptx
Ch 1 and 2 test review
Ch 1 and 2 test reviewCh 1 and 2 test review
Ch 1 and 2 test review
Esther Herrera
Lecture 1
Lecture 1Lecture 1
Lecture 1
Internal examination 3rd semester disaster
Internal examination 3rd semester disasterInternal examination 3rd semester disaster
Internal examination 3rd semester disaster
Mahendra Poudel
Federalists Essays.pdf
Federalists Essays.pdfFederalists Essays.pdf
Federalists Essays.pdf
Melanie Mendoza
Nber Lecture Final
Nber Lecture FinalNber Lecture Final
Nber Lecture Final
Applications of Computer Science in Environmental Models
Applications of Computer Science in Environmental ModelsApplications of Computer Science in Environmental Models
Applications of Computer Science in Environmental Models
Frictional resistance in self ligating orthodontic brackets and conventionall...
Frictional resistance in self ligating orthodontic brackets and conventionall...Frictional resistance in self ligating orthodontic brackets and conventionall...
Frictional resistance in self ligating orthodontic brackets and conventionall...
Practice Test 1 solutions
Practice Test 1 solutions  Practice Test 1 solutions
Practice Test 1 solutions
Long Beach City College
Quotes For College Essays. This is How You Write a College Essay College app...
Quotes For College Essays. This is How You Write a College Essay  College app...Quotes For College Essays. This is How You Write a College Essay  College app...
Quotes For College Essays. This is How You Write a College Essay College app...
Mimi Williams
Cases studies 3 & 4 – primary care a 47 year-old male pati
Cases studies 3 & 4 – primary care a 47 year-old male patiCases studies 3 & 4 – primary care a 47 year-old male pati
Cases studies 3 & 4 – primary care a 47 year-old male pati

Similar to B.lect1 (20)

Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
Berman pcori challenge document
Berman pcori challenge documentBerman pcori challenge document
Berman pcori challenge document
Samples Types and Methods
Samples Types and Methods Samples Types and Methods
Samples Types and Methods
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdf
Good Science Essay Topics. Essay on Science and Technology Science and Techn...
Good Science Essay Topics. Essay on Science and Technology  Science and Techn...Good Science Essay Topics. Essay on Science and Technology  Science and Techn...
Good Science Essay Topics. Essay on Science and Technology Science and Techn...
Sqqs1013 ch1-a122
Sqqs1013 ch1-a122Sqqs1013 ch1-a122
Sqqs1013 ch1-a122
Ch 4 SAMPLE..doc
Ch 4 SAMPLE..docCh 4 SAMPLE..doc
Ch 4 SAMPLE..doc
Lecture 1 basic concepts2009
Lecture 1 basic concepts2009Lecture 1 basic concepts2009
Lecture 1 basic concepts2009
Acadamic Writing AricleWS.pptx
Acadamic Writing AricleWS.pptxAcadamic Writing AricleWS.pptx
Acadamic Writing AricleWS.pptx
Ch 1 and 2 test review
Ch 1 and 2 test reviewCh 1 and 2 test review
Ch 1 and 2 test review
Lecture 1
Lecture 1Lecture 1
Lecture 1
Internal examination 3rd semester disaster
Internal examination 3rd semester disasterInternal examination 3rd semester disaster
Internal examination 3rd semester disaster
Federalists Essays.pdf
Federalists Essays.pdfFederalists Essays.pdf
Federalists Essays.pdf
Nber Lecture Final
Nber Lecture FinalNber Lecture Final
Nber Lecture Final
Applications of Computer Science in Environmental Models
Applications of Computer Science in Environmental ModelsApplications of Computer Science in Environmental Models
Applications of Computer Science in Environmental Models
Frictional resistance in self ligating orthodontic brackets and conventionall...
Frictional resistance in self ligating orthodontic brackets and conventionall...Frictional resistance in self ligating orthodontic brackets and conventionall...
Frictional resistance in self ligating orthodontic brackets and conventionall...
Practice Test 1 solutions
Practice Test 1 solutions  Practice Test 1 solutions
Practice Test 1 solutions
Quotes For College Essays. This is How You Write a College Essay College app...
Quotes For College Essays. This is How You Write a College Essay  College app...Quotes For College Essays. This is How You Write a College Essay  College app...
Quotes For College Essays. This is How You Write a College Essay College app...
Cases studies 3 & 4 – primary care a 47 year-old male pati
Cases studies 3 & 4 – primary care a 47 year-old male patiCases studies 3 & 4 – primary care a 47 year-old male pati
Cases studies 3 & 4 – primary care a 47 year-old male pati

More from Ankit Katiyar

Transportation and assignment_problem
Transportation and assignment_problemTransportation and assignment_problem
Transportation and assignment_problem
Ankit Katiyar
Time and space complexity
Time and space complexityTime and space complexity
Time and space complexity
Ankit Katiyar
The oc curve_of_attribute_acceptance_plans
The oc curve_of_attribute_acceptance_plansThe oc curve_of_attribute_acceptance_plans
The oc curve_of_attribute_acceptance_plans
Ankit Katiyar
Stat methchapter
Stat methchapterStat methchapter
Stat methchapter
Ankit Katiyar
Simple queuingmodelspdf
Simple queuingmodelspdfSimple queuingmodelspdf
Simple queuingmodelspdf
Ankit Katiyar
Scatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionScatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssion
Ankit Katiyar
Queueing 3
Queueing 3Queueing 3
Queueing 3
Ankit Katiyar
Queueing 2
Queueing 2Queueing 2
Queueing 2
Ankit Katiyar
Ankit Katiyar
Probability mass functions and probability density functions
Probability mass functions and probability density functionsProbability mass functions and probability density functions
Probability mass functions and probability density functions
Ankit Katiyar
Ankit Katiyar
Lect 02
Lect 02Lect 02
Lect 02
Ankit Katiyar
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
Ankit Katiyar
Conceptual foundations statistics and probability
Conceptual foundations   statistics and probabilityConceptual foundations   statistics and probability
Conceptual foundations statistics and probability
Ankit Katiyar
Applied statistics and probability for engineers solution montgomery && runger
Applied statistics and probability for engineers solution   montgomery && rungerApplied statistics and probability for engineers solution   montgomery && runger
Applied statistics and probability for engineers solution montgomery && runger
Ankit Katiyar
A hand kano-model-boston_upa_may-12-2004
A hand kano-model-boston_upa_may-12-2004A hand kano-model-boston_upa_may-12-2004
A hand kano-model-boston_upa_may-12-2004
Ankit Katiyar
Ankit Katiyar

More from Ankit Katiyar (20)

Transportation and assignment_problem
Transportation and assignment_problemTransportation and assignment_problem
Transportation and assignment_problem
Time and space complexity
Time and space complexityTime and space complexity
Time and space complexity
The oc curve_of_attribute_acceptance_plans
The oc curve_of_attribute_acceptance_plansThe oc curve_of_attribute_acceptance_plans
The oc curve_of_attribute_acceptance_plans
Stat methchapter
Stat methchapterStat methchapter
Stat methchapter
Simple queuingmodelspdf
Simple queuingmodelspdfSimple queuingmodelspdf
Simple queuingmodelspdf
Scatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionScatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssion
Queueing 3
Queueing 3Queueing 3
Queueing 3
Queueing 2
Queueing 2Queueing 2
Queueing 2
Probability mass functions and probability density functions
Probability mass functions and probability density functionsProbability mass functions and probability density functions
Probability mass functions and probability density functions
Lect 02
Lect 02Lect 02
Lect 02
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
Conceptual foundations statistics and probability
Conceptual foundations   statistics and probabilityConceptual foundations   statistics and probability
Conceptual foundations statistics and probability
Applied statistics and probability for engineers solution montgomery && runger
Applied statistics and probability for engineers solution   montgomery && rungerApplied statistics and probability for engineers solution   montgomery && runger
Applied statistics and probability for engineers solution montgomery && runger
A hand kano-model-boston_upa_may-12-2004
A hand kano-model-boston_upa_may-12-2004A hand kano-model-boston_upa_may-12-2004
A hand kano-model-boston_upa_may-12-2004


  • 1. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Lecture 1 Chapter 1: Basic Statistical Concepts M. George Akritas M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 2. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Why Statistics? Populations, Samples, and Census Some Sampling Concepts Representative Samples Simple Random and Stratified Sampling Sampling With and Without Replacement Non-representative Sampling M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 3. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Example (Examples of Engineering/Scientific Studies) Comparing the compressive strength of two or more cement mixtures. Comparing the effectiveness of three cleaning products in removing four different types of stains. Predicting failure time on the basis of stress applied. Assessing the effectiveness of a new traffic regulatory measure in reducing the weekly rate of accidents. Testing a manufacturer’s claim regarding a product’s quality. Studying the relation between salary increases and employee productivity in a large corporation. What makes these studies challenging (and thus to require Statistics) is the inherent or intrinsic variability: M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 4. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The compressive strength of different preparations of the same cement mixture will differ. The figure in http://sites. stat.psu.edu/~mga/401/fig/HistComprStrCement.pdf shows 32 compressive strength measurements, in MPa (MegaPascal units), of test cylinders 6 in. in diameter by 12 in. high, using water/cement ratio of 0.4, measured on the 28th day after they are made. Under the same stress, two beams will fail at different times. The proportion of defective items of a certain product will differ from batch to batch. Intrinsic variability renders the objectives of the case studies, as stated, ambiguous. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 5. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The objectives of the case studies can be made precise if stated in terms of averages or means. Comparing the average hardness of two different cement mixtures. Predicting the average failure time on the basis of stress applied. Estimation of the average coefficient of thermal expansion. Estimation of the average proportion of defective items. Moreover, because of variability, the words ”average” and ”mean” have a technical meaning which can be made clear through the concepts of population and sample. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 6. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Definition Population is a well-defined collection of objects or subjects, of relevance to a particular study, which are exposed to the same treatment or method. Population members are called units. Example (Examples of populations:) All water samples that can be taken from a lake. All items of a certain manufactured product. All students enrolled in Big Ten universities during the 2007-08 academic year. Two types of cleaning products. (Each type corresponds to a population.) M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 7. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The objective of a study is to investigate certain characteristic(s) of the units of the population(s) of interest. Example (Examples of characteristics:) All water samples taken from a lake. Characteristics: Mercury concentration; Concentration of other pollutants. All items of a certain manufactured product (that have, or will be produced). Characteristic: Proportion of defective items. All students enrolled in Big Ten universities during the 2007-08 academic year. Characteristics: Favorite type of music; Political affiliation. Two types of cleaning products. Characteristic: cleaning effectiveness. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 8. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts In the example where different (but of the same type) beams are exposed to different stress levels: the characteristic of interest is time to failure of a beam under each stress level, and each stress level used in the study corresponds to a separate population which consists of all beams that will be exposed to that stress level. This emphasizes that populations are defined not only by the units they consist of, but also by the method or treatment applied to these units. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 9. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Full (i.e. population-level) understanding of a characteristic requires the examination of all population units, i.e. a census. For example, full understanding of the relation between salary and productivity of a corporation’s employees requires obtaining these two characteristics from all employees. However, taking a census can be time consuming and expensive: The 2000 U.S. Census costed $6.5 billion, while the 2010 Census costed $13 billion. Moreover, census is not feasible if the population is hypothetical or conceptual, i.e. not all members are available for examination. Because of the above, we typically settle for examining all units in a sample, which is a subset of the population. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 10. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Due to the intrinsic variability, the sample properties/attributes of the characteristic of interest will differ from those of the population. For example The average mercury concentration in 25 water samples will differ from the overall mercury concentration in the lake. The proportion in a sample of 100 PSU students who favor the use of solar energy will differ from the corresponding proportion of all PSU students. The relation between bear’s chest girth and weight in a sample of 10 bears, will differ from the corresponding relation in the entire population of 50 bears in a forested region. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 11. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The GOOD NEWS is that, if the sample is suitably drawn, then sample properties approximate the population properties. 400 300 Weight 200 100 20 25 30 35 40 45 50 55 Chest Girth Figure: Population and sample relationships 1between Basic Statistical Concepts M. George Akritas Lecture Chapter 1: chest girth and
  • 12. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Sampling Variability Samples properties of the characteristic of interest also differ from sample to sample. For example: 1. The number of US citizens, in a sample of size 20, who favor expanding solar energy, will (most likely) be different from the corresponding number in a different sample of 20 US citizens. 2. The average mercury concentration in two sets of 25 water samples drawn from a lake will differ. The term sampling variability is used to describe such differences in the characteristic of interest from sample to sample. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 13. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts 400 300 Weight 200 100 20 25 30 35 40 45 50 55 Chest Girth Figure: Illustration of Sampling Variability. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 14. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Population level properties/attributes of characteristic(s) of interest are called (population) parameters. Examples of parameters include averages, proportions, percentiles, and correlation coefficient. The corresponding sample properties/attributes of characteristics are called statistics. The term sports statistics comes from this terminology. Sample statistics approximate the corresponding population parameters but are not equal to them. Statistical inference deals with the uncertainty issues which arise in approximating parameters by statistics. The tools of statistical inference include point and interval estimation, hypothesis testing and prediction. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 15. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Example (Examples of Estimation, Hypothesis Testing and Prediction) Estimation (point and interval) would be used in the task of estimating the coefficient of thermal expansion of a metal, or the air pollution level. Hypothesis testing would be used for deciding whether to take corrective action to bring the air pollution level down, or whether a manufacturer’s claim regarding the quality of a product is false. Prediction arises in cases where we would like to predict the failure time on the basis of the stress applied, or the age of a tree on the basis of its trunk diameter. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 16. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling For valid statistical inference the sample must be representative of the population. For example, a sample of PSU basketball players is not representative of PSU students, if the characteristic of interest is height. Typically it is hard to tell whether a sample is representative of the population. So, we define a sample to be representative if . . . (cyclical definition!!) it allows for valid statistical inference. The only guarantee for that comes from the method used to select the sample (sampling method). The good news is that there are several sampling methods guarantee representativeness. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 17. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Definition A sample of size n is a simple random sample if the selection process ensures that every sample of size n has equal chance of being selected. To select a s.r.s. of size 10 from a population of 100 units, any of the 100!/(10!90!) samples of size 10 must be equally likely. In simple random sampling every member of the population has the same chance of being included in the sample. The reverse, however, is not true. Example To select a sample of 2 students from a population of 20 male and 20 female students, one selects at random one male and one female students. Is this a s.r.s.? (Does every student have the same chance of being included in the sample?) M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 18. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Another sampling method for obtaining a representative sample is called stratified sampling. Definition A stratified sample consists of simple random samples from each of a number of groups (which are non-overlapping and make up the entire population) called strata. Examples of strata include: ethnic groups, age groups, and production facilities. If the units in the different strata differ in terms of the characteristic under study, stratified sampling is preferable to s.r.s. For example, if different production facilities differ in terms of the proportion of defective products, a stratified sample is preferable. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 19. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling How do we select a s.r.s. of size n from a population of N units? STEP 1: Assign to each unit a number from 1 to N. STEP 2: Write each number on a slips of paper, place the N slips of paper in an urn, and shuffle them. STEP 3: Select n slips of paper at random, one at a time. Alternatively, the entire process can be performed in software like R. We will see this in the next lab session. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 20. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Sampling without replacement simply means that a population unit can be included in a sample at most once. For example, a simple random sample is obtained by sampling without replacement: Once a unit’s slip of paper is drawn, it is not placed back into the urn. Sampling with replacement means that after a unit’s slip of paper is chosen, it is put back in the urn. Thus a population unit could be included in the sample anywhere between 0 and n times. Rolling a die can be thought of as sampling with replacement from the numbers 1, 2, . . . , 6. Though conceptually undesirable, sampling with replacement is easier to work with from a mathematical point of view. When a population is very large, sampling with and without replacement are practically equivalent. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 21. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Non-representative samples arise whenever the sampling plan is such that a part, or parts, of the population of interest are either excluded from, or systematically under-represented in, the sample. This is called selection bias. Two examples of non-representative samples are self-selected and convenience samples. A self-selected sample often occurs when people are asked to send in their opinions in surveys or questionnaires. For example, in a political survey, often those who feel that things are running smoothly or who support an incumbent will (apathetically) not respond, whereas those activists who strongly desire change will voice their opinions. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 22. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling A convenience sample is a sample made up from units that are most easily reached. For example, randomly selecting students from your classes will not result in a sample that is representative of all PSU students because your classes are mostly comprised of students with the same major as you. A famous example of selection bias is the following. Example (The Literary Digest poll of 1936) The magazine had been extremely successful in predicting the results in US presidential elections, but in 1936 it predicted a 3-to-2 victory for Republican Alf Landon over the Democratic incumbent Franklin Delano Roosevelt. Worth noting is that this prediction was based on 2.3 million responses (out of 10 million questionnaires sent). On the other hand Gallup correctly predicted the outcome of that election by surveying only 50,000 people. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 23. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Go to next lesson http://www.stat.psu.edu/~mga/401/ course.info/b.lect2.pdf Go to the Stat 401 home page http://www.stat.psu.edu/~mga/401/course.info/ http://www.stat.psu.edu/~mga http://www.google.com M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts