This document provides an overview of basic statistical concepts including populations, samples, parameters, statistics, and sampling methods. It defines key terms like population, sample, parameter, statistic, and discusses sampling methods like simple random sampling and stratified sampling. It also covers sampling variability, estimation, hypothesis testing, prediction, and issues around representative vs non-representative samples.
Here are the responses to the questions:
1. A statistical population is the entire set of individuals or objects of interest. A sample is a subset of the population selected to represent the population. The sample infers information about the characteristics, attributes, and properties of the entire population.
2. Variance is the average of the squared deviations from the mean. It is calculated as the sum of the squared deviations from the mean divided by the number of values in the data set minus 1. Standard deviation is the square root of the variance. It measures how far data values spread out from the mean.
3. No data was provided to create graphs. Additional data on the number of fish in each age group would be needed.
This document proposes a conceptual model for automatically matching individuals with health researchers for research studies using electronic medical record data. The model involves selecting relevant medical measurements for a "candidate" research participant, filtering individuals based on rules, reducing the data dimensions using principal component analysis, and calculating similarity between individuals' medical data using similarity coefficients. A simulation applies the model to a medical data set and demonstrates that it can significantly reduce the data needed to automatically match individuals for health research.
This document discusses different types of sampling methods used in research. It begins by defining key terms like target population, sample, and sampling frame. It then covers different sampling techniques including probability sampling methods like simple random sampling, stratified random sampling, and cluster sampling as well as non-probability sampling methods. For each method, it provides examples and discusses their advantages and disadvantages for representing populations. The document aims to help medical students understand how to select appropriate sampling methods based on their research questions.
The document provides an overview of populations, samples, and key concepts in descriptive statistics. It discusses how samples are used to make inferences about populations. Key points include:
- Samples are subsets of populations used for study due to constraints on time and resources.
- Descriptive statistics like means, medians, and histograms are calculated from samples to learn about characteristics of interest in populations.
- Categorical data can be summarized using frequency distributions and sample proportions.
- Different measures of center like the mean, median, and trimmed mean are used to summarize data, with the choice dependent on factors like outliers and distribution shape.
Good Science Essay Topics. Essay on Science and Technology Science and Techn...Kimberly Pulley
This essay discusses the rediscovery of penicillin by Alexander Fleming in 1928 after initially discovering it by accident in 1922. It describes how Fleming noticed a mold growing in one of his culture dishes that was preventing the growth of the staphylococci bacteria. While he documented his observation, he did not pursue developing penicillin at the time. The essay then outlines how Howard Florey and Ernst Chain helped revive interest in penicillin in the late 1930s and worked to extract and concentrate the active ingredients from the mold. Their work culminated in the first patient being treated with penicillin in 1941.
In 1922, Scottish scientist
Statistics is the collection, organization, analysis, interpretation and presentation of data. It deals with both descriptive statistics, which summarize and describe data, and inferential statistics, which are used to draw conclusions about populations based on sample data. The key aspects of statistics discussed in the document are:
- Populations and samples
- Parameters and statistics
- Quantitative and qualitative variables
- Levels of measurement including nominal, ordinal, interval and ratio scales
- Types of data including primary and secondary data
The document discusses sampling methods and statistical inference. It defines key terms like population, sample, sampling frame. It describes different sampling techniques including random sampling methods like simple random sampling and systematic sampling. It also covers non-random sampling techniques like quota sampling and convenience sampling. The minimum sample size is calculated using a standard formula. Statistical inference is defined as using a sample to make conclusions about the larger population. The key difference between a sample and population is also highlighted.
This document provides an overview of basic statistical concepts for medical research. It discusses different types of data including categorical and numerical data, and how to describe both types of data through summary statistics and graphical displays. Specific topics covered include determining data types, summarizing categorical variables with tables and percentages, summarizing numerical variables with measures of center like mean and median and measures of spread like standard deviation and interquartile range, and using histograms and boxplots for graphical summaries. The goal is to assist researchers in interpreting statistics and communicating with biostatisticians.
The document discusses the structure and components of scientific articles. It states that the typical structure includes IMRAD sections - Introduction, Methods, Results, and Discussion. It may also include Conclusions, References, and Acknowledgements. The structure is essentially always the same with the title, summary and keywords preceding the main text. The document also provides examples of formatting for titles, keywords, tables, and figures. It emphasizes writing each section of a scientific article in the appropriate tense to clearly convey the information.
The document provides a review of key concepts from chapters 1 and 2 including:
1) Definitions of scientific terms such as mass vs weight, qualitative vs quantitative data, and scientific law vs theory.
2) Identification of the six steps of the scientific method and an explanation of significant figures and scientific notation.
3) Practice problems involving conversions between standard and scientific notation, significant figures, densities, percent errors, and graph types.
This document provides an introduction to biostatistics. It defines biostatistics and explains its importance in biomedical research. Some key points covered include:
- Biostatistics is the application of statistics to medicine and health sciences. It involves the collection, organization, and analysis of numerical data.
- Understanding biostatistics is important for medical research, updating medical knowledge, and managing data and treatment.
- The document outlines the basic concepts of biostatistics like population and sample, and the different types of data. It also describes the typical steps involved in a research project and how biostatistics can be applied.
Federalism Essay | Essay on Federalism for Students and Children in .... Federalist #10 in Plain-English. 015 The Federalist Collection Of Essays Written In Favour New .... Who wrote the federalists essays. The federalist papers essay 9. Introduction to The Federalist | Teaching American History. Federalist Paper No. 9 Summary. Essay on the federalist papers. Federalist Number 10 by James Madison Essay Example | Topics and Well ....
This document summarizes John List's introduction to field experiments. It discusses:
1) Different approaches to measurement, including naturally-occurring data and controlled experiments.
2) Types of experiments like lab experiments, field experiments, and natural experiments.
3) Some design insights for field experiments, including sample size considerations for binary treatments, unequal variances, treatment intensity, and clustering.
Applications of Computer Science in Environmental ModelsIJLT EMAS
Computation is now regarded as an equal and
indispensable partner, along with theory and experiment, in the
advance of scientific knowledge and engineering practice.
Numerical simulation enables the study of complex systems and
natural phenomena that would be too expensive or dangerous, or
even impossible, to study by direct experimentation. The quest
for ever higher levels of detail and realism in such simulations
requires enormous computational capacity, and has provided the
impetus for dramatic breakthroughs in computer algorithms and
architectures. Due to these advances, computational scientists
and engineers can now solve large-scale problems that were once
thought intractable. Computational science and engineering
(CSE) is a rapidly growing multidisciplinary area with
connections to the sciences, engineering, and mathematics and
computer science. CSE focuses on the development of problemsolving
methodologies and robust tools for the solution of
scientific and engineering problems. We believe that CSE will
play an important if not dominating role for the future of the
scientific discovery process and engineering design. The
computation science is now being used widely for environmental
engineering calculations. The behavior of environmental
engineering systems and processes can be studied with the help
of computation science and understanding as well as better
solutions to environmental engineering problems can be
obtained.
Frictional resistance in self ligating orthodontic brackets and conventionall...VARADARAJU MAGESH
This document summarizes a systematic review that compares the frictional resistance between self-ligating brackets and conventionally ligated brackets based on in vitro studies. A total of 19 studies met the selection criteria for inclusion in the review. The review found that self-ligating brackets produce lower friction than conventional brackets when used with small round archwires in the absence of tipping or torque in an ideally aligned arch. However, the evidence was insufficient to claim that self-ligating brackets produce lower friction than conventional brackets when used with large rectangular wires that involve tipping and/or torque or in arches with malocclusion. The variability in experimental methods among the selected studies may explain the inconsistent results.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 1
Module 1: Chapters 1-3
Chapter 1: Introduction to Statistics.
Chapter 2: Exploring Data with Tables and Graphs.
Chapter 3: Describing, Exploring, and Comparing Data.
Quotes For College Essays. This is How You Write a College Essay College app...Mimi Williams
Using Quotes in an Essay: Ultimate Beginners Guide - How to write an .... 25 Good Quotes for College Essays. Quotes For A College Essay. QuotesGram. Using Quotes In College Essays. QuotesGram. College Essay Quotes. QuotesGram. Quotes about College essays 30 quotes. Quotes For College Essays. QuotesGram. Writing A College Entrance Essay With Quotes - iWriteEssays. 21 Killer GRE Essay Quotes You Should Be Using Right Now - CrunchPrep .... This is How You Write a College Essay College application essay .... My first day at college essay with quotes. Beginning an academic essay with a quote. 11 College essay ideas inspirational quotes, me quotes, words Quotes For College Essays Quotes For College Essays. This is How You Write a College Essay College application essay ...
Cases studies 3 & 4 – primary care a 47 year-old male patisodhi3
Patient A, a 47-year-old male, presented with painful defecation and a visible anal fissure. Laboratory tests were positive for Treponema and non-Treponema. Patient B, the 45-year-old husband of Patient A, had a painless genital ulcer and tested positive for Treponema. Both patients denied sexual encounters outside their relationship. Their symptoms and laboratory results indicate they have sexually transmitted infections.
The document discusses transportation problems and assignment problems in operations research. It provides:
1) An overview of transportation problems, including the mathematical formulation to minimize transportation costs while meeting supply and demand constraints.
2) Methods for obtaining initial basic feasible solutions to transportation problems, such as the North-West Corner Rule and Vogel's Approximation Method.
3) Techniques for moving towards an optimal solution, including determining net evaluations and selecting entering variables.
4) The formulation and algorithm for solving assignment problems to minimize assignment costs while ensuring each job is assigned to exactly one machine.
The document discusses time and space complexity analysis of algorithms. Time complexity measures the number of steps to solve a problem based on input size, with common orders being O(log n), O(n), O(n log n), O(n^2). Space complexity measures memory usage, which can be reused unlike time. Big O notation describes asymptotic growth rates to compare algorithm efficiencies, with constant O(1) being best and exponential O(c^n) being worst.
The oc curve_of_attribute_acceptance_plansAnkit Katiyar
The document discusses operating characteristic (OC) curves, which describe the probability of accepting a lot based on the lot's quality level. The typical OC curve has an S-shape, with the probability of acceptance decreasing as the percent of nonconforming items increases. Sampling plans can approach the ideal step-function OC curve as the sample size and acceptance number increase. Specific points on the OC curve correspond to acceptance quality limits and rejection quality limits.
This document discusses conceptual problems in statistics, testing, and experimentation in cognitive psychology. It identifies three main sources of variability in psychological data: (1) participant interest and motivation, (2) individual differences, and (3) potentially stochastic cognitive mechanisms. Addressing this variability poses challenges for developing normative and descriptive models of cognition and for making inferences from group-level data to individuals. The document also discusses approaches like individual differences research and modeling heterogeneous groups to help address these challenges.
The document summarizes key concepts about queuing systems and simple queuing models. It discusses:
1) Components of a queuing system including the arrival process, service mechanism, and queue discipline.
2) Performance measures for queuing systems such as average delay, waiting time, and number of customers.
3) The M/M/1 queuing model where arrivals and service times follow exponential distributions with a single server. Expressions are given for performance measures in this model.
4) How limiting the queue length to a finite number affects performance measures compared to an infinite queue system.
Scatter diagrams and correlation and simple linear regresssionAnkit Katiyar
The document discusses scatter diagrams, correlation, and linear regression. It defines key terms like predictor and response variables, positively and negatively associated variables, and the correlation coefficient. It also describes how to calculate the linear correlation coefficient and interpret it. The document shows an example of using least squares regression to fit a line to productivity and experience data. It provides formulas to calculate the slope and intercept of the regression line and how to make predictions with the line. However, predictions should stay within the scope of the observed data used to fit the model.
This document provides an introduction to queueing theory, covering basic concepts from probability theory used in queueing models like random variables, generating functions, and common probability distributions. It then discusses fundamental queueing models and relations, including Kendall's notation for describing queueing systems and Little's Law relating average queue length and waiting time. Specific queueing models are analyzed like the M/M/1, M/M/c, M/Er/1, M/G/1, and G/M/1 queues.
This document provides an introduction to queueing theory. It discusses key concepts such as random variables, probability distributions, performance measures, Little's law and the PASTA property. It then examines several common queueing models including the M/M/1, M/M/c, M/Er/1, M/G/1 and G/M/1 queues. For each model it derives the equilibrium distribution and discusses measures like mean queue length and waiting time. The goal is to provide the fundamental mathematical techniques for analyzing queueing systems.
This document provides an introduction to queueing theory. It discusses key concepts such as random variables, probability distributions, performance measures, Little's law and the PASTA property. It then examines several common queueing models including the M/M/1, M/M/c, M/Er/1, M/G/1 and G/M/1 queues. For each model it derives the equilibrium distribution and discusses measures like mean queue length and waiting time. The goal is to give an overview of basic queueing theory concepts and common single-server and multi-server queues.
Probability mass functions and probability density functionsAnkit Katiyar
This document discusses probability mass functions (pmf) and probability density functions (pdf) for discrete and continuous random variables. A pmf fX(x) gives the probability of a discrete random variable X taking on the value x. A pdf fX(x) defines the probability that a continuous random variable X falls within an interval via its cumulative distribution function FX(x). The pdf must be non-negative and have an area/sum of 1 under the curve/over all x values.
This document discusses histograms and stem-and-leaf plots for analyzing and visualizing the distribution of a single set of numerical data. It provides examples using yearly precipitation data from New York City to demonstrate how to create histograms and stem-and-leaf plots in R. Histograms partition data into bins to show the frequency or relative frequency of observations in each bin, while stem-and-leaf plots list the "stems" and "leaves" of values to show their distribution.
This document discusses inventory management for multiple items and locations. It introduces the concepts of:
1) Setting aggregate inventory policies to meet system-wide objectives when dealing with multiple items and locations.
2) Using exchange curves to analyze the tradeoffs between total inventory levels and other factors like number of replenishments and service levels. These curves allow setting parameters like order costs and carrying costs.
3) Determining optimal reorder quantities, cycle stock, and safety stock levels across an inventory system using techniques like exchange curves. This helps allocate limited inventory budgets across items to maximize performance.
The document summarizes the economic production quantity (EPQ) model and its extensions. It discusses:
1) The EPQ model balances fixed ordering costs and inventory holding costs to determine optimal production/order quantities and intervals.
2) The economic order quantity (EOQ) model is a special case where production rate is infinite and demand is met through ordering.
3) Sensitivity analysis shows how the optimal solutions change with different parameters like production rate and setup costs.
The Kano Model classifies customer needs into three categories - threshold, performance, and excitement - based on their effect on customer satisfaction. Threshold attributes are basic needs whose absence causes dissatisfaction. Performance attributes directly improve satisfaction as implementation increases. Excitement attributes unexpectedly delight customers when implemented. The model is useful for identifying needs, setting requirements, concept development, and analyzing competitors to maximize performance attributes while including excitement attributes.
This document provides an overview of basic probability and statistics concepts. It covers variables, descriptive statistics like mean and standard deviation, frequency distributions through histograms, the normal distribution, linear regression, and includes a practice test in the appendices. Key topics are qualitative and quantitative data, parameters versus statistics, measures of central tendency and dispersion, and generating frequency tables and histograms from data sets.
Conceptual foundations statistics and probabilityAnkit Katiyar
This document provides guidance for a 6th grade statistics and probability unit of study. It outlines key concepts students should understand, including developing questions that anticipate variability, understanding data distributions in terms of center, spread and shape, and summarizing and describing distributions using various graphs such as dot plots, histograms and box plots. Students learn to analyze subgroups within data sets and how to match statistical questions to the appropriate graph. The document emphasizes interpreting and constructing dot plots, histograms and box plots to display and analyze numerical data.
The document outlines 5 axioms of probability:
1) Probabilities are non-negative
2) Probabilities of mutually exclusive events add
3) The probability of the sample space is 1
It then proves 5 theorems about probability:
1) The probability of an event equals 1 minus the probability of its complement
2) The probability of the impossible event (the empty set) is 0
3) The probability of a subset is less than or equal to the probability of the larger set it is contained within
4) A probability is between 0 and 1
5) The addition law - for two events the probability of their union equals the sum of their probabilities minus the probability of their intersection
Applied statistics and probability for engineers solution montgomery && rungerAnkit Katiyar
This document is the copyright page and preface for the book "Applied Statistics and Probability for Engineers" by Douglas C. Montgomery and George C. Runger. The copyright is held by John Wiley & Sons, Inc. in 2003. This book was edited, designed, and produced by various teams at John Wiley & Sons and printed by Donnelley/Willard. The preface states that the purpose of the included Student Solutions Manual is to provide additional help for students in understanding the problem-solving processes presented in the main text.
A hand kano-model-boston_upa_may-12-2004Ankit Katiyar
This document introduces the Kano Model, a framework used to classify product features based on their impact on customer satisfaction. It explains that some features are "basic" and expected, while others provide linear satisfaction proportional to quality or performance. Some "excitement" features unexpectedly delight customers. The document outlines a process to apply the Kano Model to user experience design including researching customer needs, analyzing data, plotting features on the Kano diagram, and strategizing priorities with clients. It provides an example workshop applying the model to a fictional business and discusses extending the model with personas and use cases.
This document is Leigh Slauson's dissertation on students' conceptual understanding of variability. It investigates how students understand two measures of variability - standard deviation and standard error. Two introductory statistics classes were taught, one with traditional lecture and one with hands-on active learning labs. Both classes took a pre-test and post-test to assess understanding. The analysis found that students in the active class improved their understanding of standard deviation concepts, but not standard error concepts. Interviews suggested understanding connections between data distributions and measures of variability is important for standard error. Further research is needed on students' prior knowledge of sampling distributions and the role of probability concepts.
1. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
Lecture 1
Chapter 1: Basic Statistical Concepts
M. George Akritas
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
2. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
Representative Samples
Simple Random and Stratified Sampling
Sampling With and Without Replacement
Non-representative Sampling
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
3. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
Example (Examples of Engineering/Scientific Studies)
Comparing the compressive strength of two or more cement
mixtures.
Comparing the effectiveness of three cleaning products in
removing four different types of stains.
Predicting failure time on the basis of stress applied.
Assessing the effectiveness of a new traffic regulatory measure
in reducing the weekly rate of accidents.
Testing a manufacturer’s claim regarding a product’s quality.
Studying the relation between salary increases and employee
productivity in a large corporation.
What makes these studies challenging (and thus to require
Statistics) is the inherent or intrinsic variability:
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
4. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
The compressive strength of different preparations of the same
cement mixture will differ. The figure in http://sites.
stat.psu.edu/~mga/401/fig/HistComprStrCement.pdf
shows 32 compressive strength measurements, in MPa
(MegaPascal units), of test cylinders 6 in. in diameter by 12
in. high, using water/cement ratio of 0.4, measured on the
28th day after they are made.
Under the same stress, two beams will fail at different times.
The proportion of defective items of a certain product will
differ from batch to batch.
Intrinsic variability renders the objectives of the case studies, as
stated, ambiguous.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
5. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
The objectives of the case studies can be made precise if stated in
terms of averages or means.
Comparing the average hardness of two different cement
mixtures.
Predicting the average failure time on the basis of stress
applied.
Estimation of the average coefficient of thermal expansion.
Estimation of the average proportion of defective items.
Moreover, because of variability, the words ”average” and ”mean”
have a technical meaning which can be made clear through the
concepts of population and sample.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
6. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
Definition
Population is a well-defined collection of objects or subjects, of
relevance to a particular study, which are exposed to the same
treatment or method. Population members are called units.
Example (Examples of populations:)
All water samples that can be taken from a lake.
All items of a certain manufactured product.
All students enrolled in Big Ten universities during the
2007-08 academic year.
Two types of cleaning products. (Each type corresponds to a
population.)
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
7. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
The objective of a study is to investigate certain characteristic(s)
of the units of the population(s) of interest.
Example (Examples of characteristics:)
All water samples taken from a lake. Characteristics: Mercury
concentration; Concentration of other pollutants.
All items of a certain manufactured product (that have, or will
be produced). Characteristic: Proportion of defective items.
All students enrolled in Big Ten universities during the
2007-08 academic year. Characteristics: Favorite type of
music; Political affiliation.
Two types of cleaning products. Characteristic: cleaning
effectiveness.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
8. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
In the example where different (but of the same type) beams
are exposed to different stress levels:
the characteristic of interest is time to failure of a beam under
each stress level, and
each stress level used in the study corresponds to a separate
population which consists of all beams that will be exposed to
that stress level.
This emphasizes that populations are defined not only by the
units they consist of, but also by the method or treatment
applied to these units.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
9. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
Full (i.e. population-level) understanding of a characteristic
requires the examination of all population units, i.e. a census.
For example, full understanding of the relation between salary
and productivity of a corporation’s employees requires
obtaining these two characteristics from all employees.
However,
taking a census can be time consuming and expensive: The
2000 U.S. Census costed $6.5 billion, while the 2010 Census
costed $13 billion.
Moreover, census is not feasible if the population is
hypothetical or conceptual, i.e. not all members are
available for examination.
Because of the above, we typically settle for examining all
units in a sample, which is a subset of the population.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
10. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
Due to the intrinsic variability, the sample properties/attributes of
the characteristic of interest will differ from those of the
population. For example
The average mercury concentration in 25 water samples will
differ from the overall mercury concentration in the lake.
The proportion in a sample of 100 PSU students who favor
the use of solar energy will differ from the corresponding
proportion of all PSU students.
The relation between bear’s chest girth and weight in a
sample of 10 bears, will differ from the corresponding relation
in the entire population of 50 bears in a forested region.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
11. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
The GOOD NEWS is that, if the sample is suitably drawn, then
sample properties approximate the population properties.
400
300
Weight
200
100
20 25 30 35 40 45 50 55
Chest Girth
Figure: Population and sample relationships 1between Basic Statistical Concepts
M. George Akritas Lecture Chapter 1:
chest girth and
12. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
Sampling Variability
Samples properties of the characteristic of interest also differ
from sample to sample. For example:
1. The number of US citizens, in a sample of size 20, who favor
expanding solar energy, will (most likely) be different from the
corresponding number in a different sample of 20 US citizens.
2. The average mercury concentration in two sets of 25 water
samples drawn from a lake will differ.
The term sampling variability is used to describe such
differences in the characteristic of interest from sample to
sample.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
13. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
400
300
Weight
200
100
20 25 30 35 40 45 50 55
Chest Girth
Figure: Illustration of Sampling Variability.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
14. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
Population level properties/attributes of characteristic(s) of
interest are called (population) parameters.
Examples of parameters include averages, proportions,
percentiles, and correlation coefficient.
The corresponding sample properties/attributes of
characteristics are called statistics. The term sports statistics
comes from this terminology.
Sample statistics approximate the corresponding population
parameters but are not equal to them.
Statistical inference deals with the uncertainty issues which
arise in approximating parameters by statistics.
The tools of statistical inference include point and interval
estimation, hypothesis testing and prediction.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
15. Outline
Why Statistics?
Populations, Samples, and Census
Some Sampling Concepts
Example (Examples of Estimation, Hypothesis Testing and
Prediction)
Estimation (point and interval) would be used in the task of
estimating the coefficient of thermal expansion of a metal, or
the air pollution level.
Hypothesis testing would be used for deciding whether to take
corrective action to bring the air pollution level down, or
whether a manufacturer’s claim regarding the quality of a
product is false.
Prediction arises in cases where we would like to predict the
failure time on the basis of the stress applied, or the age of a
tree on the basis of its trunk diameter.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
16. Outline Representative Samples
Why Statistics? Simple Random and Stratified Sampling
Populations, Samples, and Census Sampling With and Without Replacement
Some Sampling Concepts Non-representative Sampling
For valid statistical inference the sample must be
representative of the population. For example, a sample of
PSU basketball players is not representative of PSU students,
if the characteristic of interest is height.
Typically it is hard to tell whether a sample is representative
of the population. So, we define a sample to be representative
if . . . (cyclical definition!!)
it allows for valid statistical inference.
The only guarantee for that comes from the method used to
select the sample (sampling method).
The good news is that there are several sampling methods
guarantee representativeness.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
17. Outline Representative Samples
Why Statistics? Simple Random and Stratified Sampling
Populations, Samples, and Census Sampling With and Without Replacement
Some Sampling Concepts Non-representative Sampling
Definition
A sample of size n is a simple random sample if the selection
process ensures that every sample of size n has equal chance of
being selected.
To select a s.r.s. of size 10 from a population of 100 units, any
of the 100!/(10!90!) samples of size 10 must be equally likely.
In simple random sampling every member of the population
has the same chance of being included in the sample. The
reverse, however, is not true.
Example
To select a sample of 2 students from a population of 20 male and
20 female students, one selects at random one male and one
female students. Is this a s.r.s.? (Does every student have the
same chance of being included in the sample?)
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
18. Outline Representative Samples
Why Statistics? Simple Random and Stratified Sampling
Populations, Samples, and Census Sampling With and Without Replacement
Some Sampling Concepts Non-representative Sampling
Another sampling method for obtaining a representative sample is
called stratified sampling.
Definition
A stratified sample consists of simple random samples from each
of a number of groups (which are non-overlapping and make up
the entire population) called strata.
Examples of strata include: ethnic groups, age groups, and
production facilities.
If the units in the different strata differ in terms of the
characteristic under study, stratified sampling is preferable to
s.r.s. For example, if different production facilities differ in
terms of the proportion of defective products, a stratified
sample is preferable.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
19. Outline Representative Samples
Why Statistics? Simple Random and Stratified Sampling
Populations, Samples, and Census Sampling With and Without Replacement
Some Sampling Concepts Non-representative Sampling
How do we select a s.r.s. of size n from a population of N units?
STEP 1: Assign to each unit a number from 1 to N.
STEP 2: Write each number on a slips of paper, place the N
slips of paper in an urn, and shuffle them.
STEP 3: Select n slips of paper at random, one at a time.
Alternatively, the entire process can be performed in software like
R. We will see this in the next lab session.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
20. Outline Representative Samples
Why Statistics? Simple Random and Stratified Sampling
Populations, Samples, and Census Sampling With and Without Replacement
Some Sampling Concepts Non-representative Sampling
Sampling without replacement simply means that a
population unit can be included in a sample at most once. For
example, a simple random sample is obtained by sampling
without replacement: Once a unit’s slip of paper is drawn, it
is not placed back into the urn.
Sampling with replacement means that after a unit’s slip of
paper is chosen, it is put back in the urn. Thus a population
unit could be included in the sample anywhere between 0 and
n times. Rolling a die can be thought of as sampling with
replacement from the numbers 1, 2, . . . , 6.
Though conceptually undesirable, sampling with replacement
is easier to work with from a mathematical point of view.
When a population is very large, sampling with and without
replacement are practically equivalent.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
21. Outline Representative Samples
Why Statistics? Simple Random and Stratified Sampling
Populations, Samples, and Census Sampling With and Without Replacement
Some Sampling Concepts Non-representative Sampling
Non-representative samples arise whenever the sampling plan
is such that a part, or parts, of the population of interest are
either excluded from, or systematically under-represented in,
the sample. This is called selection bias.
Two examples of non-representative samples are self-selected
and convenience samples.
A self-selected sample often occurs when people are asked to
send in their opinions in surveys or questionnaires. For
example, in a political survey, often those who feel that things
are running smoothly or who support an incumbent will
(apathetically) not respond, whereas those activists who
strongly desire change will voice their opinions.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
22. Outline Representative Samples
Why Statistics? Simple Random and Stratified Sampling
Populations, Samples, and Census Sampling With and Without Replacement
Some Sampling Concepts Non-representative Sampling
A convenience sample is a sample made up from units that
are most easily reached. For example, randomly selecting
students from your classes will not result in a sample that is
representative of all PSU students because your classes are
mostly comprised of students with the same major as you.
A famous example of selection bias is the following.
Example (The Literary Digest poll of 1936)
The magazine had been extremely successful in predicting the
results in US presidential elections, but in 1936 it predicted a
3-to-2 victory for Republican Alf Landon over the Democratic
incumbent Franklin Delano Roosevelt. Worth noting is that this
prediction was based on 2.3 million responses (out of 10 million
questionnaires sent). On the other hand Gallup correctly predicted
the outcome of that election by surveying only 50,000 people.
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
23. Outline Representative Samples
Why Statistics? Simple Random and Stratified Sampling
Populations, Samples, and Census Sampling With and Without Replacement
Some Sampling Concepts Non-representative Sampling
Go to next lesson http://www.stat.psu.edu/~mga/401/
course.info/b.lect2.pdf
Go to the Stat 401 home page
http://www.stat.psu.edu/~mga/401/course.info/
http://www.stat.psu.edu/~mga
http://www.google.com
M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts