0% found this document useful (0 votes)

14 views

Module 4

Uploaded by

Kwasi Appiah

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Module 4

Uploaded by

Kwasi Appiah

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Population, then, is the entire group that is the target of our interest

Data are pieces of information about individuals organized into variables. By an individual, we mean a
particular person or object. By a variable, we mean a particular characteristic of the individual.

A dataset is a set of data identified with particular circumstances. Datasets are typically displayed in
tables, in which rows represent individuals and columns represent variables.

Categorical variables take category or label values and place an individual into one of several groups.
Each observation can be placed in only one category, and the categories are mutually exclusive.

Quantitative variables take numerical values and represent some kind of measurement

It would make sense to average a quantitative variable but not a categorical variable

In order to summarize the distribution of a categorical variable, we first create a table of the different
values (categories) the variable takes, how many times each value occurs (count) and, more importantly,
how often each value occurs (by converting the counts to percentages); this table is called a frequency
distribution.

The pie chart emphasizes how the different categories relate to the whole, and the bar chart emphasizes
how the different categories compare with each other.

When interpreting the distribution there are 4 features we should consider

1. Shape: Modality and Skewdness

2. Center
3. Spread
4. Outliers

The center of the distribution is its midpoint—the value that divides the distribution so that approximately
half the observations take smaller values, and approximately half the observations take larger values.

The spread (also called variability) of the distribution can be described by the approximate range covered
by the data.

Outliers are observations that fall outside the overall pattern.

The stemplot is a simple but useful visual display of quantitative data. Its principal virtues are:

 Easy and quick to construct for small, simple datasets.

 Retains the actual data.

 Sorts (ranks) the data.

Note that when n is odd, the median is not included in either the bottom or top half of the data

An observation is considered a suspected outlier if it is:

 below Q1 - 1.5(IQR) or

 above Q3 + 1.5(IQR)

Even though it is an extreme value, if an outlier can be understood to have been produced by essentially
the same sort of physical or biological process as the rest of the data, and if such extreme values are
expected to eventually occur again, then such an outlier indicates something important and interesting
about the process you're investigating, and it should be kept in the data.
If an outlier can be explained to have been produced under fundamentally different conditions from the
rest of the data (or by a fundamentally different process), such an outlier can be removed from the data if
your goal is to investigate only the process that produced the rest of the data.
An outlier might indicate a mistake in the data (like a typo, or a measuring error), in which case it should
be corrected if possible or else removed from the data before calculating summary statistics or making
inferences from the data (and the reason for the mistake should be investigated).

The standard deviation gives the average (or typical distance) between a data point and the mean

The Standard Deviation Rule:

 Approximately 68% of the observations fall within 1 standard deviation of the mean.

 Approximately 95% of the observations fall within 2 standard deviations of the mean.

 Approximately 99.7% (or virtually all) of the observations fall within 3 standard deviations of the
mean.

A positive (or increasing) relationship means that an increase in one of the variables is associated with
an increase in the other.
A negative (or decreasing) relationship means that an increase in one of the variables is associated with
a decrease in the other.
Properties of the correlation coefficient:

The correlation does not change when the units of measurement of either one of the variables change. In
other words, if we change the units of measurement of the explanatory variable and/or the response
variable, the change has no effect on the correlation (r).

The correlation measures only the strength of a linear relationship between two variables. It ignores any
other type of relationship, no matter how strong it is.

The correlation by itself is not sufficient to determine whether a relationship is linear.

Types of Samples:

1. Volunteer Sample where individuals select themselves to be included

2. Convenience sample where individuals happen to be at the right place at the right time to suit the
schedule of the researcher
3. Systematic sampling where the nth individual in the population is picked to be part of the sample
4. Simple random sampling is where students are completely picked at random and everyone has the
same chance of being picked

A sampling frame is where the sample is drawn from a group which is not the total population

An observational study is one in which values of the variable or va

riables of interest are recorded as they naturally occur. There is no interference by the researchers who
conduct the study.

A sample survey, which is a particular type of observational study in which individuals report variables'
values themselves, frequently by giving their opinions.

Perform an experiment. Instead of assessing the values of the variables as they naturally occur, the
researchers interfere, and they are the ones who assign the values of the explanatory variable to the
individuals. The researchers "take control" of the values of the explanatory variable because they want to
see how changes in the value of the explanatory variable affect the response variable. (Note: By nature,
any experiment involves at least two variables.

In general, we control for the effects of a lurking variable by separately studying groups that are similar
with respect to this variable.

One pitfall of observational studies is lurking variables

The key to establishing causation is to rule out the possibility of any lurking variable, or in other words, to
ensure that individuals differ only with respect to the values of the explanatory variable.

Treatments are the imposed values of the explanatory variable

The groups receiving treatments are called treatment groups

Control groups are groups where no treatment was prescribed

If neither the subjects nor the researchers know who was assigned what treatment, then the experiment is
called double-blind

The most reliable way to determine whether the explanatory variable is actually causing changes in the
response variable is to carry out a randomized controlled double-blind experiment.

Some of the inherent difficulties that may be encountered in experimentation are the Hawthorne effect,
lack of realism, noncompliance, and treatments that are unethical, impossible, or impractical to impose.

This phenomenon, whereby people in an experiment behave differently from how they would normally
behave, is called the Hawthorne effect.

Probability

There are 2 fundamental ways in which we can determine probability:

 Theoretical (also known as Classical)

 Empirical (also known as Observational)

Sample space is the list of possible outcomes of an experiment

One method for determining whether two events are independent is to compare P(B | A) and P(B)

If the two are equal (i.e., knowing or not knowing whether A has occurred has no effect on the probability
of B occurring) then the two events are independent. Otherwise, if the probability changes depending
on whether we know that A has occurred or not, then the two events are not independent. Similarly,
using the same reasoning, we can compare P(A | B) and P(A).
P(B | A) = P(B)

P(A | B) = P(A)

P(B | A) = P(B | not A)

P(A and B) = P(A) * P(B)

General probability rule

P(A and B) = P(A) * P(B | A)

A random variable assigns a unique numerical value to the outcome of a random experiment.

When describing the shape of a scatter plot, we have to consider the direction, form and strength of the
relationship and then outliers

Page 56 is an important page talking about the features of a the correlation coefficient

a lurking variable, by definition, is a variable that was not included in the study, but could have a
substantial effect on our understanding of the relationship between the two studied variables.
Binomial experiments are random experiments that consist of a fixed number of repeated trials, like
tossing a coin 10 times, randomly choosing 10 people, rolling a die 5 times, etc. These trials, however,
need to be independent in the sense that the outcome in one trial has no effect on the outcome in other
trials. In each of these repeated trials there is one outcome that is of interest to us (we call this outcome
"success"), and each of the trials is identical in the sense that the probability that the trial will end in a
"success" is the same in each of the trials.

The random variable X that represents the number of successes in those n trials is called binomial

The number (X) of successes in a sample of size n taken without replacement from a population with
proportion (p) of successes is approximately binomial with n and p as long as the sample size (n) is at
most 10% of the population size (N).

Consider a random experiment that consists of n trials, each one ending up in either success or failure.
The number of possible outcomes in the sample space that have exactly k successes out of n is:

n!/(k!(n−k)!)

25% of data is less that 0.67 standard deviations from the mean

For equally likely outcomes

The Addition Rule

for Disjoint Events: If A and B are disjoint events, then P(A or B) = P(A) + P(B)

We can generalize what we learned in the last example and say that when two individuals are selected at
random from a large population (like in the example, the entire U.S.) any event associated with one
individual is independent of any event associated with the other individual. The fact that the two are
chosen from a large population is key to the independence.

If A and B are two independent events, then P(A and B) = P(A) * P(B).

For any two events A and B, P(A and B) = P(A) * P(B | A)

A parameter is a number that describes the population; a statistic is a number that is computed from the
sample.
The standard deviation of sample proportions is:

the null hypothesis suggests nothing special is going on; in other words, there is no change from the status
quo, no difference from the traditional state of affairs, no relationship

The significance level of the test is the cutoff at which the thing will be surprising if the null hypothesis is
true

 if the p-value < α (usually .05), then the data we got is considered to be "rare (or surprising)
enough" when Ho is true, and we say that the data provide significant evidence against H o, so we
reject Ho and accept Ha.

 if the p-value > α (usually .05), then our data are not considered to be "surprising enough" when
Ho is true, and we say that our data do not provide enough evidence to reject H o (or, equivalently,
that the data do not provide enough evidence to accept Ha).

Gea1000 Finals Cheatsheet
No ratings yet
Gea1000 Finals Cheatsheet
2 pages
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
From Everand
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
Lee Baker
No ratings yet
Port of Long Beach Storm Water Pollution Prevention Plan PDF
100% (1)
Port of Long Beach Storm Water Pollution Prevention Plan PDF
335 pages
Statements That Imply Your Higher Value - Chateau Heartiste PDF
No ratings yet
Statements That Imply Your Higher Value - Chateau Heartiste PDF
68 pages
Eurolab Handbook Iso Iec 17025 2017
No ratings yet
Eurolab Handbook Iso Iec 17025 2017
32 pages
R It by Brooke Ingersoll
No ratings yet
R It by Brooke Ingersoll
16 pages
Rod Acceleration: by Bruce Richards
No ratings yet
Rod Acceleration: by Bruce Richards
4 pages
AP Stats Study Guide
No ratings yet
AP Stats Study Guide
17 pages
Applied Statistics Summary
No ratings yet
Applied Statistics Summary
9 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
1 Introduction To Statistics-MPhil Lecture
No ratings yet
1 Introduction To Statistics-MPhil Lecture
31 pages
STATAPP1
No ratings yet
STATAPP1
11 pages
Statistics Reviewer
No ratings yet
Statistics Reviewer
6 pages
STT 215 Exam 1 Study Guide
No ratings yet
STT 215 Exam 1 Study Guide
2 pages
1data Management Mamw 100
100% (1)
1data Management Mamw 100
84 pages
Statistics.handouts
No ratings yet
Statistics.handouts
14 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Note for Int to Statistics
No ratings yet
Note for Int to Statistics
24 pages
List of Important AP Statistics Concepts To Know
No ratings yet
List of Important AP Statistics Concepts To Know
9 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
27 pages
Statistics 8
No ratings yet
Statistics 8
33 pages
GEA1000 Finals Cheatsheet
No ratings yet
GEA1000 Finals Cheatsheet
2 pages
1 - Basic Concepts
No ratings yet
1 - Basic Concepts
71 pages
Introduction To Biostatistics: Dr. M. H. Rahbar
No ratings yet
Introduction To Biostatistics: Dr. M. H. Rahbar
35 pages
Mba Statistics Midterm Review Sheet
No ratings yet
Mba Statistics Midterm Review Sheet
1 page
Chapter 1: Introduction To Statistics
No ratings yet
Chapter 1: Introduction To Statistics
28 pages
Lesson+1+Introduction+to+Statistics
No ratings yet
Lesson+1+Introduction+to+Statistics
12 pages
GED 23 Worksheet 1-4
No ratings yet
GED 23 Worksheet 1-4
8 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
13 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
7 pages
1 - Intro To Statistics
No ratings yet
1 - Intro To Statistics
11 pages
math_140_final_review_notes (1)
No ratings yet
math_140_final_review_notes (1)
20 pages
Matht Reviewer
No ratings yet
Matht Reviewer
7 pages
Statistics Modules
No ratings yet
Statistics Modules
27 pages
Prelims Biostat
No ratings yet
Prelims Biostat
9 pages
Lecture 1: Introduction To Statistics
No ratings yet
Lecture 1: Introduction To Statistics
23 pages
Prob & Stat
No ratings yet
Prob & Stat
50 pages
Chapter 1 Statistics: Case Study 1.1
No ratings yet
Chapter 1 Statistics: Case Study 1.1
5 pages
To Statistics: Kusumasari KH Darmayanti, M.Si
No ratings yet
To Statistics: Kusumasari KH Darmayanti, M.Si
28 pages
Data Collection and Implementation
No ratings yet
Data Collection and Implementation
55 pages
Chapter 1 - Math 63
No ratings yet
Chapter 1 - Math 63
9 pages
Stat - Lesson 1 Concepts and Definitions
No ratings yet
Stat - Lesson 1 Concepts and Definitions
5 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
42 pages
What Is Statistics?: "Statistics Is A Way To Get Information From Data"
No ratings yet
What Is Statistics?: "Statistics Is A Way To Get Information From Data"
220 pages
QR Midterm Memo
No ratings yet
QR Midterm Memo
2 pages
Statistics
No ratings yet
Statistics
11 pages
Statistics 1
No ratings yet
Statistics 1
3 pages
ENGDAT1 Module1 PDF
No ratings yet
ENGDAT1 Module1 PDF
34 pages
Sta 103 L1 Upda2
No ratings yet
Sta 103 L1 Upda2
104 pages
Chapter 1 - Introduction To Statistics
No ratings yet
Chapter 1 - Introduction To Statistics
6 pages
Principle of Statistics
No ratings yet
Principle of Statistics
108 pages
Introduction Statistics
100% (1)
Introduction Statistics
23 pages
1 Statistics-and-its-bg
No ratings yet
1 Statistics-and-its-bg
42 pages
Statistics Lesson 1(4) (1)
No ratings yet
Statistics Lesson 1(4) (1)
111 pages
Chapter 1: Statistics: Scatterplot
No ratings yet
Chapter 1: Statistics: Scatterplot
30 pages
Psych Assessment Notes 7
No ratings yet
Psych Assessment Notes 7
32 pages
Module One Two One
No ratings yet
Module One Two One
32 pages
Statistics - Shikha Agrawal
No ratings yet
Statistics - Shikha Agrawal
33 pages
Stat Prelim Exam
No ratings yet
Stat Prelim Exam
4 pages
Introductory Statistics For The Behavioral Sciences Presentation - Chapters 1 & 2
No ratings yet
Introductory Statistics For The Behavioral Sciences Presentation - Chapters 1 & 2
37 pages
Statistical Biology - Reviewer
100% (1)
Statistical Biology - Reviewer
6 pages
STATISTICS (Tanya) PG 1 - 28
No ratings yet
STATISTICS (Tanya) PG 1 - 28
35 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Sekolah Kluster Kecemerlangan (SKK) : Prepared By: Sitiliyanabtomar Niknurisyabtmohdkamarolzaman
No ratings yet
Sekolah Kluster Kecemerlangan (SKK) : Prepared By: Sitiliyanabtomar Niknurisyabtmohdkamarolzaman
9 pages
Understanding The Self: Introduction and Module 1 "Getting To Know You: My Home, My School"
No ratings yet
Understanding The Self: Introduction and Module 1 "Getting To Know You: My Home, My School"
10 pages
ФО Англ.яз 3кл
No ratings yet
ФО Англ.яз 3кл
135 pages
Depth of Knowledge
No ratings yet
Depth of Knowledge
7 pages
SICSR MBA-IT 1719 Presentation
No ratings yet
SICSR MBA-IT 1719 Presentation
17 pages
Idioms and Meaning
No ratings yet
Idioms and Meaning
30 pages
Speaking Ideas For TOEFL
No ratings yet
Speaking Ideas For TOEFL
4 pages
Mojico, Bernie - Personality Development - Activity - Assignment - 01
100% (2)
Mojico, Bernie - Personality Development - Activity - Assignment - 01
2 pages
Biology Lab Assistant Manual: From The Department Chair .
No ratings yet
Biology Lab Assistant Manual: From The Department Chair .
6 pages
Weekly Plan Sheet
No ratings yet
Weekly Plan Sheet
25 pages
Solution Manual Adms 2320 PDF
No ratings yet
Solution Manual Adms 2320 PDF
869 pages
JoelGreenbergThirdSuperseding 30mar 2021
No ratings yet
JoelGreenbergThirdSuperseding 30mar 2021
45 pages
Remote Command API Tutorial
No ratings yet
Remote Command API Tutorial
19 pages
Math Lesson Plan Money
No ratings yet
Math Lesson Plan Money
4 pages
Hist 113www Syllabus
No ratings yet
Hist 113www Syllabus
5 pages
How To Improve A Thesis Statement
100% (3)
How To Improve A Thesis Statement
5 pages
4th House Characteristics
No ratings yet
4th House Characteristics
3 pages
Power Systems For The 21st Century - H Gas Turbine Combined Cycles GER3935b
No ratings yet
Power Systems For The 21st Century - H Gas Turbine Combined Cycles GER3935b
22 pages
Pastoral - Rogers PCT vs. Ellis REB Therapy
No ratings yet
Pastoral - Rogers PCT vs. Ellis REB Therapy
5 pages
Stainsfile - 10% Neutral Buffered Formalin
No ratings yet
Stainsfile - 10% Neutral Buffered Formalin
2 pages
IBSAT Model Paper 2
No ratings yet
IBSAT Model Paper 2
22 pages
TSO&ISPF
No ratings yet
TSO&ISPF
13 pages
Roy Sayyodong Gunnawa
No ratings yet
Roy Sayyodong Gunnawa
4 pages
Stand Out of Our Light
100% (1)
Stand Out of Our Light
152 pages
Group 76 SMBP Assignment V2
No ratings yet
Group 76 SMBP Assignment V2
13 pages