0% found this document useful (0 votes)

130 views

Class Test 1 Revision Notes

This document provides a summary of key concepts in descriptive statistics, including: 1) Types of variables (categorical, ordinal, quantitative) and charts (histogram, bar chart) used to visualize data. 2) Pros and cons of different visual displays (box plots, stem-and-leaf plots, dot plots, histograms) for analyzing quantitative data. 3) Measures of central tendency (mean, median, mode), variability (standard deviation, range, interquartile range), and shape (skewness, kurtosis).

Uploaded by

Harry Kwong

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views

Class Test 1 Revision Notes

Uploaded by

Harry Kwong

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Class Test 1 Revision Note

Chapter 1 Descriptive Statics

Types of Variables
- Categorical variables
- Ordinal variables
- Quantitative variables

Charts
- Histogram


- Histogram vs. Bar Chart

 One implication of this distinction: it is always appropriate to talk about the
skewness of a histogram; that is, the tendency of observations to fall more
on the low end or the high end of the x-axis
 With bar charts, however, the x-axis does not have a low end or a high end;
because the labels on the x-axis are categorical – not quantitative. As a
result, it is less appropriate to comment on the skewness of a bar chart.

- Pros and Cons of the Four Visual Displays for Quantitative Variables
 Box plots, stem-and-leaf plots, dot plots, and histograms organize
quantitative data in ways that let us begin to find the information in a data
set.
 As to the question of which type of display is the best, there is no unique
answer.
 The answer depends on what feature of the data may be of interest and, to a
certain degree, on the sample size.
 Box plot
 Strength:
 Give a direct look at central location and spread as it summarizes
the five-number summary.
 Can identify outliers.
 Side-by-side box plot is an excellent tool for comparing two or
more groups
 Weakness:
 Not entirely useful for judging shape.
 Cannot distinguish between bell-shaped or bimodal.
 Stem-and-Leaf plot
 Strength:
 Excellent for sorting data.
 With a sufficient sample size, it can be used to judge shape.
 Weakness:
 With a large sample size, a stem-and-leaf plot may be too
cluttered because the display shows all individual data values.
 More restricted in the choices for “intervals” when compared to
histograms.
 Dot plot
 Strength:
 Can present all individual data values.
 Easy to create.
 Weakness:
 With a large sample size, a dot plot may be too cluttered.
 Histogram
 Strength:
 Excellent for judging the shape of a data set with moderate or
large sample sizes.
 Flexible in choosing number as well as the width of the intervals
for the display.
 Between 6 and 15 intervals usually gives a good picture of the
shape.
 Weakness:
 With a small sample size, a histogram may not “fill in”
sufficiently well to show the shape of the data.
 With either too few intervals or too many, we may not see the true
shape of the data.

- Misleading Graphs
 Statistics can be misleading if not presented appropriately.
 Same data can appear very differently when graphed.
 E.g. break in the vertical axis.
 Frequency on the vertical axis should be continuous from zero.
When we put a break in the axis, we lose proportional relationship
among class interval frequencies.

- Shape of Frequency Distributions

 J-shaped
 Positively skewed
 Negatively skewed
 Rectangular
 Bimodal
 Bell-shaped

Numerical Summaries
- Measures of Central Location: Mean, Mode, Median
 Mean as the Balance Point of a Distribution:
 Unlike the median and the mode, the mean is responsive to the exact
position of each score in the distribution. It is the balance point of a
distribution.
 Median in the Case with Outliers:
 The median is less sensitive than the mean to the presence of a
few extreme scores (outliers)
 Is it permissible to calculate the mean for tests in the behavioral
sciences? First of all, we have to ask ourselves a question: “Is the
measurement on this scale interval or ordinal?” Sometimes it may not
be interval nor ordinal.
 Measures of Variability: Standard Deviation, Range, Interquartile Range
 The standard deviation, like the mean, is responsive to the exact
position of every score in the distribution, because it is calculated by
taking deviations from the mean, if a score is shifted to a position more
deviant from the mean, the standard deviation will increase. If the shift
is to a position closer to the mean, the standard deviation decreases.
 Measures of Shape: Skewness, Kurtosis
 Skewness is a measure of a data set’s deviation from symmetry

 Skewness 
m3
, m2 
 (x  x ) 2

, m3 
 (x  x ) 3

3
m2 2 n n

The value of this measure generally lies between -3 and +3. The
closer the value lies to -3, the more the distribution is skewed left,
vice versa. A value close to 0 indicates a symmetric distribution. A
normal distribution is symmetric and has skewness of 0.
 There are other measures of skewness:
 1. Pearson mode skewness or fist skewness coefficient
mean  mode
skewness 
s.d .
Mean < (>) mode  distribution is -ve-ly (+ve-ly) skewned

 2. Pearson median skewness or second skewness coefficient

3(mean  median)
skewness 
s.d .
Mean < (>) median  distribution is -ve-ly (+ve-ly) skewed
 3. Bowley skewness or quartile skewness coefficient
(Q  Q2 )  (Q2  Q1 ) Q3  2Q2  Q1
skewness  3 
Q3  Q1 Q3  Q1
Distribution Coefficient of Skewness Measures of Central
Location
Symmetrical 0 Mean = Median = Mode
Skewed to the right >0 Mean > Median > Mode
Skewed to the left <0 Mean < Median < Mode
 Kurtosis is a measure of peakedness of a distribution.
m
kurtosis  42
m2
 Excess kurtosis is defined as the kurtosis minus 3, i.e.
excess kurtosis = kurtosis – 3
Normal distribution has an excess kurtosis of 3.
Generally, if a distribution has a greater excess kurtosis, it has a
higher peak and thicker tails, compared to another distribution of
the same kind.
 Outlier is a data point that is not consistent with the bulk of the data.
If an observation is outside the range [Q1 – 1.5IQR , Q3+1.5IQR],
then it is regarded as outlier.
 Possible reasons for outliers and what to do about them:
 Outlier is legitimate data value and represents natural
variability for the group and variable(s) measured. Values
may not be discarded. They provide important information
about location and spread.
 Mistake made while taking measurement or entering it into
computer. If verified, should be discarded or corrected.
 Individual in question belongs to a different group other than
bulk of individuals measured. Values may be discarded if
summary is desired and reported for the majority group only.
 Coefficient of Variation
 The standard deviation measures the variation in a set of data. For
decision makers, the standard deviation indicates how spread out a
distribution is.
 For distributions having the same mean, the distribution with the
largest standard deviation has the greatest relative spread.
 When two or more distributions have different means, the relative
spread cannot be determined by merely comparing the standard
deviations.
 Coefficient of variation (CV), is used to measure the relative variation
for distributions with different means.
s
 Sample coefficient of variation = (100%)
x
 When the coefficients of variation for two or more distributions are
compared, the distribution with the largest CV is said to have the
greatest relative spread.

Normal Distribution

Percentile

- k-th percentile is a number that has k% of the data values at or below it and
(100-k)% of the data values at or above it. Lower quartile, median, upper quartile
are special cases of percentile. Lower quartile = 25th percentile, median = 50th
percentile, upper quartile = 75th percentile.
Value-at-Risk (VaR)

- One important application of percentile in risk management is VaR.

- VaR is defined as the worst loss over a target horizon that will not be exceeded
with a certain confidence level. For instance, the VaR at the 95% confidence level
gives a loss value that will not be exceeded with no less than 95% of probability.

Z-score

-   1 contains about 68% of the scores

-   2 contains about 95% of the scores

-   3 contains about 99.7% of the scores

Chapter 2 Correlation and Regression

Scatterplot

- Positive/negative association, linear relationship/nonlinear (curvilinear)

relationship

Correlation Coefficient r

- Strength

 It is determined by the closeness of the points to a straight line.

- Direction

 It is determined by whether one variable generally increases or generally

decreases when the other variable increases

- Linear

 When the pattern is nonlinear, the correlation coefficient is not an

appropriate way to measure the strength of the relationship.

- The measure is also called Pearson product-moment correlation coefficient.

r
S xy

 ( x  x )( y  y )
( S xx )( S yy )  (x  x )  ( y  y)
2 2

where
( x) 2
S xx   ( x  x ) 2   x 2  nx 2   x 2 
n
( y ) 2
S yy   ( y  y )   y  ny   y 
2 2 2 2

S xy   ( x  x )( y  y )   xy  nx  y   xy 
 x y
n

- r is always -1 and +1.

- Magnitude indicates the strength of the linear relationship.

- Sign indicates the direction of the association.

Rank Correlation Coefficient rs

- Since rankings are qualitative data but not quantitative data even though they are
numerical, sample correlation coefficient r cannot be used.
- Instead, we will use the nonparametric counterparts of r, the rank correlation
coefficient rs, to perform correlation analysis to a form of qualitative data:
bivariate rankings.

- If we wish to assess the strength of the relation between the two sets of ranks, we
can compute the sample rank correlation coefficient rs.

- The Spearman correlation coefficient rs is defined as the Pearson correlation

coefficient between the ranks of the data.

rs 
 ( R  R )( R  R )
x x y y
, where Rx and R y are the ranks of the two
 (R  R )  (R  R )
x x
2
y y
2

variables of interest.

If there are no tied ranks in the data, then the following formula also works

6 i 1 di2
n

Shortcut formula: rs  1  ,
n(n 2  1)

di
 Rank ( xi )  Rank ( yi )
where ,
 Rxi  Ryi (difference between a pair of ranks)

n = the number of pairs of ranks

- When to use rs instead of r?

 Situation 1: Data are given in the form of ranks.

 Situation 2: Data are given in the form of scores, but what matters is that
one score is higher than another and how much higher is not really
important. Then, translating scores to ranks will be suitable.

- Cautions in the use of correlation

 Bear in mind the following five cautions in the use of correlation.

 Correlation does not prove causation

 If variation in X causes variation in Y, that causal connection will

appear in some degree of correlation between X and Y.

 However, we cannot reason backward from a correlation to a

causal relationship.
 We must always remember “correlation does not imply
causation”.

 There are at least four possibilities of an observed correlation.

Denote X as the explanatory variable, Y as the response variable.

(a) Causation – X is a cause of Y.

(b) Reverse of causation – Y is a cause of X.

(c) A third variable influences both X and Y.

(d) A complex of interrelated variables influences X and Y.

Note: Two or more of these situations may occur simultaneously.

For example, X and Y may influence each other. (a+b)

 r and rs are only for linear relationship

 When data for one or both variables are not linear, other measures
of association are better.

 effect of variability

 The correlation coefficient is sensitive to the variability

characterizing the measurements of the two variables.

 For example, suppose a university had only minimal entrance

requirements, the relationship between total SAT scores, and the
other university is a more selective private university which
admits students only with SAT scores of 1200 or higher. The
correlation will be weaker in the latter case.

 Therefore, restricting the range, whether in X, in Y, or in both,

results in lower correlation coefficient (in magnitude).

 effect of discontinuity

 The correlation tends to be an overestimate in discontinuous

distributions.

 Usually, discontinuity, whether in X, in Y, or in both, results in a

higher correlation coefficient.

 correlation for combined data

 correlation coefficient may increase or decrease, depends.

- Examples of deceiving relationship

 Outliers can substantially inflate or deflate correlations.
 An outlier that is consistent with the trend of the rest of the data will
inflate the correlation.

 An outlier that is not consistent with the rest of the data can
substantially decrease the correlation.

 Groups combined inappropriately may mask relationships.

 The missing link is a third variable.

 Simpson’s Paradox

 Two or more groups

 Variables for each group may be strongly correlated

 When groups combined into one, very little correlation between

the two variables.

Simple Linear Regression

Bafs s4 Personal Finance CH
No ratings yet
Bafs s4 Personal Finance CH
3 pages
CHAPTER 5 Skewness, Kurtosis and Moments
0% (1)
CHAPTER 5 Skewness, Kurtosis and Moments
49 pages
3 Module 3 Statistics Refresher
No ratings yet
3 Module 3 Statistics Refresher
50 pages
Solutions To Problem Set 1
No ratings yet
Solutions To Problem Set 1
6 pages
Webquestthelifecycleofstars Keevanpagett
0% (1)
Webquestthelifecycleofstars Keevanpagett
3 pages
Developing The Four Essential Skills
No ratings yet
Developing The Four Essential Skills
2 pages
Nital Preparation PDF
No ratings yet
Nital Preparation PDF
3 pages
STAT1600 (22-23, 1st) Test 1
No ratings yet
STAT1600 (22-23, 1st) Test 1
3 pages
hw3 Sol
100% (1)
hw3 Sol
6 pages
Lecture Two: The Perceptron: CEG5301: Machine Learning With Applications
No ratings yet
Lecture Two: The Perceptron: CEG5301: Machine Learning With Applications
66 pages
SLT For Sap Hana Training
No ratings yet
SLT For Sap Hana Training
15 pages
Lecture Three Multi-Layer Perceptron: Backpropagation: Part I: Fundamentals of Neural Networks
No ratings yet
Lecture Three Multi-Layer Perceptron: Backpropagation: Part I: Fundamentals of Neural Networks
70 pages
The Uvw Method
No ratings yet
The Uvw Method
10 pages
Kanwal Rubber Expansion Bellows
No ratings yet
Kanwal Rubber Expansion Bellows
14 pages
Theory Manual
No ratings yet
Theory Manual
274 pages
Lecture Five Radial-Basis Function Networks: Associate Professor
No ratings yet
Lecture Five Radial-Basis Function Networks: Associate Professor
64 pages
HMC Reloading DS8870
No ratings yet
HMC Reloading DS8870
5 pages
Introduction To Valuation: The Time Value of Money
No ratings yet
Introduction To Valuation: The Time Value of Money
61 pages
HJHJHJ
No ratings yet
HJHJHJ
38 pages
Questions Aurora Textile Company
100% (2)
Questions Aurora Textile Company
2 pages
Cost and Profit
No ratings yet
Cost and Profit
13 pages
505G2 20-30 Hpu Prod-Mn
No ratings yet
505G2 20-30 Hpu Prod-Mn
90 pages
10 - Filtration - Industrial
No ratings yet
10 - Filtration - Industrial
8 pages
SAP Solution Manager 7
No ratings yet
SAP Solution Manager 7
4 pages
Overview of Roles in Solution Manager (SAP Library - SAP Solution Manager)
No ratings yet
Overview of Roles in Solution Manager (SAP Library - SAP Solution Manager)
3 pages
CFX-Intro 14.5 L09 HeatTransfer
No ratings yet
CFX-Intro 14.5 L09 HeatTransfer
26 pages
Aurora Analysis
No ratings yet
Aurora Analysis
26 pages
Salomon Brothers Fixed in Come 2
No ratings yet
Salomon Brothers Fixed in Come 2
23 pages
Demystifying Solution Manager 7.1 - Configuration Focus On Diagnostics Roland Hoeller SAP PDF
No ratings yet
Demystifying Solution Manager 7.1 - Configuration Focus On Diagnostics Roland Hoeller SAP PDF
106 pages
CFX-Intro 16 L09 HeatTransfer
No ratings yet
CFX-Intro 16 L09 HeatTransfer
23 pages
SM001 Introduction To SAP Solution Manager
No ratings yet
SM001 Introduction To SAP Solution Manager
16 pages
Article Floating Roof Design Volume Emissions Maintenance
No ratings yet
Article Floating Roof Design Volume Emissions Maintenance
3 pages
Diamond 5 Page Report PDF
No ratings yet
Diamond 5 Page Report PDF
6 pages
Chapter 3
No ratings yet
Chapter 3
28 pages
Biostats
No ratings yet
Biostats
17 pages
Chapter Two Mba Summary Class Notes 24
No ratings yet
Chapter Two Mba Summary Class Notes 24
31 pages
MPH Biostatistics lecture 3_2016
No ratings yet
MPH Biostatistics lecture 3_2016
59 pages
(Week2) Social Data Analysis_240911 (2)
No ratings yet
(Week2) Social Data Analysis_240911 (2)
27 pages
Module 10 Introduction To Data and Statistics
No ratings yet
Module 10 Introduction To Data and Statistics
63 pages
DS Notes Unit - III
No ratings yet
DS Notes Unit - III
29 pages
BAA Class Notes
No ratings yet
BAA Class Notes
16 pages
Amit Singh - Ssjcet20024 - Business Statistic Assignment
No ratings yet
Amit Singh - Ssjcet20024 - Business Statistic Assignment
14 pages
Skewness
No ratings yet
Skewness
6 pages
Lec 6 P&S
No ratings yet
Lec 6 P&S
16 pages
Unit 4 Descriptive Statistics
No ratings yet
Unit 4 Descriptive Statistics
8 pages
Skewness and Kurtosis
No ratings yet
Skewness and Kurtosis
10 pages
BAA Class Notes
No ratings yet
BAA Class Notes
16 pages
Dispersion
No ratings yet
Dispersion
9 pages
Module 4 - S3 B Com
No ratings yet
Module 4 - S3 B Com
4 pages
CHAPTER 3 Displaying and Describing Quantitative Data
No ratings yet
CHAPTER 3 Displaying and Describing Quantitative Data
66 pages
Measures of central tendency - identify center & average of a dataset
No ratings yet
Measures of central tendency - identify center & average of a dataset
3 pages
This Section Presents Concepts Related To Using and Interpreting The Following Measures
No ratings yet
This Section Presents Concepts Related To Using and Interpreting The Following Measures
24 pages
Statistics For Datacience
100% (1)
Statistics For Datacience
7 pages
Dispersion (Measures of Variability)
100% (3)
Dispersion (Measures of Variability)
42 pages
S1 Chapter 1-3 Answer
No ratings yet
S1 Chapter 1-3 Answer
18 pages
Analytics compendium (incl stats)
No ratings yet
Analytics compendium (incl stats)
31 pages
NORMAL-DISTRIBUTION-Updated-slides
No ratings yet
NORMAL-DISTRIBUTION-Updated-slides
44 pages
Chapter 4 Fin534
No ratings yet
Chapter 4 Fin534
38 pages
Organizing Data, Measures of Central Tendency and Dispersion in Frequency Distributions
No ratings yet
Organizing Data, Measures of Central Tendency and Dispersion in Frequency Distributions
22 pages
CHAPTER 5 Skewness, Kurtosis and Moments
100% (3)
CHAPTER 5 Skewness, Kurtosis and Moments
49 pages
BigDataAnalytics _ Unit2
No ratings yet
BigDataAnalytics _ Unit2
15 pages
INF30036 Lecture5
No ratings yet
INF30036 Lecture5
33 pages
Sts Reviewer
No ratings yet
Sts Reviewer
15 pages
Lecture 2 - Normative Distribution and Descriptive Statistics
No ratings yet
Lecture 2 - Normative Distribution and Descriptive Statistics
51 pages
T1 - Demo of Random Walk
No ratings yet
T1 - Demo of Random Walk
1 page
Lecture6 StockSimulationExcel
No ratings yet
Lecture6 StockSimulationExcel
35 pages
2013 Dec Final Exam
No ratings yet
2013 Dec Final Exam
4 pages
STAT1600A (16-17, 1st) Assignment 4
No ratings yet
STAT1600A (16-17, 1st) Assignment 4
5 pages
FINA0804 Assignment 4 PDF
No ratings yet
FINA0804 Assignment 4 PDF
2 pages
Syllabus
No ratings yet
Syllabus
4 pages
Lecture Notes For STAT2602
No ratings yet
Lecture Notes For STAT2602
104 pages
Problem Sets (Days 1-6)
No ratings yet
Problem Sets (Days 1-6)
18 pages
White Earth Forest Vision
No ratings yet
White Earth Forest Vision
1 page
Pccoe College Information
No ratings yet
Pccoe College Information
11 pages
Yohannes Dejene Final Research
No ratings yet
Yohannes Dejene Final Research
100 pages
EPS Study
No ratings yet
EPS Study
9 pages
Demo Teaching Rubric
100% (1)
Demo Teaching Rubric
2 pages
Reference Style - KU - Economics - Discipline - 2015 - 11 - 10-2
No ratings yet
Reference Style - KU - Economics - Discipline - 2015 - 11 - 10-2
7 pages
Short Essay Rubric: Score Completion Accuracy Comprehension Organization Conventions
No ratings yet
Short Essay Rubric: Score Completion Accuracy Comprehension Organization Conventions
1 page
10b Sorting
No ratings yet
10b Sorting
29 pages
Questionnaire For Retailers VAS Value Added Services 1. Which
No ratings yet
Questionnaire For Retailers VAS Value Added Services 1. Which
3 pages
Lesson Plan: News Item Text
No ratings yet
Lesson Plan: News Item Text
3 pages
Laser Gauge Application
No ratings yet
Laser Gauge Application
2 pages
Mr. Makasa's Lataz Presentation-2022
No ratings yet
Mr. Makasa's Lataz Presentation-2022
62 pages
THE ROLE OF PRODUCT KNOWLEDGE AND KNOWLEDGE MANAGEMENT IN THE IMPLEMENTATION OF COPPER
No ratings yet
THE ROLE OF PRODUCT KNOWLEDGE AND KNOWLEDGE MANAGEMENT IN THE IMPLEMENTATION OF COPPER
20 pages
How To Set Up Reports in The Page Layout
No ratings yet
How To Set Up Reports in The Page Layout
2 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
65 pages
English Lesson
No ratings yet
English Lesson
33 pages
A Grounded Theory
No ratings yet
A Grounded Theory
15 pages
Logarithm Practice
No ratings yet
Logarithm Practice
9 pages
Auditoria 21
No ratings yet
Auditoria 21
12 pages
Slang Word Formation in Pitch Perfect Movie
No ratings yet
Slang Word Formation in Pitch Perfect Movie
11 pages
EDLC Supercapacitor Market Update Feb 2019
No ratings yet
EDLC Supercapacitor Market Update Feb 2019
7 pages
Implementation of Medical Image Fusion Using DWT Process On FPGA
No ratings yet
Implementation of Medical Image Fusion Using DWT Process On FPGA
4 pages
The Role of Informal Mechanism in The Co
No ratings yet
The Role of Informal Mechanism in The Co
28 pages
Esl LP Najor
No ratings yet
Esl LP Najor
2 pages
Chapter 2.1 Cell Structure and Function
No ratings yet
Chapter 2.1 Cell Structure and Function
5 pages
Ba Eng dbk+4
No ratings yet
Ba Eng dbk+4
3 pages
Role of Library in Research
100% (1)
Role of Library in Research
6 pages