Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 1 Stat

Download as pdf or txt
Download as pdf or txt
You are on page 1of 96

Chapter 1

Introduction to statistics, organizing


and visualizing data
BB113 Statistics and its applications

Textbook: Chapter 1 - 2
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 1
Objectives
In this chapter you learn:

◼ The basic vocabulary of statistics


◼ The types of variable used
◼ The types of sampling methods
◼ Organizing categorical & numerical variables.
◼ Visualizing categorical & numerical variables.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 2


What is statistics

◼ Represent scientific procedures and methods


for collecting, organizing, summarizing,
presenting and analyzing data, drawing valid
conclusions and making reasonable decisions.

◼ The figures that results from statistical analysis


also called ‘statistics’.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 3


Population vs. Sample
• The entire group of individuals
to be studied is called the
population.

• An individual is a person or
object that is a member of the
population being studied.

• A sample is a subset of the


population that is being studied.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 4


Statistics basics
Descriptive Statistics

Descriptive Statistics consists of methods for organizing


and summarizing information.

Descriptive statistics includes


the construction of graphs,
charts, and tables and the
calculation of various
descriptive measures such as
averages, measures of
variation, and percentiles.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 5


Statistics basics
Inferential Statistics

Inferential statistics consists of methods for drawing and


measuring the reliability of conclusions about a population
based on information obtained from a sample of the
population.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 6


Classifying Variables By Type
DCOVA
Categorical or qualitative variables allow for
classification of individuals based on some
attribute or characteristic.

What is your
favorite food
group?

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 7


Classifying Variables By Type
DCOVA
▪ Numerical or quantitative variables provide
numerical measures of individuals.
▪ The variables have values that represent a
counted or measured quantity.
▪ Discrete variables arise from a counting process.
▪ Continuous variables arise from a measuring process.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 8


Types of Variables
DCOVA
Variables

Categorical Numerical

Nominal Ordinal Discrete Continuous


Examples: Examples: Ratings Examples: Examples:
◼ Marital Status ◼ Good, Better, Best ◼ Number of Children ◼ Weight
◼ Political Party ◼ Low, Med, High ◼ Defects per hour ◼ Voltage
◼ Eye Color (Ordered Categories) (Counted items) (Measured
(Defined Categories) characteristics)

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 9


Measurement Scales
DCOVA
A nominal scale classifies data into distinct
categories in which no ranking is implied.

Categorical Variables Categories

Do you have a Facebook Yes, No


profile?
Growth, Value, Other
Type of investment
TM Berhad, Celcom, Maxis,
Digi, U Mobile
Cellular Provider

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 10


Measurement Scales (con’t.)
DCOVA
An ordinal scale classifies data into distinct
categories in which ranking is implied.
Categorical Variable Ordered Categories

Student class designation 1st year, 2nd year, 3rd year, 4th year

Product satisfaction Very unsatisfied, Fairly unsatisfied,


Neutral, Fairly satisfied, Very
satisfied
Faculty rank Professor, Associate Professor,
Assistant Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades A, B, C, D, F

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 11


Measurement Scales (con’t.)
DCOVA
▪ An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true
zero point.
▪ Example:
▪ Temperature (0°C does not mean no heat at all)
▪ IQ scores (0 does not imply no intelligence)

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 12


Measurement Scales (con’t.)
DCOVA
▪ A ratio scale is an ordered scale in which the
difference between the measurements is a
meaningful quantity and the measurements have a
true zero point.
▪ Example:
▪ Height
▪ Weight
▪ Area
▪ Number of phone calls received
▪ Salary

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 13


Data collection
◼ Data is collected from either a population or a
sample.
◼ Collecting data via sampling is used when
doing so is
a) Less time consuming than selecting every
item in the population.
b) Less costly than selecting every item in the
population.
c) Less cumbersome and more practical than
analyzing the entire population.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 14


Sources of Data
DCOVA
▪ Primary Sources: The data collector is the one
using the data for analysis:
▪ Data from a political survey.
▪ Data collected from an experiment.
▪ Observed data.
▪ Secondary Sources: The person performing data
analysis is not the data collector:
▪ Analyzing census data.
▪ Examining data from print journals or data published on
the internet.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 15


Parameter or Statistic? DCOVA

◼ A population parameter summarizes the value


of a specific variable for a population.
◼ 𝜇, 𝜎 2

◼ A sample statistic summarizes the value of a


specific variable for sample data.
◼ ത 𝑠2
𝑋,

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 16


Parameter or Statistic?
We want to know about these We have these to
work with

Random
selection

sample
population

parameter statistic

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 17


Types of Samples DCOVA

Samples

Non Probability Probability Samples


Samples

Simple Stratified
Random
Judgment Convenience

Systematic Cluster

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 18


Types of Samples:
Nonprobability Sample DCOVA
◼ In a nonprobability sample, items included are
chosen without regard to their probability of
occurrence.
◼ In convenience sampling, items are selected
based only on the fact that they are easy,
inexpensive, or convenient to sample.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 19


Types of Samples:
Nonprobability Sample DCOVA

◼ In a judgment sample, you get the opinions


of pre-selected experts in the subject matter.
◼ Use your own judgement to select what
seems like an appropriate sample

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 20


Types of Samples:
Probability Sample DCOVA

◼ In a probability sample, items in the


sample are chosen on the basis of known
probabilities.
Probability Samples

Simple
Systematic Stratified Cluster
Random

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 21


Probability Sample:
Simple Random Sample DCOVA

◼ Every individual or item from the frame has an


equal chance of being selected.

◼ Selection may be with replacement or without


replacement.

◼ Samples obtained from table of random


numbers or computer random number
generators.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 22


Simple Random Sample

◼ Sampling with replacement


◼ Each time we select an element from the
population, we put it back in the population
before we select the next element.
◼ Thus, the population contains the same
number of items each time a selection is
made.
◼ As a result, we may select the same item
more than once in such a sample.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 23


Simple Random Sample

◼ Sampling without replacement


◼ occurs when the selected element is not
replaced in the population.
◼ Each time we select an item, the size of the

population is reduced by one element.


◼ Thus, we cannot select the same item more

than once in this type of sampling.


◼ Most of the time, samples taken in statistics are
without replacement.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 24


Selecting a Simple Random Sample
Using A Random Number Table DCOVA

Sampling Frame For Portion Of A Random Number Table


Population With 850 49280 88924 35779 00283 81163 07275
11100 02340 12860 74697 96644 89439
Items 09893 23997 20048 49420 88872 08401

Item Name Item #


Bev R. 001
Ulan X. 002
. . The First 5 Items in a simple
. . random sample
. . Item # 492
Item # 808
. . Item # 892 -- does not exist so ignore
Joann P. 849 Item # 435
Item # 779
Paul F. 850
Item # 002

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 25


Probability Sample:
Systematic Sample DCOVA

◼ Decide on sample size: n


◼ Divide frame of N individuals into groups of k
individuals: k=N/n
◼ Randomly select one individual from the 1st
group
◼ Select every kth individual thereafter
N = 40 First Group
n=4
k = 10

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 26


Probability Sample:
Stratified Sample DCOVA

◼ Divide population into two or more subgroups (called strata) according


to some common characteristic.
◼ A simple random sample is selected from each subgroup, with sample
sizes proportional to strata sizes.
◼ Samples from subgroups are combined into one.
◼ This is a common technique when sampling population of voters,
stratifying across racial or socio-economic lines.

Population
Divided
into 4
strata

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 27


Example

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 28


Example

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 29


Example

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 30


Example

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 31


Probability Sample
Cluster Sample DCOVA

◼ Population is divided into several “clusters,” each representative of


the population.

◼ A simple random sample of clusters is selected.

◼ All items in the selected clusters can be used, or items can be


chosen from a cluster using another probability sampling technique.

◼ A common application of cluster sampling involves election exit polls,


where certain election districts are selected and sampled.

Population
divided into
16 clusters. Randomly selected
clusters for sample

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 32


Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 33
Probability Sample:
Comparing Sampling Methods
DCOVA
◼ Simple random sample and Systematic sample:
◼ Simple to use.
◼ May not be a good representation of the
population’s underlying characteristics.
◼ Stratified sample:
◼ Ensures representation of individuals across the
entire population.
◼ Cluster sample:
◼ More cost effective.
◼ Less efficient (need larger sample to acquire the
same level of precision).

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 34


Organizing and visualizing variables

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 35


Organizing Data Creates Both
Tabular And Visual Summaries
DCOVA
◼ Summaries both guide further exploration and
sometimes facilitate decision making.

◼ Visual summaries enable rapid review of larger


amounts of data & show possible significant
patterns.

◼ Often, the Organize and Visualize step in


DCOVA occur concurrently.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 36


Example 1
Weights (kg)
48 70 48 49 49
63 56 56 70 67
53 45 90 65 60
55 45 73 73 70

Weights frequency Frequency histogram for students' weight


7
40 but less than 48
6
48 but less than 56
Frequency 5
56 but less than 64 4

64 but less than 72 3

2
72 but less than 80
1
80 but less than 88
0
88 but less than 96 44 52 60 68 76 84 92
Weight (kg)

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 37


Categorical Data Are Organized By
Utilizing Tables
DCOVA
Categorical
Data

Tallying Data

One Two
Categorical Categorical
Variable Variables

Summary Contingency
Table Table

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 38


Frequency distribution of
categorical data

◼ A frequency distribution of categorical data is a


listing of the distinct values and their
frequencies.

◼ A frequency distribution provides a table of the


values of the observations and how often they
occurs.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 39


Example 2: Determine a frequency
distribution of these data
Table: Political party affiliations of the students in
Statistics and its Applications
D R O R R R R R D = Democratic
R = Republican
D O R D O O R D
O = Other
D R O D R R O R
D O D D D R O D
O R D R R R R D

Party Tally Frequency


Democratic
Republican
Other

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 40


Relative frequency distribution
of categorical data
◼ A relative-frequency distribution of categorical
data is a listing of the distinct values and their
relative frequencies.

◼ A relative-frequency distribution provides a


table of the values of the observations and
(relatively)how often they occurs.

frequency
◼ relative frequency =
number of observations

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 41


Example 3: Construct a relative-frequency
distribution of data in Example 2

Party Frequency Relative Frequency


Democratic
Republican
Other

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 42


Organizing Categorical Data:
Summary Table
DCOVA
A summary table tallies the frequencies or percentages of
items in a set of categories so that you can see differences
between categories.
Main Reason Young Adults Shop Online
Reason For Shopping Online? Percent
Better Prices 37%
Avoiding holiday crowds or hassles 29%
Convenience 18%
Better selection 13%
Ships directly 3%
Source: Data extracted and adapted from “Main Reason Young Adults Shop Online?”
USA Today, December 5, 2012, p. 1A.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 43


A Contingency Table Helps Organize
Two or More Categorical Variables
DCOVA
◼ Used to study patterns that may exist between
the responses of two or more categorical
variables.

◼ Cross tabulates or tallies jointly the responses


of the categorical variables.

◼ For two variables the tallies for one variable are


located in the rows and the tallies for the
second variable are located in the columns.
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 44
Contingency Table - Example
DCOVA
◼ A random sample of 400
invoices is drawn. Contingency Table Showing
Frequency of Invoices Categorized
◼ Each invoice is categorized By Size and The Presence Of Errors
as a small, medium, or large No
amount. Errors Errors Total

◼ Each invoice is also Small 170 20 190


Amount
examined to identify if there
Medium 100 40 140
are any errors. Amount
◼ This data are then organized Large 65 5 70
in the contingency table to Amount
the right. Total 335 65 400

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 45


Contingency Table Based On
Percentage Of Overall Total
DCOVA
No
Errors Errors Total 42.50% = 170 / 400
Small 170 20 190 25.00% = 100 / 400
Amount 16.25% = 65 / 400
Medium 100 40 140
Amount No
Large 65 5 70 Errors Errors Total
Amount Small 42.50% 5.00% 47.50%
Total 335 65 400 Amount
Medium 25.00% 10.00% 35.00%
Amount
83.75% of sampled invoices
Large 16.25% 1.25% 17.50%
have no errors and 47.50% Amount
of sampled invoices are for Total 83.75% 16.25% 100.0%
small amounts.
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 46
Contingency Table Based On
Percentage of Row Totals
DCOVA
No
Errors Errors Total 89.47% = 170 / 190
Small 170 20 190 71.43% = 100 / 140
Amount 92.86% = 65 / 70
Medium 100 40 140
Amount No
Large 65 5 70 Errors Errors Total
Amount Small 89.47% 10.53% 100.0%
Total 335 65 400 Amount
Medium 71.43% 28.57% 100.0%
Amount
Medium invoices have a larger
Large 92.86% 7.14% 100.0%
chance (28.57%) of having Amount
errors than small (10.53%) or Total 83.75% 16.25% 100.0%
large (7.14%) invoices.
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 47
Contingency Table Based On
Percentage Of Column Totals
DCOVA
No
Errors Errors Total 50.75% = 170 / 335
Small 170 20 190 30.77% = 20 / 65
Amount
Medium 100 40 140
Amount No
Large 65 5 70 Errors Errors Total
Amount Small 50.75% 30.77% 47.50%
Total 335 65 400 Amount
Medium 29.85% 61.54% 35.00%
Amount
There is a 61.54% chance
Large 19.40% 7.69% 17.50%
that invoices with errors are Amount
of medium size. Total 100.0% 100.0% 100.0%

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 48


Tables Used For Organizing
Numerical Data
DCOVA
Numerical Data

Ordered Array Frequency Cumulative


Distributions Distributions

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 49


Organizing Numerical Data:
Ordered Array DCOVA
▪ An ordered array is a sequence of data, in rank order, from
the smallest value to the largest value.
▪ Shows range (minimum value to maximum value).
▪ May help identify outliers (unusual observations).
Age of Day Students
Surveyed
16 17 17 18 18 18
College
Students 19 19 20 20 21 22
22 25 27 32 38 42
Night Students
18 18 19 19 20 21
23 28 32 33 41 45
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 50
Organizing Numerical Data:
Frequency Distribution DCOVA

◼ We first group the observations into classes


(also known as categories or bins) and then
treat the classes as the distinct values of
categorical data.

◼ Most common methods: single-value grouping,


limit grouping, and cutpoint grouping.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 51


Organizing Numerical Data:
Frequency Distribution DCOVA

Frequency distribution for numerical data

A frequency distribution for numerical data lists


all the classes and the number of values that
belong to each class.
Data presented in the form of a frequency
distribution are called grouped data.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 52


Computing the relative frequency
◼ Frequency in each class divided by the total
observations:

frequency in each class


◼ Relative frequency =
total observations

◼ Relative frequency also knows as proportion.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 53


Single-value grouping
DCOVA
Table: Number of TV sets in each of 50 randomly selected households
1 1 1 2 6 3 3 4 2 4 3 2 1 5 2 1 3 6 2 2 3 1 1 4 3
2 2 2 2 3 0 3 1 2 1 2 3 1 1 3 3 2 1 2 1 1 3 1 5 1

Number of Relative Table: Frequency and


Tally Frequency
TVs frequency relative-frequency
0 | distributions, using
1 |||| |||| |||| | single-value
grouping, for the
2 |||| |||| ||||
number-of-TVs data
3 |||| |||| ||
4 |||
5 ||
6 ||

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 54


Limit grouping DCOVA

◼ Group numerical data using class limits.

◼ Each class consists of a range of values.

◼ Particularly useful when the data are expressed


as whole numbers and there are too many
distinct values to employ single-value grouping.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 55


Limit grouping DCOVA

Terms Used in Limit Grouping


Lower class limit: The smallest value that could go in a class.
Upper class limit: The largest value that could go in a class.
Class width: The difference between the lower limit of a class
and the lower limit of the next-higher class.
Class mark: The average of the two class limits of a class.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 56


Limit grouping – example DCOVA
Table: Days to maturity for 40 short-term investments
70 64 99 55 64 89 87 65 62 38 67 70 60 69 78 39 75 56 71 51
99 68 95 86 57 53 47 50 55 81 80 98 51 36 63 66 85 79 83 70

Use limit grouping, with grouping by 10s, to organize these


data into frequency and relative-frequency distributions.

Min = → 1st class:

Max = → grouping by 10s will results in 7 classes

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 57


Limit grouping – example
Table: Frequency and relative-frequency distributions,
using limit grouping, for the days-to-maturity data
Days to Relative
variable Tally Frequency
maturity frequency Relative
30 – 39 ||| 3 0.075 frequency
40 – 49 | 1 0.025 column
3rd class 50 – 59 |||| ||| 8 0.200 Frequency
60 – 69 |||| |||| 10 0.250 column
70 – 79 |||| || 7 0.175
80 – 89 |||| || 7 0.175
90 – 99 |||| 4 0.100
40 1.00

Lower limit of Upper limit of


the 7th class the 7th class
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 58
Limit grouping – example
◼ Find the class width.
◼ Find the class mark for class 70 – 79.

Class width = 40 – 30 = 10
70+79
Class mark for class 70 – 79 = = 74.5
2

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 59


Cutpoint grouping DCOVA

◼ Group numerical data using class cutpoints.

◼ Each class consists of a range of values.

◼ Particularly useful when the data are continuous


and are expressed with decimals.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 60


Cutpoint grouping DCOVA

Terms Used in Cutpoint Grouping


Lower class cutpoint: The smallest value that could go in a
class.
Upper class cutpoint: The smallest value that could go in the
next-higher class (equivalent to the lower cutpoint of the next-
higher class).
Class width: The difference between the cutpoints of a class.
Class midpoint: The average of the two cutpoints of a class.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 61


Cutpoint grouping – example DCOVA
Table: Weights, in pounds, of 37 males aged 18-24 years
129.2 185.3 218.1 182.5 142.8 155.2 170.0 151.3 187.5 145.6
167.3 161.0 178.7 165.0 172.5 191.1 150.7 187.0 173.7 178.2
161.7 170.1 165.8 214.6 136.7 278.8 175.6 188.7 132.1 158.5
146.4 209.1 175.4 182.0 173.6 149.9 158.6

Use cutpoint grouping, with class width of 20 and a first cutpoint of


120, to organize these data into frequency and relative-frequency
distributions.

1st cutpoint of 120; class width of 20 → 1st class: 120 – under 140

Max = 278.8 lb → last class is 260 – under 280

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 62


Cutpoint grouping – example
Table: Frequency and relative-frequency distributions,
using cutpoint grouping, for the weight data
Relative
Weight (lb) Tally Frequency
frequency
120 – under 140 ||| 3 0.081
140 – under 160 |||| |||| 9 0.243
160 – under 180 |||| |||| ||| 14 0.378
180 – under 200 |||| || 7 0.189
200 – under 220 ||| 3 0.081
220 – under 240 0 0.000
240 – under 260 0 0.000
260 – under 280 | 1 0.027
37 0.999
Lower cutpoint Upper cutpoint
of the 8th class of the 8th class
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 63
Organizing Numerical Data:
Guidelines for grouping data DCOVA

1. The number of classes should be small


enough to provide an effective summary but
large enough to display the relevant
characteristics of the data. Rule of thumb:
between 5 and 20.
2. Each observation must belong to one, and
only one, class.
3. Whenever feasible, all classes should have
the same width.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 64


Organizing Numerical Data:
Frequency Distribution DCOVA

◼ To determine the width of a class, you divide the


range (highest value – lowest value) of the data by
the number of class groupings desired.

highest value − lowest value


class width =
number of classes

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 65


Organizing Numerical Data:
Frequency Distribution Example
DCOVA

A manufacturer of insulation randomly selects 20


winter days and records the daily high temperature.

24 35 17 21 24 37 26 46 58 30
32 13 12 38 41 43 44 27 53 27
Construct a frequency and relative frequency and
percentage distributions with 5 classes using cutpoint
grouping.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 66


Organizing Numerical Data:
Frequency Distribution Example
▪ Sort raw data in ascending order:
DCOVA
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58.
▪ Find range: 58 - 12 = 46.
▪ Number of classes: 5 (usually between 5 and 20).
▪ Compute class width: 10 (46/5 then round up).
▪ Determine class cutpoints:
Alternative way of writing:
▪Class 1: 10 – under 20
10 but less than 20
▪Class 2: 20 – under 30
▪Class 3: 30 – under 40
or
▪Class 4: 40 – under 50 10 ≤ 𝑥 < 20
▪ Class 5: 50 – under 60
▪ Compute class midpoints: 15, 25, 35, 45, 55.
▪ Count observations & assign to classes.
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 67
Organizing Numerical Data: Frequency
Distribution Example
DCOVA
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Class Midpoints Frequency

Total

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 68


Organizing Numerical Data: Relative &
Percent Frequency Distribution Example
DCOVA
Relative
Class Midpoints Frequency Percentage
frequency

Total

frequency
Relative frequency = ,
total
frequency
Percentage = relative frequency ×100 or = × 100
total

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 69


The cumulative distribution

◼ It provides a way of presenting information


about the percentage of values that are less
than a specific amount.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 70


Organizing Numerical Data: Cumulative
Percentage Distribution Example
Cumulative
DCOVA
percentage

Percentage of temperature that are


Class Percentage less than the class lower cutpoint

15 0
30 15% = 0 + 15
25 45% = 15 + 30
20 70% = 15 + 30 + 25
10 90% = 15 + 30 + 25 + 20
0 100% = 15 + 30 + 25 + 20 + 10
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 71
Organizing Numerical Data: Cumulative
Percentage Distribution Example
DCOVA

Cumulative percentage distribution of the temperature


Percentage of temperature that are less than
Temperature the indicated value
0
15
45
70
90
100

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 72


Why Use a Frequency Distribution?
DCOVA
◼ It condenses the raw data into a more
useful form.
◼ It allows for a quick visual interpretation of
the data.
◼ It enables the determination of the major
characteristics of the data set including
where the data are concentrated /
clustered.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 73


Going From Classes To Excel Bins
DCOVA
◼ Microsoft Excel creates distribution tables using bins
(named by their upper limit) rather than classes.

Class Excel Bin Name


An extra bin
9.99 added in Excel
10 but less than 20 19.99 slightly less than
the smallest
20 but less than 30 29.99 observation
30 but less than 40 39.99
40 but less than 50 49.99
50 but less than 60 59.99

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 74


Visualizing Categorical Data
Through Graphical Displays
DCOVA
Categorical
Data
Visualizing Data

Summary Contingency
Table For One Table For Two
Variable Variables

Bar Pareto Side By Side Doughnut


Chart Chart Bar Chart Chart

Pie or
Doughnut Chart

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 75


Visualizing Categorical Data:
The Bar Chart
DCOVA
▪ The bar chart visualizes a categorical variable as a series of bars. The
length of each bar represents either the frequency or percentage of
values for each category. Each bar is separated by a space called a gap.

Reason For Percent


Shopping Online?
Better Prices 37%
Avoiding holiday 29%
crowds or hassles
Convenience 18%
Better selection 13%
Ships directly 3%

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 76


Visualizing Categorical Data:
The Pie Chart
DCOVA
▪ The pie chart is a circle broken up into slices that represent categories.
The size of each slice of the pie varies according to the percentage in
each category.

Reason For Shopping Percent


Online?
Better Prices 37%
Avoiding holiday crowds or 29%
hassles
Convenience 18%
Better selection 13%
Ships directly 3%

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 77


Visualizing Categorical Data:
Side By Side Bar Charts DCOVA
▪ The side by side bar chart represents the data from a contingency table.

No
Errors Errors Total
Invoice Size Split Out By Errors
Small 50.75% 30.77% 47.50% & No Errors
Amount
Medium 29.85% 61.54% 35.00% Errors

Amount
Large 19.40% 7.69% 17.50% No Errors

Amount
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%
Total 100.0% 100.0% 100.0% Large Medium Small

Invoices with errors are much more likely to be of


medium size (61.5% vs 30.8% & 7.7%).

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 78


Visualizing Numerical Data
By Using Graphical Displays
DCOVA
Numerical Data

Frequency Distributions
Ordered Array and
Cumulative Distributions

Stem-and-Leaf
Histogram Polygon Ogive
Display

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 79


Stem-and-Leaf Display
DCOVA

◼ A simple way to see how the data are distributed


and where concentrations of data exist.

METHOD: Separate the sorted data series


into leading digits (the stems) and
the trailing digits (the leaves).

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 80


Organizing Numerical Data:
Stem and Leaf Display
DCOVA
A stem-and-leaf display organizes data into groups (called
stems) so that the values within each group (the leaves) branch
out to the right on each row.
Age of College Students

Age of Day Students Day Students Night Students


Surveyed
16 17 17 18 18 18 Stem Leaf Stem Leaf
College
Students 19 19 20 20 21 22
1 67788899 1 8899
22 25 27 32 38 42
Night Students 2 0012257 2 0138
18 18 19 19 20 21 3 28 3 23
23 28 32 33 41 45
4 2 4 15
Key: 2|5 = 25 Key: 1|8 = 18

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 81


Visualizing Numerical Data:
The Histogram
DCOVA

▪ A vertical bar chart of the data in a frequency distribution is


called a histogram.

▪ In a histogram there are no gaps between adjacent bars.

▪ The class cutpoints (or class midpoints) are shown on the


horizontal axis.

▪ The vertical axis is either frequency, relative frequency, or


percentage.

▪ The height of the bars represent the frequency, relative


frequency, or percentage.
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 82
Visualizing Numerical Data:
The Histogram
Class Midpoints Frequency Relative frequency Percentage DCOVA
10 ≤ 𝑥 < 20 15 3 0.15 15
20 ≤ 𝑥 < 30 25 6 0.30 30
30 ≤ 𝑥 < 40 35 5 0.25 25
40 ≤ 𝑥 < 50 45 4 0.20 20
50 ≤ 𝑥 < 60 55 2 0.10 10
20 1.00 100
8
Histogram: Temperature
6

Frequency
(In a percentage
histogram the vertical
axis would be defined to 4
show the percentage of
observations per class).
2

0
5 15 25 35 45 55More

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 83


Visualizing Numerical Data:
The Polygon
DCOVA

▪ A percentage polygon is formed by having the midpoint of


each class represent the data in that class and then connecting
the sequence of midpoints at their respective class
percentages.

▪ The cumulative percentage polygon, or ogive, displays the


variable of interest along the X axis, and the cumulative
percentages along the Y axis.

▪ Useful when there are two or more groups to compare.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 84


Visualizing Numerical Data:
The Frequency Polygon DCOVA
Useful When Comparing Two or More Groups
Center city Metro area
Meal cost ($) Midpoint frequency frequency
10 ≤ 𝑥 < 20 15 0 1
20 ≤ 𝑥 < 30 25 6 14
30 ≤ 𝑥 < 40 35 5 12
40 ≤ 𝑥 < 50 45 9 10
50 ≤ 𝑥 < 60 55 9 10
60 ≤ 𝑥 < 70 65 10 3
70 ≤ 𝑥 < 80 75 6 0
80 ≤ 𝑥 < 90 85 2 0
90 ≤ 𝑥 < 100 95 3 0
Total 50 50
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 85
Visualizing Numerical Data:
The Frequency Polygon DCOVA
Useful When Comparing Two or More Groups

Frequency Polygons for Meal Cost at Center City and


Metro Area Restaurants
16
14
12 Center city frequency
Frequency

10 Metro area frequency

8
6
4
2
0
5 15 25 35 45 55 65 75 85 95 105
Meal cost ($)

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 86


Visualizing Numerical Data:
The Percentage Polygon DCOVA
Useful When Comparing Two or More Groups

Center city Metro area


Meal cost ($)
Relative frequency Percentage Relative frequency Percentage
10 ≤ 𝑥 < 20 0.00 0 0.02 2
20 ≤ 𝑥 < 30 0.12 12 0.28 28
30 ≤ 𝑥 < 40 0.10 10 0.24 24
40 ≤ 𝑥 < 50 0.18 18 0.20 20
50 ≤ 𝑥 < 60 0.18 18 0.20 20
60 ≤ 𝑥 < 70 0.20 20 0.06 6
70 ≤ 𝑥 < 80 0.12 12 0.00 0
80 ≤ 𝑥 < 90 0.04 4 0.00 0
90 ≤ 𝑥 < 100 0.06 6 0.00 0
Total 1.00 100 1.00 100

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 87


Visualizing Numerical Data:
The Percentage Polygon
DCOVA

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 88


Visualizing Numerical Data: The
Cumulative Percentage Polygon (Ogive)
DCOVA

Percentage of Center City Percentage of Metro Area


Meal cost Restaurants Meals that cost Restaurants Meals that cost
($) less than indicated amount less than indicated amount
10 0 0
20 0 2
30 12 30
40 22 54
50 40 74
60 58 94
70 78 100
80 90 100
90 94 100
100 100 100

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 89


Visualizing Numerical Data: The
Cumulative Percentage Polygon (Ogive)
DCOVA

𝑌 axis : cumulative percentages


𝑋 axis : lower cutpoint of the class intervals (10, 20, …), approximated
by the upper cutpoint of the previous bins (9.99, 19.99, …)
Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 90
Visualizing Two Numerical Variables
By Using Graphical Displays
DCOVA

Two Numerical
Variables

Scatter Time-
Plot Series
Plot

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 91


Visualizing Two Numerical
Variables: The Scatter Plot
DCOVA
▪ Scatter plots are used for numerical data
consisting of paired observations taken from
two numerical variables.
▪ One variable is measured on the vertical axis
and the other variable is measured on the
horizontal axis.
▪ Scatter plots are used to examine possible
relationships between two numerical variables.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 92


Scatter Plot Example
DCOVA

Volume Cost Cost per day vs. Production Volume


per day per day 250

23 125
200
26 140
29 146
Cost per day

150

33 160
100
38 167
42 170 50

50 188 0
0 10 20 30 40 50 60 70
55 195 Volume per day

60 200

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 93


In Excel It Is Easy To
Inadvertently Create Distortions

◼ Excel often will create a graph where the


vertical axis does not start at 0.

◼ Excel offers the opportunity to turn simple


charts into 3-D charts and in the process can
create distorted image.

◼ Unusual charts offered as choices by Excel will


most often create distorted images.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 94


Best Practices for Constructing
Visualizations DCOVA

▪ Use the simplest possible visualization.


▪ Include a title & label all axes.
▪ Include a scale for each axis if the chart contains axes.
▪ Begin the scale for a vertical axis at zero & use a
constant scale.
▪ Avoid 3D or “exploded” effects & the use of chart junk.
▪ Use consistent colorings in charts meant to be
compared.
▪ Avoid using uncommon chart types including radar,
surface, bubble, cone, and pyramid charts.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 95


Exercise
Political view Frequency
Liberal 160
Moderate 246
Conservative 94

a) Obtain a relative-frequency distribution. Round


your answer to 3 decimal places.
b) Construct a relative-frequency bar chart on
graph paper.

Copyright © 2017, 2014, 2011 Pearson Education, Inc. Chapter 1 - 96

You might also like