Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
120 views

1 - Chapter 1 - Frequency Distribution and Graphs

The document provides information about probability and statistics concepts including: - Statistics is the science of collecting, organizing, analyzing, and drawing conclusions from data. - Probability and statistics have applications in many fields including computer science, engineering, business, medicine, and more. - Key concepts discussed include population, sample, descriptive statistics, inferential statistics, variables, data, and frequency distributions. - Procedures for constructing different types of frequency distributions such as categorical, ungrouped, and grouped distributions are outlined.

Uploaded by

Dmddldldldl
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views

1 - Chapter 1 - Frequency Distribution and Graphs

The document provides information about probability and statistics concepts including: - Statistics is the science of collecting, organizing, analyzing, and drawing conclusions from data. - Probability and statistics have applications in many fields including computer science, engineering, business, medicine, and more. - Key concepts discussed include population, sample, descriptive statistics, inferential statistics, variables, data, and frequency distributions. - Procedures for constructing different types of frequency distributions such as categorical, ungrouped, and grouped distributions are outlined.

Uploaded by

Dmddldldldl
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

COLLEGE OF COMPUTING AND INFORMATION SCIENCES

DEPARTMENT OF INFORMATION TECHNOLOGY


MATHEMATICS SECTION

SEMESTER-1 (AY 2022-2023)


PROBABILITY AND STATISTICS (MATH311 / STAT3101)

PPT - 1 OF CHAPTER 1
Frequency distribution and Graphs

11/19/22
Introduction

Statistics deals with the methods for collection, classification and analysis of
numerical data for drawing valid conclusions and making reasonable
decisions.

Precisely statistics is the science of conducting studies to collect, organize,


summarize, analyze and draw conclusions from data.
Computer Science: Electrical Engineering: General:
▶ Machine Learning ▶ Signal Processing ▶ Gambling (not recommended)
▶ Data Mining ▶ Telecommunications ▶ Stock Market Analysis
▶ Simulation ▶ Information Theory ▶ Politics
▶ Image Processing ▶ Control Theory ▶ Sports
Applications of
▶ Computer Vision ▶ Instrumentation, Sensors ▶ Demographics
Probability and
▶ Computer Graphics ▶Hardware/Electronics Testing ▶ Medicine
Statistics
▶ Visualization   ▶ Economics
▶ Software Testing ▶ All Sciences!
▶ Algorithms
Definitions :
Statistics: The science of conducting studies to collect, organizes, summarize, analyze and draw conclusions
from data.
Population: It consists of all subjects that are being studied
Sample: It is a group of subjects selected from a population. (A population consists of all elements: individuals,
items, or objects, whose characteristics are being studied. The population that is being studied is also called
the target population. The collection of a few elements selected from a population is called a sample.)
Descriptive statistics: It consists of the collection, organization, summarization and presentation of data.

Inferential Statistics: It consists of generalizing from samples to populations, determining relationships


among variables, performing estimations and hypothesis tests and making predictions.

Variable: A variable is characteristic that can assume different values. Qualitative variables are variables that
can be placed into distinct categories. Quantitative variables are numerical and can be ordered or ranked.
Quantitative variables can be further classified in to two groups namely discrete and continuous. Discrete
variables are countable and can assume values 0, 1, 2, …. number of students in a class, number of members
in a family is discrete variables. Continuous variables can assume all values between any two specific
values. Example – height, weight, temperature . . . are continuous in nature.

Data: Data are the values that the variable can assume. A collection of data values form a data set . Each
value in the data set is called a data value or a datum. When the data are collected in the original form, they
are called raw data.

The frequency of a value in a data is the number of times it occurs in the data. If the data is grouped into
classes, the frequency of a class or a group is number of values in that class.

There are three basic types of frequency distributions, and there are specific procedures for constructing
each type. The three types are categorical, ungrouped, and grouped frequency distributions.
1.1 Organize The Data
The most convenient method of organizing data is to construct a frequency distribution. A frequency distribution
is the organization of raw data in table form, using classes and frequencies.

Categorical Frequency Distributions


A categorical frequency distribution lists the number of occurrences for each category of data (This is used for
data that can be placed in specific categories – example religion, political parties, nationality . . .)

Example: Twenty-five army inductees were given a blood set to determine their blood type. The data set is
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.

Solution : Since the data are categorical, discrete classes can be used.
There are four blood types: A, B, O, and AB.
These types will be used as the classes for the distribution. The procedure for constructing a frequency
distribution for categorical data is given next

Step 1: Make a table as shown

Blood Type Tally Marks Frequency

A IIII 5

B IIII II 7

O IIII IIII 9

AB IIII 4

Total   25

Step 2: Tally the data and place the results in second column.

Step 3: Count the tallies and place the results in third column (see the completed table that follows).
Ungrouped frequency distributions : When the variable is discrete and the range of the data values is
relatively small, a frequency distribution can be constructed by finding the frequency of the distinct values.
Such a distribution is known as an ungrouped frequency distribution.

Example : A survey taken in a restaurant shows the following number of cups of coffee consumed with each
meal. Construct an ungrouped frequency distribution.
0 2 2 1 1 2 3 5 3 2
2 2 1 0 1 2 4 2 0 1
0 1 4 4 2 2 0 1 1 5
Solution : Step 1 : Find the range of the data. The range , R, is defined as No. of cups of
Tally Marks Frequency
Range = Highest Value - Lowest value for this data set, coffee
the Range = 5 - 0 = 5 the Range of data set is small. 0 IIII 5
1 IIII III 8
Step 2 : Make a table as shown next
2 IIII IIII 10
Step 3 : Tally the data
3 II 2
Step 4 : Complete the frequency column 4 III 3

The ungrouped frequency distribution is 5 II 2


Total   30
Grouped frequency distributions:
When the range of the data is large, the data must be grouped into classes that are more than one unit in
width. For a continuous data, we can group the data into groups or classes by using the class limits or class
boundaries. The class limits should have the same decimal place value as the data, but the class boundaries
should have one additional place value and end in a 5.
For example , if 24 – 30 is a class where 24 is the lower limit and 30 is the upper limit ; the same class can be
represented by 23.5 – 30.5 , where 23.5 is the lower boundary and 30.5 is the upper boundary. If a class in
class limits is 10.2 – 18.4, the same class in class boundaries is 10.15 – 18.45.

Procedures in Constructing a Frequency Distribution:

1.To get the class width, divide the range by the number of classes and round up the answer.
2. Tally the data and find the numerical frequencies.
3. Find the cumulative frequencies. The cumulative frequency of a class is the sum of the frequencies up to
the upper boundary of that class.
4.The midpoint of a class =
(or)
The midpoint of a class =
Rules in Constructing a Frequency Distribution:
1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive.
3. The classes must be continuous.
4. The classes must be exhaustive.
5. The classes must be equal in width
Example : These data represent the record high temperatures for each of the 50 states. Construct a grouped
frequency for the data using 7 classes.
112 100 127 120 134 118 105 110 109 112 110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110 116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Solution : Step 1 : Determine the classes.
Find the highest value and lowest value: H = 134 and L = 100
Find range: R = highest value - lowest value. So R = 134 – 100 = 34
Find the class width by dividing the range by the number of classes.
Width 34 ÷ 7 = 4.8571 which we will round off to 5.
This means that each class will contain five numbers.
Select a starting point for the lowest class limit. This can be the smallest data value. In this case, 100 is used. Add
the width to the lowest score taken as the starting point to get the lower limit of the next class. Keep adding until
there are 7 classes, as shown, 100, 105, 110, etc.
Subtract one unit from the lower limit of the second class to get the upper limit of the first class. Then add the
width to each upper limit to get all the upper limits. 105– 1=104

The first class is 100–104; the second class is 105–109, etc.


Find the class boundaries by subtracting 0.5 from each lower class limit and adding 0.5 to each upper class limit:
99.5– 104.5, 104.5– 109.5, etc.
Step 2 : Tally the data.
Step 3 : Find the numerical frequencies from the tallies.
Step 4 : Find the cumulative frequencies.
Class limits Class boundaries Tally Frequency Cumulative
frequency
100 – 104 99.5 – 104.5 II 2 2
105 – 109 104.5 – 109.5 IIII III 8 10
110 – 114 109.5 – 114.5 IIII IIII IIII III 18 28

115 – 119 114.5 – 119.5 IIII IIII III 13 41


120 – 124 119.5 – 124.5 IIII II 7 48
125 – 129 124.5– 129.5 I 1 49
130 – 134 129.5 –134.5 I 1 50
Example : Construct a grouped frequency for the data using 6 classes.
767 770 761 760 771 768 776 771 756 770 763 760 747 766 754 771 771 778 766 762
780 750 746 764 769 759 757 753 758 746
Solution : To determine the number of data per class, we must compute range first
Range = 780 – 746 = 34
Class Width = (34) ÷ 6 = 5.67 ≈ 6
 The completed frequency distribution is

Class limits Class Tally Frequency Cumulative


boundaries frequency
746 – 751 745.5 – 751.5 IIII 4 4
752– 757 751.5 – 757.5 IIII 4 8
758 – 763 757.5 – 763.5 IIII II 7 15
764 – 769 763.5 – 769.5 IIII I 6 21
770 – 775 769.5 – 775.5 IIII I 6 27
776 – 781 775.5– 781.5 III 3 30
Example : Construct a grouped frequency for the following data
19.55 20.75 21.28 22.02 22.51
22.55 23.75 24.03 24.24 25.17
25.19 25.7 25.91 26.13 26.13
26.32 26.33 27.01 27.13 27.55
27.57 27.79 28.17 30.46 30.91
Solution : Divide the range by the number of classes (let's use 5 in this case):

So the class width is 2.28 (round up to the next value, in hundredths). Here's the frequency distribution. Find the
class boundaries by subtracting 0.005 from each lower class limit and adding 0.005 to each upper class limit
Cumulative
Class Limits Class Boundaries Frequency
frequency
19.55 – 21.82 19.545 – 21.825 3 3
21.83 – 24.10 21.825 – 24.105 5 8
24.11 – 26.38 24.105 – 26.385 9 17
26.39 – 28.66 26.385 – 28.665 6 23
28.67 – 30.94 28.665 – 30.945 2 25
Relative Frequency :
The relative frequency of a class =

Note – The total relative frequency of a distribution is equal to 1.

Example : Find the relative frequencies and the Midpoint for the following frequency distribution.

Class limits Frequency Relative frequencies Midpoint


90 – 98 6 6/108 94
99 – 107 22 22/108 103
108 – 116 43 43/108 112
117 – 125 28 28/108 121
126 – 134 9 9/108 130
What is Stem and Leaf Plot?
A stem and leaf plot is a way of organizing data into a form that makes it easy to observe the frequency of
different types of values. It is a graph that shows numerical data arranged in order. Each data value is broken into
a stem and a leaf.

A stem and leaf plot is represented in form of a special table where each first digit or digits of data value is split
into a stem and the last digit of data in a leaf. This " | " symbol is used to show stem values and leaf value and it
is called as stem and leaf plot key. For example, 46 is represented as 4 on the stem and 6 on the leaf and shown
using stem and leaf plot key like this 4 | 6.

In the image given below, the stem values are listed one below the other in ascending order and the leaf values
are listed left to right from the stem values in ascending order.
As the stem and leaf plot definition states,

6 | 7 ⇒ 6 on the stem and 7 on the leaf read as 67.


6 | 8 ⇒ 6 on the stem and 8 on the leaf read as 68.
9 | 0 ⇒ 9 on the stem and 0 on the leaf read as 90.
9 | 1 ⇒ 9 on the stem and 1 on the leaf read as 91.
9 | 8 ⇒ 9 on the stem and 8 on the leaf read as 98.
Stem and Leaf Plots
Stem and leaf plots organize data by sorting them into stems and leaves. Each data is thus split into two
parts.
When setting up a stem and leaf plot, it is extremely important to provide a key!
Example : Construct a stem-and-leaf plot listing the scores below in order from lowest to high.

Solution :
GRAPHS
GRAPHS
The data given in the form of a frequency distribution can be presented in graphical form. The purpose of
graph is to convey the data in pictorial form. It is easier for people to understand the data when it is presented
graphically.

1. The histogram.
2. The frequency polygon.
3. The cumulative frequency graph, or Ogive.

Histogram :

The histogram is a graph that displays the data using continuous vertical bars (unless the frequency of a
class is zero) of various heights to represent the frequencies of the classes. The variable is taken on the x-
axis and the frequencies on the y-axis. Class boundaries are marked on the x – axis. Draw vertical bar for
each class using the frequencies as the heights.
Example : Draw a Histogram for the following grouped frequency data.

CB F

5.5 - 10.5 3

10.5 - 15.5 6

15.5 - 20.5 8

20.5 - 25.5 9

25.5 - 30.5 7

30.5 - 35.5 3
Frequency polygon :
The frequency polygon is a graph that displays the data by using lines that connect points plotted for the
frequencies at the midpoints of the classes. The frequencies are represented by the heights of the points. The
variable is taken on the x- axis and the frequencies on the y-axis. Class midpoints are marked on the x – axis.
Put a point over each midpoint at a height equal to the frequency of the class. The points in order are joined by
lines. The first and last points are joined to the x-axis at the same width as the classes.
Example : Draw a Frequency polygon for the following grouped frequency data.

X f Mid-pt
5.5 - 10.5 4 8
10.5 - 15.5 6 13
15.5 - 20.5 11 18
20.5 - 25.5 10 23

25.5 - 30.5 7 28

30.5 - 35.5 5 33
Cumulative Frequency Graph (Ogive) :
The Ogive is a graph that represents the cumulative frequencies for the classes in a distribution. The
variable is taken on the x – axis and the cumulative frequency is taken on the y – axis. Put a point over the
upper boundary of each class at a height equal to the cumulative frequency of the class. Connect the
adjacent points with line segments. Join the graph to the x – axis at the lower boundary of the first class.

Example : Draw an Ogive for the following grouped frequency data.

CB f Cf
5.5 - 10.5 3 3
10.5 - 15.5 6 9
15.5 - 20.5 8 17
20.5 - 25.5 9 26
25.5 - 30.5 7 33
30.5 - 35.5 3 36
Example : Construct a histogram, frequency polygon and cumulative frequency graph (Ogive) for the data
given below fort the record high temperatures of 50 states.
Class Boundaries Frequency
99.5 – 104.5 2
104.5 – 109.5 8
109.5 – 114.5 18
114.5 – 119.5 13
119.5 – 124.5 7
124.5 – 129.5 1
129.5 – 134.5 1

Solution : Histogram
Class Boundaries Frequency
99.5 – 104.5 2
104.5 – 109.5 8
109.5 – 114.5 18
114.5 – 119.5 13
119.5 – 124.5 7
124.5 – 129.5 1
129.5 – 134.5 1
Frequency Polygon :

Class Boundaries Midpoints Frequency


99.5 – 104.5 102 2
104.5 – 109.5 107 8
109.5 – 114.5 112 18
114.5 – 119.5 117 13
119.5 – 124.5 122 7
124.5 – 129.5 127 1
129.5 – 134.5 132 1

Ogive :
Class Boundaries Frequency Cum.freq
99.5 – 104.5 2 2
104.5 – 109.5 8 10
109.5 – 114.5 18 28
114.5 – 119.5 13 41
119.5 – 124.5 7 48
124.5 – 129.5 1 49
129.5 – 134.5 1 50
Graphs for Categorical Frequency Distributions
1. Pareto Chart : This is used for a categorical variable. It consists of continuous vertical bars of equal
width whose heights represent the frequencies of the various categories. The bars are arranged in order
from highest to lowest.
Example : The following table shows the by which people go to work. Construct Pareto chart.
2. Pie Graph : This is used for a categorical frequency table. A pie graph is a circle divided in two
sections or wedges according to the percentage of frequencies in each category of the distribution.

Example : The following pie - graph shows the number of pounds of each snack food eaten during the 1998
Super Bowl.
Time Series Graph : A time series graph represents data that occur a specific period of time.
Reference Books

1. Elementary Statistics, A Step By Step Approach, Eighth edition Bluman, McGraw-Hill International Edition

2. Basic Statisticsby Seemon,Thomas Publisher Alpha Science International Date 2014-10-31

You might also like