Hand Out2 Data Presentation 1
Hand Out2 Data Presentation 1
Hand Out2 Data Presentation 1
(1) Raw data are data collected that have not been organized numerically.
(2) An array is an arrangement of raw numerical data in ascending or descending order of
magnitude. The difference between the largest and smallest numbers is called the range of
the data.
(3) Two general methods in presenting data: tabular presentation and graphical presentation.
(4) Tabular Presentations may be in the form of a cross tabulation table, a frequency distribution,
or a stem-and-leaf plot. Some of the statistical graphs are the bar chart, histogram, frequency
polygon and the box and whisker plot.
(5) Cross tabulation table – when data are in categories, data are arranged in rows and columns.
- information gathered has only a small number of values, like a “yes” or “no” answer or
the gender of the respondent. - these categories are arranged in a row or column and the
appropriate information, oftentimes in frequencies or percentages, are placed in suitable
cells.
Frequency Distribution – is a grouping of all the observations that fall in each interval or class.
3. Estimate the width c of the interval by dividing the range R by the number of classes Round off
this estimate to same number of significant decimal places as the original set of data.
4. List the lower and upper class limits of the first interval. This interval should contain the
smallest observation or any number closest to it.
5. List all the class limits by adding the class width to the limits of the previous interval.
6. Compute the class marks and the class boundaries. Class boundaries should not coincide with
the actual observed data.
Example 2. Table 2.1.2 is a frequency distribution of heights (recorded to the nearest inch) of 100
male students at XYZ University.
A symbol defining a class, such as 60-62 in Table 2.1.2, is called a class interval. The end numbers,
60 and 62, are called class limits; the smaller number (60) is the lower class limit, and the larger
number (62) is the upper class limit.
A class interval that, at least theoretically, has either no upper class limit or no lower class limit
indicated is called an open interval. For example, referring to age groups of individuals, the class
interval “65 years and over” is an open class interval.
9. Class Boundaries
For example the heights are recorded to the nearest inch, the class interval 60-62 theoretically
includes all measurements from 59.5000 to 62.5000 in. These numbers, indicated briefly by the
exact numbers 59.5 to 62.5, are called class boundaries, or true class limits; the smaller number
(59.5) is the lower class boundary, and the larger number (62.5) is the upper class boundary.
In practice, the class boundaries are obtained by adding the upper limit of one class interval to the
lower limit of the next-higher class interval and dividing by 2.
Sometimes, class boundaries are used to symbolize classes. For example, the various classes in the
first column of Table 2.1.2 could be indicated by 59.5-62.5, 62.5-65.5, etc.
To avoid ambiguity in using such notation, class boundaries should not coincide with the actual
observations.
10. The size or width of a class interval The size or width of a class interval is the difference
between the lower and upper class boundaries and is also referred as the class width, class size,
or class length. This can be also obtained by getting the difference between two successive lower
class limits or two successive upper class limits.
11. The class mark The class mark is the class midpoint of the class interval and is obtained by
adding the lower and upper class limits and dividing by 2. The class mark is also called the class
midpoint. For purposes of further mathematical analysis, all observations belonging to a given
class interval are assumed to coincide with the class mark. Thus all heights in the class interval
60-62 in are considered to be 61. in.
12. Relative Frequency Distribution – the frequencies found in the frequency table is replaced by
relative frequencies. - the relative frequency for each interval is obtained by dividing the class
frequency by the total frequency.
13. Percentage Distribution – if the relative frequency of each class is expressed in percent.
14. Cumulative Frequency Distribution – the cumulative frequency associated with the upper
class boundary of a particular interval is computed by summing the frequency for that interval
and the frequencies of all the intervals below it.
All of the graphical methods shown in this section are derived from frequency tables. Table 1
shows a frequency table for the results of the iMac study; it shows the frequencies of the various
response categories. It also shows the relative frequencies, which are the proportion of responses
in each category. For example, the relative frequency for “none” of 0.17 = 85/500
Pie Charts
The pie chart in Figure1 shows the results of the iMac study. In a pie chart, each category is
represented by a slice of the pie. The area of the slice is proportional to the percentage of responses
in the category. This is simply the relative frequency multiplied by 100. Although most iMac
purchasers were Macintosh owners, Apple was encouraged by the 12 of purchasers who were
former Windows users, and by the 17 of purchasers who were buying a computer for the first
time.
Pie charts are effective for displaying the relative frequencies of a small number of categories.
They are not recommended, however, when you have a large number of categories. Pie charts
can also be confusing when they are used to compare the outcomes of two different surveys or
experiments.
Bar Charts
Bar charts can also be used to represent frequencies of different categories. A bar chart of the
iMac purchases is shown in Figure2. Frequencies are shown on the Y − axis and the type of
computer previously owned is shown on the X − axis. Typically, the Y − axis shows the number
of observations in each category rather than the percentage of observations in each category as is
typical in pie charts
Summary
Pie charts and bar charts can both be effective methods of portraying qualitative data. Bar charts
are better when there are more than just a few categories and for comparing two or more distribu-
tions.
There are many types of graphs that can be used to portray distributions of quantitative variables.
The upcoming sections cover the following types of graphs: (1) stem and leaf displays, (2) his-
tograms, (3) frequency polygons, (4) box plots, (5) bar charts, (6) line graphs, (7) dot plots, and
(8) scatter plots (discussed in a different chapter). Some graph types such as stem and leaf dis-
plays are best-suited for small to moderate amounts of data, whereas others such as histograms
are bestsuited for large amounts of data. Graph types such as box plots are good at pg. 44 depict-
ing differences between distributions. Scatter plots are used to show the relationship between two
variables.
A stem and leaf display is a graphical method of displaying data. It is particularly useful when
your data are not too numerous. In this section, we will explain how to construct and interpret this
kind of graph. As usual, we will start with an example. Consider Table 2 that shows the number
of touchdown passes (TD passes) thrown by each of the 31 teams in the National Football League
in the 2000 season.
A stem and leaf display of the data is shown in Figure 7. The left portion of Figure 1 contains the
stems. They are the numbers 3, 2, 1, and 0, arranged as a column to the left of the bars. Think
of these numbers as 10’s digits. A stem of 3, for example, can be used to represent the 10’s digit
in any of the numbers from 30 to 39. The numbers to the right of the bar are leaves, and they
represent the 1’s digits. Every leaf in the graph therefore stands for the result of adding the leaf to
10 times its stem.
Histograms
In a histogram, the class frequencies are represented by bars. The height of each bar corresponds
to its class frequency. A histogram of these data is shown in Figure 12
Frequency Polygons
Frequency polygons are a graphical device for understanding the shapes of distributions. They
serve the same purpose as histograms, but are especially helpful for comparing sets of data. Fre-
quency polygons are also a good choice for displaying cumulative frequency distributions.
To create a frequency polygon, start just as for histograms, by choosing a class interval. Then
draw an X − axis representing the values of the scores in your data. Mark the middle of each class
interval with a tick mark, and label it with the middle value represented by the class. Draw the
Y − axis to indicate the frequency of each class. Place a point in the middle of each class interval at
the height pg. 52 corresponding to its frequency. Finally, connect the points. You should include
one class interval below the lowest value in your data and one above the highest value. The graph
will then touch the X-axis on both sides.
A frequency polygon for 642 psychology test scores shown in Figure 12 was constructed from the
frequency table shown in Table 5.
The first label on the X-axis is 35. This represents an interval extending from 29.5 to 39.5. Since the
lowest test score is 46, this interval has a frequency of 0. The pg. 53 point labeled 45 represents the
interval from 39.5 to 49.5. There are three scores in this interval. There are 147 scores in the interval
that surrounds 85. You can easily discern the shape of the distribution from Figure 13. Most of
the scores are between 65 and 115. It is clear that the distribution is not symmetric inasmuch as
good scores (to the right) trail off more gradually than poor scores (to the left). In the terminology
of Chapter 3 (where we will study shapes of distributions more systematically), the distribution is
skewed.
A cumulative frequency polygon for the same test scores is shown in Figure 14. The graph is
the same as before except that the Y value for each point is the number of students in the corre-
sponding class interval plus all numbers in lower intervals. For example, there are no scores in
the interval labeled “35,” three in the interval “45,” and 10 in the interval “55.” Therefore, the Y
value corresponding to “55” is 13. Since 642 students took the test, the cumulative frequency for
the last interval is 642.