Frequency Distributio2
Frequency Distributio2
A frequency distribution is a tool for organizing data. We use it to group data into categories and show the number of observations in each category. Here are some test scores from a math class. 65 82 87 94 96 91 75 69 67 98 85 100 89 77 46 76 70 54 92 70 85 88 74 82 90 87 78 89 70 96 79 83 83 94 88 93 59 80 84 72
It's hard to get a feel for this data in this format because it is unorganized. To construct a frequency distribution, you should first identify the lowest and highest values in the list. We do this because we want to be sure that each value in the list fits into one of our categories. The low value here is 46, and the high is 100. A set of categories that would work here is 41-50, 51-60, 61-70, 71-80, 81-90, and 91100. Here's a finished product : Class 41-50 51-60 61-70 71-80 81-90 91-100 Frequency 1 2 6 8 14 9
We can now see that the biggest number of tests were between 81 and 90, and most of the tests were between 71 and 100. The low number in each category (or class) is called the lower class limit, and the high number is called the upper class limit. Now for some guidelines for constructing a frequency distribution.
Each value should fit into a category. The classes should be mutually exhaustive. No value should fit into more than 1 category. The classes should be mutually exclusive, there should be no overlapping of classes. Make the classes of equal size if possible. This makes it easier to compare the frequency in one class to another. Avoid open-ended classes if possible such as "75 and over". Try to use between 5 and 20 classes if possible. If you have fewer than 5 classes, you're not really breaking up the data, and if you use more than 20 classes, this will probably be information overflow. It is usually convenient to use class sizes of 5 or 10, in other words, to have each class containing 5 or 10 possible values. It is usually convenient to make the lower limit of the first category a multiple of the class size.
After the first two rules above, the rest are merely suggestions. Each set of data may require you to violate some of these suggestions. The best advice is to try and follow them whenever possible. The terms that we should need to know for frequency distribution: A. Qualitative Data: Data that are measured by either nominal or ordinal scales of measurement. Each value serves as a name or label for identifying an item. Data that are measured by interval or ratio scales of measurement. Quantitative data are numerical values on which mathematical operations can be performed. A graphical method of presenting qualitative data that have been summarized in a frequency distribution or a relative frequency distribution. A graphical device for presenting qualitative data by subdividing a circle into sectors that correspond to the relative frequency of each class.
B.
Quantitative Data:
C.
Bar Graph:
D.
Pie Chart:
E.
Frequency Distribution:
A tabular presentation of data, which shows the frequency of the appearance of data elements in several nonoverlapping classes. The purpose of the frequency distribution is to organize masses of data elements into smaller and more manageable groups. The frequency distribution can present both qualitative and quantitative data. A tabular presentation of a set of data which shows the frequency of each class as a fraction of the total frequency. The relative frequency distribution can present both qualitative and quantitative data. A tabular presentation of a set of data which shows the percentage of the total number of items in each class. The percent frequency of a class is simply the relative frequency multiplied by 100. A grouping of data elements in order to develop a frequency distribution. The length of the class interval. Each class has two limits. The lowest value is referred to as the lower class limit, and the highest value is the upper class limit. The difference between the upper and the lower class limits represents the class width. The point in each class that is halfway between the lower and the upper class limits. A tabular presentation of a set of quantitative data which shows for each class the total number of data elements with values less than the upper class limit. A tabular presentation of a set of quantitative data which shows for each class the fraction of the total frequency with values less than the upper class limit.
F.
G.
H.
Class:
I.
Class Width:
J.
Class Midpoint:
K.
L.
A tabular presentation of a set of quantitative data which shows for each class the fraction of the total frequency with values less than the upper class limit. A graphical presentation of data, where the horizontal axis shows the range of data values and each observation is plotted as a dot above the axis. A graphical method of presenting a frequency or a relative frequency distribution. A graphical method of presenting a cumulative frequency distribution or a cumulative relative frequency distribution.
O.
Histogram:
P.
Ogive:
COMMULATIVE DISTRIBUTION
One further extension to the frequency distribution is to look at the percentage of values that show up in each category. This is called a relative frequency distribution or percent frequency distribution. The final frequency distribution that we will discuss is the cumulative frequency distribution. Think about the word cumulative, it generally refers to some sort of total. A cumulative frequency distribution is a way to list how many values fit into the first class, the first 2 classes, the first 3 classes, etc., or the last class, the last 2 classes, etc.
Example 1 Constructing a frequency distribution table Example 2 Constructing a cumulative frequency distribution table o Class intervals Example 3 Constructing a frequency distribution table for large numbers of observations o Relative frequency and percentage frequency
The frequency (f) of a particular observation is the number of times the observation occurs in the data. The distribution of a variable is the pattern of frequencies of the observation. Frequency distributions are portrayed as frequency tables, histograms, orpolygons.
Frequency distributions can show either the actual number of observations falling
in each range or the percentage of observations. In the latter instance, the distribution is called a relative frequency distribution.
Frequency distribution tables can be used for both categorical and numeric variables. Continuous variables should only be used with class intervals, which will be explained shortly.
Example 1 Constructing a frequency distribution table A survey was taken on Maple Avenue. In each of 20 homes, people were asked how many cars were registered to their households. The results were recorded as follows: 1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0 Use the following steps to present this data in a frequency distribution table. 1. Divide the results (x) into intervals, and then count the number of results in each interval. In this case, the intervals would be the number of households with no car (0), one car (1), two cars (2) and so forth. 2. Make a table with separate columns for the interval numbers (the number of cars per household), the tallied results, and the frequency of results in each interval. Label these columns Number of cars, Tally and Frequency. 3. Read the list of data from left to right and place a tally mark in the appropriate row. For example, the first result is a 1, so place a tally mark in the row beside where 1 appears in the interval column (Number of cars). The next result is a 2, so place a tally mark in the row beside the 2, and so on. When you reach your fifth tally mark, draw a tally line through the preceding four marks to make your final frequency calculations easier to read. 4. Add up the number of tally marks in each row and record them in the final column entitled Frequency. Your frequency distribution table for this exercise should look like this: Table 1. Frequency table for the number of cars registered in each household Number of Tally Frequency
cars (x) 0 1 2 3 4
(f) 4 6 5 3 2
By looking at this frequency distribution table quickly, we can see that out of 20 households surveyed, 4 households had no cars, 6 households had 1 car, etc. Example 2 Constructing a cumulative frequency distribution table A cumulative frequency distribution table is a more detailed table. It looks almost the same as a frequency distribution table but it has added columns that give the cumulative frequency and the cumulative percentage of the results, as well. At a recent chess tournament, all 10 of the participants had to fill out a form that gave their names, address and age. The ages of the participants were recorded as follows: 36, 48, 54, 92, 57, 63, 66, 76, 66, 80 Use the following steps to present these data in a cumulative frequency distribution table. 1. Divide the results into intervals, and then count the number of results in each interval. In this case, intervals of 10 are appropriate. Since 36 is the lowest age and 92 is the highest age, start the intervals at 35 to 44 and end the intervals with 85 to 94.
2. Create a table similar to the frequency distribution table but with three extra columns.
In the first column or the Lower value column, list the lower value of the result intervals. For example, in the first row, you would put the number 35. The next column is the Upper value column. Place the upper value of the result intervals. For example, you would put the number 44 in the first row. The third column is the Frequency column. Record the number of times a result appears between the lower and upper values. In the first row, place the number 1. The fourth column is the Cumulative frequency column. Here we add the cumulative frequency of the previous row to the frequency of the current row. Since this is the first row, the cumulative frequency is the same as the frequency. However, in the second row, the frequency for the 3544 interval (i.e., 1) is added to the frequency for the 4554 interval (i.e., 2). Thus, the cumulative frequency is 3, meaning we have 3 participants in the 34 to 54 age group. 1+2=3
The next column is the Percentage column. In this column, list the percentage of the frequency. To do this, divide the frequency by the total number of results and multiply by 100. In this case, the frequency of the first row is 1 and the total number of results is 10. The percentage would then be 10.0. 10.0. (1 10) X 100 = 10.0
The final column is Cumulative percentage. In this column, divide the cumulative frequency by the total number of results and then to make a percentage, multiply by 100. Note that the last number in this column should always equal 100.0. In this example, the cumulative frequency is 1 and the total number of results is 10, therefore the cumulative percentage of the first row is 10.0. 10.0. (1 10) X 100 = 10.0
3. The cumulative frequency distribution table should look like this: Table 2. Ages of participants at a chess tournament Lower Value 35 45 55 65 75 85 Upper Value 44 54 64 74 84 94 Frequency (f) Cumulative frequency 1 2 2 2 2 1 1 3 5 7 9 10 Percentage Cumulative percentage 10.0 20.0 20.0 20.0 20.0 10.0 10.0 30.0 50.0 70.0 90.0 100.0
For more information on how to make cumulative frequency tables, see the section onCumulative frequency and Cumulative percentage. Class intervals
If a variable takes a large number of values, then it is easier to present and handle the data by grouping the values into class intervals. Continuous variables are more likely to be presented in class intervals, while discrete variables can be grouped into class intervals or not. To illustrate, suppose we set out age ranges for a study of young people, while allowing for the possibility that some older people may also fall into the scope of our study. The frequency of a class interval is the number of observations that occur in a particular predefined interval. So, for example, if 20 people aged 5 to 9 appear in our study's data, the frequency for the 59 interval is 20. The endpoints of a class interval are the lowest and highest values that a variable can take. So, the intervals in our study are 0 to 4 years, 5 to 9 years, 10 to 14 years, 15 to 19 years, 20 to 24 years, and 25 years and over. The endpoints of the first interval are 0 and 4 if the variable is discrete, and 0 and 4.999 if the variable is continuous. The endpoints of the other class intervals would be determined in the same way.
Class interval width is the difference between the lower endpoint of an interval
and the lower endpoint of the next interval. Thus, if our study's continuous intervals are 0 to 4, 5 to 9, etc., the width of the first five intervals is 5, and the last interval is open, since no higher endpoint is assigned to it. The intervals could also be written as 0 to less than 5, 5 to less than 10, 10 to less than 15, 15 to less than 20, 20 to less than 25, and 25 and over. Rules for data sets that contain a large number of observations In summary, follow these basic rules when constructing a frequency distribution table for a data set that contains a large number of observations:
find the lowest and highest values of the variables decide on the width of the class intervals include all possible values of the variable.
In deciding on the width of the class intervals, you will have to find a compromise between having intervals short enough so that not all of the observations fall in the same interval, but long enough so that you do not end up with only one observation per interval.
It is also important to make sure that the class intervals are mutually exclusive. Example 3 Constructing a frequency distribution table for large numbers of observations Thirty AA batteries were tested to determine how long they would last. The results, to the nearest minute, were recorded as follows: 423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392, 408, 431, 401, 363, 391, 405, 382, 400, 381, 399, 415, 428, 422, 396, 372, 410, 419, 386, 390 Use the steps in Example 1 and the above rules to help you construct a frequency distribution table. Answer The lowest value is 363 and the highest is 431. Using the given data and a class interval of 10, the interval for the first class is 360 to 369 and includes 363 (the lowest value). Remember, there should always be enough class intervals so that the highest value is included. The completed frequency distribution table should look like this: Table 3. Life of AA batteries, in minutes Battery life, minutes (x) 360369 370379 380389 390399 Tally Frequency (f) 2 3 5 7
5 4 3 1 30
Relative frequency and percentage frequency An analyst studying these data might want to know not only how long batteries last, but also what proportion of the batteries falls into each class interval of battery life. This relative frequency of a particular observation or class interval is found by dividing the frequency (f) by the number of observations (n): that is, (f n). Thus: Relative frequency = frequency number of observations The percentage frequency is found by multiplying each relative frequency value by 100. Thus: Percentage frequency = relative frequency X 100 = f n X 100