CH 03
CH 03
CH 03
• The method of arranging data into homogeneous classes according to some common
features present in the data is called data classification.
• It’s the process of organizing data into categories for its most effective and efficient use.
• A planned data analysis system makes fundamental data easy to find and recover. This
can be of particular importance for risk management, legal discovery, and compliance.
Types of Data
Classification
Qualitative data are classified on the basis of certain descriptive character or qualitative aspect of a
phenomenon viz. sex, beauty, literacy, honesty, intelligence, religion, eye-sight etc. Population can be
divided on the basis of marital status as married or unmarried etc.
Examples :
Gender - Male, Female
Marital Status - Unmarried, Married, Divorcee
State - New Delhi, Haryana, Illinois, Michigan
A variable that has two or more categories, without any implied ordering.
Examples :
Scale - Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree
Rating - Very low, Low, Medium, Great, Very great
Quantitative data refers to variables of quantities which can be either measured or
operated on. A quantitative variable can be counted, measured and/or operated
with; it provides specific information on a numerical scale.
For example, if you are to count the amount of people having dinner at a
restaurant, this would be discrete data, first, because you are counting; second,
you cannot have fractions of people, you can only have complete people. Discrete
data comes in the form of whole numbers or integers.
On the other hand, if you measure the time it takes for each table in the restaurant to
receive what they ordered (hopefully within the range of an hour) you will have values
containing hours, minutes, second, and even fractions of a second if you want to increase
precision! And so, these values would be a set of continuous quantitative data, first,
because you measured them; second, because you can have any value (any value
containing decimals, not just integers) within the reasonable range.
Notice that from the four examples of quantitative variables listed above, the first two
are examples of discrete variables, while the third and fourth are examples of continuous
variables.
Continuous variables can be further categorized as either interval or ratio variables.
Interval variables are variables for which their central characteristic is that they can be
measured along a continuum and they have a numerical value (for example, temperature
measured in degrees Celsius or Fahrenheit). So the difference between 20°C and 30°C is
the same as 30°C to 40°C. However, temperature measured in degrees Celsius or
Fahrenheit is NOT a ratio variable.
Ratio variables are interval variables, but with the added condition that 0 (zero) of the
measurement indicates that there is none of that variable. So, temperature measured in
degrees Celsius or Fahrenheit is not a ratio variable because 0°C does not mean there is
no temperature. However, temperature measured in Kelvin is a ratio variable as 0 Kelvin
(often called absolute zero) indicates that there is no temperature whatsoever. Other
examples of ratio variables include height, mass, distance and many more. The name
"ratio" reflects the fact that you can use the ratio of measurements. So, for example, a
distance of ten metres is twice the distance of 5 metres.
Example 1
Determine which of the following data is quantitative or qualitative:
The marks that students get in a test.
The genders of newborn babies.
The area codes in phone numbers.
The heights of buildings.
Example 2
Identify which items in the BELOW list are discrete and which are continuous
Attributes
construction of
Determine the approximate class interval size
Frequency
Distribution Decide the starting point
18 1
20 1
21 2
22 1
Absolute, relative, cumulative
frequency
The absolute frequency is the number of times a particular
value (or particular set of values) of a variable is observed.
The distribution or table of frequencies is a table
of the statistical data with its corresponding
frequencies.
Twenty students were asked how many hours they
worked per day. Their responses, in hours, are
listed below:
5; 6; 3; 3; 2; 4; 7; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3
Below is a frequency table listing the different
data values in ascending order and their
frequencies.
Absolute, relative, cumulative
frequency(cont..)
A relative frequency is the fraction of times an answer occurs. To find the relative
frequencies, divide each frequency by the total number of students in the sample - in this
case, 20. Relative frequencies can be written as fractions, percent's, or decimals.
Absolute, relative, cumulative
frequency(cont..)
Cumulative relative frequency is the accumulation of the previous relative frequencies.
To find the cumulative relative frequencies, add all the previous relative frequencies to
the relative frequency for the current row.
Types of table
Leisure
Dance Sports TV Total
Activity
Men 2 10 8 20
Women 16 6 8 30
Total 18 16 16 50
Construction of a table from data Statement
The making of a compact table itself an art. What the purpose of tabulation is and
how the tabulated information is to be used are the main points to be kept in mind
while preparing for a statistical table. An ideal table should consist of the following
main parts:
Table Number
Title
Captions or column Headings
Stubs or Row Designations
Body
Footnotes
Sources of data