Chapter 1
Chapter 1
Chapter 1
Chapter 1
What is Statistics?
At the end of this chapter, the students should be able to:
Define statistics and understand the process of statistics.
Define descriptive statistics and inferential statistics.
Distinguish between descriptive statistics and inferential statistics.
Differentiate between a population and a sample.
Define the terms used in statistics.
Distinguish between qualitative variables, quantitative variables, discrete random
variables, and continuous random variables.
Step 1
Identify the research objective
The researcher must have in mind why he (she) does the study and determine the
detailed of the question he (she) wants to answer. He (she) also needs to set a
targeted group for the study and focus only on that group.
Step 2
Collect the information needed
Collection of information can be conducted from a population or sample. However,
the data from population is often difficult to collect and expensive due to the huge
amount of data. Therefore, the survey is normally based on a sample of the
population.
2
Step 3
Organize, summarize and analyse the information
This step is called descriptive statistics. The data collected from a population or
sample is organized either in a numerical method or graphical method. It provides
an overview of the information collected.
Step 4
Make decision or draw conclusion
This step is called inferential statistics. The information collected from the sample
generalized to the population.
Descriptive Statistics
This statistics includes the method of organizing, displaying and describing data by using
tables, graphs and summary measures.
Example 1.1
The compilation of batting average, runs butted in, and number of home runs for each
player, as well as earned run average, won/lost percentage, number of saves, etc, for each
pitcher from the official score sheets for major league baseball players is an example of
descriptive statistics. These statistical measures allow us to compare players, determine
whether a player is having an “off year” or “good year”, etc.
Inferential Statistics
This statistics includes the method that uses sample results to help make decision or
prediction about a population.
Example 1.2
The techniques of inferential statistics are applied in many industrial processes to control
the quality of the products produced. In industrial settings, the population may consist of
the daily production of toothbrushes, computer chips, bolts, and so forth. The sample will
consist of a random and representative selection of items from the process producing the
toothbrushes, computer chips, bolts, etc. The information contained in the daily sample is
used to construct control charts. The control charts are then used to monitor the quality of
the products.
3
Definition of Population
A population is a complete collection of all elements of the target group (individuals, items
or objects) whose characteristics are being studied. The collection is complete in the sense
that it includes all the subjects to be studied. It is also known as target population.
Definition of Sample
A sample is a subset of a population. It is a collection of a few elements selected from a
population, i.e., it consists of a portion of the population selected for study.
Example 1.3
The results of polls are widely reported by both the written and the electronic media. The
techniques of inferential statistics are widely utilized by pollsters. Table 1.1 explores
several examples of populations and samples encountered in polls reported by the media.
The methods of inferential statistics are used to make inferences about the populations
based upon the results found in the sample and to give an indication about the reliability
of these inferences. Suppose the results of a poll of 600 registered voters are reported as
follows: forty percent of the voters approve of the president’s economic policies. The
margin of error for the survey is 4%. The survey indicates that an estimated 40% of all
registered voters approve of the economic policies, but it might be as low as 36% or as
high as 44%.
Table 1.1
Population Sample
All registered voters A telephone survey of 600 registered voters
All owners of handguns A telephone survey of 1000 handgun owners
Households headed by single parent The results from questionnaires sent to 2500
households headed by a single parent
The CEOs of all private companies The results from surveys sent to 150 CEO’s of
private companies
Example 1.4
The statements for population are as follows:
The heights of all citizens in Malaysia.
The monthly incomes of all workers in MMU.
The tuition fees of all students in an university.
Definition of variable
A variable is a characteristic of interest concerning the individual elements of a population
or a sample.
Definition of observation
An observation or measurement is a value of a variable or characteristic for an element.
Example 1.5
The following is the observation on the total number of students in each foundation.
Table 1.2
Foundation Number of students Variable
Element Engineering 500
IT 200
Management 800
Law 100 Observation
Life Sciences 50
5
Example 1.6
Table 1.3
Discrete variable Possible values for the variable
The number of bulbs in a classroom 3, 4, 5, or 10
The number of accidents within a week 0, 1, 2, 3, ..., 10
The number of TV sold in a week 0, 1, 2, 3, 4 (finite value)
The number of customer in a week 0, 1, 2,... (infinite value)
Example 1.7
Table 1.4 gives several examples of qualitative variables along with a set of categories into
which they may be classified.
Table 1.4
Qualitative variable Possible categories for the variable
Marital status Single, married, divorced, separated
Blood type O, A, B, AB
Gender Male, Female
Pain level None, low, moderate, severe
6
The possible categories for qualitative variables are often coded for the purpose of
performing computerized statistical analysis. Marital status might be coded as 1, 2, 3, or 4
where 1 represents single, 2 represents married, 3 represents divorced, and 4 represents
separated. The variable gender might be coded as 0 for female and 1 for male. The
categories for any qualitative variable may be coded in a similar fashion. Even though
numerical values are associated with the characteristic of interest after being coded, the
variable is considered a qualitative variable.
Scales of Measurements
The qualitative data can be measured by using four scales such as nominal, ordinal,
interval and ratio. Through this section, each scale will be explained clearly.
Nominal scale (Classification) is characterized by data that consist of names, labels,
or categories only. This scale data cannot be arranged in an ordering scheme.
Example 1.7
Ordinal scale (Ranking); numbers are used to place objects in order, but there is no
information regarding the differences (intervals) between points on the scale.
Example 1.8:
Grading systems [A, B, C, D, E]. Customer’s satisfaction [1, 2, 3, 4, 5]. Level of
education, Linkert’s scale, team/individual standing, socioeconomic status.
Example 1.9:
IQ scores, it makes sense to talk about someone having an IQ 20 points higher
than another person, but an IQ zero has no meaning. Celsius and Fahrenheit
temperature scales, most psychological measures.
Ratio scale; a measurement scale that has equal units of measurement and a
rational zero point for the scale (absolute zero).
Example 1.10:
Kelvin temperature scale, income in Ringgit, length, area, or volume, height and
weight.
7
Example 1.11:
The following tables indicate the collection of values that a variable took during the
measurement. Identify the element and types of variable in (a), (b), and (c).
(b)
(a)
(c)
(a) The element in the table is the student named, X1 to X8 and the variable is the grade
of each student, X. The measured variable is quantitative and discrete because it is
described in an only finite numerical value.
(b) The element in the table is the types of operating systems, which are Windows, Linux,
BSD and MacOS. The variable is the algorithm that is generated in each operating
system in the element. The measured variable is qualitative and discrete because it is
described by the name of algorithms and nonnumeric value.
(c) The element of the table is the number of trial and the variable is the run time taken
for each trial. The measured variable is also quantitative and discrete because it is
described in an only finite numerical value.
Example 1.12:
The KSW computer science aptitude test consists of 25 questions. The score reported is
reflective of the computer science aptitude of the test taker. How would the score likely be
reported for the test? What are the possible values for the scores? Is the variable discrete
or continuous?
Answer:
The score reported would likely be the number or percent of correct answers. The number
correct would be a whole number from 0 to 25 and the percent correct would range from
0 to 100 in steps of size 4. However, if the test evaluator considered the reasoning process
used to arrive at the answers and assigned partial credit for each problem, the scores
could range from 0 to 25 or 0 to 100 percent continuously. That is, the score could be any
real number between 0 and 25 or any real number between 0 and 100 percent. We might
8
say that for all practical purposes, the variable is discrete. However, theoretically the
variable is continuous.
Example 1.14:
The pain level following surgery for an intestinal blockage was classified as none, low,
moderate, or severe for several patients. Give three different numerical coding schemes
that might be used for the purpose of inclusion of the responses in a computer data file.
Does this coding change the variable to a quantitative variable?
Answer:
The responses none, low, moderate, or severe might be coded as 0, 1, 2, or 3 or 1, 2, 3, or
4 or as 10, 20, 30, or 40. There is no limit to the number of coding schemes that could be
used. Coding the variable does not change it into a quantitative variable. Many times
coding a qualitative variable simplifies the computer analysis performed on the variable.
Example 1.15:
Indicate the scale of measurement for each of the following variables: racial origin,
monthly phone bills, Fahrenheit and centigrade temperature scales, military ranks, time,
ranking of a personality trait, clinical diagnoses, and calendar numbering of the years.
Answer:
Racial origin: nominal, time: ratio, monthly phone bills: ratio ranking of personality trait:
ordinal, temperature scales: interval clinical diagnoses: nominal, military ranks: ordinal
calendar numbering of the years: interval.
Example 1.16:
In a sociological study involving 35 low-income households, the number of children per
household was recorded for each household. What is the variable? How many
observations are in the data set?
Answer:
The variable is the number of children per household. The data set contains 35
observations.
9