Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

BHRM 242 - Collection, Organisation and Presentation of Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

BHRM 242: Quantitative Methods in Management Class Notes

LECTURE ONE

MEANING AND PURPOSE OF STATISTICS


Statistics may be defined as the science of collection, organization, presentation, analysis and
interpretation of numerical data.
According to the above definition, there are five stages in a statistical investigation. These stages are:
i) Collection of data
Collection of data constitutes the first step in a statistical investigation. Utmost care must be exercised
in collecting data because they form the foundation of statistical analysis. If data are faulty, the
conclusions drawn can never be reliable. The data may be available from existing published or
unpublished sources or else may be collected by the investigator himself. The first-hand collection of
data is one of the most difficult and important tasks faced by a statistician.
ii) Organization of data
Data collected from published sources are generally in organized form. However, a large mass of
figures that are collected from a survey frequently needs organization. The first step in organizing a
group of data is editing. The collected data must be edited very carefully so that omissions,
inconsistencies, irrelevant answers and wrong computations in the returns from a survey may be
corrected or adjusted. After the data have been edited the next step is to classify them. The purpose
of classification is to arrange the data according to some common characteristics possessed by the items
constituting the data. The last step in organization is tabulation. The purpose of tabulation is to arrange
the data in columns and rows so that there is absolute clarity in the data presented.
iii) Presentation of data
After the data have been collected and organized, they are ready for presentation. Data presented in an
orderly manner facilitates statistical analysis. Data collected may be presented in graphs and diagrams.
iv) Analysis of data
The purpose of analysing data is to dig out information useful for decision making. Methods used in
analysing the presented data are numerous, ranging from simple observations of the data to
complicated, sophisticated and highly mathematical techniques. In this course, we shall discuss the
most commonly used methods of statistical analysis such as measures of central tendency, measure of
variation, correlation, regression.
v) Interpretation
The last in stage in statistical investigation is concerned with drawing conclusions from the data
collected and analysed. The interpretation of data is a difficult task and necessitates a high degree of
skill and experience. If the data that have been analysed are not properly interpreted, the whole object
of the investigation may be defeated and fallacious conclusions be drawn. Correct interpretation will
lead to a valid conclusion of the study and thus can aid one in making suitable business decisions.

Since statistical methods help in taking decisions, statistics may be rightly regarded as a body of
methods for making wise decisions in the face of uncertainty. A modified form of this definition is given
by Professor Yan- Lun-Chou in whose words, statistics is a method of decision making in the face of
uncertainty on the basis of numerical data and calculated risks.

1.5.2 Main Divisions of Statistics


Statistics as a science can be divided into two main classes:
i. Statistical methods
ii. Applied statistics

1
BHRM 242: Quantitative Methods in Management Class Notes

i) Statistical Methods.
In statistical methods, we study all those devices, rules of procedures and general principles which are
applicable to all kinds or groups of data. They are the tools in the hands of a statistical investigator.
They are the devices for achieving the desired ends explained in theory.

ii) Applied Statistics


Applied statistics deal with the application of statistical methods to specific problems. Just like statistical
methods, applied statistics can be further divided into two main groups. These are:
(a) Descriptive statistics
(b) Inferential statistics

1.5.3Descriptive Statistics
These are methods involving the collection, presentation and characterisation of data in order to
properly describe the various features of that set of data. Areas covered under descriptive statistics
include:
i. Collection of data
ii. Organisation and presentation of data
iii. Measures of central tendency
iv. Measures of dispersion

1.5.4 Inferential statistics:


They are those methods that make possible the estimation of a characteristic of a population or
the making of a decision concerning a population based only on sample results. In other words,
it is the process of reaching generalizations about the whole population by examining a portion.

Areas covered under inferential statistics are:


i) Probability theory
ii) Sampling distributions.
iii) Theory of estimation
iv) Hypothesis testing etc.

The Roles of Statistics


You will realize that statistics is useful in all spheres of human life. A woman with a given
amount of money, going to the market to purchase foodstuff for the family, takes decision on
the types of food items to purchase, the quantity and the quality of the items to maximize the
satisfaction she will derive from the purchase. For all these decisions, the woman makes use of
statistics

Government uses statistics as a tool for collecting data on economic aggregates such as
national income, savings, consumption and gross national product.

Government also uses statistics to measure the effects of external factors on its policies and to
assess the trends in the economy so that it can plan future policies.

2
BHRM 242: Quantitative Methods in Management Class Notes

Government uses statistics during census. The various forms sent by the government to
individuals and firms on annual income, tax returns, prices, costs, output and wage rates
generate a lot of statistical data for the use of the government

Business uses statistics to monitor the various changes in the national economy for the various
budget decisions. Business makes use of statistics in production, marketing, administration and
in personnel management.

Statistics is also used extensively to control and analyze stock level such as minimum, maximum
and reorder levels. It is used by business in market research to determine the acceptability of a
product that will be demanded at various prices by a given population in a geographical area.

Management also uses statistics to make forecast about the sales and labour cost of a firm.

Management uses statistics to establish mathematical relationship between two or more


variables for the purpose of predicting a variable in terms of others. For the conduct and
analyses of biological, physical, medical and social researches, we use statistics extensively.

Basic Concepts in Statistics


Let us quickly define some of the basic concepts you will continue to come across in this course.

• Entity: This may be person, place, and thing on which we make observations. In studying the
nutritional well-being of pupils in a primary school, the entity is a pupil in the school.

• Variable: This is a characteristic that assumes different values for different entities. The
weights of pupils in the primary school constitute a variable.

• Random Variable: If we can specify, for a given variable, a mathematical expression called a
function, which gives the relative frequency of occurrence of the values that the variable can
assume, the function is called a probability function and the variable a random variable.
• Quantitative Variable: This is a variable whose values are given as numerical quantities.
Examples of this is the hourly patronage of a restaurant

• Qualitative Variable: This is a variable that is not measurable in numerical form or that
cannot be counted. Examples of this are colours of fruits, taste of some brands of a biscuit.

• Discrete Variable: This is the variable that can only assume whole numbers. Examples of
these are the number of Local Government Council Areas of the States in Nigeria, number of
female students in the various programmes in the National Open University. A discrete variable
has "interruptions" between the values it can assume. For instance between 1 and 2, there are
infinite number of values such as 1.1, 1.11, 1.111, 1.IV land so on. These are called
interruptions.

3
BHRM 242: Quantitative Methods in Management Class Notes

• Continuous Variable: This is a variable that can assume both decimal and non decimal values.
There is always a continuum of values that the continuous variable can assume. The
interruptions that characterize the discrete variable are absent in the continuous variable. The
weight can be both whole values or decimal values such as 20 kilograms and
220.1752 kilograms.

 A parameter: - is a summary measure that is computed to describe a characteristic of


an entire population.

• Population: This is the largest number of entities in a study. In the study of how workers in
Nigeria spend their leisure hours, the number of workers in Nigeria constitutes the population
of the study.
- A population (or universe):- is the totality of all things under consideration.

• Sample: This is the part of the population that is selected for a study. In studying the income
distribution of students in the National Open University, the incomes of 1000 students selected
for the study, from the population of all the students in the Open University will constitute the
sample of the study.

A sample: - is the portion of the population that is selected for analysis.

• Random Sample: This is a sample drawn from a population in such a way that the results of
its analysis may be used to generalize about the population from which it was drawn.

 A statistic: - is a summary measure that is computed to describe a characteristic from


only a sample of the population.

Thus, one major aspect of inferential statistics is the process of using sample statistics to draw
conclusions about the true population parameters. The need for inferential methods derives from the
need for sampling. As the population becomes larger, it is usually too costly, too time consuming, and
too cumbersome to obtain our information from the entire population. So, decisions pertaining to
populations characteristics have to be based on the information contained in a sample of that
population

4
BHRM 242: Quantitative Methods in Management Class Notes

LECTURE TWO
COLLECTION, ORGANISATION AND PRESENTATION OF DATA
Types of data:
We are going to talk about two types of data:
i. Primary data:-raw data collected by the investigator himself and thus it is original in
character.
ii. Secondary data:-data which have been collected by some other persons and which
have passed through the statistical machine at least once.
 It may be observed that the distinction between primary and secondary data is a matter
of degree of relativity only.
 The same set of data may be primary in the hands of one and secondary in the hands of
another.
 In general, the data are primary to the source who collects and processes them for the
first time and are secondary for all other sources who later use such data.

Methods of collecting primary data


 Personal interviews
 Telephone interviews
 Mail interview
Read on these methods

Sources of secondary data


 Chief sources of secondary data may be broadly classified into the following two groups:
 Published sources
 Unpublished sources

Published sources
 Include publications of various organisations such as;
i. Official publications of central government – i.e., economic surveys
ii. Annual and monthly abstracts of statistics- i.e statistical abstracts
iii. Publications of research institutions
iv. Publications of commercial and financial institutions.
v. Newspapers and periodicals.
vi. International publications e.g. U.N.O, WHO, IMF

Unpublished sources
There are various sources of unpublished statistical material such as;
 Records maintained by private firms or business enterprises who may not want to
release their data to any outside agency.
 The various departments and offices of the government.

5
BHRM 242: Quantitative Methods in Management Class Notes

 The researches carried out by the individual research scholars in the universities or
research institutes

Precautions in the use of Secondary data


 Secondary data should be used with extra caution.
 Before using such data, the investigator must be satisfied regarding the reliability,
accuracy, adequacy and suitability of the data to the given problem under investigation
 Proper care should be taken to edit it so that it is free from inconsistencies, errors and
omissions

Data presentation/displaying data


 Classification and tabulation of data are devices of presenting statistical data in neat,
concise, systematic and readily comprehensible and intelligible form, thus highlighting
the salient features
 Data can also be presented in diagrams and graphs

Diagrammatic and graphic presentation of statistical data


 Construction of a good graph or diagram requires the following:
i. Clear, concise and unambiguous titles.
ii. Clear and concise statement of the units in which the figures are to be measured
iii. Correct vertical and horizontal scaling
iv. Statement of units used in the vertical and horizontal scale.
v. Graph or diagram should be tidy and attractive.
vi. Statement of the source of information, at the bottom.
vii. A key to explain the various features of a graph

Bar Diagrams
 Bar diagrams are one of the easiest and the most commonly used devices for presenting
most of economic and business data.
 Consist of a group of equidistant rectangles, one for each group or category of the data
in which the values or the magnitude are represented by the height or length of the
rectangle, width being arbitrary and immaterial.

Points to consider when drawing bar diagrams


 All the bars drawn in a single study should be of uniform (though arbitrary) width
depending on the number of bars to be drawn and the space available.
 Proper but uniform spacing should be given between different bars to make the diagram
look more attractive and elegant.
 The height( length) of the rectangles or bars are taken proportional to magnitude of
observations,
 The scale being selected keeping in view the magnitude of the largest observation

6
BHRM 242: Quantitative Methods in Management Class Notes

 All bars should be constructed on the same baseline


 Where possible, bars should be arranged from left to right.

Types of bar diagrams


 Simple bar diagram
 Component( or sectional) bar diagram
 Percentage bar diagram
 Multiple bar diagram
 Bilateral bar diagram

Read on these bar diagrams

Graphic Representation of data


 Like diagrams, a large number of graphs are used in practice.
 But they can be broadly classified under the following two heads
i. Graphs of frequency distributions
ii. Graphs of time series

Graphs of frequency Distributions


 These are designed to reveal clearly the characteristics of a frequency data
 Such graphs are more appealing to the eye than the tabulated data and are readily
perceptible to the mind
 They facilitate comparative study of two or more frequency distributions regarding their
shape and pattern

Examples of graphs of frequency distributions


 Histogram
 Frequency polygons
 Frequency curves
 Ogives or cumulative frequency curves

Read on these graphs

CLASSIFICATION AND TABULATION OF DATA


Classification is the grouping of related facts into classes. Facts in one class differ from those of
classification.

Objectives of Classification
 To condense the mass of data in such a manner that similarities and dissimilarities can
be readily apprehended.

7
BHRM 242: Quantitative Methods in Management Class Notes

 To facilitate comparison
 To pinpoint the most significant features of the data at a glance.
 To give prominence to the information gathered while dropping out the unnecessary
elements
 To enable a statistical treatment of the material collected.

Types of Classification
Data can be classified broadly on the basis of the following 4 criteria:
1. Geographical i.e. area-wise e.g. cities, districts e.t.c
2. Chronological i.e. on the basis of time
3. Qualitative i.e. according to some attributes
4. Quantitative i.e. in terms of magnitude

Geographical Classification:
Data are classified on the basis of geographical or vocational differences between the various
items like states, cities, regions, zones, areas e.t.c.

Chronological Classification:
When data are observed over a period of time the type of classification is known as
chronological classification e.g. we may present the figures of population (or production, sales
e.t.c) as follows:
Population of Kenya from 1969 to 1988(hypothetical figures)
Year Pop. In Millions
1969 14 million
1978 21 million
1988 28 million

Time series are usually listed in chronological order; normally starting with earliest period.

Qualitative Classification
Data are classified on the basis of some attribute or quality such as sex, colour of hair, literacy,
religion e.t.c. The attribute under study cannot be measured; one can only find out whether it
is present or absent in the units of the population under study. E.g. if the attribute under study
is population, one can find out how many persons are living in urban areas and how many in
rural areas. Thus, when only one attribute is studied, one possessing the attribute and the
other not possessing the attribute. This type of classification is called simple classification.

Quantitative Classification
Quantitative classification refers to classification of data according to some characteristic that
can be measured such as height, weight, income, sales, profits, production e.t.c., e.g. the
students in college may be classified according to weight as follows:

8
BHRM 242: Quantitative Methods in Management Class Notes

Weight in (lb) No. of Students


90-100 50
100-110 200
110-120 260
120-130 360
130-140 90
140-150 40
Total 1000
Such a distribution is known as empirical frequency distribution or simple frequency
distribution.

A frequency distribution refers to data classified on the basis of some variable that can be
measured such as prices, wages, age, number of units consumed or produced.
A variable is a characteristic that varies in amount or magnitude in a frequency distribution. It
may be either continuous or discrete.

A continuous variable also known as continuous random variable is capable of manifesting


every conceivable fractional value within the range of possibilities, such as the height or weight
of persons or the weight of a product. In continuous variable, data are obtained by numerical
measurements rather than counting. E.g. when a student grows, say from 90 cm to 150 cm, his
height passes through all values between these limits.

A discrete variable is that which can vary only by finite jumps and cannot manifest every
conceivable fractional value e.g. the number of children in a household can be 0, 1, 2, 3,……
e.t.c. The number of rooms in a house can be 1, 2, 3…..

Definition:
A frequency distribution or frequency table is simply a table in which the data grouped into
classes and the numbers of cases which fall in each class are recorded. The numbers in each
class are referred to as ‘frequencies’, hence the term ‘frequency’. When the number of items
are expressed by their proportion in each class, the table is usually referred to as a ‘relative’
frequency distribution’, or simply a percentage distribution.

Formation of a Discrete Frequency Distribution


We count the number of times a particular value is repeated which is called the frequency of
that class. To facilitate counting we prepare a ‘tally’ column. The process is illustrated by the
example below.

Example:
In a survey of 35 families in a village, the number of children per family was recorded and the
following data obtained:

9
BHRM 242: Quantitative Methods in Management Class Notes

1 0 2 3 4 5 6
7 2 3 4 0 2 5
8 4 5 12 6 3 2
7 6 5 3 3 7 8
9 7 9 4 5 4 3
Represent the data in the form of a discrete frequency distribution.

Solution:
Frequency distribution of the number of children.
No. of children Tallies Frequency
1 II 2
2 I 1
3 IIII 4
4 IIII 5
5 III 3
6 IIII 5
7 IIII 4
8 II 2
9 II 2
10 - 0
11 - 0
12 I 1
FORMATION of a continuous frequency distribution
This classification is popular in practice. The following technical terms are important when a
continuous frequency distribution is formed or data are classified according to class intervals.

Class limits
These are the highest and lowest values that can be included in the class. For example take the
class 20-40. The lowest value of the class is 2 and the highest is 40. The two boundaries of
class are known as the lower limit and the upper limit of the class.

Class intervals
The difference between the upper and lower limit of a class is known as class interval of that
class. For example, in the class 100-200, the class interval is 100 (i.e. 200-00). An important
decision while constructing a frequency distribution is about the width of the class interval, i.e.
whether it should be 10, 20, 50, 100, 500, e.t.c.

The decision would depend upon a number of factors such as the range in the data, i.e. the
difference between the smallest and the largest item, the details required and the number of
classes to be formed e.t.c. A simple formula to obtain the estimate of appropriate class
interval, i.e. i is:

10
BHRM 242: Quantitative Methods in Management Class Notes

LS
i
K
Where: L – Largest item
S – Smallest item
K – The number of classes
e.g. if the salary of 100 employees in a commercial undertaking varied between £500 and £5,
500, and we want to form 10 classes, then the class interval would be:
LS
i
K
L = 5, 500
S = 500
K = 10

5,500  500
i
10
= 500

The starting class would be 500-1000, the next 1000 – 1500 and so on.

The question now is how to fix the number of classes, i.e. K. The number can be either fixed
arbitrarily keeping in view the nature of problem under study or it can be decided with the help
of Sturge’s rule. According to him, the number of classes can be determined by the formula:
K = 1 + 3.322 log N
Where : N = total number of observations
Log = logarithms of the number
Thus if 10 observations are being studied, the number of classes shall be:
K = 1 + (3.222 x 1)
= 4.322 or 4

And if 100 observations are being studied, the number of classes shall generally be between 4
and 20 – it cannot be less than 4 even if N is less than 10 and if N is 10 K will be 1 + 3.222 x 6 =
20.9 or 21

Sturges suggested the following formula for determining the magnitude of class interval:

Range
i
1 3.322 log N

Where range is the difference between the largest and smallest items.
e.g. in the example above; and by applying the formula, the magnitude of class interval shall be:

5500  500
i
1  3.22 log 100

11
BHRM 242: Quantitative Methods in Management Class Notes

5000
=
7.644
= 654.1 or 650

Class Frequency
The number of observations corresponding to a particular class is known as the frequency of
that class or the class frequency.
Class Midpoint or class mark
It is the value lying halfway between the lower and upper class limits of a class interval. The
midpoint of a class is ascertained as follows:
Upper lim oftheclass  lower lim itoftheclass
Midpoint =
2
There are two methods of classifying the data according to class-intervals:
i) Exclusive method
ii) Inclusive method

Exclusive Method
The class intervals are so fixed that the upper limit of one class is the lower limit of the next
class.
Example
Income No. Of Persons
1000-1100 50
1100-1200 100
1200-1300 200
1300-1400 150
1400-1500 40
1500-1600 10

Inclusive Method
The upper limit of one class is included in that class itself.
Example:
Income (£) No. of Persons
1000 – 1099 50
1100 – 1199 100
1199 – 1299 200
1300 – 1399 150
1400 – 1499 40
1500 – 1599 10
Total 550
To decide whether to use the inclusive or exclusive method, it is important to determine
whether the variable under observation is a continuous or discrete one. In case of continuous

12
BHRM 242: Quantitative Methods in Management Class Notes

variables, the upper limit exclusive method must be used. E.g. the variable height being
inherently a continuous one should be stated as 60” and under 62”, and under 64 and so on.
The inclusive method should, in general be used in case of discrete variables.

Illustration:
Prepare a frequency table for the data below with the width of each class interval as 10. Use
exclusive method of classification.
57 44 80 75 00 18 45 14 04 64
72 51 69 34 22 83 79 20 57 28
96 56 50 47 10 34 61 66 80 46
22 10 84 50 47 73 42 33 48 65
10 34 60 53 75 90 58 46 39 69
Solution:
Preparation of frequency distribution:
Frequency distribution
Marks Tallies Frequency Relative Frequency
0-10 II 2 2
50 = 0.04
10–20 IIII 5 4
50 = 0.10

20-30 IIII 4 0.08


30-40 IIII 5 0.10
40-50 IIII III 8 0.16
50-60 IIII III 8 0.16
60-70 IIII II 7 0.14
70-80 IIII 5 0.10
80-90 IIII 4 0.08
90-100 II 2 0.04
Total 50

Prepare a frequency distribution for the following data:


15 45 40 42 50 60 62 68 70 42
75 75 80 81 25 26 31 32 78 45
30 45 42 43 55 56 78 80 81 62
60 62 58 69 70 45 50 56 72 58
75 62 60 65 60 70 35 37 40 55

Relative Frequency Distribution


They show the percentage of the total number of observations in each class. E.g. it is obtained
by dividing each of the class frequencies by the total number of frequencies so that the relative
frequencies will total to one. Consider the example above.

13

You might also like