Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
82 views

Lesson 1 Introduction To Statistics

1. Statistics is the process of collecting, organizing, analyzing, and interpreting numerical data. It involves defining terms like population, sample, variable, and data. 2. Data can be either quantitative (numerical) or qualitative (categorical). Quantitative data is further divided into continuous data (measurements) and discrete data (counts). Qualitative data includes nominal data (categories) and rank data (ordered categories). 3. Data can come from primary sources (collected for the first time) or secondary sources (already exists). Theoretical statistics includes descriptive statistics (summarizing data), estimation, and hypothesis testing (making inferences from samples to populations).

Uploaded by

Francis Onyango
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Lesson 1 Introduction To Statistics

1. Statistics is the process of collecting, organizing, analyzing, and interpreting numerical data. It involves defining terms like population, sample, variable, and data. 2. Data can be either quantitative (numerical) or qualitative (categorical). Quantitative data is further divided into continuous data (measurements) and discrete data (counts). Qualitative data includes nominal data (categories) and rank data (ordered categories). 3. Data can come from primary sources (collected for the first time) or secondary sources (already exists). Theoretical statistics includes descriptive statistics (summarizing data), estimation, and hypothesis testing (making inferences from samples to populations).

Uploaded by

Francis Onyango
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Lesson 1: Introduction to statistics

1.1. Definition of terms in


statistics Definition of
‘Statistics’
Statistics is the process of collecting, classifying, presenting, analyzing and interpreting
the numerical facts, comparable for some predetermined purpose.

Terminologies
A population; consists of all elements—individuals, items, or objects—whose
characteristics are being studied. The population that is being studied is also called the
target population
A sample; portion of the population selected for study.
Census; A survey that includes every member of the population
Sample survey; The technique of collecting information from a portion of the population.
Element or member: of a sample or population is a specific subject or object (for
example, a person, firm, item, state, or country) about which the information is
collected.
A variable; is a characteristic under study that assumes different values for different
elements. In contrast to a variable, the value of a constant is fixed.
Observation or Measurement: The value of a variable for an element.
Data Set; is a collection of observations on one or more variables.
Cross-Section Data; Data collected on different elements at the same point in time or
for the same period of time.
Time-Series Data: Data collected on the same element for the same variable at
different points in time or for different periods of time.
Data or data set Collection of observations or measurements on a variable.
Discrete variable A (quantitative) variable whose values are countable.
Qualitative or categorical data Data generated by a qualitative variable.
Qualitative or categorical variable A variable that cannot assume numerical values
but is classified into two or more categories.
Quantitative data Data generated by a quantitative variable.
Quantitative variable A variable that can be measured numerically.

TYPES OF DATA AND DATA SOURCES

Statistical data are the basic raw material of statistics. Data may relate to an activity of our
interest, a phenomenon, or a problem situation under study. They derive as a result of the
process of measuring, counting and/or observing. Statistical data, therefore, refer to those
aspects of a problem situation that can be measured, quantified, counted, or classified. Any
object subject phenomenon, or activity that generates data through this process is termed as
a variable. In other words, a variable is one that shows a degree of variability when
successive measurements are recorded. In statistics, data are classified into two broad
categories:
Quantitative data and Qualitative data. This classification is based on the kind of
characteristics that are measured.

Quantitative data are those that can be quantified in definite units of measurement. These
refer to characteristics whose successive measurements yield quantifiable observations.
Depending on the nature of the variable observed for measurement, quantitative data can be
further categorized as continuous and discrete data.

Obviously, a variable may be a continuous variable or a discrete variable.

(i) Continuous data represent the numerical values of a continuous variable. A continuous
variable is the one that can assume any value between any two points on a line segment,
thus representing an interval of values. The values are quite precise and close to each other,
yet distinguishably different. All characteristics such as weight, length, height, thickness,
velocity, temperature, tensile strength, etc., represent continuous variables. Thus, the data
recorded these and similar other characteristics are called continuous data. It may be noted
that a continuous variable assumes the finest unit of measurement. Finest in the sense that it
enables measurements to the maximum degree of precision.

(ii) Discrete data are the values assumed by a discrete variable. A discrete variable is the
one whose outcomes are measured in fixed numbers. Such data are essentially count data.
These are derived from a process of counting, such as the number of items possessing or
not possessing a certain characteristic. The number of customers visiting a departmental
store every day, the incoming flights at an airport, and the defective items in a consignment
received for sale, are all examples of discrete data.

Qualitative data refer to data type is non-numerical in nature. This type of data is collected
through methods of observations, one-to-one interviews, conducting focus groups, and
similar methods. Qualitative data in statistics is also known as categorical data since this
data can be grouped according to categories. For example, think of a student reading a
paragraph from a book during one of the class sessions. A teacher who is listening to the
reading gives feedback on how the child read that paragraph. If the teacher gives feedback
based on fluency, intonation, throw of words, clarity in pronunciation without giving a
grade to the child, this is considered as an example of qualitative data.

It’s pretty easy to understand the difference between qualitative and quantitative data.
Qualitative data does not include numbers in its definition of traits, whereas quantitative
data is all about numbers.

 The cake is orange, blue, and black in color (qualitative).


 Females have brown, black, blonde, and red hair (qualitative).

These data are further classified as nominal and rank data.

(i) Nominal data are the outcome of classification into two or more categories of items or
units comprising a sample or a population according to some quality characteristic.
Classification of students according to sex (as males and females), of workers according to
skill (as skilled, semi-skilled, and unskilled), and of employees according to the level of
education (as matriculates, undergraduates, and post-graduates), all result into nominal
data. Given any such basis of classification, it is always possible to assign each item to a
particular class and make a summation of items belonging to each class. The count data so
obtained are called nominal data.

(ii) Rank data, on the other hand, are the result of assigning ranks to specify order in terms
of the integers 1, 2, 3... n. Ranks may be assigned according to the level of performance in a
test. A contest, a competition, an interview, or a show. The candidates appearing in an
interview, for example, may be assigned ranks in integers ranging from I to n, depending on
their performance in the interview. Ranks so assigned can be viewed as the continuous
values of a variable involving performance as the quality characteristic.

DATA SOURCES

Data sources could be seen as of two types, viz., secondary and primary. The two can be
defined as under:

(i) Secondary data: They already exist in some form: published or unpublished - in an
identifiable secondary source. They are, generally, available from published source(s),
though not necessarily in the form actually required.

(ii) Primary data: Those data which do not already exist in any form, and thus have to be
collected for the first time from the primary source(s). By their very nature, these data
require fresh and first-time collection covering the whole population or a sample drawn
from it.

TYPES OF STATISTICS

The scope of statistics is much extensive. It can be divided into two parts, Theoretical
Statistics and Applied Statistics

(a) Theoretical Statistics:


These are statistical Methods such as Collection, Classification, Tabulation,
Presentation, Analysis, Interpretation and Forecasting. Theoretical statistics can be
further sub-divided into the following three categories:

i. Descriptive Statistics:

The term descriptive statistics deals with collecting, summarizing, and


simplifying data, which are otherwise quite unwieldy and voluminous. It seeks to
achieve this in a manner that leads to meaningful conclusions that can be readily
drawn from the data. Descriptive statistics may thus be seen as comprising
methods of bringing out and highlighting the latent characteristics present in a set
of numerical data. It not only facilitates an understanding of the data and
systematic reporting thereof in a manner; and also makes them amenable to
further discussion, analysis, and interpretations. Typically, there are two general
types of descriptive statistics that are used to describe data:-

Measures of central tendency: these are ways of describing the central position
of a frequency distribution for a group of data. In this case, the frequency
distribution is simply the distribution and pattern of marks scored by the 100
students from the lowest to the highest. We can describe this central position
using a number of statistics, including the mode, median, and mean.

ii. Inferential Statistics allows you to make predictions (“inferences”) from that
data. With inferential statistics, you take data from samples and make
generalizations about a population. For example, you might stand in a mall and
ask a sample of 100 people if they like shopping at Sears. You could make a bar
chart of yes or no answers (that would be descriptive statistics) or you could use
your research (and inferential statistics) to reason that around 75-80% of the
population (all shoppers in all malls) like shopping at Sears.

There are two main areas of inferential statistics:

Estimating parameters. This means taking a statistic from your sample data
(for example the sample mean) and using it to say something about a
population parameter (i.e. the population mean).

Hypothesis tests. This is where you can use sample data to answer research
questions. For example, you might be interested in knowing if a new cancer
drug is effective. Or if breakfast helps children perform better in schools.

(b) Applied Statistics:


It consists of the application of statistical methods to practical problems. Design of
sample surveys, techniques of quality control and decision-making in business. It is
further divided into three parts:
(i) Descriptive Applied Statistics: Purpose of this analysis is to provide
descriptive information.
(ii) Scientific Applied Statistics: Data are collected with the purpose of
some scientific research and with the help of these data some particular
theory or principle is propounded.
(iii) Business Applied Statistics: Under this branch statistical methods are
used for the study, analysis and solution of various problems in the field
of business.

Characteristics of statistics

1. Statistics are the aggregates of facts. It means a single figure is not statistics.
For example, national income of a country for a single year is not statistics but
the same for two or more years is statistics.
2. Statistics are affected by a number of factors. For example, sale of a product
depends on a number of factors such as its price, quality, competition, the
income of the consumers, and so on.
3. Statistics must be reasonably accurate. Wrong figures, if analyzed, will lead
to erroneous conclusions. Hence, it is necessary that conclusions must be based
on accurate figures.
4. Statistics must be collected in a systematic manner. If data are collected in a
haphazard manner, they will not be reliable and will lead to misleading
conclusions.
5. Statistics is collected for a pre-determined purpose
6. Statistics should be placed in relation to each other. If one collects data
unrelated to each other, then such data will be confusing and will not lead to
any logical conclusions.
7. Data should be comparable over time and over space.

General Functions of Statistics:

The functions of statistics are as follows:

1. It presents fact in a definite form: Numerical expressions are convincing and, therefore,
one of the most important functions of statistics is to present statement in a precise and
definite form.

2. It simplifies mass of figures: The data presented in the form of table, graph or diagram,
average or coefficients are simple to understand.

3. It facilitates comparison: Once the data are simplified they can be compared with other
similar data. Without such comparison the figures would have been useless.

4. It helps in prediction: Plans and policies of organizations are invariably formulated in


advance at the time of their implementation. Knowledge of future trends is very useful in
framing suitable policies and plans.

5. It helps in formulating and testing hypothesis: Statistical methods like z-test, t-test, X2-
test are extremely helpful in formulating and testing hypothesis and to develop new
theories.

6. It helps in the formulation of suitable policies: Statistics provide the basic material for
framing suitable policies. It helps in estimating export, import or production programs in
the light of changes that may occur.

7. Statistics indicates trend behavior: Statistical techniques such as Correlation, Regression,


Time series analysis etc. are useful in forecasting future events
IMPORTANCE OF STATISTICS IN BUSINESS
There are three major functions in any business enterprise in which the statistical methods
are useful. These are as follows:

(i) The planning of operations: This may relate to either special projects or to the
recurring activities of a firm over a specified period.

(ii) The setting up of standards: This may relate to the size of employment, volume of
sales, fixation of quality norms for the manufactured product, norms for the daily output,
and so forth.

(iii) The function of control: This involves comparison of actual production achieved
against the norm or target set earlier. In case the production has fallen short of the target, it
gives remedial measures so that such a deficiency does not occur again.

Importance of statistics in business and management based on functional areas

1. Accounting: Statistical sampling techniques are used during the conduction of audits
for clients. It also helps in detecting the trend and make a projection for next year.
2. Finance and Investments: Statistical information can be used to study the trend in
securities and that can be used to provide investment recommendations. Statistical
methods help in selecting securities which are safe and have the best prospects of
yielding a good income.
3. Marketing: Statistical analysis is frequently used in for making a decision in the field
of marketing, since it is the first step to find out what can be sold and to whom. Then
using statistical methods a suitable strategy is formulated. A statistical analysis of
data on production purchasing power, manpower, habits of competitors, habits of
consumers, transportation cost can be done before entering a new market. Nowadays
electronic scanners at retail checkout counters are used to collect data and to study
the buying behavior of the customer. The data obtained in this procedure is used to
analyze it to formulate future marketing policies.
4. Production: Statistical methods are used in quality control during the production
process. It is also used to control and manage the flow of production. Statistical
methods are used in the scheduling of men and machines.
5. Banking: Statistical data gathering and analysis of the information, help banks in
their own business and also give an idea of the general economic situation of every
segment of business in which they may have interest. Using this analysis they can
formulate their lending policies.
6. Control: The management control process combines statistical and accounting
method in making the overall budget for the coming year including sales, materials,
labor and other costs and capital requirement.
7. Purchase: Purchase department can fix their schedule of purchasing orders
depending upon the trends in consumption of raw materials and inputs. Thus they
decide what to buy? When to buy? And how much to buy?
8. Economics: Statistical techniques and analysis are used for forecasting the future of
the economy. Time series like moving averages, indicators like inflation index are
statistical methods. We can consider statistics as the backbone of economics.
Application of statistics
The use of statistics has become almost essential in order to clearly understand and solve
a problem. Statistics proves to be much useful in unfamiliar fields of application and
complex situations such as :-
Planning, Administration, Economics, Trade & Commerce, Production, management,
Quality control Helpful in inspection, Insurance business, Railways & transport Co,
Banking Institutions, Speculation and Gambling, Underwriters and Investors ,Politicians
& social workers etc

LIMITATIONS OF STATISTICS

Statistics has a number of limitations, pertinent among them are as follows:

(i) There are certain phenomena or concepts where statistics cannot be used. This is
because these phenomena or concepts are not amenable to measurement. For example,
beauty, intelligence, courage cannot be quantified. Statistics has no place in all such cases
where quantification is not possible.

(ii) Statistics reveal the average behavior, the normal or the general trend. An
application of the 'average' concept if applied to an individual or a particular situation may
lead to a wrong conclusion and sometimes may be disastrous. For example, one may be
misguided when told that the average depth of a river from one bank to the other is four
feet, when there may be some points in between where its depth is far more than four feet.
On this understanding, one may enter those points having greater depth, which may be
hazardous.

(iii) Since statistics are collected for a particular purpose, such data may not be
relevant or useful in other situations or cases. For example, secondary data (i.e., data
originally collected by someone else) may not be useful for the other person.

(iv) Statistics are not 100 per cent precise as is Mathematics or Accountancy. Those
who use statistics should be aware of this limitation.

(v)In statistical surveys, sampling is generally used as it is not physically possible to


cover all the units or elements comprising the universe. The results may not be
appropriate as far as the universe is concerned. Moreover, different surveys based on the
same size of sample but different sample units may yield different results.

(vi) At times, association or relationship between two or more variables is studied in


statistics, but such a relationship does not indicate cause and effect' relationship. It
simply shows the similarity or dissimilarity in the movement of the two variables. In such
cases, it is the user who has to interpret the results carefully, pointing out the type of
relationship obtained.

(vii) A major limitation of statistics is that it does not reveal all pertaining to a certain
phenomenon. There is some background information that statistics does not cover.
Similarly, there are some other aspects related to the problem at hand, which are also not
covered. The user of Statistics has to be well informed and should interpret Statistics
keeping in mind all other aspects

The misuse of Statistics

The misuse of Statistics may take several forms some of which are explained below.

(i) Sources of data not given: At times, the source of data is not given. In the absence of
the source, the reader does not know how far the data are reliable. Further, if he wants to
refer to the original source, he is unable to do so.

(ii) Defective data: Another misuse is that sometimes one gives defective data. This may
be done knowingly in order to defend one's position or to prove a particular point. This
apart, the definition used to denote a certain phenomenon may be defective. For example, in
case of data relating to unemployed persons, the definition may include even those who are
employed, though partially. The question here is how far it is justified to include partially
employed persons amongst unemployed ones.

(iii) Unrepresentative sample: In statistics, several times one has to conduct a survey,
which necessitates to choose a sample from the given population or universe. The sample
may turn out to be unrepresentative of the universe. One may choose a sample just on the
basis of convenience. He may collect the desired information from either his friends or
nearby respondents in his neighborhood even though such respondents do not constitute a
representative sample.

(iv) Inadequate sample: Earlier, we have seen that a sample that is unrepresentative of the
universe is a major misuse of statistics. This apart, at times one may conduct a survey based
on an extremely inadequate sample. For example, in a city we may find that there are
100,000 households. When we have to conduct a household survey, we may take a sample
of merely 100 households comprising only 0.1 per cent of the universe. A survey based on
such a small sample may not yield right information.

(v) Unfair Comparisons: An important misuse of statistics is making unfair comparisons


from the data collected. For instance, one may construct an index of production choosing
the base year where the production was much less. Then he may compare the subsequent
year's production from this low base. Such a comparison will undoubtedly give a rosy
picture of the production though in reality it is not so. Another source of unfair comparisons
could be when one makes absolute comparisons instead of relative ones. An absolute
comparison of two figures, say, of production or export, may show a good increase, but in
relative terms it may turn out to be very negligible. Another example of unfair comparison
is when the population in two cities is different, but a comparison of overall death rates and
deaths by a particular disease is attempted. Such a comparison is wrong. Likewise, when
data are not properly classified or when changes in the composition of population in the two
years are not taken into consideration, comparisons of such data would be unfair as they
would lead to misleading conclusions.
(vi) Unwanted conclusions: Another misuse of statistics may be on account of
unwarranted conclusions. This may be as a result of making false assumptions. For
example, while making projections of population in the next five years, one may assume a
lower rate of growth though the past two years indicate otherwise. Sometimes one may not
be sure about the changes in business environment in the near future. In such a case, one
may use an assumption that may turn out to be wrong. Another source of unwarranted
conclusion may be the use of wrong average. Suppose in a series there are extreme values,
one is too high while the other is too low, such as 800 and 50. The use of an arithmetic
average in such a case may give a wrong idea. Instead, harmonic mean would be proper in
such a case.

(vii) Confusion of correlation and causation: In statistics, several times one has to
examine the relationship between two variables. A close relationship between the two
variables may not establish a cause-and-effect-relationship.

Two quantities are said to be correlated if both increase and decrease together (“positively
correlated”), or if one increases when the other decreases and vice-versa (“negatively
correlated”). Correlation is readily detected through statistical measurements ,which
indicates how tightly locked together the two quantities are, ranging from -1 (perfectly
negatively correlated) through 0 (not at all correlated) and up to 1 (perfectly positively
correlated). But just because two quantities are correlated does not necessarily mean that
one is directly causing the other to change. Correlation does not imply causation.

You might also like