Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Sta I06 Lecture Note

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

KADUNA STATE UNIVERSITY

DEPARTMENT OF MATHEMATICAL SCIENCE

STA 106 LECTURE NOTE


INTRODUCTORY LABORATORY FOR INFERENCE
2020/2021 ACADEMIC SESSION

LECTURER:
DR. S. ABDULAZEEZ
STA106:- INTRODUCTORY LABORATORY FOR INTERFERENCE
(2CU)
• Introduction to statistical packages
• Presentation and preliminary analysis of data by tables and graphs.
• Moments, Skewness and Kurtosis
• Fitting and goodness of fit tests.
• Time series; definition, components, addition and multiplication models;
stationarity and invertibility.
• Introduction to demography.
• Simple index numbers.
• Interference, estimation and test of hypothesis.
• Use of random numbers and statistical traders
• Laboratory practical of the course outlines in STA103 should be conducted.

2
INTRODUCTION TO STATISTICAL PACKAGES
Data commonly encountered in real life are often voluminous for
statistical methods to be carried out using pocket calculator. More advanced
statistical methods and analysis can be conveniently carried out on a computer
using a wide range of packages. Most common among statistical packages are
the SPSS and Minitab. Other packages include amongst others: -surplus,
STATA, xl-stat, PSPP, SPAD, Matlas, E-views, CAUSS, Excel,
These packages have changed boundaries between basic statistical
methods and advanced statistical methods.

SPSS
The “statistical package for the social sciences” (SPSS) is a package of
programmes for manipulating, analysis and presenting data, the packages is
widely used in the social and behavioural sciences. There are several forms of
SPSS. The core program is called SPSS base and there are a number of add-on
modules that extend the range of data entry, statistical or reporting capabilities.
The most important of these for statistical analysis are the SPSS advanced
models and SPSS regression models add-on modules.
Getting started in SPSS – Data entry
When you start SPSS, most versions of SPSS for windows provides a
default dialogue box what gives the user a number of options. These
include:
• Run the tutorial
• Type in data
• Run an existing query
• Create new query using database wizard
• Open an existing data source
• Open another type of file.

3
You may select one of the options or hit the escape (ESC) key which
gives to the SPSS spread sheet.
The Data editor consists of two windows. By default the Data View
which allows the data to be entered and viewed, in this window you can create
new variables or edit existing ones rows represent cases and columns represent
variable.
The other window is the Variable View, which allows the types of variables to
be specified and viewed. Users can toggle between the windows by clicking on
the appropriate tabs on the bottom left of the screen.
The variable view spreadsheet serves to define the variables; each variable
definition occupies a row of the spreadsheet. As soon as data is entered under a
column in the Data view, the default name of the variable view.
There are 10 characteristics to be specified under the column of the
variable view which includes: Name, Type, width, Decimals, Label, values,
missing, columns, align and measure.
1. Name: this is the desired variable name, it can be up to eight alphanumeric
characters must begin with be in a letter. Variable names are not case
sensitive. It may contain underscore (=) but hyphens (-), ampersands (⅋)
and spaces are not acceptable.
2. Type: this accepts the type of data, such as date, currency, strong (Alpha-
Number) etc, the type can be changed by highlighting the respective entry in
the second column of the variable view and clicking the three-periods (iii)
appearing on the right-hand side of the cell.
3. Width: allow us to specify the number of character or digits we want to use
in our data. The default width of numeric variable entries is eight.
4. Decimals: -this indicates places to the right of numbers displayed for data
entries if 2 is selected, all your data for that variable will be taken to 2
decimal places.

4
5. Label: A label is attached to the variable name. in contrast to the variable
name, which is confined to eight characters, we can input full text, spaces,
phrases, sentence or ancestor.
6. Values: Value labels are attached to category codes. For categorical
variable, an integer code should be assigned to each category and the
variable defined to be of type “numeric”. When this has been done, clicking
on the respective cell under the sixth column of the variable view makes the
three-period symbol appear and clicking this opens the value labels dialogue
box, which in turn allows assignment of labels to category codes. For
example, if a data set contains a categorical variable sex indicating the
gender of a subject we may assign numeric code “O” to represent females
and code “I” to represent males.
7. Missing: This serves as missing value codes. SPSS recognizes the period
symbol as indicating a missing value. If the codes have been used (e.g. 66, or
666). These have to be declared to represent missing values by highlighting
the respective cell in the selected column, clicking the three- periods symbol
and filling in the resulting missing values dialogue box accordingly.
8. Columns: This represents the width of the variable column in the data view.
The default cell width for numerical variables is eight, but the user may take
it to be as wide as desired for variable that may contain many characters such
as names.
9. Align: This is for the alignment of variables that is, left, centre or right
alignment. The SPSS default is to align numerical variables to the right hand
side and sting variables to be left. If necessary, alignment can be changed by
highlighting the relevant cell in the ninth column and choosing an option
from the drop-down list.
10. Measure: This enable us to select the type of measuring scale that suits the
variables, these included nominal, ordinary and scale (internal and ratio
scales are both regarded as scale).

5
MINITAB
Minitab is a complete package for data summary and analysis. There are some
changes between different versions but the version you have access to should
be alright as a beginner.
Most statistical analysis requires a series of steps, often directed by background
knowledge or by the subject area you are investigated. A typical MINITAB
session enables you to explore data with graphs conduct statistical analysis and
procedures.
Minitab opens with two main windows visible: -
a. The session window displays the results of your analysis in text format. Also
in this window, you can enter commands instead of using MINITAB’s
menus.
b. The Data window contains an open worksheet, which is similar in
appearance to a spreadsheet. You can open multiple worksheets each in a
different Data window.
The MINITAB environment is sketched below:
MIINTAB – Untitled
File Edit Data Stat Graph Editor Tools
S e s s i o n
W i n d o w
W o r k s h e e t s

1 C 1 C 2 C 3
2
3
Data Windows Column Rows
The data are arranged in columns which are called variables. The column
number and name are at the top of each column. Each row in the worksheet
represents a case, which signifies information on a simple variety.
6
MINITAB accepts three types of that numeric, text and date/ time.
Note: Column write date/ time data has C-D Column write numeric data has
no extension Ci. Column with text data has Ci – T.

QUESTIONNAIRE CONSTRUCTION
A questionnaire is an inquiring form, which seeks response to a number of
pertinent questions of interest to the data collection team. Questionnaire
provides a fast method of data collection either manually or electronically.
Two types of questionnaires could be distinguished.
1. They are structural (Close ended) and;
2. The un-structural (open – ended) questionnaires.
Structural questionnaires are prepared such that an objective number of
answers are provided to questions asked. Respondents are expected to tick their
choice, for instance an inquiry on marital status may be designed as follows.
Marital status: (Please Tick)
Single Married Divorced
Un-Structural questionnaires do not suggest answers options to the respondent.
Respondent are given the choice of responding freely to questions asked. Most
often, blank spaces are provided in the questionnaire to accommodate respond
to such questions. Example;
What is your advice on the improve of academic standard of their students?
…………………………………………………………………………………
…………………………………………………………………………………
It should be noted that structural questionnaires aids simplicity of
administration, it is less costly and enhance speed in analysis.
Characteristics/ Guiding Principles in a good questionnaire construction
i. Objective of questionnaire: The aim, objectives and importance of the
questionnaire must be stated at the top or front page of a questionnaire.

7
ii. Confidentially: The confidentiality of the information of interest must
guaranteed before genuine response to be obtained from the respondents.
iii. Simplicity: Very simple, Clear are unambiguous questions should be asked.
In fact the order of question must be logical from simple to complete
explanatory notes or instructions must give where necessary and the number
of questions must be minimized information for analysis. We must ensure
that repetitions are avoided.
iv. Neatness: The designed questionnaire must be neat and attraction.
Appropriate spacing of questions will attract the respondents attention
v. Leading Questions: These are questions asked in such a way that the
respondent is likely to answer the question(s) in the manner of the
investigator wants. In the design of questionnaires leading questions should
be avoided as much as possible.
vi. Exhaustive and mutually exclusive questions. When optional questions are
asked, the options should be exhaustive and mutually exclusive. This implies
that all possible answers to a given question should be included as an option
and also, the meaning of a given option should not be the same or contained
in another
vii. Pre-Testing: The questionnaire should be pre-tested before the actual field
survey to identify problem areas. This will enable appropriate amendments
and corrections to be effected.
viii. Analysis: Questionnaires must be designed in such a way as to facilitate
analysis. Any questionnaire that does not take this factor into consideration
has little or no value at all.
Questionnaire can considered to be a cost-saving and time saving method with
high responsible rate. On the other hand questionnaire may be difficult and
time consuming enlightened individuals may deliberately refuse to give
information required or cause some delay.

8
SAMPLES AND POPULATION
A sample consists of one or more observations drawn from the population, it is
simply a fraction of the population.
Population is the totality of individual observations about which inferences are
to be made; it can simply be referred to as universe. Population can be used to
describe the totality of people (human population) animate or inanimate objects
that are clearly defined.

CENSUS AND SAMPLE SURVEYS


Enumeration involves counting meaning or studying individual elements under
consideration.
A complete enumeration or total count of the population is called a census, if
we consider a population census, we measure a complete enumeration or count
of the human population in a partial geographical area in a given time period.
A partial enumeration is called a sample survey.

PARAMETERS AND STATISTIC


Numerical characteristics which serve to describe specific properties of the
population are referred to as parameter of the parental population is referred to
as statistics. They are computed summary measure describing the
characteristics from only a sample of the population such as sample mean 𝑥̅ ,
variance 𝜎 2 and standard deviation 𝜎.

REASONS FOR SAMPLING


i. Population Size
ii. Destructive tests

ADVANTAGES OF SAMPLING
i. Saves cost
9
ii. Greater Accuracy
iii. Greater Speed
iv. Greater Scope
v. Feasibility

METHOD OF SAMPLING
There are two main methods of sampling namely:
Probabilistic and Non-Probabilistic sampling procedure.
Non-Probabilistic Sampling procedure does not require the use of the laws of
probability in selecting the items of the sample. Criteria such as accessibility of
the elements, the opinion of experts, or convenience to the investigation.
Examples of Non-probabilistic sampling include judgment sampling quota
sampling and convenience sampling.

Probabilistic sampling refers to a process in which the laws of probability


chance determine which elements of the population to include in the sample.
Examples are
(a) Simple random sampling
(b) Stratified Sampling
(c) Systematic Sampling
(d) Cluster Sampling

SIMPLE RANDOM SAMPLING (SRS)


This is a sampling procedure in which every unit in the population has an equal
chance of being included in the sample. In selecting a sample size n from a
population of Size N, every possible sample of size n has the same chance of
being selected.
There are various method of drawing a simple random sample (S. R. S)

10
Lottery Method: This is a very popular method in which numbers are
allocated to sampling units, which are then confined somewhere to avoid bais.
An appropriate technique is then used in selecting our sample from the
population.

Random Number Tables: An alternative and more systematic approach to


assuring randomness is to select a sample with the aid of a table of random
numbers. Such a table is designed specially or computer generated. It
comprises a series of digits drawn up in such a way that all the numbers are
produced with equal frequency.
With a finite population, a table of random numbers may be used to
select a simple random sample of n items in the following manner.
First, a unique number between O and N must be assigned to each of the
N items in the population. The table of random numbers is then consulted the
first n numbers encountered (starting at any desired point in the table) which is
less than N constitutes a set of n numbers. Then elements corresponding to
these n numbers form the random sample.
In simple SRS, we must have access to all items in the population. With
a small population of elements, which is easy to identify and sample, this
procedure normally gives the best results. With a large population however,
SRS may be difficult perhaps even impossible to implement and very costly.

SIMPLE INDEX NUMBERS


An index number is a statistical value that measures the change in a variable
with respect to time. It is a percentage ratio of a variable for two-time period,
geographic locations, categories, profession, income or some other
characteristics. Changes with respect to time are common in index numbers.

11
TYPES OF INDEX NUMBERS
There are generally three (3) types of index numbers
(a) Price index numbers: Measure changes in prices of a particular commodity
in the base period (Po) and the price the commodity in any other specified
period (Pi).
(b) Quantity Index numbers: Measure changes in the volume or physical
quantity of goods in the base period (qO) and the volume or quantity of the
commodity in any other Specified period (qi)
(c) Value index number: Measure change in the value of a commodity in the
base period (Vo) and the value of the commodity in the specified period (Vi).
the value of a commodity is the product of its price and quantity, that is, Vo =
Po x qo and Vi = Pi x qi

CONSTRUCTION OF INDEX NUMBER


We refer to index numbers that are constructed from a single item only as
simple index numbers.
SIMPLE PRICE INDEX
The price relative of an item is the ratio of the price of the item in the current
period (Pi) to the price of the same item in the base period (Po),
𝑷𝒊
Thus price Relative =
𝒑𝒐

It is common to express this changes using the simple price index


Simple price Index = Price Relative x 100
𝑷𝒊
Ip = 𝒑𝒐 𝒙 𝟏𝟎𝟎

The price index finds the percentage change in the price of an item from one
period to another.
Note:

12
(i) If the simple price index is more than 100, subtract 100 from the simple
price index. The result is the percentage increase in price from the base
period to the current period.
(ii) If the simple price index is less than 100, subtract the simple price index
from 100. The result is the percentage by which the item cost less in the
base period than it does in the current period.

Example: In 2015, a 50kg of rice cost N13,000:00. In 2019 a 50kg bag of


rice cost N18,000
i. Calculate the price relative
ii. What is the simple price index Ip
iii. Interpret your result
Solution
Price (2015) Po = 13,000
Price (2019) Pi = 18,000
𝑷𝒊 𝟏𝟖,𝟎𝟎𝟎
(i) Price Relative = = 𝟏𝟑,𝟎𝟎𝟎 = 𝟏. 𝟑𝟖𝟓
𝒑𝒐

(ii) Simple Price Index = Price Relative x 100


Ip = 1.385 x 100 = 138.5
(iii) Since Ip> 100 then 138.5 – 100 = 38.5

The price of 50kg of rice has increases in 2019 by 38.5% with respect of 2015
price

Example: In 2018 the price of a litre of petrol is N145, the price in 2020 is
N125.
i. Calculate the price relative
ii. What is the Ip
iii. interpret you Ip

Solution
Petrol Price 2018, Po = N145
13
2020, Pi = N125
𝑷𝒊 𝟏𝟐𝟓
i. Price Relative = 𝒑𝒐 = = 𝟎. 𝟖𝟔𝟐
𝟏𝟒𝟓

ii. 𝑰𝒑 = 𝑷𝒓𝒊𝒄𝒆 𝑹𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒙 𝟏𝟎𝟎 = 𝟖𝟔. 𝟐


iii. 𝑺𝒊𝒏𝒄𝒆 𝑰𝒑 < 100 𝑡ℎ𝑒𝑛 100 − 86.2 = 13.8

The price of a litre of petrol has decreased in year 2020 by 13.8% with respect
to the price in 2018.

SIMPLE QUANTITY INDEX


This is the ratio of quantity or volume of a single commodity in a given period
(Qc) to its quantity or volume in the base period (Q0).
𝒒𝒊
Iq = 𝒒𝒐 x 100

SIMPLE VALUE INDEX


A simple value index or value relation is one that represents a comparison of
the total value of a commodity over two-time periods.
𝒗 𝒑𝒊𝒒𝒊
Iv = 𝒗 𝒊 𝒙 𝟏𝟎𝟎 = 𝒑𝒐𝒒𝒐 𝒙 𝟏𝟎𝟎
𝒐

TIME SERIES ANALYSIS


An orderly arrangement of data collected, recorded or observed at successive
intervals of time are generally referred to as “Time series”. A time series can
simply be define as a set of observations taken at specific times, usually at
equal intervals. Examples of time series data include: total annual crude oil
production in Nigeria over a number of years, the daily price of commodities
sold in a market, the hour temperature recorded by metereologist department;
monthly sales in a departmental store, annual school enrolment and
withdrawals.

14
Mathematically, a time series is represented or defined by the values y 1,
y2 ... of a variable y(prices, temperature, etc) at times t1, t2...
Thus, y=f(t) i.e y is a function of t.

A graph of time series can be drawn with t(years, months, weeks,


minutes, seconds) on the x-axis while the variable considered y is plotted on
the y-axis.
For example, the data below is on annual sales of rice.
Year 2010 2011 2012 2013 2015
Qty sold
10 20 30 40 50
(tonnes)

12

10
Quantity of

8
rice sold

0
t
11
t
22
t
33
t
44
t55 (Year)

Graph of time series

UTILITY OF TIME SERIES


1. It helps in understanding past behaviour.
2. It helps in planning future operations.
3. It helps in evaluating current accomplishments.
4. It facilitates comparison.

15
COMPONENTS OF TIME SERIES
The basic ideas underlying time series analysis is that systematic influences
that are associated with time affects its values. The objective of time series
analysis is to identify and measure the influences of the different time related
factors. The fluctuations are due to the influence of physical, economic,
sociological or other forces. The characteristic movement of a time series may
be classified into four main types called component of a time series.
a. Secular Trend (T): This represents a general rise or fall occur in a time
series data over a low period of time. It is a smooth, steady, regular and a
broad movement of the series. In the same direction genuinely covering a
minimum of ten or fifteen years. A secular trend which portrays an upward
movement is population trend while death rate portrays a downward trend.
b. Cyclical Fluctuation (C): This refers to recurrent up and down wave like
variation or oscillation about a trend line. They are often described as
“swings from prosperity, through recession, depression, recovery and back
to prosperity”.
A cycles is said to be completed when beginning with a peak, the falling
curve reaches a minimum point and then rising again reaches the next peak.
These cycles may or may not be periodic.
A typical example of cyclical movement is a business cycle.
Phases of a Business cycle.

Peak
Peak, boom
or Prosperity

Normal

Decline Trough or
depression

16
c. Seasonal Variation (S): These are changes that occur in time series data
that can be attributed to seasonal effect with fairly regular period (usually a
yearly) and reoccur annually. Seasonal effects could be observed within a
day, a week, a month or a quarter of a year depending on the nature of data
being observed.
The factors that create seasonal variations include: climate and weather
conditions, customs, traditions and habits. Daily, hourly or weekly
occurrence of events can also produce seasonal movements.
d. Irregular Variation (I): These are random or sporadic movement of time
series due to chance or unpredictable events such as floods, strikes, election
fires, war, earthquakes, pandemics, epidemic, etc. These events produce
variations lasting a short time hence they are sometimes called residual,
erratic or accidental variations which can be ascribed to cyclical or
seasonal influences.
The graph below shows a hypothetic time series of monthly values for a
24 year period.

Level (y)
Productions,
Sales, etc

2 4 6 8 10 12 14 16 18 20 22 24
Time (t)

17
TIME SERIES MODEL
In traditional time series analysis, it is assumed that there is a multiplicative
relationship between the four components. That is, it is assumed that any
particular value in a series is the product of factors that can be attributed to
the various components.
Symbolically;
Yt = Tt × St × Ct × It ...................... Multiplicative model
Where
Yt = the value of the observed series for a given time t (Result of
the four factors)
Tt = Trend, a long term growth factor
Ct = The cyclical component
St = The seasonal factor
It = The irregular factor

Another approach is to treat each observation of a time series as sum of


these four components. That;
Yt = Tt + St + Ct + It ...................... Additive model

The multiplicative model is mostly accepted because the factors are viewed
as amplifying each other rather than acting separately as assumed by the
additive model. This implies that the factors are not independent of each
other, hence the multiplicative model is considered as a standard
assumption for time series analysis and it is more often employed in
practice.

STATIONARITY AND INVERTIBILITY


A stationary time series is a time series whose statistics do not change over
time such statistics are the mean, variance and the covariance.

18
The basic idea of stationarity is that the probability laws that govern the
behaviour of the process do not change over time. In a sense, the process in
is statistical equilibrium.
Specifically, a process {Xt} is said to be strictly stationary if the joint
distribution Xt1, Xt2 ... Xtn is the same as the joint distribution of Xt1-k, Xt2-k,
... Xtn-k for all choices of time points t1, t2 ... tn and all choices of time lag k.
Thus, when n=1 the (univariate) distribution of Xt is the same as that of
Xt-k for all t and k, in other words, the X’s are (marginally) identically
distributed. It then follows that:
E(Xt) = E(Xt-k) t,k. So that the mean function is constant for all time.

Additionally, Var(Xt) = Var(Xt-k) . So that the variance is also constant


over time.

Setting n=2 in the stationarity definition we see that the bivariat


distribution of Xt and Xs must be the same as that of Xt-k and Xs-k from
which it follows that:
Cov(Xt,Xs) = Cov(Xt-k, Xs-k) t, s & k
Putting k=s and k=t we obtain:
ɣt,s = Cov(Xt-s, Xo)
= Cov(Xo, Xs-t)
= Cov(Xo, X/t-s/)
= ɣo,/t-s/

That is, the covariance between Xt and Xs depends on time only through the
time difference /t-s/ and not otherwise on the actual times t and s. Thus for a
stationary process, we can simplify our notation and write:
ɣk = Cov(Xt, Xt-k) and ρk = Cov(Xt, Xt-k)

19
ɣ
Note also that ρk = ɣk
o

The preceding relationships give us the following general properties;

ɣo = Var(Xt) ρo = 1
ɣk = ɣ-k ρk = ρ-k
/ɣk/≤ ɣo /ρk/≤ 1

If the process is strictly stationary and has finite variance, then the
covariance function must depend only on the time lag.
A definition that is similar to that of strict stationarity but is
mathematically weaker is as follows:
A process {Xt} is said to be weakly (or second order) stationary if;
a. The mean function is constant over time
b. ɣt,t-k = ɣo,k time t & lag k

INTRODUCTION TO DEMOGRAPHY
Demography is the study of populations, especially with reference to
size and density, fertility, mortality growth, age distribution, migration and
vital statistics. It involves the integration of all these features with social and
economic conditions.
Demography is simply the study of population. It looks at everything
that influences population size, distribution processes, and the influence that
change in population has on contemporary issues.
Demography is the study of the changes in number of births, deaths,
marriages and cases of diseases in a community over a period of time.
Other definitions of demography include:
Demography is the statistical switch of human populations.

20
Demography is the study if information in figures (statistics) about the
population of an area or country and how these figures vary with time.
Demography is the study of the population in its static and dynamic aspects.
The static aspects include characteristics such as composition by age, sex, race,
marital status, economic characteristics. The dynamic aspects are fertility,
mortality, natality and migration (John Hopkins University, 2008).
Demography is the study of size composition, growth and distribution of
human population (Henslin, 2009).

USES OF DEMOGRAPHY
Weeks (1998) observed that demography is one of the areas in sociology
that treats things from the practical point of view. Demography can be used in
politics, by Government as much as in business.

DEMOGRAPHY AND POLITICS


Politics is all about democracy which is “government of the people, for
the people, by the people”. It is demographic results that will show the number
of people to be governed, the areas (location, communities) to be governed or
represented and the actual people (eligible voters) who will bring them to
power. Demography is also an essential tool used in ensuring equal
representation in Congress-National (Senate and House of representative) and
State Houses of Assembly and Local Government Councils. For example,
make-up of the House of Representative is determined by population
distribution each Local Government Area is represented.

DEMOGRAPHY AND BUSINESS


People who are doing business can use the study of population to their
advantage. The result of population studies could be put to specific use in
business. For example, if a business sells a product that is desired or required

21
by age-specific groups, that business can use information from demographic
studies to discover communities where members of that age-specific group
live. For example, a business of stationery will work very well in areas where
there are many students. A shop that sells fashion dresses and shoes for young
people will do very well in a College, University or Polytechnic or any other
higher Institutions environment. Fertilizer will sell best in a farming
community. In general, demographic awareness could help in finding
neighbourhoods where a business would yield the most profit and satisfaction.

DEMOGRAPHY AND GOVERNMENT


Government uses demography to plan and allocate resources. Demography
studies help to identify the various groups in a community like children,
women, men, physically challenged, elderly and youths. It is this information
that is used in the allocation of resources to suit the need of each group.

SOURCES OF DEMOGRAPHIC DATA


Three methods of data collection are commonly used to collect demographic
data; they are population censuses, civic registration and household surveys.
Population Census: A population collects information on economic and social
characteristics of every person and household in the nation at a particular point
in time. Population census is typically taken once every 10 years.
Civic Registration: Civic registration collects information on births, deaths
and other vital events occurring in a country. Like the population census, civic
registration aims at universal coverage. Unlike the population census, civic
registration is a continuous operation. Births and deaths are to be registered
within a short time of occurrence.
Household Surveys: It collects information for relatively small but
scientifically designed samples of households. The relatively small sample size
makes survey less expensive and more flexible than population census and

22
civic registration, but also less able to provide detailed information on small
geographic areas and population subgroups.
❖ A Census is the total process of collecting, compiling and publishing
demographic, economic and social data pertaining to a specified times or
times to all persons in a country or in a delimited territory. Census tell us
the size of the population by sex, age, marital status and citizenship. It
gives information on other population composition such as educational
level, religion, work status and occupation.

POPULATION DYNAMICS
Population dynamics refer to the ever-changing interrelationships among the
set of variables that influences the demographic makeup of population as well
as variables that influences the growth and decline of population sizes. Among
the factors that relate to the size as well as the age and sex composition of
population are fertility, death rates and migration.
Fertility: Is a child bearing capacity of the population represented by women
between the ages of 1 – 49years.
Fertility rate is a number of births per 1000 women of specific composition.
i. General fertility rate: is the number of live births per 1000 women
between the ages of 15 and 49years.
GFR = Number of life births X 1000
Mid-year female population age
ii. Age Specific Fertility Rate: Is the number of births to women of a
particular age (a year or age group) and females in the age group 25 – 29
years.

ASFR= Number of live births to women of Age x X 1000


Mid-year female population of age x

Total Fertility Rate (TFR): Is the average number of children a woman would
bear during her reproductive life from (15 – 40 years), assuming her child-

23
bearing confirms to her age-specific fertility rate every year of her child-
bearing years.
Computation of total fertility rate, based on Hungary’s 2010 data;

TFR = 0.10 + 0.20 + 0.35 + 0.40 + 0.15 + 0.05 + 0


= 1.25

From biological point of view (with concerning migration and at a stable level
of mortality) TFR indicates clearly the trend of the human reproduction. The
cut value is TFR=2, which means that mother and father in the family will be
replaced by 2 children.

Life Year Categories


15-19 20-24 25-29 30-34 35-39 40-44 44-49
Number of 5220 12668 25090 31489 13438 2271
Children
Number of 287,568 314,375 335856 401,619 388,074 346,058 301,
Women
Fertility rate 0.02 0.04 0.07 0.08 0.03 0.01 0.0

Number of
Women/children 0.10 0.21 0.35 0.40 0.15 0.05 0.0
in live years

TFR greater than 2 means growing population, TFR less than 2 means
decreasing number of the population.
Other rates on fertility include Crude birth rate (CBR), Gross reproduction rate
(GRR) and Net reproduction rate (NRR).
Mortality: Is a relationship of death cases to the whole population. There are
basically two types of mortality.
(a) General/Crude mortality rate or death rate.
(b) Specific mortality rates.

24
• Age and sex related (special rates; infant mortality and fetal
losses).
• Cause related (diseases, injuries, suicide, and homicide).
• Life expectancy (Sex and age related)
a) Crude death rate: is the rate number of death cases in a year per 1000 of
the population.
CDR = Number of death losses X 1000
Mid-year population
Example:- death cases= 135,000, Mid-year population = 10,000,000
CDR = 135,000 X 1000
10,000,000
= 13.5

25
MOMENT OF A FREQUENCY DISTRIBUTION
(a) Let x1, x2 ... xn be random variable with frequencies f1, f2 ... fn, then the
rth moment (about zero) of the frequency distribution is defined as
∑ 𝑓𝑥 𝑟
𝑀𝑟𝑖 = ∑𝑓

∑ 𝑓𝑥 1
Thus, 1st moment 𝑀1𝑖 = ∑𝑓
= mean = 𝑥̅

∑ 𝑓𝑥 2
2nd moment 𝑀2𝑖 = ∑𝑓

∑ 𝑓𝑥 3
rd
3 moment 𝑀3𝑖 = ∑𝑓

∑ 𝑓𝑥 𝑟
rth moment 𝑀𝑟𝑖 = ∑𝑓

(b) The rth moment (about the mean) of the frequency distribution is
∑ 𝑓(𝑥−𝑥̅ )𝑟
𝑀𝑟 = ∑𝑓

Thus, 1st moment about the mean is;


∑ 𝑓 (𝑥−𝑥̅ )1
𝑀1 = ∑𝑓
∑ 𝑓𝑥 − 𝑥̅ ∑ 𝑓
= ∑𝑓
∑ 𝑓𝑥
∑ 𝑓𝑥 − ∑𝑓
∑𝑓
= ∑𝑓
∑ 𝑓𝑥 − ∑ 𝑓𝑥
= ∑𝑓
0
=∑
𝑓

= 0.

2nd moment about the mean is;


∑ 𝑓 (𝑥−𝑥̅ )2
𝑀2 = ∑𝑓
= S2

26
∑ 𝑓(𝑥 2 −2𝑥𝑥̅ + 𝑥̅ 2 )
= ∑𝑓

∑ 𝑓𝑥 2 − 2𝑥̅ ∑ 𝑓𝑥 + 𝑥̅ 2 ∑ 𝑓
= ∑𝑓
1 ∑ 𝑓𝑥 ∑ 𝑓𝑥
= ∑ 𝑓 [∑ 𝑓 𝑥 2 − 2 ∑𝑓
(∑ 𝑓𝑥) + ∑ 𝑓 ( ∑ 𝑓 )2 ]
1 (∑ 𝑓𝑥)2 ∑ 𝑓𝑥
= ∑ [∑ 𝑓 𝑥 2 − 2 ∑𝑓
+ ∑ 𝑓 ( ∑ )2 ]
𝑓 𝑓

1 (∑ 𝑓𝑥)2 (∑ 𝑓𝑥)2
= ∑ 𝑓 [∑ 𝑓 𝑥 2 − 2 ∑𝑓
+ ∑𝑓
]

1 (∑ 𝑓𝑥)2
= ∑ 𝑓 [∑ 𝑓 𝑥 2 − ∑𝑓
]

∑ 𝑓𝑥2 ∑ 𝑓𝑥 2
= ∑𝑓
−[ ∑𝑓
]
2
= 𝑀2𝑖 − (𝑀1𝑖 )

Exercise: Show that the 3rd and 4th moments about the mean M3 and M4 can be
expressed in terms of moments about the origin (zero) as follows;
(a) 𝑀3 = 𝑀3𝐼 − 3𝑀1𝐼 𝑀2𝐼 + 2(𝑀1𝐼 )3
(b) 𝑀4 = 𝑀4𝐼 − 4𝑀1𝐼 𝑀4𝐼 + 6(𝑀1𝐼 )2 𝑀2𝐼 − 3(𝑀1𝐼 )4

Example: Find the 1st moment about the origin and the 3rd moment about the
mean for the distribution below:
Class interval 1–5 6 – 10 11 – 15 16 – 20
Frequency 3 5 6 1

∑ 𝑓𝑥 1
1st moment about the origin: 𝑀1𝑖 = ∑𝑓

3rd moment about the mean: 𝑀3 = 𝑀3𝐼 − 3𝑀1𝐼 𝑀2𝐼 + 2(𝑀1𝐼 )3


∑ 𝑓𝑥 1 ∑ 𝑓𝑥 2 ∑ 𝑓𝑥 3
𝑀1𝑖 = ∑𝑓
; 𝑀2𝑖 = ∑𝑓
; 𝑀3𝑖 = ∑𝑓

27
SKEWNESS AND KURTOSIS
Skewness can be described as a measure of non-symmetry. A frequency
distribution or curve is said to be symmetrical of the value equidistant from a
central maximum have the same frequencies. If a distribution is symmetrical,
then the two halves are the mirror images of each other.
(a) Normal curve (b) Rectangular distribution

0 1 2 3 4 5 6

The skew of a distribution represents the extent to which it departs from


symmetry throughout the range of values of observed data. The frequency
tends to be higher at one end or the other end of the distribution or curve.

Low High High Low


Negatively skewed distribution Positively skewed distribution

The measure of skewness are as follows:


𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒
(i) Skewness =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
3(𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)
(ii) Skewness =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

 𝑚𝑒𝑎𝑛 − 𝑚𝑜𝑑𝑒 = 3(𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛)


Or 𝑚𝑜𝑑𝑒 = 𝑚𝑒𝑎𝑛 − 3(𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛)
Equations (i) and (ii) above are called Pearson’s measure of skewness. A
negative, zero or positive value of these measures shows respectively left
skew, symmetry (i.e no skew) or right skew.

28
KURTOSIS
Kurtosis is the degree of peakedness of a distribution. It indicates the extent to
which frequencies are closely group or thingly spread throughout observed
values. Distributions may have the same degree of skew but different degree of
kurtosis. “Platykurtic” is the name given to “flat – topped” distribution and
“Leptokurtic” to more peaked distributions. Mesokurtic distributions are
considered as normal distributions.

Leptokurtic
Mesokurtic or curve
Platykurtic normal curve
curve

𝑀4
A measure of kurtosis of a distribution is given by (𝑀2 )2
− 3; a negative or

positive value showing how less or more peaked (respectively) the given
distribution is compared to a “Normal” distribution.
Another measure is called the percentile co-efficient of kurtosis defined
𝑄
as; 𝑘 =
𝑃90 − 𝑃10

Where Q = semi-interquatile range


P90 = 90th percentile
P10 = 10th percentile

29

You might also like