Sta I06 Lecture Note
Sta I06 Lecture Note
Sta I06 Lecture Note
LECTURER:
DR. S. ABDULAZEEZ
STA106:- INTRODUCTORY LABORATORY FOR INTERFERENCE
(2CU)
• Introduction to statistical packages
• Presentation and preliminary analysis of data by tables and graphs.
• Moments, Skewness and Kurtosis
• Fitting and goodness of fit tests.
• Time series; definition, components, addition and multiplication models;
stationarity and invertibility.
• Introduction to demography.
• Simple index numbers.
• Interference, estimation and test of hypothesis.
• Use of random numbers and statistical traders
• Laboratory practical of the course outlines in STA103 should be conducted.
2
INTRODUCTION TO STATISTICAL PACKAGES
Data commonly encountered in real life are often voluminous for
statistical methods to be carried out using pocket calculator. More advanced
statistical methods and analysis can be conveniently carried out on a computer
using a wide range of packages. Most common among statistical packages are
the SPSS and Minitab. Other packages include amongst others: -surplus,
STATA, xl-stat, PSPP, SPAD, Matlas, E-views, CAUSS, Excel,
These packages have changed boundaries between basic statistical
methods and advanced statistical methods.
SPSS
The “statistical package for the social sciences” (SPSS) is a package of
programmes for manipulating, analysis and presenting data, the packages is
widely used in the social and behavioural sciences. There are several forms of
SPSS. The core program is called SPSS base and there are a number of add-on
modules that extend the range of data entry, statistical or reporting capabilities.
The most important of these for statistical analysis are the SPSS advanced
models and SPSS regression models add-on modules.
Getting started in SPSS – Data entry
When you start SPSS, most versions of SPSS for windows provides a
default dialogue box what gives the user a number of options. These
include:
• Run the tutorial
• Type in data
• Run an existing query
• Create new query using database wizard
• Open an existing data source
• Open another type of file.
3
You may select one of the options or hit the escape (ESC) key which
gives to the SPSS spread sheet.
The Data editor consists of two windows. By default the Data View
which allows the data to be entered and viewed, in this window you can create
new variables or edit existing ones rows represent cases and columns represent
variable.
The other window is the Variable View, which allows the types of variables to
be specified and viewed. Users can toggle between the windows by clicking on
the appropriate tabs on the bottom left of the screen.
The variable view spreadsheet serves to define the variables; each variable
definition occupies a row of the spreadsheet. As soon as data is entered under a
column in the Data view, the default name of the variable view.
There are 10 characteristics to be specified under the column of the
variable view which includes: Name, Type, width, Decimals, Label, values,
missing, columns, align and measure.
1. Name: this is the desired variable name, it can be up to eight alphanumeric
characters must begin with be in a letter. Variable names are not case
sensitive. It may contain underscore (=) but hyphens (-), ampersands (⅋)
and spaces are not acceptable.
2. Type: this accepts the type of data, such as date, currency, strong (Alpha-
Number) etc, the type can be changed by highlighting the respective entry in
the second column of the variable view and clicking the three-periods (iii)
appearing on the right-hand side of the cell.
3. Width: allow us to specify the number of character or digits we want to use
in our data. The default width of numeric variable entries is eight.
4. Decimals: -this indicates places to the right of numbers displayed for data
entries if 2 is selected, all your data for that variable will be taken to 2
decimal places.
4
5. Label: A label is attached to the variable name. in contrast to the variable
name, which is confined to eight characters, we can input full text, spaces,
phrases, sentence or ancestor.
6. Values: Value labels are attached to category codes. For categorical
variable, an integer code should be assigned to each category and the
variable defined to be of type “numeric”. When this has been done, clicking
on the respective cell under the sixth column of the variable view makes the
three-period symbol appear and clicking this opens the value labels dialogue
box, which in turn allows assignment of labels to category codes. For
example, if a data set contains a categorical variable sex indicating the
gender of a subject we may assign numeric code “O” to represent females
and code “I” to represent males.
7. Missing: This serves as missing value codes. SPSS recognizes the period
symbol as indicating a missing value. If the codes have been used (e.g. 66, or
666). These have to be declared to represent missing values by highlighting
the respective cell in the selected column, clicking the three- periods symbol
and filling in the resulting missing values dialogue box accordingly.
8. Columns: This represents the width of the variable column in the data view.
The default cell width for numerical variables is eight, but the user may take
it to be as wide as desired for variable that may contain many characters such
as names.
9. Align: This is for the alignment of variables that is, left, centre or right
alignment. The SPSS default is to align numerical variables to the right hand
side and sting variables to be left. If necessary, alignment can be changed by
highlighting the relevant cell in the ninth column and choosing an option
from the drop-down list.
10. Measure: This enable us to select the type of measuring scale that suits the
variables, these included nominal, ordinary and scale (internal and ratio
scales are both regarded as scale).
5
MINITAB
Minitab is a complete package for data summary and analysis. There are some
changes between different versions but the version you have access to should
be alright as a beginner.
Most statistical analysis requires a series of steps, often directed by background
knowledge or by the subject area you are investigated. A typical MINITAB
session enables you to explore data with graphs conduct statistical analysis and
procedures.
Minitab opens with two main windows visible: -
a. The session window displays the results of your analysis in text format. Also
in this window, you can enter commands instead of using MINITAB’s
menus.
b. The Data window contains an open worksheet, which is similar in
appearance to a spreadsheet. You can open multiple worksheets each in a
different Data window.
The MINITAB environment is sketched below:
MIINTAB – Untitled
File Edit Data Stat Graph Editor Tools
S e s s i o n
W i n d o w
W o r k s h e e t s
1 C 1 C 2 C 3
2
3
Data Windows Column Rows
The data are arranged in columns which are called variables. The column
number and name are at the top of each column. Each row in the worksheet
represents a case, which signifies information on a simple variety.
6
MINITAB accepts three types of that numeric, text and date/ time.
Note: Column write date/ time data has C-D Column write numeric data has
no extension Ci. Column with text data has Ci – T.
QUESTIONNAIRE CONSTRUCTION
A questionnaire is an inquiring form, which seeks response to a number of
pertinent questions of interest to the data collection team. Questionnaire
provides a fast method of data collection either manually or electronically.
Two types of questionnaires could be distinguished.
1. They are structural (Close ended) and;
2. The un-structural (open – ended) questionnaires.
Structural questionnaires are prepared such that an objective number of
answers are provided to questions asked. Respondents are expected to tick their
choice, for instance an inquiry on marital status may be designed as follows.
Marital status: (Please Tick)
Single Married Divorced
Un-Structural questionnaires do not suggest answers options to the respondent.
Respondent are given the choice of responding freely to questions asked. Most
often, blank spaces are provided in the questionnaire to accommodate respond
to such questions. Example;
What is your advice on the improve of academic standard of their students?
…………………………………………………………………………………
…………………………………………………………………………………
It should be noted that structural questionnaires aids simplicity of
administration, it is less costly and enhance speed in analysis.
Characteristics/ Guiding Principles in a good questionnaire construction
i. Objective of questionnaire: The aim, objectives and importance of the
questionnaire must be stated at the top or front page of a questionnaire.
7
ii. Confidentially: The confidentiality of the information of interest must
guaranteed before genuine response to be obtained from the respondents.
iii. Simplicity: Very simple, Clear are unambiguous questions should be asked.
In fact the order of question must be logical from simple to complete
explanatory notes or instructions must give where necessary and the number
of questions must be minimized information for analysis. We must ensure
that repetitions are avoided.
iv. Neatness: The designed questionnaire must be neat and attraction.
Appropriate spacing of questions will attract the respondents attention
v. Leading Questions: These are questions asked in such a way that the
respondent is likely to answer the question(s) in the manner of the
investigator wants. In the design of questionnaires leading questions should
be avoided as much as possible.
vi. Exhaustive and mutually exclusive questions. When optional questions are
asked, the options should be exhaustive and mutually exclusive. This implies
that all possible answers to a given question should be included as an option
and also, the meaning of a given option should not be the same or contained
in another
vii. Pre-Testing: The questionnaire should be pre-tested before the actual field
survey to identify problem areas. This will enable appropriate amendments
and corrections to be effected.
viii. Analysis: Questionnaires must be designed in such a way as to facilitate
analysis. Any questionnaire that does not take this factor into consideration
has little or no value at all.
Questionnaire can considered to be a cost-saving and time saving method with
high responsible rate. On the other hand questionnaire may be difficult and
time consuming enlightened individuals may deliberately refuse to give
information required or cause some delay.
8
SAMPLES AND POPULATION
A sample consists of one or more observations drawn from the population, it is
simply a fraction of the population.
Population is the totality of individual observations about which inferences are
to be made; it can simply be referred to as universe. Population can be used to
describe the totality of people (human population) animate or inanimate objects
that are clearly defined.
ADVANTAGES OF SAMPLING
i. Saves cost
9
ii. Greater Accuracy
iii. Greater Speed
iv. Greater Scope
v. Feasibility
METHOD OF SAMPLING
There are two main methods of sampling namely:
Probabilistic and Non-Probabilistic sampling procedure.
Non-Probabilistic Sampling procedure does not require the use of the laws of
probability in selecting the items of the sample. Criteria such as accessibility of
the elements, the opinion of experts, or convenience to the investigation.
Examples of Non-probabilistic sampling include judgment sampling quota
sampling and convenience sampling.
10
Lottery Method: This is a very popular method in which numbers are
allocated to sampling units, which are then confined somewhere to avoid bais.
An appropriate technique is then used in selecting our sample from the
population.
11
TYPES OF INDEX NUMBERS
There are generally three (3) types of index numbers
(a) Price index numbers: Measure changes in prices of a particular commodity
in the base period (Po) and the price the commodity in any other specified
period (Pi).
(b) Quantity Index numbers: Measure changes in the volume or physical
quantity of goods in the base period (qO) and the volume or quantity of the
commodity in any other Specified period (qi)
(c) Value index number: Measure change in the value of a commodity in the
base period (Vo) and the value of the commodity in the specified period (Vi).
the value of a commodity is the product of its price and quantity, that is, Vo =
Po x qo and Vi = Pi x qi
The price index finds the percentage change in the price of an item from one
period to another.
Note:
12
(i) If the simple price index is more than 100, subtract 100 from the simple
price index. The result is the percentage increase in price from the base
period to the current period.
(ii) If the simple price index is less than 100, subtract the simple price index
from 100. The result is the percentage by which the item cost less in the
base period than it does in the current period.
The price of 50kg of rice has increases in 2019 by 38.5% with respect of 2015
price
Example: In 2018 the price of a litre of petrol is N145, the price in 2020 is
N125.
i. Calculate the price relative
ii. What is the Ip
iii. interpret you Ip
Solution
Petrol Price 2018, Po = N145
13
2020, Pi = N125
𝑷𝒊 𝟏𝟐𝟓
i. Price Relative = 𝒑𝒐 = = 𝟎. 𝟖𝟔𝟐
𝟏𝟒𝟓
The price of a litre of petrol has decreased in year 2020 by 13.8% with respect
to the price in 2018.
14
Mathematically, a time series is represented or defined by the values y 1,
y2 ... of a variable y(prices, temperature, etc) at times t1, t2...
Thus, y=f(t) i.e y is a function of t.
12
10
Quantity of
8
rice sold
0
t
11
t
22
t
33
t
44
t55 (Year)
15
COMPONENTS OF TIME SERIES
The basic ideas underlying time series analysis is that systematic influences
that are associated with time affects its values. The objective of time series
analysis is to identify and measure the influences of the different time related
factors. The fluctuations are due to the influence of physical, economic,
sociological or other forces. The characteristic movement of a time series may
be classified into four main types called component of a time series.
a. Secular Trend (T): This represents a general rise or fall occur in a time
series data over a low period of time. It is a smooth, steady, regular and a
broad movement of the series. In the same direction genuinely covering a
minimum of ten or fifteen years. A secular trend which portrays an upward
movement is population trend while death rate portrays a downward trend.
b. Cyclical Fluctuation (C): This refers to recurrent up and down wave like
variation or oscillation about a trend line. They are often described as
“swings from prosperity, through recession, depression, recovery and back
to prosperity”.
A cycles is said to be completed when beginning with a peak, the falling
curve reaches a minimum point and then rising again reaches the next peak.
These cycles may or may not be periodic.
A typical example of cyclical movement is a business cycle.
Phases of a Business cycle.
Peak
Peak, boom
or Prosperity
Normal
Decline Trough or
depression
16
c. Seasonal Variation (S): These are changes that occur in time series data
that can be attributed to seasonal effect with fairly regular period (usually a
yearly) and reoccur annually. Seasonal effects could be observed within a
day, a week, a month or a quarter of a year depending on the nature of data
being observed.
The factors that create seasonal variations include: climate and weather
conditions, customs, traditions and habits. Daily, hourly or weekly
occurrence of events can also produce seasonal movements.
d. Irregular Variation (I): These are random or sporadic movement of time
series due to chance or unpredictable events such as floods, strikes, election
fires, war, earthquakes, pandemics, epidemic, etc. These events produce
variations lasting a short time hence they are sometimes called residual,
erratic or accidental variations which can be ascribed to cyclical or
seasonal influences.
The graph below shows a hypothetic time series of monthly values for a
24 year period.
Level (y)
Productions,
Sales, etc
2 4 6 8 10 12 14 16 18 20 22 24
Time (t)
17
TIME SERIES MODEL
In traditional time series analysis, it is assumed that there is a multiplicative
relationship between the four components. That is, it is assumed that any
particular value in a series is the product of factors that can be attributed to
the various components.
Symbolically;
Yt = Tt × St × Ct × It ...................... Multiplicative model
Where
Yt = the value of the observed series for a given time t (Result of
the four factors)
Tt = Trend, a long term growth factor
Ct = The cyclical component
St = The seasonal factor
It = The irregular factor
The multiplicative model is mostly accepted because the factors are viewed
as amplifying each other rather than acting separately as assumed by the
additive model. This implies that the factors are not independent of each
other, hence the multiplicative model is considered as a standard
assumption for time series analysis and it is more often employed in
practice.
18
The basic idea of stationarity is that the probability laws that govern the
behaviour of the process do not change over time. In a sense, the process in
is statistical equilibrium.
Specifically, a process {Xt} is said to be strictly stationary if the joint
distribution Xt1, Xt2 ... Xtn is the same as the joint distribution of Xt1-k, Xt2-k,
... Xtn-k for all choices of time points t1, t2 ... tn and all choices of time lag k.
Thus, when n=1 the (univariate) distribution of Xt is the same as that of
Xt-k for all t and k, in other words, the X’s are (marginally) identically
distributed. It then follows that:
E(Xt) = E(Xt-k) t,k. So that the mean function is constant for all time.
That is, the covariance between Xt and Xs depends on time only through the
time difference /t-s/ and not otherwise on the actual times t and s. Thus for a
stationary process, we can simplify our notation and write:
ɣk = Cov(Xt, Xt-k) and ρk = Cov(Xt, Xt-k)
19
ɣ
Note also that ρk = ɣk
o
ɣo = Var(Xt) ρo = 1
ɣk = ɣ-k ρk = ρ-k
/ɣk/≤ ɣo /ρk/≤ 1
If the process is strictly stationary and has finite variance, then the
covariance function must depend only on the time lag.
A definition that is similar to that of strict stationarity but is
mathematically weaker is as follows:
A process {Xt} is said to be weakly (or second order) stationary if;
a. The mean function is constant over time
b. ɣt,t-k = ɣo,k time t & lag k
INTRODUCTION TO DEMOGRAPHY
Demography is the study of populations, especially with reference to
size and density, fertility, mortality growth, age distribution, migration and
vital statistics. It involves the integration of all these features with social and
economic conditions.
Demography is simply the study of population. It looks at everything
that influences population size, distribution processes, and the influence that
change in population has on contemporary issues.
Demography is the study of the changes in number of births, deaths,
marriages and cases of diseases in a community over a period of time.
Other definitions of demography include:
Demography is the statistical switch of human populations.
20
Demography is the study if information in figures (statistics) about the
population of an area or country and how these figures vary with time.
Demography is the study of the population in its static and dynamic aspects.
The static aspects include characteristics such as composition by age, sex, race,
marital status, economic characteristics. The dynamic aspects are fertility,
mortality, natality and migration (John Hopkins University, 2008).
Demography is the study of size composition, growth and distribution of
human population (Henslin, 2009).
USES OF DEMOGRAPHY
Weeks (1998) observed that demography is one of the areas in sociology
that treats things from the practical point of view. Demography can be used in
politics, by Government as much as in business.
21
by age-specific groups, that business can use information from demographic
studies to discover communities where members of that age-specific group
live. For example, a business of stationery will work very well in areas where
there are many students. A shop that sells fashion dresses and shoes for young
people will do very well in a College, University or Polytechnic or any other
higher Institutions environment. Fertilizer will sell best in a farming
community. In general, demographic awareness could help in finding
neighbourhoods where a business would yield the most profit and satisfaction.
22
civic registration, but also less able to provide detailed information on small
geographic areas and population subgroups.
❖ A Census is the total process of collecting, compiling and publishing
demographic, economic and social data pertaining to a specified times or
times to all persons in a country or in a delimited territory. Census tell us
the size of the population by sex, age, marital status and citizenship. It
gives information on other population composition such as educational
level, religion, work status and occupation.
POPULATION DYNAMICS
Population dynamics refer to the ever-changing interrelationships among the
set of variables that influences the demographic makeup of population as well
as variables that influences the growth and decline of population sizes. Among
the factors that relate to the size as well as the age and sex composition of
population are fertility, death rates and migration.
Fertility: Is a child bearing capacity of the population represented by women
between the ages of 1 – 49years.
Fertility rate is a number of births per 1000 women of specific composition.
i. General fertility rate: is the number of live births per 1000 women
between the ages of 15 and 49years.
GFR = Number of life births X 1000
Mid-year female population age
ii. Age Specific Fertility Rate: Is the number of births to women of a
particular age (a year or age group) and females in the age group 25 – 29
years.
Total Fertility Rate (TFR): Is the average number of children a woman would
bear during her reproductive life from (15 – 40 years), assuming her child-
23
bearing confirms to her age-specific fertility rate every year of her child-
bearing years.
Computation of total fertility rate, based on Hungary’s 2010 data;
From biological point of view (with concerning migration and at a stable level
of mortality) TFR indicates clearly the trend of the human reproduction. The
cut value is TFR=2, which means that mother and father in the family will be
replaced by 2 children.
Number of
Women/children 0.10 0.21 0.35 0.40 0.15 0.05 0.0
in live years
TFR greater than 2 means growing population, TFR less than 2 means
decreasing number of the population.
Other rates on fertility include Crude birth rate (CBR), Gross reproduction rate
(GRR) and Net reproduction rate (NRR).
Mortality: Is a relationship of death cases to the whole population. There are
basically two types of mortality.
(a) General/Crude mortality rate or death rate.
(b) Specific mortality rates.
24
• Age and sex related (special rates; infant mortality and fetal
losses).
• Cause related (diseases, injuries, suicide, and homicide).
• Life expectancy (Sex and age related)
a) Crude death rate: is the rate number of death cases in a year per 1000 of
the population.
CDR = Number of death losses X 1000
Mid-year population
Example:- death cases= 135,000, Mid-year population = 10,000,000
CDR = 135,000 X 1000
10,000,000
= 13.5
25
MOMENT OF A FREQUENCY DISTRIBUTION
(a) Let x1, x2 ... xn be random variable with frequencies f1, f2 ... fn, then the
rth moment (about zero) of the frequency distribution is defined as
∑ 𝑓𝑥 𝑟
𝑀𝑟𝑖 = ∑𝑓
∑ 𝑓𝑥 1
Thus, 1st moment 𝑀1𝑖 = ∑𝑓
= mean = 𝑥̅
∑ 𝑓𝑥 2
2nd moment 𝑀2𝑖 = ∑𝑓
∑ 𝑓𝑥 3
rd
3 moment 𝑀3𝑖 = ∑𝑓
∑ 𝑓𝑥 𝑟
rth moment 𝑀𝑟𝑖 = ∑𝑓
(b) The rth moment (about the mean) of the frequency distribution is
∑ 𝑓(𝑥−𝑥̅ )𝑟
𝑀𝑟 = ∑𝑓
= 0.
26
∑ 𝑓(𝑥 2 −2𝑥𝑥̅ + 𝑥̅ 2 )
= ∑𝑓
∑ 𝑓𝑥 2 − 2𝑥̅ ∑ 𝑓𝑥 + 𝑥̅ 2 ∑ 𝑓
= ∑𝑓
1 ∑ 𝑓𝑥 ∑ 𝑓𝑥
= ∑ 𝑓 [∑ 𝑓 𝑥 2 − 2 ∑𝑓
(∑ 𝑓𝑥) + ∑ 𝑓 ( ∑ 𝑓 )2 ]
1 (∑ 𝑓𝑥)2 ∑ 𝑓𝑥
= ∑ [∑ 𝑓 𝑥 2 − 2 ∑𝑓
+ ∑ 𝑓 ( ∑ )2 ]
𝑓 𝑓
1 (∑ 𝑓𝑥)2 (∑ 𝑓𝑥)2
= ∑ 𝑓 [∑ 𝑓 𝑥 2 − 2 ∑𝑓
+ ∑𝑓
]
1 (∑ 𝑓𝑥)2
= ∑ 𝑓 [∑ 𝑓 𝑥 2 − ∑𝑓
]
∑ 𝑓𝑥2 ∑ 𝑓𝑥 2
= ∑𝑓
−[ ∑𝑓
]
2
= 𝑀2𝑖 − (𝑀1𝑖 )
Exercise: Show that the 3rd and 4th moments about the mean M3 and M4 can be
expressed in terms of moments about the origin (zero) as follows;
(a) 𝑀3 = 𝑀3𝐼 − 3𝑀1𝐼 𝑀2𝐼 + 2(𝑀1𝐼 )3
(b) 𝑀4 = 𝑀4𝐼 − 4𝑀1𝐼 𝑀4𝐼 + 6(𝑀1𝐼 )2 𝑀2𝐼 − 3(𝑀1𝐼 )4
Example: Find the 1st moment about the origin and the 3rd moment about the
mean for the distribution below:
Class interval 1–5 6 – 10 11 – 15 16 – 20
Frequency 3 5 6 1
∑ 𝑓𝑥 1
1st moment about the origin: 𝑀1𝑖 = ∑𝑓
27
SKEWNESS AND KURTOSIS
Skewness can be described as a measure of non-symmetry. A frequency
distribution or curve is said to be symmetrical of the value equidistant from a
central maximum have the same frequencies. If a distribution is symmetrical,
then the two halves are the mirror images of each other.
(a) Normal curve (b) Rectangular distribution
0 1 2 3 4 5 6
28
KURTOSIS
Kurtosis is the degree of peakedness of a distribution. It indicates the extent to
which frequencies are closely group or thingly spread throughout observed
values. Distributions may have the same degree of skew but different degree of
kurtosis. “Platykurtic” is the name given to “flat – topped” distribution and
“Leptokurtic” to more peaked distributions. Mesokurtic distributions are
considered as normal distributions.
Leptokurtic
Mesokurtic or curve
Platykurtic normal curve
curve
𝑀4
A measure of kurtosis of a distribution is given by (𝑀2 )2
− 3; a negative or
positive value showing how less or more peaked (respectively) the given
distribution is compared to a “Normal” distribution.
Another measure is called the percentile co-efficient of kurtosis defined
𝑄
as; 𝑘 =
𝑃90 − 𝑃10
29