6 Data Analysis
6 Data Analysis
Learning objectives
Concepts of data analysis. Need for data preparation techniques. Various statistical techniques for data analysis Factors that influence the selection of an appropriate data analysis strategy.
Data analysis
It is a process by which data is converted into useful information. Raw data from questionnaires is processed in some way to make it amenable to draw conclusions
The purpose of data analysis is to produce information that will help to address the problem.
Data validation
It is concerned with the process of determining, to the extent possible, whether a surveys interviews or observations were conducted correctly and are free of fraud or bias.
Data editing
It is the process whereby the raw data are checked for mistakes by either the interviewer or the respondents. The researcher can check several areas of concern:
Asking the proper questions. Accurate recording of answers. Correct screening of respondents. Complete and accurate recording of openended question.
Coding
It is grouping and assigning value to various responses to the questions on the survey instrument. Specifically, coding is the assignment of numerical values to each individual response for each question on the survey. Typically, the codes are numerical (a number from 0 to 9) since these are easy to input.
Data entry
It is the procedure used to enter the data into the computer for subsequent data analysis. It includes those tasks involved with the direct input of the coded data into a software package that enables the research analyst to manipulate and transform the raw data into useful information. One critical step is to ensure that the data entered is correct and error free.
Weighing: It is the procedure by which each response in the database is assigned a number according to some perspective rule.
Variable re-specification: It is a procedure in which the existing data are modified to create new variables, or large number of variables are collapsed into fewer variables.
Dummy variables: These are used extensively for re-specifying categorical variables.
Scale transformation: It involves the manipulation of scale values to ensure comparability with other scales or otherwise make the data suitable for analysis.
Frequency distribution
Reports the number of responses that each question received. Simplest way of determining the empirical distribution of the variable. It organizes the data into classes or groups of values, and shows the number of observations from the data set that falls into each class. It can be represented by percentage breakdown of the various categories and visual bar graph presentation known as a histogram.
Descriptive Statistics
fX
i
X
where
I=1
n
fi = the frequency of the ith class
Xi = the midpoint of that class h = the number of classes n = the total number of observations
Descriptive Statistics
Median It is the middle value of the distribution when the distribution is ordered in either an ascending or a descending sequence. Mode It is the most common value in the set of responses to a question i.e. the response most often given to the question.
Descriptive Statistics
Measures of Dispersion
Standard deviation It describes the average distance of the distribution values from the mean. Calculated by: subtracting the mean of a series from each value in a series. squaring each result. summing them. dividing by the number of items minus 1. and taking the square root of this value.
Descriptive Statistics
Standard deviation:
S =
where
n-1
I=1
(Xi - X) 2
Xi = the value of the ith observation X = the sample mean n = the sample size
Descriptive Statistics
Range
It defines the spread of the data. The distance between the smallest and the largest values in a set of responses.
Statistical techniques
Various statistical techniques used are : Univariate, involving single variable at a time. Bivariate, involving two variables at a time. Multivariate, involving three or more variables at a time.
Univariate Techniques
These are the Statistical techniques appropriate for analyzing data when there is a single measurement of each element in the sample or, if there are several measurements on each element, each variable is analyzed in isolation.
Parametric statistics
One sample
One sample
Independent
T-test Z-test
Independent
Dependent
Dependent
Multivariate Techniques
Statistical techniques suitable for analyzing data when there are two or more measurements on each element and the variables are analyzed simultaneously. Multivariate techniques are concerned with the simultaneous relationships among two or more phenomena.
Multivariate techniques
Multivariate techniques
Dependence techniques
Interdependence techniques
Focus on variables
Focus on objects
Factor analysis
Thank You