Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
244 views

Introduction To Database Management and Statistical Software

The document provides an introduction to data management and statistical packages. It discusses the objectives of learning data processing, management and statistical analysis using software like EpiData, Epi-Info, SPSS, Stata. It describes the key steps in data management including data entry, validation, cleaning and documentation. It also summarizes the different types of statistical analyses that can be performed including descriptive, bivariate, stratified and multivariate analyses. Finally, it introduces some commonly used statistical software packages for data management and analysis.

Uploaded by

amin ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views

Introduction To Database Management and Statistical Software

The document provides an introduction to data management and statistical packages. It discusses the objectives of learning data processing, management and statistical analysis using software like EpiData, Epi-Info, SPSS, Stata. It describes the key steps in data management including data entry, validation, cleaning and documentation. It also summarizes the different types of statistical analyses that can be performed including descriptive, bivariate, stratified and multivariate analyses. Finally, it introduces some commonly used statistical software packages for data management and analysis.

Uploaded by

amin ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

Introduction to data management and

statistical packages

03/13/23 1
Objective
• The main objectives of this session are to enable the
participants:
– To process and manage public health data using statistical
software (EpiData, Epi-Info, SPSS, Stata…

– To acquire concepts and skills of data management and


statistical analysis using statistical packages

– To apply knowledge and skill of data management and


statistical analyses software in your research project and
working area/healthcare setting
03/13/23 2
• Data management involve entering and processing of the
data using a computer to produce desired outputs

• Aim is to create reliable/relevant database which include:


• Data entry
• Data validation and checking
• Data manipulation
• Data files backup
• Data documentation

3
Data management and analysis
Data processing
–Data coding
–Data entry
–Data cleaning
Data analysis
–Descriptive analysis
–Bivariate analysis
–Stratified analysis
–Multivariate analysis,

4
Data management
Data processing refers to:
data entry into a computer
data checks and correction
Variable coding
Data cleaning
The aim of this process is to produce a
relatively ‘clean’ data set
Data Entry
 concerns the transfer of data from a
questionnaire to a computer file.
 It is converting data into a form that can be
read and manipulated by computers used in
quantitative data analysis.
 Before analysis, data must be checked for
errors,
 information that needs coding must be coded
and missing values must be dealt with
Data coding
• In general computers are at their best with
numbers.
• You must translate variables through coding
• Coding is assigning a separate (non-
overlapping) numerical code for separate
answers and missing values.
• Eg. Instead of using ‘Male’ and ‘Female’ for
the variable sex, it can be indicated as 1 =
Male and 2 = Female .
Missing values
• Missing values occur when measurements were
not taken, or respondents did not answer
questions.
• missing values should not be entered as a ‘blank’,
because some statistical packages interpret
blanks as zero.
• Ideally, a code should be chosen to denote a
missing value (e.g. a code ‘9’ or ’99’ or ’999’)
• After the analysis is finished, decoding back to
original variables is required when writing the
report
Data Cleaning
• Once data have been gathered, they need to
be entered into a computer data file and
checked for errors.
• The aim of this process is to produce a clean
set of data for statistical analysis.
Components of Data Analysis
Data processing
– Data entry
– Coding
– Cleaning
Descriptive /exploratory
– Frequencies,
– Tables and graphs
– Cross tabulations (chi-squares, spearman’s correlation…)
– Measures of central tendency and variations
– Proportions/percentages
Analytic /inferential
– Estimation
– Confidence intervals (P-value, OR,…)
– Hypothesis testing
– Statistical models
Statistical software
Spreadsheets and graphics
For data entry:
Epidata
 EPI Info

For data analysis


SPSS
STATA
SAS
 R ……….
Spreadsheet
Excel help us to:-
• Calculation
• Formatting
• Working with tables
• Charts
Programmemicrosoft officeexcel
Ribbon

The office button


If you merely wish to add something new you must
double click on the active cell

Formula bar displays content of the cell


1. Calculation
• Place the cursor in the random cell and type
=2+3 then press inter key.
• Use can use other functions(+,-,*, / ,^…)
• Additionally excel helps for
• Copy , paste
• Series(e.g. days, numbers)
2. Working with tables

• Create table
• Filtering(to select and show certain data in the
table)
3. charts

1. Bar chart
2. Line chart
3. Circle chart
4. Scatter chart
Statistical softwares for data management and analysis
•A number of data management software packages exists
are: EPI-Info, EpiData, SPSS, STATA, spreadsheets,…

•Spreadsheets are not good for data entry as they are


unreliable and can be easily corrupted

Example: easy to type over, lose track of records,


duplicate data & so on.

20
 EpiData and Epi-Info are good for
data management (Data entry )
•Because they are fast, reliable, allows
for controlled and double data entry,

 SPSS, STATA, SAS, R, S Plus, and


Minitab are good statistical software's
for data analyses

21
EpiData
• It is used for simple or programmed data entry and data
documentation.

• EpiData does not contain any complex data analysis


functions but data may be exported to a variety of
common formats

• Optimized documentation and error detection features.

• E.g. double entry verification, list of ID numbers in


several files, codebook overview of data, date added to
backup and encryption procedures.

22
Statistical Softwares for Data Analysis
• As quantitative research grows, application of
statistical software (SS) becomes a more crucial
part of data analysis.

• Researchers are experiencing a transition from


manual analysis with paper to more efficient
electronic analysis with statistical software (SS).

• It identifies the prerequisites of producing world-


class studies by using modern SS solutions.
23
Statistical Softwares…
• Statistical Software (SS) is a vital tool for research analysis, data
validation and findings

• Over the course of history, different forms of data analysis


methods have been in existence.

• Initially, it was paper and pen and later the advent of which
computer has helped invention of punching machines & later
upgraded to simple calculator and complex scientific calculator.

• Statistical software is a software program that makes the


calculation and presentation of statistics relatively easy

24
Statistical Softwares…
• Many proprietary and freeware statistical software
packages are available that are suitable for different
statistical analysis, depending on the user's needs

• Some of the proprietary Software are SPSS, STATA, SAS,


MINITAB, etc., and few among open or free Statistical
software are R, EPI-INFO, CS-PRO, to mention but few.

• The emergence of statistical software in 21st century has


helped different researchers in the physical and social
science to improve in the quality of research

25
Available Statistical Packages

26
Statistical Softwares…

27
Statistical Softwares…

28
STATA
• It is powerful statistical package with smart data management
facilities, wide array of up-to-date statistical techniques, &
excellent system for producing publication-quality graphs
• STATA is available for Windows, Unix, and Mac computers.
• The standard version is called Stata/IC (or Intercooled Stata)
and can handle up to 2,047 variables.
• Special edition called Stata/SE that can handle up to 32,766
variables (allows longer string variables & larger matrices)
• The version for multicore/multiprocessor computers called
Stata/MP, which has the same limits but is substantially faster.
29
Statistical Softwares…
• STATA performs most general statistical
analyses:
– Regression analysis (any type of regression)
– Survival analysis
– Analysis of variance (ANOVA/ANCOVA)
– Factor Analysis and PCA
– Multivariate analysis (MANOVA/MANCOVA)
– Time series analysis.

30
Statistical Softwares…

31
EpiInfo…
• Epi Info is a suite of free data management,
processing and analysis software designed
specially for public health community by CDC

• Epi Info has been in existence for over 20 years


and is currently available for Microsoft Windows.

• The program allows for electronic survey


creation, data entry, and analysis.
32
Epi Info…
Modules of EpiInfo
• Form Designer
• Stat-Calc Do Sample sizes for X-sectional, CC, RCT, Cohort
• Nut-stat Growth chart for child aged 0-5 and 6-15 years
• Analysis module:
– T-tests, ANOVA, Nonparametric statistics, etc
– cross tabulations, stratified and matched analysis
– Estimates of odds ratios, risk ratios, and risk differences,
– Logistic regression (conditional and unconditional),
– survival analysis (Kaplan Meier, Cox proportional hazard),
– Complex sample analysis like survey data.
• Epi-Map
• Report writers

33
EpiInfo…

34
SPSS
Definition of - SPSS

• SPSS is a Windows based program that can be


used to perform data entry and analysis and to
create tables and graphs.

35
Opening SPSS
oyou can open SPSS in one of two ways.
1. If there is an SPSS on the desktop, simply
put the cursor on it and double click the left
mouse button.
2. If not you can open by following this step
Start > All program > SPSS 16.0

36
 When you use SPSS, you work in one of
several windows:
• The data view
• Variable view
• Output view
• Draft output view
• The syntax view

37
The data view
•The data view displays your actual data and any
new variables you have created.

The variable view


•‘At the bottom of the data window, you’ll notice
a tab labeled Variable View. The variable view
window contains the definitions of each variable
in your data set, including its name, type, label,
size, alignment, and other information.

38
Name
–Each variable name must be unique; duplication
is not allowed.
–Start with a letter
–May have up to 8 characters, including letters,
numbers, and the symbols (@, #, _, or $).
–Variable names cannot end with a period

39
Type
under this we specify weather variables are
numeric, string and other type
Width
 Width indicates the number of characters
–Maximum width is 40 characters.
Decimals
–If more decimals have been entered or computed by
SPSS, the additional information will be retained
internally but not displayed on screen.

40
Label
- Is important to identify in detail what a variable
represents.
–Is limited to 255 characters
–May contain spaces and punctuation
Values
– And here his is where we give code for
categorical variables.

41
Missing
–Signal to SPSS which data should be treated as
missing
–System Missing data – SPSS display a single
period
Columns
–Columns affect only the display of values in
the Data Editor. Changing the column width
does not change the defined width of a variable.

42
Measure
–Indicates the level of measurement
–Then under this the measurement scale of
variables are identified. Type of measurement
are
•Nominal
•Ordinal and
•Ranked or scale.

43
Nominal
-Is the simplest type of data in which all values
fall into unordered categories or classes.
-it data takes on one of two distinct value-such as
male and female are said t be dichotomous
-However not all nominal data need to be
dichotomous. Often there are 3 or 4 possible
categories into which our data may fall.
for example: blood type.

44
Ordinal
-When the order is important among categories
is the observation is referred to as ordinal data
type.
For example: the result of blood pressure of
patient (normal, serious, critical).
Scale data (Interval and Ratio)
- Is the data type in which the observations are
arranged from lowest to highest.

45
The output view
The output window is where you see
the results of your various queries

The draft view


•The draft view is where you can
look at output as it is generated for
printing.

46
The syntax view
•The best method of preserving the exact steps of
a particular analysis is the syntax view.
•In the syntax view, you’ll preserve the code used
to generate any set of tables or charts.
Note
Syntax is basically the actual computer code that
produces a specific output.

47
48

You might also like