Advanced Statistical Methods
Advanced Statistical Methods
David Dayya, D.O., M.P.H. Dept. of Family Medicine Saint Barnabas Hospital
Objectives
Overview of advanced statistical methods used in medicine Kaplan Meier Curves in Survival analysis ANOVA, ANCOVA, MANOVA Ad Hoc Analysis Simple Linear, Multiple Linear, Nonlinear and Logistic Regression
drawn are (approximately) normally distributed. 2. The populations from which the samples are drawn have the same variance (or standard deviation). 3. The samples drawn from different populations are random and independent.
Sum Of Squares
To calculate MSB and MSW, we first compute the between groups sum of squares denoted by SSB and the within groups sum of squares denoted by SSW. The sum of SSB and SSW is called the total sum of squares and it is denoted by SST; that is,
( x ) T T T SSB = + + + ... n n2 n3 n 1
2 1 2 2 2 3
T T T SSW = x + + + ... n 1 n2 n3
2 2 1 2 2 2 3
Degrees of Freedom
k1 nk
n1
MSB F= MSW
Solution 12-3
H0: 1 = 2 = 3
The mean scores of the three groups are equal
Do not reject H0
Reject H0
= .05
0
Critical value of
3.16
Simple Regression
Definition A regression model is a mathematical equation that describes the relationship between two or more variables. A simple regression model includes only two variables: one independent and one dependent. The dependent variable is the one being explained, and the independent variable is the one used to explain the variation in the dependent variable.
Linear Regression
Definition A (simple) regression model that gives a straight-line relationship between two variables is called a linear regression model.
Relationship between Blood Pressure and BMI (a) Linear relationship (b) Nonlinear relationship
Blood Pressure
Blood Pressure
Linear
Nonlinear
BMI (a)
BMI (b)
y = 50 + 5x
100
x = 10 y = 100 x=0 y = 50
20 25 30
80
5 1 5 1 80 Change in x Change in y
y-intercept
Slope
y = A + Bx
Dependent variable Independent variable
y =mx + b
y = A + Bx
= a + bx
Scatter Diagram
Definition A plot of paired observations is called a scatter diagram.
Scatter diagram.
Blood Pressure
BMI
Blood Pressure
BMI
e
Blood Pressure
Regression line
BMI
SSE = e = ( y y )
2
The values of a and b that give the minimum SSE are called the least square estimates of A and B, and the regression line obtained with these estimates is called the least square line.
b=
SS xy SS xx
and
a = y bx
where
SS xy =
( x )( y ) xy
n
and
SS xx = x 2
( x )
n
and SS stands for sum of squares. The least squares regression line = a + bx us also called the regression of y on x.
Interpretation of a and b
Interpretation of b The value of b in the regression model gives the change in y due to change of one unit in x We can state that, on average, a 5% increase in BMI of a will increase the SBP by 20 mmHg.
y
b>0
y
b<0
Error Distribution
E() = 0
Food expenditure
16 12 8 4
10
x = 20
30
x = 35
BMI
40
50
x
(a) (b)
se =
where
SS yy bSS xy n2
( y ) 2 n
SS yy = y 2
Coefficient Of Determination
Total Sum of Squares (SST) The total sum of squares, denoted by SST, is calculated as
SST = y
( y )
n
Total errors.
y = 9.1429
10
20
30 BMI
40
50
Coefficient Of Determination
Regression Sum of Squares (SSR) The regression sum of squares , denoted by SSR, is
Coefficient of Determination
Coefficient of Determination The coefficient of determination, denoted by r2, represents the proportion of SST that is explained by the use of the regression model. The computational formula for r2 is bSS
r =
2 xy
SS yy
and
0 r2 1
Inferences and B
Sampling Distribution of b Estimation of B Hypothesis Testing About B
Sampling Distribution of b
Mean, Standard Deviation, and Sampling Distribution of b The mean and standard deviation of b, denoted by b and b , respectively, are
b = B
and
b =
SS xx
Estimation of B
Confidence Interval for B The (1 )100% confidence interval for B is given by
b ts b
where
sb = se SS xx
y
Population regression line
y| x = A + Bx
y ts ym
Model Construction
Solve for the constant and the coefficients using Differential Calculus of matrices. Use the model to predict blood pressures given the values for a set of variables of interest i.e. bmi, age, and #cigarettes smoked.
References
Gehlbach SH. Interpreting the Medical Literature. Practical Epidemiology For Clinicians. 5th Ed. 2006. Gordis L. Epidemiology. 3rd Ed. 2004. Hulley SB, Designing Clinical Research. An Epidemiologic Approach. 3rd Ed. 2006.
Epidemiology/Research Methods
Maran R. Maran Illustrated Office 2003. 1st Ed. 2005. Maran R. Maran Illustrated Access 2003. 1st Ed. 2005. Maran R. Maran illustrated Excel 2003. 1st Ed. 2005. George D, Mallory P. SPSS For Windows: Step-by-Step. 7th Ed. 2006 Step- byHinton PR. Brownlow C, McMurray I. et. al. SPSS Explained. 1st Ed. 2004. Delwiche LD, Slaughter SJ. The Little SAS Book: A Primer. 3rd Ed. 2003 Acock AC. A Gentle Introduction to Stata. 1st. Ed. 2005. Stata.
Dataset/Database Management
Agrawal A. EndNote 1-2-3 Easy! Reference Management For the Professional. 1st ed. 2005. Maran R. Microsoft Office 2000 Simplified. 1999. Maran R. Maran Illustrated Office 2003. 1st Ed. 2005.
References
Supplementary and Advanced Level References Glantz SA, Slinker BK. Primer of Applied Regression and Analysis of Variance. 2nd Ed. 2000. Kleinbaum DG, Kupper LL, Nizam A, Muller KE. Applied Regression Analysis and Multivariable Methods. 4th Ed.. Methods. 2007. Winer BJ, Brown DR, Michels KM. Statistical Principles in Experimental Design, 3rd Ed. Snedecor GW, Cochran WG. Statistical Methods. 8th Ed. 1989. Maxwell SE, Delaney HD. Designing Experiments and Analyzing Data: A Model Comparison Approach. 2nd Ed. Data: Keppel G, Wickens TD. Design And Analysis. A Researchers Handbook. 4th Ed. 2004. McMahon D. Linear Algebra Demystified. 1st Ed. 2005. Lay DC. Linear Algebra and its Applications. 3rd Ed. 2005. Clark-Carter D. Quantitative Psychological Research: A Students Handbook. 2004. ClarkHandbook. Russo R. Statistics for the Behavioral Sciences. 2003.
Statistics/Biostatistics
(1)Evans JS, Evans BT. How To Do Research. 2005. (2) Boynton PM. The Research Companion. A Practical Guide for the Social and Health the
Epidemiology/Research Methods
Sciences. 2005.
Englebardt SP. Health Care Informatics: An Interdisciplinary Approach Medical Informatics: Knowledge Management and Data Mining in Biomedicine Biomedicine
References
Rice University Virtual Lab in Statistics (online multimedia tutorial and textbook) www.onlinestatbook.com/rvls/ UCLA Statistical Computing Online Tutorial on SAS, STATA, and SPSS. www.ats.ucla.edu/stat/overview.htm Practice Datasets www.vetmed.wsu.edu/appliedregression/ Against All Odds: Inside Statistics www.learner.org/resources/series65.html Statistics www.videoaidedinstruction.com/
Video Instruction Resources Useful WWW Online Resources
Review
Know your data types i.e. Nominal, ordinal, interval, and ratio, parametric, nonparametric, continuous, discrete. Be able to identify the type of study i.e. cohort, case-control, cross-sectional, intervention trial, cross-over casecrosscrossdesigns. Recognize blinding, systematic error, confounders, sources of bias and their significance. bias Understand error types 1 and type 2, alpha, beta, power. Understand the general principles around Sensitivity, Specificity, Positive and Negative predictive values, and the Specificity, effect of prevalence. Understand epidemiologic principles and basic definitions i.e. incidence, prevalence, causation theory, levels of incidence, evidence. Know how too interpret a Relative Risk Ratio or Odds ratio, attributable risk and the types of studies they can be attributable applied to. Know what the hypothesis and null hypothesis mean. Understand why we use a kappa statistic and how to interpret it. Understand the principles behind a kaplan-meier curve and how to interpret it in survival analysis. kaplanUnderstand how to interpret a p-value, an alpha, or a confidence interval of an effect size or a risk ratio. pUnderstand the general principles behind a literature search techniques. techniques. Understand when a multivariate or a biivariate hypothesis test is required and be able to recognize whether or not the author used the correct test. Recognize and interpret the measures of central tendency and dispersion in the data. dispersion Differentiate between population and sample data. Understand the concepts of inclusion/exclusion criteria. Understand the concepts of validity, reliability, accuracy and precision. precision. Understand and interpret the graphical representation of data. Understand the limitations and strengths of prospective vs. retrospective studies. retrospective
Non-parametric
Nominal Data
One Sample Test Binomial (binomial equation or Z approximation) Chi-Square Goodness of Fit Test
Two Sample Test Related Samples Unrelated Samples
Interval Data
Z test
Paired t Unpaired t
K Sample Test Related Samples Cochran Q Randomized Block Analysis Of Variance Analysis Of Variance (ANOVA) (followed by Tukey or SNK))
Kruskal-Wallace
Evidence Pyramid
Cases and controls selected from a medical facility/facilities, community or general population. Subjects are diseased at onset of the study. Can select Incident cases or Prevalent cases. Bias more common. Rare Diseases Less Cost Less difficult logistics Ethical considerations
Interpreting the Medical Literature Outline Resident Name_________________________ Date_________________________________ Citation: When reading the assigned article consider these questions for discussion at the Journal Club meeting? General Considerations 1. Is the title of the article consistent with the content of the article? 2. What were the author(s) conclusions and how strongly were they worded? 3. Did the research question warrant doing a study on this topic i.e. unnecessary, clinical practical significance? Systematic Design Considerations 4. What were the dependent (outcome) variables? Were they clearly defined and adequately measured? 5. What were the independent (exposure/intervention/predictor) variables? Were they clearly defined and adequately measured? 6. What was the design of the study? Was there an adequate control group, blinding, randomization? Were confounders balanced or excluded in the design? 7. If the authors conclusions are correct to whom can they be generalized to based on the sample selected? Statistical Considerations 8. Were any associations established and if so what was/were the strength of the associations? Was there statistical significance?
Interpreting the Medical Literature Statistical Considerations - Continued 9. Was the statistical test used to determine statistical significance appropriate and correctly interpreted? 10. What do you estimate the potential was for type 1 or type 2 error in the study?
11. Were the authors conclusions justified based on your assessment of the strengths and weaknesses of this study? 12. How could the study design have been improved?