Log Linear Notes

Uploaded by

n02017418v

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Log Linear Notes

Uploaded by

n02017418v

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Log-linear Models

The classical log-linear models are tools for analyzing relationships among two or more categorical
variables. They are based on multi-dimensional joint frequency tables. In the sample, each cell in such a
table contains the number of cases with a particular combination of values of the variables. In the
population, each cell in our multi-dimensional table contains a probability -- the probability of selecting a
case with that combination of values. This table is exactly the joint probability distribution of the variables
in the analysis.

If you multiply the probabilities by the size of the sample, you get expected frequencies. A "log-linear
model" is a statistical model for the natural logarithm of the expected frequency. It looks like a multiple
regression model with effect coding, in which the interaction terms correspond to associations among
variables. If the variables are unrelated (conditionally upon the values of other variables in the model), the
interaction terms are missing. The terms corresponding to main effects represent departures from equal
marginal frequencies.

The model with terms corresponding to all possible main effects and interactions is called the saturated
model. In multiple regression, such a model would be estimated from the data with error. In loglinear
modelling, the saturated model always fits the data perfectly. There is the same number of terms in the
model as cell frequencies, the relationship between them is one-to-one, and that's it.

So you can't test the saturated model, but you can estimate a non-saturated model with software, and test
the difference between that one and the saturated model. Such tests are often called "goodness of fit" tests,
because they tell you whether the model in question is significantly worse than the saturated (perfect)
model. There are several good ways to conduct goodness of fit tests, but we will confine ourselves to
likelihood ratio tests. We really are testing the difference between a reduced model and a full model. In
this case, the full model is the saturated model.

If the reduced model is true (that's the null hypothesis), the likelihood ratio statistic (minus two times the
natural log of the likelihood function evaluated at the MLE, if you know what that means) has a distribution
that approaches a chi-square distribution as the sample size increases. The degrees of freedom are the

STA442s04 Loglinear Models:

Page 1
number of terms present in the saturated model but missing from the reduced model.

Often, the tests that are really interesting can be expressed as the difference between the saturated model and
a carefully chosen reduced model. You are testing (simultaneously) all the associations among variables
that are absent in the reduced model. Sometimes, especially when four or more variables are involved,
what you are interested in may correspond to the difference between a reduced model and an even more
reduced one. In this case, the difference between -2 log(likelihood) of the two models will have a chi-
square distribution with df equal to the difference in degrees of freedom -- provided that the terms in the
more reduced model are a subset of the terms in the less reduced one. That is, the models have to be
nested in order for the large-sample likelihood ratio tests to be valid. If you do this, please at least be sure
that your less restricted model (that's your new "full" model that's not the saturated model) fits the data
fairly well. At the very least, it should not fit significantly worse than the saturated model.

In order to avoid ambiguities and tricky problems interpreting results, we will confine ourselves to
hierarchical log-linear models. Hierarchical means that if an effect is present in a model, then all the
lower-order effects that make it up must also be in the model. For example, if a model contains a three-way
A by B by C association, then it must also contain A, B, C, A by B, A by C and B by C.

For hierarchical models, there is a very convenient bracket notation for expressing association and lack of
association among variables, especially if the variables can all be represented with single symbols like
letters or numbers. Just enclose sets of variables that are associated within the same set of brackets. A
variable that does not appear at all has equal marginal frequencies. For example,

° For three variables numbered one through three, the model [1] [2] [3] allows each variable to
have unequal marginal frequencies, but it contains no relationships among variables. It is a model of
complete independence. If the test for goodness of fit is significant, the conclusion is that there is some
relationship among variables. This is not a bad place to start in any analysis.

° The model [1,2] [1,3] [2,3] allows for lack of independence in each of the three two-way
marginal tables. Because the model is hierarchical, the three single-variable terms are implicitly present.
The only term that is missing from this model is the three-factor relationship 1*2*3, so the test for
goodness of fit is a test for whether, equivalently,

STA442s04 Loglinear Models:

Page 2
* The relationship between 1 and 2 is the same for all values of variable 3
* The relationship between 1 and 3 is the same for all values of variable 2
* The relationship between 2 and 3 is the same for all values of variable 1
That is, it is analogous to a two-factor interaction in the normal linear model.

° The model [1,2] [1,3] says that the only thing going on is (possibly) a relationship between
Variables 1 and 2, and a relationship between 1 and 3. Any apparent relationship between 2 and 3 arises
from the fact that they are both related to 1. This is a model of conditional independence. That is,
conditionally on (controlling for) the value of Variable 1, Variables 2 and 3 are unrelated.

You can get the test statistic for this model another way. Produce separate two-way tables of Variable 2 by
Variable 3 -- one for each value of Variable 1. This is the subdivision approach to controlling for Variable
1. Add the chisquare values for testing independence in the sub-tables. Under the null hypothesis that
Variables 2 and 3 are independent for each fixed value of Variable 1, the sum of chisquares has a chisquare
distribution, with degrees of freedom equal to the sum of degrees of freedom from the sub-tests. If you
add likelihood ratio chi-squares, you get the standard test of conditional independence for loglinear models.
But adding Pearson chi-squares is valid too.

By the way, suppose the test just described is significant. To see where the effect comes from, try looking
one-at-a-time at the chi-square statistics you just added up. It would be best to apply a Bonferroni
correction.

This is a good way to slice up a test for conditional independence, but it is not the only good way. The
model [1,2] [1,3] lacks two terms that are present in the saturated model. They are 2*3 and 1*2*3. If we
added just the first one to the model [1,2] [1,3], we would get [1,2] [1,3] [2,3]. This says that Variables 2
and 3 may be related, but if so they are related in the same way for all values of Variable 1. To test for
this limited form of dependence of 2 and 3, use [1,2] [1,3] [2,3] as the full model and [1,2] [1,3] as the
reduced model. Again, the test statistic is the difference in -2 times the log likelihood for the two models,
distributed as chisquare with df equal to the difference in degrees of freedom. The numbers are quite easy
to locate on the printout from SAS proc catmod. By the way, this all works out because the two models are
nested. The terms in [1,2] [1,3] are a subset of the terms in [1,2] [1,3] [2,3].

STA442s04 Loglinear Models:

Page 3
Next, you can test for the other piece of departure from conditional independence, by testing the goodness
of fit of [1,2] [1,3] [2,3]. This tests [1,2] [1,3] [2,3] against the saturated model, equivalently testing
for 1*2*3. Again the models are nested.

° Suppose you want to test association between two sets of variables. For example, suppose
Variables 1 through 4 represent employment history, and Variables 5 through 8 represent employment
history of the parent (of the same sex). The model to test against the saturated model is [1234] [5678].
This model says that the employment history variables may be related any way at all, and parental
employment history variables may be related any way at all, but the employment history variables are
completely independent of the parental employment history variables.

In the theory of log-linear models, there is no distinction between independent variables and dependent
variables. But there are some situations where you want to make the distinction. Say, two or more of the
variables are randomly assigned. It is possible to develop a separate theory for this situation, but it turns out
that as long as the reduced model contains all possible associations among independent variables (even if
they are set up to be unrelated), classical loglinear models yield exactly the same test statistics. This leads
to the following simple rule. If you want to make a distinction between independent and dependent
variables, just include all possible associations among independent variables in the model.

STA442s04 Loglinear Models:

Page 4

Stat2 Textbook
No ratings yet
Stat2 Textbook
1,656 pages
COX, D. R. HINKLEY, D. V. Theoretical Statistics. 1974 PDF
100% (4)
COX, D. R. HINKLEY, D. V. Theoretical Statistics. 1974 PDF
522 pages
Math1041 Study Notes For UNSW
No ratings yet
Math1041 Study Notes For UNSW
16 pages
Loglinear Models: Angela Jeansonne
No ratings yet
Loglinear Models: Angela Jeansonne
20 pages
SAHADEB - Categorical - Data - Lecture3
No ratings yet
SAHADEB - Categorical - Data - Lecture3
84 pages
Chap9 Agresti
No ratings yet
Chap9 Agresti
12 pages
Cheat Sheet Statistics
No ratings yet
Cheat Sheet Statistics
3 pages
Lesson1 - Simple Linier Regression
No ratings yet
Lesson1 - Simple Linier Regression
40 pages
Log-Linear_Models
No ratings yet
Log-Linear_Models
24 pages
What Is Simple Linear Regression?
No ratings yet
What Is Simple Linear Regression?
7 pages
Second Stats Packet 24
No ratings yet
Second Stats Packet 24
100 pages
Chapter 4: Transformations of Variables: Box-Cox Tests of Functional Specification
No ratings yet
Chapter 4: Transformations of Variables: Box-Cox Tests of Functional Specification
16 pages
02 Subat 12
No ratings yet
02 Subat 12
4 pages
Applied Statistics II Chapter 7 The Relationship Between Two Variables
No ratings yet
Applied Statistics II Chapter 7 The Relationship Between Two Variables
73 pages
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
No ratings yet
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
43 pages
Econometrics Q 19 8 2023
No ratings yet
Econometrics Q 19 8 2023
4 pages
Nonlinear Relationships: Y X X X X EYX
No ratings yet
Nonlinear Relationships: Y X X X X EYX
23 pages
Problem Set 2 Quantitative Methods UNIGE
No ratings yet
Problem Set 2 Quantitative Methods UNIGE
10 pages
Lec-03_LogisticRegression
No ratings yet
Lec-03_LogisticRegression
32 pages
Model Selection Loglinear Analysis Explained
No ratings yet
Model Selection Loglinear Analysis Explained
4 pages
SEE5211 Chapter3-P2017
No ratings yet
SEE5211 Chapter3-P2017
58 pages
Ra Web
No ratings yet
Ra Web
70 pages
Q.1 Explain The Underlying Ideas Behind The Log It Model. Explain On What Grounds Log It Model Is An Improvement Over Linear Probability Model. Ans
No ratings yet
Q.1 Explain The Underlying Ideas Behind The Log It Model. Explain On What Grounds Log It Model Is An Improvement Over Linear Probability Model. Ans
17 pages
CH 1
No ratings yet
CH 1
8 pages
Linear by Linear Association
0% (1)
Linear by Linear Association
5 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
23 pages
1 Loglinear Models For Contingency Tables
No ratings yet
1 Loglinear Models For Contingency Tables
12 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Non Linear Relationships
No ratings yet
Non Linear Relationships
22 pages
Statistical Tests
No ratings yet
Statistical Tests
55 pages
Allama Iqbal Open University, Islamabad: (Department of Secondary Teacher Education)
No ratings yet
Allama Iqbal Open University, Islamabad: (Department of Secondary Teacher Education)
7 pages
Regn_lect_5
No ratings yet
Regn_lect_5
9 pages
2-04 Non-linear Models - Quadratic and Polynomial Models
No ratings yet
2-04 Non-linear Models - Quadratic and Polynomial Models
16 pages
Item Regression: Multivariate Regression Models
No ratings yet
Item Regression: Multivariate Regression Models
41 pages
Linear Models in Stata and Anova
No ratings yet
Linear Models in Stata and Anova
20 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
Chapter 8
No ratings yet
Chapter 8
45 pages
6 Loglinear Models Beamer-Online PDF
No ratings yet
6 Loglinear Models Beamer-Online PDF
112 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
Linearly Independent, Orthogonal, and Uncorrelated Variables
No ratings yet
Linearly Independent, Orthogonal, and Uncorrelated Variables
2 pages
Regression.2021F.FinalExam_Solution
No ratings yet
Regression.2021F.FinalExam_Solution
5 pages
Models for Polytomous Responses AA 2016-2017
No ratings yet
Models for Polytomous Responses AA 2016-2017
48 pages
Choosing A Significance Test Objectives
No ratings yet
Choosing A Significance Test Objectives
15 pages
Lecture 22: Introduction To Log-Linear Models: Dipankar Bandyopadhyay, PH.D
No ratings yet
Lecture 22: Introduction To Log-Linear Models: Dipankar Bandyopadhyay, PH.D
59 pages
Quantitative Methods Vocabulary
No ratings yet
Quantitative Methods Vocabulary
5 pages
Chapter 3 Notes-Alyssa
No ratings yet
Chapter 3 Notes-Alyssa
10 pages
Chapter 3 Notes-Alyssa
No ratings yet
Chapter 3 Notes-Alyssa
10 pages
Educational Statistics Reviewer
No ratings yet
Educational Statistics Reviewer
5 pages
NASA Regression Lecture
No ratings yet
NASA Regression Lecture
268 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Analysis of Three-Way Tables
No ratings yet
Analysis of Three-Way Tables
14 pages
L4&5 Multiple Regression 2010B
No ratings yet
L4&5 Multiple Regression 2010B
77 pages
Inferential Statistics
No ratings yet
Inferential Statistics
171 pages
Educational Statistics
No ratings yet
Educational Statistics
12 pages
Looking at Data: Relationships - : Caution About Correlation and Regression The Question of Causation
No ratings yet
Looking at Data: Relationships - : Caution About Correlation and Regression The Question of Causation
20 pages
COMP5310 Notes
No ratings yet
COMP5310 Notes
10 pages
2101 F 17 Assignment 1
No ratings yet
2101 F 17 Assignment 1
8 pages
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
From Everand
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
César Pérez López
No ratings yet
4 Analyze
No ratings yet
4 Analyze
53 pages
T.A. Pai Management Institute Manipal: Submitted To: Prof. Sudhindra S
No ratings yet
T.A. Pai Management Institute Manipal: Submitted To: Prof. Sudhindra S
15 pages
CHAPTER 2 PART 1 Sampling Distribution
No ratings yet
CHAPTER 2 PART 1 Sampling Distribution
3 pages
Geospatial Sampling Techniques
No ratings yet
Geospatial Sampling Techniques
3 pages
Ece368h1s 01 - 22 - 2023
No ratings yet
Ece368h1s 01 - 22 - 2023
4 pages
Rocinante 36 Marengo 32 Mileage (KM/LTR) Top Speed (KM/HR) Mileage (KM/LTR) Top Speed (KM/HR)
No ratings yet
Rocinante 36 Marengo 32 Mileage (KM/LTR) Top Speed (KM/HR) Mileage (KM/LTR) Top Speed (KM/HR)
3 pages
Statistics Qestions PDF
No ratings yet
Statistics Qestions PDF
66 pages
Indomie 2022
No ratings yet
Indomie 2022
18 pages
Dilla University: College of Natural and Computational Sciences
100% (1)
Dilla University: College of Natural and Computational Sciences
38 pages
Research Methodology and Biostatistics (NCISM)_Dr. Arun Raj GR & Dr. Kavya Mohan
No ratings yet
Research Methodology and Biostatistics (NCISM)_Dr. Arun Raj GR & Dr. Kavya Mohan
40 pages
NCERT Solutions For Class 12 Maths Chapter 13 Probability Exercise 13.1
No ratings yet
NCERT Solutions For Class 12 Maths Chapter 13 Probability Exercise 13.1
22 pages
ESE Semester III
No ratings yet
ESE Semester III
3 pages
Advanced Business Statistics For Decision Making: Facilitator-Dr. Shilpa Bhaskar Mujumdar
No ratings yet
Advanced Business Statistics For Decision Making: Facilitator-Dr. Shilpa Bhaskar Mujumdar
31 pages
Oneway: ONEWAY Hasil BY Varietas /missing Analysis /posthoc Tukey Duncan LSD Alpha (0.05)
No ratings yet
Oneway: ONEWAY Hasil BY Varietas /missing Analysis /posthoc Tukey Duncan LSD Alpha (0.05)
10 pages
Chapter 24 Notes (Student)
No ratings yet
Chapter 24 Notes (Student)
11 pages
Principles of Data Analysis - Prasenjit Saha (2003) PDF
No ratings yet
Principles of Data Analysis - Prasenjit Saha (2003) PDF
86 pages
Statistics and Probability
No ratings yet
Statistics and Probability
18 pages
Sta 138 Take Home
No ratings yet
Sta 138 Take Home
3 pages
Chi Square Goodness of Fit
No ratings yet
Chi Square Goodness of Fit
2 pages
Statistical Analysis With Software Application
No ratings yet
Statistical Analysis With Software Application
3 pages
Joint Probability Distribution
No ratings yet
Joint Probability Distribution
4 pages
Statistical Inference (BW-SP20)
No ratings yet
Statistical Inference (BW-SP20)
2 pages
Statistical Significance - Wikipedia
No ratings yet
Statistical Significance - Wikipedia
43 pages
Cattaneo-Idrobo-Titiunik 2023 CUP
No ratings yet
Cattaneo-Idrobo-Titiunik 2023 CUP
103 pages
Lect8 SamplingTechniques
No ratings yet
Lect8 SamplingTechniques
21 pages
Hypothesis Testing: Example 1: Does A New Drug Improve Cancer Survival Rates?
No ratings yet
Hypothesis Testing: Example 1: Does A New Drug Improve Cancer Survival Rates?
25 pages
Problem Statements:: Inferential Statistics
0% (1)
Problem Statements:: Inferential Statistics
5 pages
Stats Project
No ratings yet
Stats Project
7 pages
Probability Distribution: Meaning Uses Types Assumptions Properties Applications
No ratings yet
Probability Distribution: Meaning Uses Types Assumptions Properties Applications
19 pages
03 9709 62 2023 2RP Afp M23 13022023030714
No ratings yet
03 9709 62 2023 2RP Afp M23 13022023030714
16 pages