[R] Interpretation of output from glm

Thu Nov 10 15:41:58 CET 2005

Dear Pedro,

The basic point, which relates to the principle of marginality in
formulating linear models, applies whether the predictors are factors,
covariates, or both. I think that this is a common topic in books on linear
models; I certainly discuss it in my Applied Regression, Linear Models, and
Related Methods.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro de Barros
> Sent: Wednesday, November 09, 2005 10:45 AM
> To: r-help at stat.math.ethz.ch
> Subject: Re: [R] Interpretation of output from glm
> Importance: High
> 
> Dear John,
> 
> Thanks for the quick reply. I did indeed have these ideas, 
> but somehow "floating", and all I could find about this 
> mentioned categorical predictors. Can you suggest a good book 
> where I could try to learn more about this?
> 
> Thanks again,
> 
> Pedro
> At 01:49 09/11/2005, you wrote:
> >Dear Pedro,
> >
> >
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch 
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro de 
> > > Barros
> > > Sent: Tuesday, November 08, 2005 9:47 AM
> > > To: r-help at stat.math.ethz.ch
> > > Subject: [R] Interpretation of output from glm
> > > Importance: High
> > >
> > > I am fitting a logistic model to binary data. The 
> response variable 
> > > is a factor (0 or 1) and all predictors are continuous variables. 
> > > The main predictor is LT (I expect a logistic relation between LT 
> > > and the probability of being
> > > mature) and the other are variables I expect to modify 
> this relation.
> > >
> > > I want to test if all predictors contribute significantly for the 
> > > fit or not I fit the full model, and get these results
> > >
> > >  > summary(HMMaturation.glmfit.Full)
> > >
> > > Call:
> > > glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom,
> > >      family = binomial(link = "logit"), data = HMIndSamples)
> > >
> > > Deviance Residuals:
> > >      Min       1Q   Median       3Q      Max
> > > -3.0983  -0.7620   0.2540   0.7202   2.0292
> > >
> > > Coefficients:
> > >                Estimate Std. Error z value Pr(>|z|)
> > > (Intercept) -8.789e-01  3.694e-01  -2.379  0.01735 *
> > > LT           5.372e-02  1.798e-02   2.987  0.00281 **
> > > CondF       -6.763e-02  9.296e-03  -7.275 3.46e-13 ***
> > > Biom        -1.375e-02  2.005e-03  -6.856 7.07e-12 ***
> > > LT:CondF     2.434e-03  3.813e-04   6.383 1.74e-10 ***
> > > LT:Biom      7.833e-04  9.614e-05   8.148 3.71e-16 ***
> > > ---
> > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > (Dispersion parameter for binomial family taken to be 1)
> > >
> > >      Null deviance: 10272.4  on 8224  degrees of freedom Residual 
> > > deviance:  7185.8  on 8219  degrees of freedom
> > > AIC: 7197.8
> > >
> > > Number of Fisher Scoring iterations: 8
> > >
> > > However, when I run anova on the fit, I get  > 
> > > anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of 
> Deviance 
> > > Table
> > >
> > > Model: binomial, link: logit
> > >
> > > Response: Mature
> > >
> > > Terms added sequentially (first to last)
> > >
> > >
> > >             Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> > > NULL                        8224    10272.4
> > > LT          1   2873.8      8223     7398.7       0.0
> > > CondF       1      0.1      8222     7398.5       0.7
> > > Biom        1      0.2      8221     7398.3       0.7
> > > LT:CondF    1    142.1      8220     7256.3 9.413e-33
> > > LT:Biom     1     70.4      8219     7185.8 4.763e-17
> > > Warning message:
> > > fitted probabilities numerically 0 or 1 occurred in: 
> method(x = x[, 
> > > varseq <= i, drop = FALSE], y = object$y, weights = 
> > > object$prior.weights,
> > >
> > >
> > > I am having a little difficulty interpreting these results.
> > > The result from the fit tells me that all predictors are 
> > > significant, while the anova indicates that besides LT (the main 
> > > variable), only the interaction of the other terms is 
> significant, 
> > > but the main effects are not.
> > > I believe that in the first output (on the glm object), the 
> > > significance of all terms is calculated considering each of them 
> > > alone in the model (i.e.
> > > removing all other terms), while the anova output is (as it says) 
> > > considering the sequential addition of the terms.
> > >
> > > So, there are 2 questions:
> > > a) Can I tell that the interactions are significant, but not the 
> > > main effects?
> >
> >In a model with this structure, the "main effects" represent slopes 
> >over the origin (i.e., where the other variables in the 
> product terms 
> >are 0), and aren't meaningfully interpreted as main effects. 
> (Is there 
> >even any data near the origin?)
> >
> > > b) Is it legitimate to consider a model where the 
> interactions are 
> > > considered, but not the main effects CondF and Biom?
> >
> >Generally, no: That is, such a model is interpretable, but it places 
> >strange constraints on the regression surface -- that the CondF and 
> >Biom slopes are 0 over the origin.
> >
> >None of this is specific to logistic regression -- it 
> applies generally 
> >to generalized linear models, including linear models.
> >
> >I hope this helps,
> >  John
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html