[R] Interpretation of output from glm

John Fox jfox at mcmaster.ca
Thu Nov 10 15:41:58 CET 2005

```Dear Pedro,

The basic point, which relates to the principle of marginality in
formulating linear models, applies whether the predictors are factors,
covariates, or both. I think that this is a common topic in books on linear
models; I certainly discuss it in my Applied Regression, Linear Models, and
Related Methods.

Regards,
John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro de Barros
> Sent: Wednesday, November 09, 2005 10:45 AM
> To: r-help at stat.math.ethz.ch
> Subject: Re: [R] Interpretation of output from glm
> Importance: High
>
> Dear John,
>
> Thanks for the quick reply. I did indeed have these ideas,
> mentioned categorical predictors. Can you suggest a good book
>
> Thanks again,
>
> Pedro
> At 01:49 09/11/2005, you wrote:
> >Dear Pedro,
> >
> >
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro de
> > > Barros
> > > Sent: Tuesday, November 08, 2005 9:47 AM
> > > To: r-help at stat.math.ethz.ch
> > > Subject: [R] Interpretation of output from glm
> > > Importance: High
> > >
> > > I am fitting a logistic model to binary data. The
> response variable
> > > is a factor (0 or 1) and all predictors are continuous variables.
> > > The main predictor is LT (I expect a logistic relation between LT
> > > and the probability of being
> > > mature) and the other are variables I expect to modify
> this relation.
> > >
> > > I want to test if all predictors contribute significantly for the
> > > fit or not I fit the full model, and get these results
> > >
> > >  > summary(HMMaturation.glmfit.Full)
> > >
> > > Call:
> > > glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom,
> > >      family = binomial(link = "logit"), data = HMIndSamples)
> > >
> > > Deviance Residuals:
> > >      Min       1Q   Median       3Q      Max
> > > -3.0983  -0.7620   0.2540   0.7202   2.0292
> > >
> > > Coefficients:
> > >                Estimate Std. Error z value Pr(>|z|)
> > > (Intercept) -8.789e-01  3.694e-01  -2.379  0.01735 *
> > > LT           5.372e-02  1.798e-02   2.987  0.00281 **
> > > CondF       -6.763e-02  9.296e-03  -7.275 3.46e-13 ***
> > > Biom        -1.375e-02  2.005e-03  -6.856 7.07e-12 ***
> > > LT:CondF     2.434e-03  3.813e-04   6.383 1.74e-10 ***
> > > LT:Biom      7.833e-04  9.614e-05   8.148 3.71e-16 ***
> > > ---
> > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > (Dispersion parameter for binomial family taken to be 1)
> > >
> > >      Null deviance: 10272.4  on 8224  degrees of freedom Residual
> > > deviance:  7185.8  on 8219  degrees of freedom
> > > AIC: 7197.8
> > >
> > > Number of Fisher Scoring iterations: 8
> > >
> > > However, when I run anova on the fit, I get  >
> > > anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of
> Deviance
> > > Table
> > >
> > > Model: binomial, link: logit
> > >
> > > Response: Mature
> > >
> > > Terms added sequentially (first to last)
> > >
> > >
> > >             Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> > > NULL                        8224    10272.4
> > > LT          1   2873.8      8223     7398.7       0.0
> > > CondF       1      0.1      8222     7398.5       0.7
> > > Biom        1      0.2      8221     7398.3       0.7
> > > LT:CondF    1    142.1      8220     7256.3 9.413e-33
> > > LT:Biom     1     70.4      8219     7185.8 4.763e-17
> > > Warning message:
> > > fitted probabilities numerically 0 or 1 occurred in:
> method(x = x[,
> > > varseq <= i, drop = FALSE], y = object\$y, weights =
> > > object\$prior.weights,
> > >
> > >
> > > I am having a little difficulty interpreting these results.
> > > The result from the fit tells me that all predictors are
> > > significant, while the anova indicates that besides LT (the main
> > > variable), only the interaction of the other terms is
> significant,
> > > but the main effects are not.
> > > I believe that in the first output (on the glm object), the
> > > significance of all terms is calculated considering each of them
> > > alone in the model (i.e.
> > > removing all other terms), while the anova output is (as it says)
> > > considering the sequential addition of the terms.
> > >
> > > So, there are 2 questions:
> > > a) Can I tell that the interactions are significant, but not the
> > > main effects?
> >
> >In a model with this structure, the "main effects" represent slopes
> >over the origin (i.e., where the other variables in the
> product terms
> >are 0), and aren't meaningfully interpreted as main effects.
> (Is there
> >even any data near the origin?)
> >
> > > b) Is it legitimate to consider a model where the
> interactions are
> > > considered, but not the main effects CondF and Biom?
> >
> >Generally, no: That is, such a model is interpretable, but it places
> >strange constraints on the regression surface -- that the CondF and
> >Biom slopes are 0 over the origin.
> >
> >None of this is specific to logistic regression -- it
> applies generally
> >to generalized linear models, including linear models.
> >
> >I hope this helps,
> >  John
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help