[R] Interpretation of output from glm
John Fox
jfox at mcmaster.ca
Thu Nov 10 15:41:58 CET 2005
Dear Pedro,
The basic point, which relates to the principle of marginality in
formulating linear models, applies whether the predictors are factors,
covariates, or both. I think that this is a common topic in books on linear
models; I certainly discuss it in my Applied Regression, Linear Models, and
Related Methods.
Regards,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro de Barros
> Sent: Wednesday, November 09, 2005 10:45 AM
> To: r-help at stat.math.ethz.ch
> Subject: Re: [R] Interpretation of output from glm
> Importance: High
>
> Dear John,
>
> Thanks for the quick reply. I did indeed have these ideas,
> but somehow "floating", and all I could find about this
> mentioned categorical predictors. Can you suggest a good book
> where I could try to learn more about this?
>
> Thanks again,
>
> Pedro
> At 01:49 09/11/2005, you wrote:
> >Dear Pedro,
> >
> >
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro de
> > > Barros
> > > Sent: Tuesday, November 08, 2005 9:47 AM
> > > To: r-help at stat.math.ethz.ch
> > > Subject: [R] Interpretation of output from glm
> > > Importance: High
> > >
> > > I am fitting a logistic model to binary data. The
> response variable
> > > is a factor (0 or 1) and all predictors are continuous variables.
> > > The main predictor is LT (I expect a logistic relation between LT
> > > and the probability of being
> > > mature) and the other are variables I expect to modify
> this relation.
> > >
> > > I want to test if all predictors contribute significantly for the
> > > fit or not I fit the full model, and get these results
> > >
> > > > summary(HMMaturation.glmfit.Full)
> > >
> > > Call:
> > > glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom,
> > > family = binomial(link = "logit"), data = HMIndSamples)
> > >
> > > Deviance Residuals:
> > > Min 1Q Median 3Q Max
> > > -3.0983 -0.7620 0.2540 0.7202 2.0292
> > >
> > > Coefficients:
> > > Estimate Std. Error z value Pr(>|z|)
> > > (Intercept) -8.789e-01 3.694e-01 -2.379 0.01735 *
> > > LT 5.372e-02 1.798e-02 2.987 0.00281 **
> > > CondF -6.763e-02 9.296e-03 -7.275 3.46e-13 ***
> > > Biom -1.375e-02 2.005e-03 -6.856 7.07e-12 ***
> > > LT:CondF 2.434e-03 3.813e-04 6.383 1.74e-10 ***
> > > LT:Biom 7.833e-04 9.614e-05 8.148 3.71e-16 ***
> > > ---
> > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > (Dispersion parameter for binomial family taken to be 1)
> > >
> > > Null deviance: 10272.4 on 8224 degrees of freedom Residual
> > > deviance: 7185.8 on 8219 degrees of freedom
> > > AIC: 7197.8
> > >
> > > Number of Fisher Scoring iterations: 8
> > >
> > > However, when I run anova on the fit, I get >
> > > anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of
> Deviance
> > > Table
> > >
> > > Model: binomial, link: logit
> > >
> > > Response: Mature
> > >
> > > Terms added sequentially (first to last)
> > >
> > >
> > > Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> > > NULL 8224 10272.4
> > > LT 1 2873.8 8223 7398.7 0.0
> > > CondF 1 0.1 8222 7398.5 0.7
> > > Biom 1 0.2 8221 7398.3 0.7
> > > LT:CondF 1 142.1 8220 7256.3 9.413e-33
> > > LT:Biom 1 70.4 8219 7185.8 4.763e-17
> > > Warning message:
> > > fitted probabilities numerically 0 or 1 occurred in:
> method(x = x[,
> > > varseq <= i, drop = FALSE], y = object$y, weights =
> > > object$prior.weights,
> > >
> > >
> > > I am having a little difficulty interpreting these results.
> > > The result from the fit tells me that all predictors are
> > > significant, while the anova indicates that besides LT (the main
> > > variable), only the interaction of the other terms is
> significant,
> > > but the main effects are not.
> > > I believe that in the first output (on the glm object), the
> > > significance of all terms is calculated considering each of them
> > > alone in the model (i.e.
> > > removing all other terms), while the anova output is (as it says)
> > > considering the sequential addition of the terms.
> > >
> > > So, there are 2 questions:
> > > a) Can I tell that the interactions are significant, but not the
> > > main effects?
> >
> >In a model with this structure, the "main effects" represent slopes
> >over the origin (i.e., where the other variables in the
> product terms
> >are 0), and aren't meaningfully interpreted as main effects.
> (Is there
> >even any data near the origin?)
> >
> > > b) Is it legitimate to consider a model where the
> interactions are
> > > considered, but not the main effects CondF and Biom?
> >
> >Generally, no: That is, such a model is interpretable, but it places
> >strange constraints on the regression surface -- that the CondF and
> >Biom slopes are 0 over the origin.
> >
> >None of this is specific to logistic regression -- it
> applies generally
> >to generalized linear models, including linear models.
> >
> >I hope this helps,
> > John
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list