[R] Interpretation of output from glm
Pedro de Barros
pbarros at ualg.pt
Fri Nov 11 00:47:14 CET 2005
Dear John,
Thanks for the pointers. I will read this.
Pedro
At 14:41 10/11/2005, you wrote:
>Dear Pedro,
>
>The basic point, which relates to the principle of marginality in
>formulating linear models, applies whether the predictors are factors,
>covariates, or both. I think that this is a common topic in books on linear
>models; I certainly discuss it in my Applied Regression, Linear Models, and
>Related Methods.
>
>Regards,
> John
>
>--------------------------------
>John Fox
>Department of Sociology
>McMaster University
>Hamilton, Ontario
>Canada L8S 4M4
>905-525-9140x23604
>http://socserv.mcmaster.ca/jfox
>--------------------------------
>
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro de Barros
> > Sent: Wednesday, November 09, 2005 10:45 AM
> > To: r-help at stat.math.ethz.ch
> > Subject: Re: [R] Interpretation of output from glm
> > Importance: High
> >
> > Dear John,
> >
> > Thanks for the quick reply. I did indeed have these ideas,
> > but somehow "floating", and all I could find about this
> > mentioned categorical predictors. Can you suggest a good book
> > where I could try to learn more about this?
> >
> > Thanks again,
> >
> > Pedro
> > At 01:49 09/11/2005, you wrote:
> > >Dear Pedro,
> > >
> > >
> > > > -----Original Message-----
> > > > From: r-help-bounces at stat.math.ethz.ch
> > > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro de
> > > > Barros
> > > > Sent: Tuesday, November 08, 2005 9:47 AM
> > > > To: r-help at stat.math.ethz.ch
> > > > Subject: [R] Interpretation of output from glm
> > > > Importance: High
> > > >
> > > > I am fitting a logistic model to binary data. The
> > response variable
> > > > is a factor (0 or 1) and all predictors are continuous variables.
> > > > The main predictor is LT (I expect a logistic relation between LT
> > > > and the probability of being
> > > > mature) and the other are variables I expect to modify
> > this relation.
> > > >
> > > > I want to test if all predictors contribute significantly for the
> > > > fit or not I fit the full model, and get these results
> > > >
> > > > > summary(HMMaturation.glmfit.Full)
> > > >
> > > > Call:
> > > > glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom,
> > > > family = binomial(link = "logit"), data = HMIndSamples)
> > > >
> > > > Deviance Residuals:
> > > > Min 1Q Median 3Q Max
> > > > -3.0983 -0.7620 0.2540 0.7202 2.0292
> > > >
> > > > Coefficients:
> > > > Estimate Std. Error z value Pr(>|z|)
> > > > (Intercept) -8.789e-01 3.694e-01 -2.379 0.01735 *
> > > > LT 5.372e-02 1.798e-02 2.987 0.00281 **
> > > > CondF -6.763e-02 9.296e-03 -7.275 3.46e-13 ***
> > > > Biom -1.375e-02 2.005e-03 -6.856 7.07e-12 ***
> > > > LT:CondF 2.434e-03 3.813e-04 6.383 1.74e-10 ***
> > > > LT:Biom 7.833e-04 9.614e-05 8.148 3.71e-16 ***
> > > > ---
> > > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > > >
> > > > (Dispersion parameter for binomial family taken to be 1)
> > > >
> > > > Null deviance: 10272.4 on 8224 degrees of freedom Residual
> > > > deviance: 7185.8 on 8219 degrees of freedom
> > > > AIC: 7197.8
> > > >
> > > > Number of Fisher Scoring iterations: 8
> > > >
> > > > However, when I run anova on the fit, I get >
> > > > anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of
> > Deviance
> > > > Table
> > > >
> > > > Model: binomial, link: logit
> > > >
> > > > Response: Mature
> > > >
> > > > Terms added sequentially (first to last)
> > > >
> > > >
> > > > Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> > > > NULL 8224 10272.4
> > > > LT 1 2873.8 8223 7398.7 0.0
> > > > CondF 1 0.1 8222 7398.5 0.7
> > > > Biom 1 0.2 8221 7398.3 0.7
> > > > LT:CondF 1 142.1 8220 7256.3 9.413e-33
> > > > LT:Biom 1 70.4 8219 7185.8 4.763e-17
> > > > Warning message:
> > > > fitted probabilities numerically 0 or 1 occurred in:
> > method(x = x[,
> > > > varseq <= i, drop = FALSE], y = object$y, weights =
> > > > object$prior.weights,
> > > >
> > > >
> > > > I am having a little difficulty interpreting these results.
> > > > The result from the fit tells me that all predictors are
> > > > significant, while the anova indicates that besides LT (the main
> > > > variable), only the interaction of the other terms is
> > significant,
> > > > but the main effects are not.
> > > > I believe that in the first output (on the glm object), the
> > > > significance of all terms is calculated considering each of them
> > > > alone in the model (i.e.
> > > > removing all other terms), while the anova output is (as it says)
> > > > considering the sequential addition of the terms.
> > > >
> > > > So, there are 2 questions:
> > > > a) Can I tell that the interactions are significant, but not the
> > > > main effects?
> > >
> > >In a model with this structure, the "main effects" represent slopes
> > >over the origin (i.e., where the other variables in the
> > product terms
> > >are 0), and aren't meaningfully interpreted as main effects.
> > (Is there
> > >even any data near the origin?)
> > >
> > > > b) Is it legitimate to consider a model where the
> > interactions are
> > > > considered, but not the main effects CondF and Biom?
> > >
> > >Generally, no: That is, such a model is interpretable, but it places
> > >strange constraints on the regression surface -- that the CondF and
> > >Biom slopes are 0 over the origin.
> > >
> > >None of this is specific to logistic regression -- it
> > applies generally
> > >to generalized linear models, including linear models.
> > >
> > >I hope this helps,
> > > John
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list