[R] "glm" function question
Marc Schwartz
MSchwartz at mn.rr.com
Sun Oct 22 02:53:20 CEST 2006
On Sat, 2006-10-21 at 20:02 -0400, Chris Linton wrote:
> I am creating a model attempting to predict the probability someone will
> reoffend after being caught for a crime. There are seven total inputs and I
> planned on using a logistic regression. I started with a null deviance of
> 182.91 and ended up with a residual deviance of 83.40 after accounting for
> different interactions and such. However, I realized after that my code is
> different from that in my book. And I can't figure out what I need to put
> in it's place. Here's my code:
>
> library(foreign)
>
> library(car)
>
> foo = read.table("C:/Documents and
> Settings/Chris/Desktop/4330/criminals.dat", header=TRUE)
>
>
> reoff = foo[ ,1]
>
> race = foo[ ,2]
>
> age = foo[ ,3]
>
> gender = foo[ ,4]
>
> educ = foo[ ,5]
>
> subst = foo[ ,6]
>
> prior = foo[ ,7]
>
> violence = foo[ ,8]
>
> fit1h = glm(reoff ~ factor(subst) + factor(violence) + prior +
> factor(violence):factor(subst) + factor(violence):factor(educ) +
> factor(violence):factor(age) + factor(violence):factor(prior))
>
> summary(fit1h)
>
>
> If you noticed, there's no part of my code that looks like:
>
> family=binomial(link="logit"))
>
>
> If I code like my book has done, it would look like:
>
> fit1i = glm(reoff ~ factor(subst) + factor(violence) + prior +
> factor(violence):factor(subst) + factor(violence):factor(educ) +
> factor(violence):factor(age) + factor(violence):factor(prior),
> family=binomial(link="logit"))
>
> summary(fit1i)
>
>
>
>
> However, when I do this, my null deviance is 1104 and my residual deviance
> is 23460. THIS IS A HUGE DIFFERENCE IN MODEL FIT! I'm not sure if I have
> to redo my model or if my book was simply doing the
> "family=binomial(link="logit")" for a specific problem/reason.
>
> So, to my question:
> Do I need to include "family=binomial(link="logit")" in my code?
Yes, though you could do with just 'family = binomial' since logit is
the default link function.
> Do I need
> to include any type of family?
If you don't want to use the default Gaussian family, then yes.
Whatever book it is you are working from (which you fail to identify)
ought to clearly explain the background on the use of the distribution
families in GLM's. There is a reason the author has included these
instructions and you need to pay attention to them.
If you look carefully at the output of summary(fit1h), you will likely
see:
(Dispersion parameter for gaussian family taken to be ....)
and you will also notice that the tests being applied (3rd and 4th
columns in the coefficient summary table) are t tests and not z tests.
These should be a big hint that you are not working with the proper
family and are therefore not fitting a logistic regression model, which
is presumably the intent of this section of the book.
See ?glm and pay careful attention to the function defaults.
HTH,
Marc Schwartz
More information about the R-help
mailing list