[R] How to intepret a factor response model?
(Ted Harding)
Ted.Harding at nessie.mcc.ac.uk
Wed May 4 10:21:11 CEST 2005
On 04-May-05 Maciej Bliziñski wrote:
> Hello,
>
> I'd like to create a model with a factor-type response variable.
> This is an example:
>
>> mydata <- data.frame(factor_var = as.factor(c(rep('one', 100),
>> rep('two', 100), rep('three', 100))), real_var = c(rnorm(150),
>> rnorm(150) + 5))
>> summary(mydata)
> factor_var real_var
> one :100 Min. :-2.742877
> three:100 1st Qu.:-0.009493
> two :100 Median : 2.361669
> Mean : 2.490411
> 3rd Qu.: 4.822394
> Max. : 6.924588
>> mymodel = glm(factor_var ~ real_var, family = 'binomial', data =
>> mydata)
>> summary(mymodel)
>
> Call:
> glm(formula = factor_var ~ real_var, family = "binomial", data =
> mydata)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -1.7442 -0.6774 0.1849 0.3133 2.1187
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -0.6798 0.1882 -3.613 0.000303 ***
> real_var 0.8971 0.1066 8.417 < 2e-16 ***
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 381.91 on 299 degrees of freedom
> Residual deviance: 213.31 on 298 degrees of freedom
> AIC: 217.31
>
> Number of Fisher Scoring iterations: 6
Have you noticed that you get identical results with
set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),
rep('two',100), rep('three', 100))),
real.var = c(rnorm(150), rnorm(150) + 5))
mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata)
summary(mymodel)
and
set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),
rep('two',200))),real.var = c(rnorm(150),rnorm(150) + 5))
mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata)
summary(mymodel)
(I've left out the "summary(mydata)" since these do naturally
differ, and I've replaced "factor_var" with "factor.var" and
"real_var" with "real.var" because of potential complications
with "_"; also "mymodel =" to "mymodel <-").
So I think the interpretation of the results from your first
model is that, because of the "family='binomial'", glm is
treating "factor.var='one'" as binomial response "0", say,
and "factor.var='two'" or "factor.var='three'" as binomial
response "1".
You're trying to fit a multinomial response, but you've
specified a binomial family to 'glm'. 'glm' does not have
a multinomial response family.
You could try 'multinom' from package 'nnet' which fits
loglinear models to factor responses with more than 2 levels.
E.g.
library(nnet)
mymodel <- multinom(factor.var ~ real.var,data=mydata)
### weights: 9 (4 variable)
## initial value 329.583687
## iter 10 value 209.780666
## final value 209.779951
## converged
summary(mymodel)
## Re-fitting to get Hessian
## Call:
## multinom(formula = factor.var ~ real.var, data = mydata)
## Coefficients:
## (Intercept) real.var
## three -3.4262565 1.3838231
## two -0.6754253 0.7116955
##
## Std. Errors:
## (Intercept) real.var
## three 0.5028541 0.1480138
## two 0.1846827 0.1068821
##
## Residual Deviance: 419.5599
## AIC: 427.5599
##
## Correlation of Coefficients:
## three:(Intercept) three:real.var two:(Intercept)
## three:real.var -0.7286258
## two:(Intercept) 0.1986995 -0.1261034
## two:real.var -0.1411377 0.7012481 -0.3285741
This output does suggest a fairly clear interpretation!
Hoping this helps,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-May-05 Time: 09:18:03
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list