# [R] How to intepret a factor response model?

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Wed May 4 10:21:11 CEST 2005

```On 04-May-05 Maciej Bliziñski wrote:
> Hello,
>
> I'd like to create a model with a factor-type response variable.
> This is an example:
>
>> mydata <- data.frame(factor_var = as.factor(c(rep('one', 100),
>> rep('two', 100), rep('three', 100))), real_var = c(rnorm(150),
>> rnorm(150) + 5))
>> summary(mydata)
>  factor_var     real_var
>  one  :100   Min.   :-2.742877
>  three:100   1st Qu.:-0.009493
>  two  :100   Median : 2.361669
>              Mean   : 2.490411
>              3rd Qu.: 4.822394
>              Max.   : 6.924588
>> mymodel = glm(factor_var ~ real_var, family = 'binomial', data =
>> mydata)
>> summary(mymodel)
>
> Call:
> glm(formula = factor_var ~ real_var, family = "binomial", data =
> mydata)
>
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max
> -1.7442  -0.6774   0.1849   0.3133   2.1187
>
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -0.6798     0.1882  -3.613 0.000303 ***
> real_var      0.8971     0.1066   8.417  < 2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>     Null deviance: 381.91  on 299  degrees of freedom
> Residual deviance: 213.31  on 298  degrees of freedom
> AIC: 217.31
>
> Number of Fisher Scoring iterations: 6

Have you noticed that you get identical results with

set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),
rep('two',100), rep('three', 100))),
real.var = c(rnorm(150), rnorm(150) + 5))

mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata)
summary(mymodel)

and

set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),
rep('two',200))),real.var = c(rnorm(150),rnorm(150) + 5))

mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata)
summary(mymodel)

(I've left out the "summary(mydata)" since these do naturally
differ, and I've replaced "factor_var" with "factor.var" and
"real_var" with "real.var" because of potential complications
with "_"; also "mymodel =" to "mymodel <-").

So I think the interpretation of the results from your first
model is that, because of the "family='binomial'", glm is
treating "factor.var='one'" as binomial response "0", say,
and "factor.var='two'" or "factor.var='three'" as binomial
response "1".

You're trying to fit a multinomial response, but you've
specified a binomial family to 'glm'. 'glm' does not have
a multinomial response family.

You could try 'multinom' from package 'nnet' which fits
loglinear models to factor responses with more than 2 levels.

E.g.

library(nnet)
mymodel <- multinom(factor.var ~ real.var,data=mydata)
### weights:  9 (4 variable)
##  initial  value 329.583687
##  iter  10 value 209.780666
##  final  value 209.779951
##  converged
summary(mymodel)
## Re-fitting to get Hessian
## Call:
## multinom(formula = factor.var ~ real.var, data = mydata)
##  Coefficients:
##        (Intercept)  real.var
##  three  -3.4262565 1.3838231
##  two    -0.6754253 0.7116955
##
## Std. Errors:
##   (Intercept)  real.var
## three   0.5028541 0.1480138
## two     0.1846827 0.1068821
##
## Residual Deviance: 419.5599
## AIC: 427.5599
##
## Correlation of Coefficients:
##             three:(Intercept) three:real.var two:(Intercept)
## three:real.var  -0.7286258
## two:(Intercept)  0.1986995        -0.1261034
## two:real.var    -0.1411377         0.7012481     -0.3285741

This output does suggest a fairly clear interpretation!

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-May-05                                       Time: 09:18:03
------------------------------ XFMail ------------------------------

```

More information about the R-help mailing list