[R] confused on model.frame evaluation
Marc Schwartz
marc_schwartz at me.com
Sat May 1 00:38:50 CEST 2010
On Apr 30, 2010, at 4:57 PM, Erik Iverson wrote:
> <snip>
>> I'm sure it's not a bug, but could someone point to a thread or offer some gentle advice on what's happening? I think it's related to:
>> test <- data.frame(name1 = 1:5, name2 = 6:10, test = 11:15)
>> eval(expression(test[c("name1", "name2")]))
>> eval(expression(interco[c("name1", "test")]))
>
> scratch that last one, obviously a typo was causing my confusion there! The model.frame stuff remains a mystery to me though...
Hi Erik,
It's late on a Friday, it's grey and raining here in Minneapolis and I am short on caffeine, but, that being said, consider the following :-)
> working
france manual famanual total working no
1 1 1 1 107 85 22
2 1 1 0 65 44 21
3 1 0 1 66 24 42
4 1 0 0 171 17 154
5 0 1 1 87 24 63
6 0 1 0 65 22 43
7 0 0 1 85 1 84
8 0 0 0 148 6 142
> as.matrix(working[c("working", "no")])
working no
[1,] 85 22
[2,] 44 21
[3,] 24 42
[4,] 17 154
[5,] 24 63
[6,] 22 43
[7,] 1 84
[8,] 6 142
> with(working, as.matrix(working[c("working", "no")]))
[,1]
[1,] NA
[2,] NA
For the incantations of model.frame(), the formula terms are evaluated first within the scope of the data frame indicated for the 'data' argument.
Thus, in the second case, I am asking for the as.matrix(...) call to be evaluated within the scope of the 'working' data frame, which returns a matrix with only two rows, one NA for each column that was asked for and not found, which is different than the number of rows in 'working', thus you get the error as soon as the 'france' column is evaluated in the formula to create the model frame:
Error in model.frame.default(formula = as.matrix(working[c("working", :
variable lengths differ (found for 'france')
2 rows in the response matrix versus 8 rows for 'france'...
It is kind of like you are asking for:
> as.matrix(working$working[c("working", "no")])
[,1]
[1,] NA
[2,] NA
Now, try this:
> with(working, matrix(c(working, no), ncol = 2))
[,1] [,2]
[1,] 85 22
[2,] 44 21
[3,] 24 42
[4,] 17 154
[5,] 24 63
[6,] 22 43
[7,] 1 84
[8,] 6 142
and then:
> summary(glm(matrix(c(working, no), ncol = 2) ~ france + manual + famanual, data = working, family = binomial))
Call:
glm(formula = matrix(c(working, no), ncol = 2) ~ france + manual +
famanual, family = binomial, data = working)
Deviance Residuals:
1 2 3 4 5 6 7
0.09316 -0.14108 2.38028 -1.91838 -1.48196 1.84993 -1.61864
8
1.16747
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.6902 0.2547 -14.489 < 2e-16 ***
france 1.9474 0.2162 9.008 < 2e-16 ***
manual 2.5199 0.2168 11.625 < 2e-16 ***
famanual 0.5522 0.2017 2.738 0.00618 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 308.329 on 7 degrees of freedom
Residual deviance: 18.976 on 4 degrees of freedom
AIC: 60.162
Number of Fisher Scoring iterations: 4
Does that help top clarify?
Regards,
Marc Schwartz
More information about the R-help
mailing list