[R] expand.grid
Giovanni Marchetti
gmm at ds.unifi.it
Sun May 4 01:48:48 CEST 2003
I recently posted a question concerning an inconsistency of
expand.grid in defining the reference level of the factors.
> > expand.grid(x = c("b", "a"), y = c(1, 2))$x
> [1] b a b a
> Levels: b a # reference level is b
> > expand.grid(x = c("b", "a"))$x
> [1] b a
> Levels: a b # reference level is a
Thank you very much for the ready explanations and comments.
I found this inconsistency
working with contingency tables and logistic models.
I was working with a flat contingency table as for example,
> ft <- ftable(low ~ race + smoke, bwt)
> ft
low 0 1
race smoke
white FALSE 40 4
TRUE 33 19
black FALSE 11 5
TRUE 4 6
other FALSE 35 20
TRUE 7 5
(here bwt is a transformation of the dataframe birthwt
in library(MASS)).
I wanted to analyse
this table by a logistic model with low as response.
So it seems useful to have a function that extracts the
factors race and smoke from the table:
> row.factors(ft)
race smoke
1 white FALSE
2 white TRUE
3 black FALSE
4 black TRUE
5 other FALSE
6 other TRUE
in such a way that they can be directly used in glm:
> glm(ft ~ race + smoke, family=binomial, data = row.factors(ft))
...
Coefficients:
(Intercept) raceblack raceother smokeTRUE
1.841 -1.084 -1.109 -1.116
...
Note that the reference level for race is "white" (the first row).
PS - Obviously, the same analysis is very easy from the original
dataframe (which here is supposed to be missing):
> glm(low ~ race + smoke, family=binomial, data = bwt)
except for the sign of the coefficients which are reversed
because the columns of ft are (failure, success) instead of
(success, failure).
Thus, I wrote the function
"row.factors" <- function (ft)
{
# ft: a flat table.
# Value: a data frame with the factors associated to the rows.
vars <- attr(ft, "row.vars")
k <- length(vars)
expl <- expand.grid(vars[k:1])
expl[,k:1, drop=FALSE]
}
The function worked pretty well except for the case of a simple
contingency table.
> ft <- ftable(low ~ race, bwt)
> ft
low 0 1
race
white 73 23
black 15 11
other 42 25
> row.factors(ft)
race
1 white
2 black
3 other
> levels(row.fact(ft)$race)
[1] "black" "other" "white"
Thus, here the reference level is "black" and this is a bit strange
as the first row of the table is "white". This little "infelicity"
is in fact caused by expand.grid.
Thanks again
-- Giovanni
--
< Giovanni M. Marchetti >
Dipartimento di Statistica, Univ. di Firenze Phone: +39 055 4237 204
viale Morgagni, 59 Fax: +39 055 4223 560
I 50134 Firenze, Italy email: gmm at ds.unifi.it
More information about the R-help
mailing list