[R] expand.grid

Giovanni Marchetti gmm at ds.unifi.it
Sun May 4 01:48:48 CEST 2003


I recently posted a question concerning an inconsistency of 
expand.grid in defining the reference level of the factors.

> > expand.grid(x = c("b", "a"), y = c(1, 2))$x
> [1] b a b a
> Levels: b a            # reference level is b
> > expand.grid(x = c("b", "a"))$x
> [1] b a
> Levels: a b            # reference level is a

Thank you very much for the ready explanations and comments.

I found this inconsistency 
working with contingency tables and logistic models.  

I was working with a flat contingency table as for example,
 
> ft <- ftable(low ~ race + smoke, bwt)
> ft
            low  0  1
race  smoke
white FALSE     40  4
      TRUE      33 19
black FALSE     11  5
      TRUE       4  6
other FALSE     35 20
      TRUE       7  5

(here bwt is a transformation of the dataframe birthwt
in library(MASS)). 

I wanted to analyse
this table by a logistic model with low as response. 
So it seems useful to have a function that extracts the 
factors race and smoke from the table:

> row.factors(ft)
   race smoke
1 white FALSE
2 white  TRUE
3 black FALSE
4 black  TRUE
5 other FALSE
6 other  TRUE
  
in such a way that they can be directly used in glm:

> glm(ft ~ race + smoke, family=binomial, data = row.factors(ft))
...
Coefficients:
(Intercept)    raceblack    raceother    smokeTRUE
      1.841       -1.084       -1.109       -1.116
...
Note that the reference level for race is "white" (the first row). 

PS - Obviously, the same analysis is very easy from the original 
dataframe (which here is supposed to be missing): 

> glm(low ~ race + smoke, family=binomial, data = bwt)

except for the sign of the coefficients which are reversed
because the columns of ft are (failure, success) instead of
(success, failure). 

Thus, I wrote the function

"row.factors" <- function (ft)
{
# ft:    a flat table.
# Value: a data frame with the factors associated to the rows.
        vars <- attr(ft, "row.vars")
        k <- length(vars)
        expl <- expand.grid(vars[k:1])
        expl[,k:1, drop=FALSE]
}

The function worked pretty well except for the case of a simple 
contingency table.
 
> ft <- ftable(low ~ race, bwt)
> ft
      low  0  1
race
white     73 23
black     15 11
other     42 25
> row.factors(ft)
   race
1 white
2 black
3 other
> levels(row.fact(ft)$race)
[1] "black" "other" "white" 

Thus, here the reference level is "black" and this is a bit strange 
as the first row of the table is "white". This little "infelicity" 
is in fact caused by expand.grid. 


Thanks again


-- Giovanni

-- 
< Giovanni M. Marchetti >
Dipartimento di Statistica, Univ. di Firenze   Phone:  +39 055 4237 204
viale Morgagni, 59                             Fax:    +39 055 4223 560
I 50134 Firenze, Italy                         email:  gmm at ds.unifi.it



More information about the R-help mailing list