[R] expand.grid and the first level of a factor

Sat May 3 16:03:27 CEST 2003

>>>>> "UweL" == Uwe Ligges <ligges at statistik.uni-dortmund.de>
>>>>>     on Sat, 03 May 2003 15:23:59 +0200 writes:

    UweL> Giovanni Marchetti wrote:
    >> I do not understand this behaviour of expand.grid:
    >> 
    >> 
    >>> expand.grid(x = c("b", "a"), y = c(1, 2))$x
    >> 
    >> [1] b a b a
    >> Levels: b a
    >> 
    >>> expand.grid(x = c("b", "a"))$x
    >> 
    >> [1] b a
    >> Levels: a b
    >> 
    >> Why the first level of the factor x depends on the number
    >> of arguments of expand.grid? Apparently, I can set 
    >> the order of the levels only when the number of 
    >> arguments in > 1. In the second example, the order 
    >> is lexicographic.
    >> 
    >> -- Giovanni

    UweL> It depends on the number of arguments, because of the implementation 
    UweL> (look into the code):

    UweL> In principle, expand.grid(x = c("b", "a")) does the following:

    UweL> x <- c("b", "a")
    UweL> factor(x)

    UweL> whereas for expand.grid(x = c("b", "a"), y = c(1, 2)), the levels will 
    UweL> be specified as in:

    UweL>    factor(x, levels = unique(x))

    UweL> Hence the difference.

which seems not perfect to me.
Factor() itself,
  > str(factor)
  function (x, levels = sort(unique.default(x), na.last = TRUE), 
      labels = levels, exclude = NA, ordered = is.ordered(x))  

does sort the levels by default, and that's what happens in the
one argument case via data.frame().

S-plus 6.1 does the same for factor() but it doesn't sort the
levels of expand.grid() arguments in any case.

I'm just now testing a patch to our expand.grid() which doesn't
treat the one argument case specially as now and seems to cure
the whole "infelicity"...
I can not imagine that anyone's code relies on the current
behavior as opposed to the more consistent one.

Martin