[R] Specification of factors in tapply

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Feb 21 17:13:54 CET 2001


On Wed, 21 Feb 2001 rijn at swi.psy.uva.nl wrote:

>
> After some fiddling around with the tapply command, I discovered that the
> factors (the INDEX argument) given to tapply must be specified in
> fastest-cycling first order.

Wait a minute: the factors can be supplied in any order.  If you ignore
the structure of the result you will get confused, though.

There are two cases:

1) `FUN' returns a single atomic value. The result is a simple array, and
is specified on the help page.

2) FUN' does not return a single atomic value. The result is a list with a
dim attribute, so an array each of whose elements is a vector (and a list
is a vector).

I think you are just discovering that if you collapse an array to a
vector, you get the results in Fortran order.

>
> The following code shows how I discovered my error: (R version 1.2.1)
>
> -o-o-o-o-o-
>
> x <- as.data.frame(list(data=c(-9,0,3,1,-9,1,0,-9,0,3,1,-9,1,0),
>                    subj=c(rep(1,7),rep(2,7)),
>                    cond=rep(c(rep(1,4),rep(2,3)),2)))
>
> x$first <- unlist(tapply(x$data,list(x$subj,x$cond),
>                        function(x) {
>                          retval<-rep(F,length(x));
>                          if (length(x[x>=0])>0) {
>                            retval[min(which(x>=0))]<-T;
>                          }
>                          print(cbind(x,retval)); # Print some debug info
>                          retval}))

The order is only relevant because you unlisted an array.  Nobody
said that you could add the results to x after unlisting: that's an
assumption.

> -o-o-o-o-
>
> resulting in:
>
> > x
>    data subj cond first
> 1    -9    1    1 FALSE
> 2     0    1    1  TRUE
> 3     3    1    1 FALSE
> 4     1    1    1 FALSE
> 5    -9    1    2 FALSE
> 6     1    1    2  TRUE
> 7     0    1    2 FALSE
> 8    -9    2    1 FALSE
> 9     0    2    1 FALSE # <--
> 10    3    2    1  TRUE # <--
> 11    1    2    1 FALSE
> 12   -9    2    2 FALSE
> 13    1    2    2  TRUE
> 14    0    2    2 FALSE
>
> I could not find any reference to this order in the tapply help file nor
> in "An Introduction to R" (Version 1.2.1 (2001-01-15), PDF file p17), it
> might prove useful to include some information about this.

The ordering seems to me to be nothing to do with tapply: that returns an
array with dimnames referring to the cells used.


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list