[R] tapply confusion

Milan Bouchet-Valat nalimilan at club.fr
Wed Aug 29 20:30:27 CEST 2012


Le mercredi 29 août 2012 à 07:37 -0700, andyspeak a écrit :
> Hello
> I have a huge data frame with three columns 'Roof' 'Month' and 'Temp'
> i want to run analyses on the numerical Temp data by the factors Roof and
> Month, separately and together.
> For using more than one factor i understand i should use aggregate, but i am
> struggling with the tapply for single factor analysis.
> 
> >  tapply(Temp, INDEX = Roof, FUN = median)
> 
> This works fine, however if i try to do anything a bit more complex, such
> as:
> 
> > tapply(Temp, INDEX = Roof, FUN = kruskal.test)
> 
> it gives the error - Error in length(g) : 'g' is missing
> 
> What could be the problem?
If you read ?kruskal.test, you'll notice its default function takes (at
least) two arguments, the second being "g". Its description is:
       g: a vector or factor object giving the group for the
          corresponding elements of ‘x’.  Ignored if ‘x’ is a list.

So you do not need tapply(): just call
kruskal.test(Temp, Roof)


The "theoretical" reason you cannot use tapply() is that it calls "FUN"
separately for each subset of the data. kruskal.test() would never be
passed the whole data set, which is needed to make a test of
differences.


Regards




More information about the R-help mailing list