[R] using functions with multiple arguments in the "apply" family

Peter Ehlers ehlers at ucalgary.ca
Thu Jan 28 20:53:34 CET 2010

chipmaney wrote:
> typically, the apply family wants you to use vectors to run functions on. 
> However, I have a function, kruskal.test, that requires 2 arguments.
> kruskal.test(Herb.df$Score,Herb.df$Year)
> This easily computes the KW ANOVA statistic for any difference across
> years....
> However, my data has multiple sites on which KW needs to be run...
> here's the data:
> Herb.df<-
> data.frame(Score=rep(c(2,4,6,6,6,5,7,8,6,9),2),Year=rep(c(rep(1,5),rep(2,5)),2),Site=c(rep(3,10),rep(4,10)))
> However, if I try this:
>  tapply(Herb.df,Herb.df$Site,function(.data)
> kruskal.test(.data$Indicator_Rating,.data$Year))
>> Error in tapply(Herb.df, Herb.df$ID, function(.data)
> kruskal.test(.data$Indicator_Rating,  : 
>   arguments must have same length
> How can I vectorize the kruskal.test() for all sites using tapply() in lieu
> of a loop?

Your example data makes little sense; you have precisely the
same data for both sites and you have only two sites (why do
kruskal.test on two sites?). Finally, you need to decide what
your response variable is: 'Score' or 'Indicator_Rating'.

So here's some made-up data and the use of by() to apply
the test to each site:

dat <- data.frame(y = rnorm(60), yr=gl(4,5,60), st=gl(3,20))
with(dat, by(dat, st, function(x) kruskal.test(y~yr, data=x)))

See the last example in ?by.

  -Peter Ehlers


Peter Ehlers
University of Calgary

More information about the R-help mailing list