[R] Repeated analysis over groups / Splitting by group variable

Peter Ehlers ehlers at ucalgary.ca
Fri Jul 16 02:09:38 CEST 2010


I would change that first dataOnly in
by(...) or lapply(...) to dataOnly[,-3].

In fact, if the dataframe mydata is suitably
subset, then, because of the as.matrix() in
function(x), both the by() and lapply() methods
will work fine with mydata.

   -Peter Ehlers

On 2010-07-15 15:42, Phil Spector wrote:
> Ralf -
> If you want to use by(), I think it should look like this:
>
> by(dataOnly,dataOnly[,3],function(x)KLdiv(as.matrix(x)))
>
> But you might find the following more useful:
>
> lapply(split(as.data.frame(dataOnly),dataOnly[,3]),
> function(x)KLdiv(as.matrix(x)))
>
> since it returns its results in a list.
>
> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spector at stat.berkeley.edu
>
>
>
> On Thu, 15 Jul 2010, Ralf B wrote:
>
>> I am performing some analysis over a large data frame and would like
>> to conduct repeated analysis over grouped-up subsets. How can I do
>> that?
>>
>> Here some example code for clarification:
>>
>> require("flexmix") # for Kullback-Leibler divergence
>> n <- 23
>> groups <- c(1,2,3)
>> mydata <- data.frame(
>> sequence=c(1:n),
>> data1=c(rnorm(n)),
>> data2=c(rnorm(n)),
>> group=rep(sample(groups, n, replace=TRUE))
>> )
>> # Part 1: full stats (works fine)
>> dataOnly <- cbind(mydata$data1, mydata$data2, mydata$group)
>> KLdiv(dataOnly)
>>
>> #
>> # Part 2: again - but once for each group (error)
>> #
>> by(dataOnly, groups, KLdiv(dataOnly))
>>
>> The error I am getting is: Error in tapply(1L:23L, list(INDICES = c(1,
>> 2, 3)), function (x) :
>> arguments must have same length
>>
>> Are there better ways than 'by' ? I would like to use different stats
>> and functions and therefore I am looking for a splitter whose output I
>> can hand to any statical function I want.
>>
>> Any ideas?
>>
>> Ralf
>>



More information about the R-help mailing list