[R] Repeated analysis over groups / Splitting by group variable

Phil Spector spector at stat.berkeley.edu
Thu Jul 15 23:42:19 CEST 2010


Ralf -
    If you want to use by(), I think it should look like 
this:

by(dataOnly,dataOnly[,3],function(x)KLdiv(as.matrix(x)))

But you might find the following more useful:

lapply(split(as.data.frame(dataOnly),dataOnly[,3]),
              function(x)KLdiv(as.matrix(x)))

since it returns its results in a list.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu



On Thu, 15 Jul 2010, Ralf B wrote:

> I am performing some analysis over a large data frame and would like
> to conduct repeated analysis over grouped-up subsets. How can I do
> that?
>
> Here some example code for clarification:
>
> require("flexmix")	# for Kullback-Leibler divergence
> n <- 23
> groups <- c(1,2,3)
> mydata <- data.frame(
> 	sequence=c(1:n),
> 	data1=c(rnorm(n)),
> 	data2=c(rnorm(n)),
> 	group=rep(sample(groups, n, replace=TRUE))
> )
> # Part 1: full stats (works fine)
> dataOnly <- cbind(mydata$data1, mydata$data2, mydata$group)
> KLdiv(dataOnly)
>
> #
> # Part 2: again - but once for each group (error)
> #
> by(dataOnly, groups, KLdiv(dataOnly))
>
> The error I am getting is: Error in tapply(1L:23L, list(INDICES = c(1,
> 2, 3)), function (x)  :
>  arguments must have same length
>
> Are there better ways than 'by' ? I would like to use different stats
> and functions and therefore I am looking for a splitter whose output I
> can hand to any statical function I want.
>
> Any ideas?
>
> Ralf
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list