[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?

Dimitris Rizopoulos d.rizopoulos at erasmusmc.nl
Wed Aug 17 14:06:04 CEST 2011


Have a look at function ave(), e.g.,

set.seed(1)
(df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
     Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),])

edf <- function(x) ecdf(x)(x)
df$edf <- with(df, ave(Value, Group, FUN = edf))
df


I hope it helps.

Best,
Dimitris


On 8/17/2011 12:42 PM, Marius Hofert wrote:
> Dear all,
>
> First, let's create some data to play around:
>
> set.seed(1)
> (df<- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
>                   Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),])
>
> ## Now we need the empirical distribution function:
> edf<- function(x) ecdf(x)(x) # empirical distribution function evaluated at x
>
> ## The big question is how one can apply the empirical distribution function to
> ## each subset of df determined by "Group", so how to apply it to Group1, then
> ## to Group2, and finally to Group3. You might suggest (?) to use tapply:
>
> (edf.<- tapply(df$Value, df$Group, FUN=edf))
>
> ## That's correct. But typically, one would like to obtain not only the values,
> ## but a data.frame containing the original information and the new (edf-)values.
> ## What's a simple way to get this? (one would be required to first sort df
> ## according to Group, then paste the values computed by edf to the sorted df;
> ## seems a bit tedious).
> ## A solution I have is the following (but I would like to know if there is a
> ## simpler one):
>
> (edf..<- do.call("rbind", lapply(unique(df$Group), function(strg){
>      subdata<- subset(df, Group==strg) # sub-data
>      subdata<- cbind(subdata, edf=edf(subdata$Value))
> })) )
>
>
> Cheers,
>
> Marius
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/



More information about the R-help mailing list