[R] more flexible "ave"

Adaikalavan Ramasamy a.ramasamy at imperial.ac.uk
Tue Nov 30 16:21:24 CET 2010


Here is a possible solution using sweep instead of ave:

   df <- data.frame(site = c("a", "a", "a", "b", "b", "b"),
                    gr = c("total", "x1", "x2", "x1", "total","x2"),
                    value1 = c(212, 56, 87, 33, 456, 213),
                    value2 = c(1546, 560, 543, 234, 654, 312) )

   sdf <- split(df, df$site)

   out <- lapply( sdf, function(mat){

     small.mat <- mat[ , -c(1,2)]
     totals    <- mat[ which( mat[ , "gr"] == "total" ), -c(1,2) ]
     totals    <- as.numeric(totals)

     percent=sweep( small.mat, MARGIN=2, STATS=totals, FUN="/" )
     colnames(percent) <- paste("percent_", colnames(percent), sep="")
     return( cbind(mat, percent) )
   } )

   do.call("rbind", out)

       site    gr value1 value2 percent_value1 percent_value2
   a.1    a total    212   1546     1.00000000      1.0000000
   a.2    a    x1     56    560     0.26415094      0.3622251
   a.3    a    x2     87    543     0.41037736      0.3512290
   b.4    b    x1     33    234     0.07236842      0.3577982
   b.5    b total    456    654     1.00000000      1.0000000
   b.6    b    x2    213    312     0.46710526      0.4770642

Also I think it might be more efficient to replace your "gr" variable 
with a binary 0,1 where 1 indicates the total. That way you don't have 
to generate x1, x2, x3, ....

Regards, Adai


On 30/11/2010 14:42, Patrick Hausmann wrote:
> Hi all,
>
> I would like to calculate the percent of the total per group for this
> data.frame:
>
> df<- data.frame(site = c("a", "a", "a", "b", "b", "b"),
>                    gr = c("total", "x1", "x2", "x1", "total","x2"),
>                    value1 = c(212, 56, 87, 33, 456, 213))
> df
>
> calcPercent<- function(df) {
>
>       df<- transform(df, pct_val1 = ave(df[, -c(1:2)], df$gr,
>                                 FUN = function(x)
>                                 x/df[df$gr == "total", "value1"]) )
> }
>
> # This works as intended...
> w<- lapply(split(df, df$site), calcPercent)
> w<- do.call(rbind, w)
> w
>
> # ... but when I add a new column
> df$value2<- c(1546, 560, 543, 234, 654, 312)
>
> # the result is not what I want...
> w<- lapply(split(df, df$site), calcPercent)
> w<- do.call(rbind, w)
> w
>
> Clearly I have to change the function, (particularly "value1") - but
> how... I've also played around with "apply" but without any success.
>
> Thanks for any help!
> Patrick
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list