[R] more flexible "ave"

Charles C. Berry cberry at tajo.ucsd.edu
Tue Nov 30 17:37:56 CET 2010


On Tue, 30 Nov 2010, Adaikalavan Ramasamy wrote:

> Here is a possible solution using sweep instead of ave:
>
>   df <- data.frame(site = c("a", "a", "a", "b", "b", "b"),
>                    gr = c("total", "x1", "x2", "x1", "total","x2"),
>                    value1 = c(212, 56, 87, 33, 456, 213),
>                    value2 = c(1546, 560, 543, 234, 654, 312) )
>

lm() and friends provide a simple approach:


> cbind( df, percent =
+   df[,-(1:2)] /
+     predict( lm( cbind(value1,value2) ~ gr*site, df),
+              new=data.frame(site=df$site,gr='total' ))
+ )
   site    gr value1 value2 percent.value1 percent.value2
1    a total    212   1546     1.00000000      1.0000000
2    a    x1     56    560     0.26415094      0.3622251
3    a    x2     87    543     0.41037736      0.3512290
4    b    x1     33    234     0.07236842      0.3577982
5    b total    456    654     1.00000000      1.0000000
6    b    x2    213    312     0.46710526      0.4770642
>

HTH,

Chuck

>  sdf <- split(df, df$site)
>
>  out <- lapply( sdf, function(mat){
>
>     small.mat <- mat[ , -c(1,2)]
>     totals    <- mat[ which( mat[ , "gr"] == "total" ), -c(1,2) ]
>     totals    <- as.numeric(totals)
>
>     percent=sweep( small.mat, MARGIN=2, STATS=totals, FUN="/" )
>     colnames(percent) <- paste("percent_", colnames(percent), sep="")
>     return( cbind(mat, percent) )
>   } )
>
>  do.call("rbind", out)
>
>       site    gr value1 value2 percent_value1 percent_value2
>   a.1    a total    212   1546     1.00000000      1.0000000
>   a.2    a    x1     56    560     0.26415094      0.3622251
>   a.3    a    x2     87    543     0.41037736      0.3512290
>   b.4    b    x1     33    234     0.07236842      0.3577982
>   b.5    b total    456    654     1.00000000      1.0000000
>   b.6    b    x2    213    312     0.46710526      0.4770642
>
> Also I think it might be more efficient to replace your "gr" variable with a 
> binary 0,1 where 1 indicates the total. That way you don't have to generate 
> x1, x2, x3, ....
>
> Regards, Adai
>
>
> On 30/11/2010 14:42, Patrick Hausmann wrote:
>>  Hi all,
>>
>>  I would like to calculate the percent of the total per group for this
>>  data.frame:
>>
>>  df<- data.frame(site = c("a", "a", "a", "b", "b", "b"),
>>                     gr = c("total", "x1", "x2", "x1", "total","x2"),
>>                     value1 = c(212, 56, 87, 33, 456, 213))
>>  df
>>
>>  calcPercent<- function(df) {
>>
>>        df<- transform(df, pct_val1 = ave(df[, -c(1:2)], df$gr,
>>                                  FUN = function(x)
>>                                  x/df[df$gr == "total", "value1"]) )
>> }
>>
>>  # This works as intended...
>>  w<- lapply(split(df, df$site), calcPercent)
>>  w<- do.call(rbind, w)
>>  w
>>
>>  # ... but when I add a new column
>>  df$value2<- c(1546, 560, 543, 234, 654, 312)
>>
>>  # the result is not what I want...
>>  w<- lapply(split(df, df$site), calcPercent)
>>  w<- do.call(rbind, w)
>>  w
>>
>>  Clearly I have to change the function, (particularly "value1") - but
>>  how... I've also played around with "apply" but without any success.
>>
>>  Thanks for any help!
>>  Patrick
>>
>>  ______________________________________________
>>  R-help at r-project.org mailing list
>>  https://stat.ethz.ch/mailman/listinfo/r-help
>>  PLEASE do read the posting guide
>>  http://www.R-project.org/posting-guide.html
>>  and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            Dept of Family/Preventive Medicine
cberry at tajo.ucsd.edu			    UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list