[R] more flexible "ave"
Charles C. Berry
cberry at tajo.ucsd.edu
Tue Nov 30 17:37:56 CET 2010
On Tue, 30 Nov 2010, Adaikalavan Ramasamy wrote:
> Here is a possible solution using sweep instead of ave:
>
> df <- data.frame(site = c("a", "a", "a", "b", "b", "b"),
> gr = c("total", "x1", "x2", "x1", "total","x2"),
> value1 = c(212, 56, 87, 33, 456, 213),
> value2 = c(1546, 560, 543, 234, 654, 312) )
>
lm() and friends provide a simple approach:
> cbind( df, percent =
+ df[,-(1:2)] /
+ predict( lm( cbind(value1,value2) ~ gr*site, df),
+ new=data.frame(site=df$site,gr='total' ))
+ )
site gr value1 value2 percent.value1 percent.value2
1 a total 212 1546 1.00000000 1.0000000
2 a x1 56 560 0.26415094 0.3622251
3 a x2 87 543 0.41037736 0.3512290
4 b x1 33 234 0.07236842 0.3577982
5 b total 456 654 1.00000000 1.0000000
6 b x2 213 312 0.46710526 0.4770642
>
HTH,
Chuck
> sdf <- split(df, df$site)
>
> out <- lapply( sdf, function(mat){
>
> small.mat <- mat[ , -c(1,2)]
> totals <- mat[ which( mat[ , "gr"] == "total" ), -c(1,2) ]
> totals <- as.numeric(totals)
>
> percent=sweep( small.mat, MARGIN=2, STATS=totals, FUN="/" )
> colnames(percent) <- paste("percent_", colnames(percent), sep="")
> return( cbind(mat, percent) )
> } )
>
> do.call("rbind", out)
>
> site gr value1 value2 percent_value1 percent_value2
> a.1 a total 212 1546 1.00000000 1.0000000
> a.2 a x1 56 560 0.26415094 0.3622251
> a.3 a x2 87 543 0.41037736 0.3512290
> b.4 b x1 33 234 0.07236842 0.3577982
> b.5 b total 456 654 1.00000000 1.0000000
> b.6 b x2 213 312 0.46710526 0.4770642
>
> Also I think it might be more efficient to replace your "gr" variable with a
> binary 0,1 where 1 indicates the total. That way you don't have to generate
> x1, x2, x3, ....
>
> Regards, Adai
>
>
> On 30/11/2010 14:42, Patrick Hausmann wrote:
>> Hi all,
>>
>> I would like to calculate the percent of the total per group for this
>> data.frame:
>>
>> df<- data.frame(site = c("a", "a", "a", "b", "b", "b"),
>> gr = c("total", "x1", "x2", "x1", "total","x2"),
>> value1 = c(212, 56, 87, 33, 456, 213))
>> df
>>
>> calcPercent<- function(df) {
>>
>> df<- transform(df, pct_val1 = ave(df[, -c(1:2)], df$gr,
>> FUN = function(x)
>> x/df[df$gr == "total", "value1"]) )
>> }
>>
>> # This works as intended...
>> w<- lapply(split(df, df$site), calcPercent)
>> w<- do.call(rbind, w)
>> w
>>
>> # ... but when I add a new column
>> df$value2<- c(1546, 560, 543, 234, 654, 312)
>>
>> # the result is not what I want...
>> w<- lapply(split(df, df$site), calcPercent)
>> w<- do.call(rbind, w)
>> w
>>
>> Clearly I have to change the function, (particularly "value1") - but
>> how... I've also played around with "apply" but without any success.
>>
>> Thanks for any help!
>> Patrick
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry Dept of Family/Preventive Medicine
cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list