[R] Efficient computation of trimmed stats?

Mon May 14 18:58:42 CEST 2007

Hi everyone,

I was wondering if there is anything already implemented for  
efficient ("row-wise") computation of group-specific trimmed stats  
(mean and sd on the trimmed vector) on large matrices.

For example:

set.seed(1)
nc = 300
nr = 250000
x = matrix(rnorm(nc*nr), ncol=nc)
g = matrix(sample(1:3, nr*nc, rep=T), ncol=nc)

trimmedMeanByGroup <- function(y, grp, trim=.05)
   tapply(y, factor(grp, levels=1:3), mean, trim=trim)

sapply(1:10, function(i) trimmedMeanByGroup(x[i,], g[i,]))

works fine... but:

 > system.time(sapply(1:nr, function(i) trimmedMeanByGroup(x[i,], g 
[i,])))
    user  system elapsed
399.928   0.019 399.988

does not look interesting for me.

Maybe some package has some implementation of the above?

Thank you very much,
-b

--
Benilton Carvalho
PhD Candidate
Department of Biostatistics
Bloomberg School of Public Health
Johns Hopkins University
bcarvalh at jhsph.edu