[R] Fast ave for sorted data?
Charles C. Berry
cberry at tajo.ucsd.edu
Sun Feb 15 20:08:37 CET 2009
On Sun, 15 Feb 2009, Zhou Fang wrote:
> Hi,
>
> This is probably really obvious, by I can't seem to find anything on it.
>
> Is there a fast version of ave for when the data is already sorted in terms
> of the factor, or if the breaks are already known?
>
If all you want are means, you can use rle() and colMeans() to good
effect:
foo2 <-
function (x,y)
{
reps <- rle(x)$lengths
lens <- rep(reps,reps)
uniqLens <- unique(lens)
for (i in uniqLens[ uniqLens != 1]){
y[ lens == i] <-
rep( colMeans(matrix(y[ lens == i], nr=i)), each=i)
}
y
}
> x <- sort( round( runif(100000, 0 , 1 ), 5) )
> y <- sample(1000000,100000)
> all.equal(ave(y,x),foo2(x,y))
[1] TRUE
> system.time(foo2(x,y))
user system elapsed
0.087 0.029 0.117
> system.time(ave(y,x))
user system elapsed
1.933 0.030 1.980
>
If, as in your example, a substantial fraction of the X's are unique, and
if you want to generalize to more than means, then you can still gain a
lot by treating the unique and non-unique values separately like this:
foo <-
function (x,y)
{
reps <- rle(x)$lengths
len.not.1 <- rep(reps,reps) != 1
y[ len.not.1] <- ave( y[ len.not.1], x[ len.not.1 ])
y
}
> y <- sample(1000000,100000)
> x <- sort( round( runif(100000, 0 , 2 ), 5) )
> system.time(foo(x,y))
user system elapsed
0.577 0.027 0.628
> system.time(ave(y,x))
user system elapsed
2.513 0.038 2.545
> table(table(x))
1 2 3 4 5 6
60526 15161 2578 318 28 1
And if neither of these is quite good enough, a line or two of C code
should do the trick. See package 'inline'.
HTH,
Chuck
> Basically, I have:
> X = 0.1, 0.2, 0.32, 0.32, 0.4, 0.56, 0.56, 0.7...
> Y = 223, 434, 343, 544, 231.... etc
> of the same, admittedly large length.
>
> Now note that some of the values of X are repeated. What I want to do is, for
> those X that are repeated, take the corresponding values of Y and change them
> to the average for that particular X.
>
> So, ave(Y,X) will work. But it's very slow, and certainly not suited to my
> problem, where Y changes and X stays the same and I need to repeatedly
> recalculate the averaging of Y. Ave also does not take take advantage of the
> sorting of the data.
>
> So, is there an alternative? (Presumeably avoiding loops.)
>
> Thanks,
>
> Zhou Fang
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list