[Rd] Potential improvements of ave?
SOEIRO Thomas
Thom@@@SOEIRO @end|ng |rom @p-hm@|r
Fri Mar 12 23:59:12 CET 2021
Dear all,
I have two questions/suggestions about ave, but I am not sure if it's relevant for bug reports.
1) I have performance issues with ave in a case where I didn't expect it. The following code runs as expected:
set.seed(1)
df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE),
id2 = sample(1:3, 5e2, TRUE),
id3 = sample(1:5, 5e2, TRUE),
val = sample(1:300, 5e2, TRUE))
df1$diff <- ave(df1$val,
df1$id1,
df1$id2,
df1$id3,
FUN = function(i) c(diff(i), 0))
head(df1[order(df1$id1,
df1$id2,
df1$id3), ])
But when expanding the data.frame (* 1e4), ave fails (Error: cannot allocate vector of size 1110.0 Gb):
df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE),
id2 = sample(1:3, 5e2 * 1e4, TRUE),
id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE),
val = sample(1:300, 5e2 * 1e4, TRUE))
df2$diff <- ave(df2$val,
df2$id1,
df2$id2,
df2$id3,
FUN = function(i) c(diff(i), 0))
This use case does not seem extreme to me (e.g. aggregate et al work perfectly on this data.frame).
So my question is: Is this expected/intended/reasonable? i.e. Does ave need to be optimized?
2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to avoid warnings in case of unused levels (https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html).
Is it relevant/possible to expose the drop argument explicitly?
Thanks,
Thomas
More information about the R-devel
mailing list