[R] Avoiding loops
Martin Morgan
mtmorgan at fhcrc.org
Wed Sep 2 18:17:01 CEST 2009
Alexander Shenkin wrote:
> Though, from my limited understanding, the 'apply' family of functions
> are actually just loops. Please correct me if I'm wrong. So, while
> more readable (which is important), they're not necessarily more
> efficient than explicit 'for' loops.
Hi Allie -- This uses an R-level loop (and a lot of C loops!), but the
length of the loop is only as long as the maximum lag
f0 <- function(df0, max_lag)
{
max_lag <- min(nrow(df0), max_lag)
a <- df0[[1]]
ans <- df <- df0[,-1, drop=FALSE]
for (lag in seq_len(max_lag)) {
idx <- diff(a, lag) <= max_lag
pad <- logical(lag)
ans[c(pad, idx),] <- ans[c(pad, idx),] + df[c(idx, pad),]
}
cbind(a, ans)
}
it makes the assumption that 'a' is sorted and unique, as in a time
series. This
f1 <- function(df0, max_lag)
{
max_lag <- min(nrow(df0), max_lag)
a <- df0[[1]]
ans <- df0[,-1, drop=FALSE]
lag <- 1
while(sum(idx <- diff(a, lag) <= max_lag) != 0) {
pad <- logical(lag)
ans[c(pad, idx),] <- ans[c(pad, idx),] + df[c(idx, pad),]
lag <- lag + 1
}
cbind(a, ans)
}
relaxes the assumption that 'a' is unique, I think, but I haven't tested
carefully; it seems to perform about the same as f0. I think there's a
clever recursive solution in there, too.
This is my implementation of Phil's solution
phil0 <- function(df0, max_lag)
{
with(df0, {
g <- function(x)
apply(df0[a - x >= -max_lag & a - x <= 0, c('b','c')],
2, sum)
data.frame(a, t(sapply(a, g)))
})
}
Here's my implementation of Chuck Berry's solution
chuck0 <- function(df0, max_lag)
{
criterion <-
as.matrix(dist(df0$a)) <= max_lag & outer(df0$a,df0$a,">=")
criterion %*% as.matrix(df0[, c("b","c")])
}
Here's a data generator
setup <- function(n, m)
## n: number of rows
## m: expected counts per sum
{
a0 <- sort(sample(seq_len(m * n), n))
data.frame(a=a0, b=as.integer(runif(n, 1, 10)),
c=as.integer(runif(n, 1, 10)))
}
and a comparison with
df0 <- setup(10^3, 3)
max_lag <- 5
> system.time(f0res <- f0(df0, max_lag), gcFirst=TRUE)
user system elapsed
0.016 0.000 0.016
> system.time(phil0res <- phil0(df0, max_lag), gcFirst=TRUE)
user system elapsed
0.960 0.000 0.962
> system.time(chuck0res <- chuck0(df0, 5), gcFirst=TRUE)
user system elapsed
0.252 0.000 0.254
> all.equal(f0res, phil0res)
[1] TRUE
> all.equal(as.matrix(f0res[,2:3]), chuck0res, check.attributes=FALSE)
[1] TRUE
The f0 solution seems to be usable up to about a million rows,
> df0 <- setup(10^6, 3)
> system.time(f0res <- f0(df0, max_lag), gcFirst=TRUE)
user system elapsed
2.680 0.004 2.700
Martin
>
> allie
>
> On 9/2/2009 3:13 AM, Phil Spector wrote:
>> Here's one way (assuming your data frame is named dat):
>>
>> with(dat,
>> data.frame(a,t(sapply(a,function(x){
>> apply(dat[a - x >= -5 & a - x <=
>> 0,c('b','c')],2,sum)}))))
>>
>>
>> - Phil Spector
>> Statistical Computing Facility
>> Department of Statistics
>> UC Berkeley
>> spector at stat.berkeley.edu
>>
>>
>>
>> On Tue, 1 Sep 2009, dolar wrote:
>>
>>> Would like some tips on how to avoid loops as I know they are slow in R
>>>
>>> i've got a data frame :
>>>
>>> a b c
>>> 1 5 2
>>> 4 6 9
>>> 5 2 3
>>> 8 3 2
>>>
>>> What i'd like is to sum for each value of a, the sum of b and the sum
>>> of c
>>> where a equal to or less than (with a distance of 5)
>>>
>>> i.e. for row three
>>> we have a=5
>>> i'd like to sum up b and sum up c with the above rule
>>> since 5, 4 and 1 are less than (within a distance of 5) or equal to
>>> 5, then
>>> we should get the following result:
>>>
>>> a b c
>>> 5 13 14
>>>
>>> the overall result should be
>>> a b c
>>> 1 5 2
>>> 4 11 11
>>> 5 13 14
>>> 8 11 14
>>>
>>> how can i do this without a loop?
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Avoiding-loops-tp25251376p25251376.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list