[R] Speeding indexing and sub-sectioning of 3d array
Swidan, Firas
swidanf at janelia.hhmi.org
Thu Aug 10 20:34:27 CEST 2006
Hi Patrick,
Thanks for the help. The function I listed is just an example. I isolated
and kept only the problematic part in my code for clarity sake. I ended up
implementing the functionality in C and now it takes 22 seconds to calculate
the objective.
Best regards,
Firas.
On 8/9/06 4:41 PM, "Patrick Burns" <pburns at pburns.seanet.com> wrote:
> First off, I hope that the function you list is just an example
> since it only returns what the last iteration does -- obviously
> the same answer can be arrived at much quicker.
>
> The main principal in speeding up loops is to do as little
> inside the loops as possible. 'fjj1' is essentially the same as
> the listed function, but with one slight cleanup.
>
> fjj1 <-
> function(x, radius=3)
> {
> dx <- dim(x)
> dx1 <- dx[1]
> dx2 <- dx[2]
> dx3 <- dx[3]
> for(i in (radius + 1):(dx1 - radius - 1)) {
> for(j in (radius + 1):(dx2 - radius - 1)) {
> for(k in (radius + 1):(dx3 - radius -1)) {
> ans <- mean(x[(i-radius):(i+radius),
> (j-radius):(j+radius),
> (k-radius):(k+radius)])
> }
> }
> }
> ans
> }
>
> The time to run fjj1(jj, 3) on my machine where jj is a
> 245 by 175 by 150 array was 1222 seconds.
>
> 'fjj2' substantially reduces the number of sequences
> created. It took 975 seconds.
>
> fjj2 <-
> function(x, radius=3)
> {
> dx <- dim(x)
> dx1 <- dx[1]
> dx2 <- dx[2]
> dx3 <- dx[3]
> rseq <- -radius:radius
> for(i in (radius + 1):(dx1 - radius - 1)) {
> for(j in (radius + 1):(dx2 - radius - 1)) {
> for(k in (radius + 1):(dx3 - radius -1)) {
> ans <- mean(x[i + rseq, j + rseq, k + rseq])
> }
> }
> }
> ans
> }
>
>
> 'fjj3' reduces some of the subscripting (but possibly at the
> expense of using more memory -- I'm not sure if it does or
> not). It took 936 seconds.
>
> fjj3 <-
> function(x, radius=3)
> {
> dx <- dim(x)
> dx1 <- dx[1]
> dx2 <- dx[2]
> dx3 <- dx[3]
> rseq <- -radius:radius
> for(i in (radius + 1):(dx1 - radius - 1)) {
> A <- x[i + rseq, , , drop=FALSE]
> for(j in (radius + 1):(dx2 - radius - 1)) {
> B <- A[, j + rseq, , drop=FALSE]
> for(k in (radius + 1):(dx3 - radius -1)) {
> ans <- mean(B[ , , k + rseq])
> }
> }
> }
> ans
> }
>
> 'fjj4' reverses the order of the loops. Because of the
> way that arrays are stored, it makes sense that subscripting
> a sequence in the first dimension would be faster than
> subscripting subsequent dimensions. This does seem to be
> the case. 'fjj4' took 839 seconds.
>
> fjj4 <-
> function(x, radius=3)
> {
> dx <- dim(x)
> dx1 <- dx[1]
> dx2 <- dx[2]
> dx3 <- dx[3]
> rseq <- -radius:radius
> for(i in (radius + 1):(dx3 - radius - 1)) {
> A <- x[, ,i + rseq, drop=FALSE]
> for(j in (radius + 1):(dx2 - radius - 1)) {
> B <- A[, j + rseq, , drop=FALSE]
> for(k in (radius + 1):(dx1 - radius -1)) {
> ans <- mean(B[k + rseq, , ])
> }
> }
> }
> ans
> }
>
>
> Another change that would make a marginal difference
> would be to generate the sequences controlling the inner
> loops once at the outset.
>
> If the computation at the heart of the function really is a
> mean or something similar, then it is possible that there
> will be tricks to update that value more efficiently.
>
> Finally, if this will be used enough that the speed is an
> issue, then rewriting it in C would be a good approach.
>
>
> Patrick Burns
> patrick at burns-stat.com
> +44 (0)20 8525 0696
> http://www.burns-stat.com
> (home of S Poetry and "A Guide for the Unwilling S User")
>
> Swidan, Firas wrote:
>
>> Hi,
>>
>> I am having a problem with a very slow indexing and sub-sectioning of a 3d
>> array:
>>
>>
>>
>>> dim(arr)
>>>
>>>
>> [1] 245 175 150
>>
>> For each point in the array, I am trying to calculate the mean of the values
>> in its surrounding:
>>
>>
>> mean( arr[ (i - radius):(i + radius),
>> (j - radius):(j + radius),
>> (k - radius):(k + radius)] )
>>
>> Putting that code in 3 for loops
>>
>> calculateKMedian <- function( arr, radius){
>>
>> for( i in (radius + 1):(dim(arr)[1] - radius - 1) ){
>> for( j in (radius + 1):(dim(arr)[2] - radius - 1) )
>> for( k in (radius + 1):(dim(arr)[3] - radius - 1) ){
>>
>>
>> mediansArr <- mean( arr[ (i - radius):(i + radius),
>> (j - radius):(j + radius),
>> (k - radius):(k + radius)] )
>>
>> }
>> }
>> return(mediansArr)
>> }
>>
>> Results in a very slow run:
>>
>>
>>
>>> system.time( calculateKMedian( a, 3))
>>>
>>>
>> [1] 423.468 0.096 423.631 0.000 0.000
>>
>> If I replace
>>
>> mediansArr <- mean( arr[ (i - radius):(i + radius),
>> (j - radius):(j + radius),
>> (k - radius):(k + radius)] )
>>
>> With an access to the (I,j,k) cell's value
>>
>> mediansArr <- arr[i,j,k]
>>
>> The running time decreases to
>>
>>
>>
>>> system.time( calculateKMedian( a, 3))
>>>
>>>
>> [1] 14.821 0.005 14.829 0.000 0.000
>>
>>
>>
>> But 14 seconds are still pretty expensive for just scanning the array.
>>
>> Is there anything I can do to speed the indexing and the sub-sectioning of
>> the 3d array in this case?
>>
>> Thanks for the help,
>> Firas.
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
More information about the R-help
mailing list