[Bioc-devel] vectorize default dist2 function in genefilter

James F. Reid reidjf at gmail.com
Wed Mar 13 10:41:21 CET 2013


Hi Wolfgang,

On 12/03/13 21:11, Wolfgang Huber wrote:
>
> Dear James
>
> Thank you. What would the saved time be (e.g. compared to the overall runtime of arrayQualityMetrics)? I would be surprised if the saving was worth the added complexity, but am always happy to be surprised.

I believe so but maybe I'm missing something.
Here are the system times on a 20000*100 on my laptop using just dist2, 
which would also have a consequence on the overall runtime of 
arrayQualityMetrics in aqm.heamap.


dist2 <- function (x,
                    fun = function(a, b) mean(abs(a - b), na.rm = TRUE),
                    diagonal = 0) {

     defaultFun <- ifelse(missing(fun), TRUE, FALSE)

     if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
         stop("'diagonal' must be a numeric scalar.")
     res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
     colnames(res) = rownames(res) = colnames(x)
     if (ncol(x) >= 2) {

         if (defaultFun) {
             res <- apply(x, 2, function(i) colMeans(abs(x - i), 
na.rm=TRUE))
         } else {
             for (j in 2:ncol(x))
                 for (i in 1:(j - 1))
                     res[i, j] = res[j, i] = fun(x[, i], x[, j])
         }
     }

     return(res)
}

y <- matrix(rnorm(20000 * 100), 20000, 100)

system.time(dist2(x = y, fun = function(a, b) mean(abs(a - b), na.rm=TRUE)))
##   user  system elapsed
## 11.664   0.060  11.800

system.time(dist2(x = y))
##  user  system elapsed
## 5.201   0.348   5.600

Not sure what you'd want to change to the Rd.

Best,
James.

>
> A patch of the .R and .Rd file would be most welcome and expedite the change.
>
> Btw, colSums apparently also works with 3-dim arrays, so both loops (over i and j) could be vectorised, however afaIcs at the cost of constructing an object of size nrow(x)^3 in memory, which might again break performance.
>
> 	Best wishes
> 	Wolfgang
>
> Il giorno Mar 12, 2013, alle ore 4:43 PM, James F. Reid <reidjf at gmail.com> ha scritto:
>
>> Dear bioc-devel,
>>
>> the dist2 function in genefilter defined as:
>>
>> dist2 <- function (x, fun = function(a, b) mean(abs(a - b), na.rm = TRUE), diagonal = 0) {
>>
>>     if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
>>         stop("'diagonal' must be a numeric scalar.")
>>     res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
>>     colnames(res) = rownames(res) = colnames(x)
>>     if (ncol(x) >= 2) {
>>         for (j in 2:ncol(x)) for (i in 1:(j - 1)) res[i, j] = res[j,
>>             i] = fun(x[, i], x[, j])
>>     }
>>     return(res)
>> }
>>
>> could have it's default function vectorized as:
>>
>> res <- apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
>>
>> to improve performance for example in the ArrayQualityMetrics package.
>>
>> Best.
>> James.
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list