[Bioc-devel] vectorize default dist2 function in genefilter

Wolfgang Huber whuber at embl.de
Thu Mar 14 22:18:40 CET 2013


Hi James

I added your suggestion to the package, it is in svn revision 74349 and should be on the server in the devel section soon.

This an old package - I noted the first commit to it had revision number 24 and was by Robert Gentleman on 2001-08-01.

	Best wishes
	Wolfgang



El Mar 13, 2013, a las 10:41 am, James F. Reid <reidjf at gmail.com> escribió:

> Hi Wolfgang,
> 
> On 12/03/13 21:11, Wolfgang Huber wrote:
>> 
>> Dear James
>> 
>> Thank you. What would the saved time be (e.g. compared to the overall runtime of arrayQualityMetrics)? I would be surprised if the saving was worth the added complexity, but am always happy to be surprised.
> 
> I believe so but maybe I'm missing something.
> Here are the system times on a 20000*100 on my laptop using just dist2, which would also have a consequence on the overall runtime of arrayQualityMetrics in aqm.heamap.
> 
> 
> dist2 <- function (x,
>                   fun = function(a, b) mean(abs(a - b), na.rm = TRUE),
>                   diagonal = 0) {
> 
>    defaultFun <- ifelse(missing(fun), TRUE, FALSE)
> 
>    if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
>        stop("'diagonal' must be a numeric scalar.")
>    res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
>    colnames(res) = rownames(res) = colnames(x)
>    if (ncol(x) >= 2) {
> 
>        if (defaultFun) {
>            res <- apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
>        } else {
>            for (j in 2:ncol(x))
>                for (i in 1:(j - 1))
>                    res[i, j] = res[j, i] = fun(x[, i], x[, j])
>        }
>    }
> 
>    return(res)
> }
> 
> y <- matrix(rnorm(20000 * 100), 20000, 100)
> 
> system.time(dist2(x = y, fun = function(a, b) mean(abs(a - b), na.rm=TRUE)))
> ##   user  system elapsed
> ## 11.664   0.060  11.800
> 
> system.time(dist2(x = y))
> ##  user  system elapsed
> ## 5.201   0.348   5.600
> 
> Not sure what you'd want to change to the Rd.
> 
> Best,
> James.
> 
>> 
>> A patch of the .R and .Rd file would be most welcome and expedite the change.
>> 
>> Btw, colSums apparently also works with 3-dim arrays, so both loops (over i and j) could be vectorised, however afaIcs at the cost of constructing an object of size nrow(x)^3 in memory, which might again break performance.
>> 
>> 	Best wishes
>> 	Wolfgang
>> 
>> Il giorno Mar 12, 2013, alle ore 4:43 PM, James F. Reid <reidjf at gmail.com> ha scritto:
>> 
>>> Dear bioc-devel,
>>> 
>>> the dist2 function in genefilter defined as:
>>> 
>>> dist2 <- function (x, fun = function(a, b) mean(abs(a - b), na.rm = TRUE), diagonal = 0) {
>>> 
>>>    if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
>>>        stop("'diagonal' must be a numeric scalar.")
>>>    res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
>>>    colnames(res) = rownames(res) = colnames(x)
>>>    if (ncol(x) >= 2) {
>>>        for (j in 2:ncol(x)) for (i in 1:(j - 1)) res[i, j] = res[j,
>>>            i] = fun(x[, i], x[, j])
>>>    }
>>>    return(res)
>>> }
>>> 
>>> could have it's default function vectorized as:
>>> 
>>> res <- apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
>>> 
>>> to improve performance for example in the ArrayQualityMetrics package.
>>> 
>>> Best.
>>> James.
>>> 
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 
> 



More information about the Bioc-devel mailing list