[R] pairwise difference operator
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Mon Aug 2 17:42:23 CEST 2004
Thank you to Marc Schwartz and Gabor Grothendieck for their responses.
Both solutions are useful.
It would be nice to generalise this problem to situations where other
operations besides difference. Maybe a new member of apply family -
pwapply for pairwise apply ?
Of course the output would be different if the results of an operation
on two columns produce a vector (like pairwise difference here) or a
single value (like in correlation or pairwise t-test) and one need to
somehow account for this.
Thank you again.
Regards, Adai
On Sun, 2004-08-01 at 00:42, Marc Schwartz wrote:
> On Fri, 2004-07-30 at 20:28, Marc Schwartz wrote:
> > On Fri, 2004-07-30 at 18:30, Adaikalavan Ramasamy wrote:
> > > There was a BioConductor thread today where the poster wanted to find
> > > pairwise difference between columns of a matrix. I suggested the slow
> > > solution below, hoping that someone might suggest a faster and/or more
> > > elegant solution, but no other response.
> > >
> > > I tried unsuccessfully with the apply() family. Searching the mailing
> > > list was not very fruitful either. The closest I got to was a cryptic
> > > chunk of code in pairwise.table().
> > >
> > > Since I do use something similar myself occasionally, I am hoping
> > > someone from the R-help list can suggest alternatives or past threads.
> > > Thank you.
>
> <snip>
>
> In follow up to the posts on this last night, I created an updated
> version of my function (though I will point out that Gabor's is faster,
> as I will show below).
>
> I realized that using the combinations() function had a potential
> limitation, which is the limits of R's recursion depth, as Greg mentions
> in the help for the function. It will require an adjustment when the
> number of columns is about 45.
>
> Thus, I modified the creation of the column combinations as noted below.
> I also added some code to verify the input data type and to ensure that
> the resultant structures remain matrices in the case of an input matrix
> with ncol = 2, in which case, this function is of course, overkill.
>
> Thus:
>
> pairwise.diffs <- function(x)
> {
> stopifnot(is.matrix(x))
>
> # create column combination pairs
> prs <- cbind(rep(1:ncol(x), each = ncol(x)), 1:ncol(x))
> col.diffs <- prs[prs[, 1] < prs[, 2], , drop = FALSE]
>
> # do pairwise differences
> result <- x[, col.diffs[, 1]] - x[, col.diffs[, 2], drop = FALSE]
>
> # set colnames
> if(is.null(colnames(x)))
> colnames(x) <- 1:ncol(x)
>
> colnames(result) <- paste(colnames(x)[col.diffs[, 1]], ".vs.",
> colnames(x)[col.diffs[, 2]], sep = "")
> result
> }
>
>
> Now to performance. I created a large 1,000 column matrix:
>
> mat <- matrix(sample(100, 10000, replace = TRUE), ncol = 1000)
> colnames(mat) <- 1:1000
>
> > str(mat)
> int [1:10, 1:1000] 48 23 26 22 69 64 2 13 13 69 ...
> - attr(*, "dimnames")=List of 2
> ..$ : NULL
> ..$ : chr [1:1000] "1" "2" "3" "4" ...
>
>
> Timing:
>
> > gc();system.time(m <- pairwise.diffs(mat))
> used (Mb) gc trigger (Mb)
> Ncells 1541241 41.2 3094291 82.7
> Vcells 7139074 54.5 17257300 131.7
> [1] 1.14 0.19 1.39 0.00 0.00
>
>
> > gc();system.time(g <- do.call("cbind", sapply(2:ncol(mat),
> f, mat)))
> used (Mb) gc trigger (Mb)
> Ncells 1541241 41.2 3094291 82.7
> Vcells 7139074 54.5 17257300 131.7
> [1] 0.81 0.02 0.92 0.00 0.00
>
>
> Comparisons:
>
> > str(m)
> int [1:10, 1:499500] -47 -43 -35 -29 15 33 -53 -36 -17 57 ...
> - attr(*, "dimnames")=List of 2
> ..$ : NULL
> ..$ : chr [1:499500] "1.vs.2" "1.vs.3" "1.vs.4" "1.vs.5" ...
>
>
> > str(g)
> int [1:10, 1:499500] -47 -43 -35 -29 15 33 -53 -36 -17 57 ...
> - attr(*, "dimnames")=List of 2
> ..$ : NULL
> ..$ : chr [1:499500] "1-2" "1-3" "1-4" "1-5" ...
>
>
> > table(m == g)
>
> TRUE
> 4995000
>
>
> HTH,
>
> Marc Schwartz
>
>
>
More information about the R-help
mailing list