[R] dividing a matrix by positive sum or negative sum depending on the sign

Hao Cen hcen at andrew.cmu.edu
Thu Nov 12 16:21:42 CET 2009


Hi David and Dimitris,

Thanks for your suggestions. They are very helpful.

Jeff

On Wed, November 11, 2009 12:12 pm, David Winsemius wrote:
>

> On Nov 11, 2009, at 10:57 AM, David Winsemius wrote:
>
>
>>
>> On Nov 11, 2009, at 10:36 AM, Dimitris Rizopoulos wrote:
>>
>>
>>> one approach is the following:
>>>
>>> mat <- rbind(c(-1, -1, 2, NA), c(3, 3, -2, -1), c(1, 1, NA, -2))
>>>
>>> mat / ave(abs(mat), row(mat), sign(mat), FUN = sum)
>>
>> Very elegant. My solution was a bit more pedestrian, but may have
>> some speed advantage:
>>
>
>
>
> I am wondering if there might be further performance improvements if
> sums were pre-calculated before the ifelse scaling step.
>
> Perhaps:
>
>> mat <- matrix(sample(-4:4, 100, replace=T), ncol=10)
>> system.time(replicate(10000, t(apply(mat, 1, function(x) {negs <-
> sum(x[x<0], na.rm=T); poss <- sum(x[x>0], na.rm=T); ifelse( x <0, -x/
> negs, x/poss)} ) ) ) ) user  system elapsed 9.420   0.103   9.619
>
>> system.time(replicate(10000, t( apply(mat, 1, function(x) ifelse( x
> <0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0], na.rm=T) ) ) ) ) )
> user  system elapsed 8.206   0.035   8.231
>
>
> That was only a 15% improvement but I got a 50% improvement by
> replacing the ifelse() with its Boolean algebra equivalent:
>
>> t( apply(mat, 1, function(x) -x*(x <0)/sum(x[x<0], na.rm=T) +
> x*(x>0)/sum(x[x>0], na.rm=T) ) ) [,1] [,2]       [,3]       [,4]
> [1,] -0.5 -0.5  1.0000000         NA
> [2,]  0.5  0.5 -0.6666667 -0.3333333
> [3,]  0.5  0.5         NA -1.0000000
>
>
>
>> system.time(replicate(10000,  t( apply(mat, 1, function(x) -x*(x
> <0)/sum(x[x<0], na.rm=T) + x*(x>0)/sum(x[x>0], na.rm=T) ) ) ))
> user  system elapsed 4.805   0.041   4.839
>
>
> I could not figure out the Jeff's method of applying the two functions
> he presented, so I am unable to compare any of these methods to his
> strategy.
>
> --
> David.
>
>>
>>
>>> system.time(replicate(10000, t( apply(mat, 1, function(x)
>> ifelse( x <0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0], na.rm=T) ) ) ) ) )
>> user  system elapsed 5.958   0.027   5.977
>>
>>
>>> system.time(replicate(10000, mat / ave(abs(mat), row(mat),
>> sign(mat), FUN = sum) ) ) user  system elapsed 12.886   0.064  12.886
>>
>>
>> --
>> David
>>
>>>
>>>
>>> I hope it helps.
>>>
>>>
>>> Best,
>>> Dimitris
>>>
>>>
>>>
>>> Hao Cen wrote:
>>>
>>>> Hi,
>>>> I have a matrix with positive numbers, negative numbers, and NAs. An
>>>>  example of the matrix is as follows -1 -1 2 NA
>>>> 3 3 -2 -1
>>>> 1 1 NA -2
>>>> I need to compute a scaled version of this matrix. The scaling
>>>> method is dividing each positive numbers in each row by the sum of
>>>> positive numbers in that row and  dividing each negative numbers in
>>>> each row by the sum of absolute value of negative numbers in that
>>>> row. So the resulting matrix would be
>>>> -1/2 -1/2 2/2 NA
>>>> 3/6 3/6 -2/3 -1/3
>>>> 1/2 1/2 NA -2/2
>>>> Is there an efficient way to do that in R? One way I am using is
>>>> 1. rowSums for positive numbers in the matrix
>>>> 2. rowSums for negative numbers in the matrix
>>>> 3. sweep(mat, 1, posSumVec, posDivFun)
>>>> 4. sweep(mat, 1, negSumVec, negDivFun)
>>>> posDivFun = function(x,y) { xPosId = x>0 & !is.na(x) x[xPosId] =
>>>> x[xPosId]/y[xPosId] return(x) }
>>>> negDivFun = function(x,y) { xNegId = x<0 & !is.na(x) x[xNegId] =
>>>> -x[xNegId]/y[xNegId]
>>>> return(x) }
>>>> It is not fast enough though. This scaling is to be applied to
>>>> large data sets repetitively. I would like to make it as fast as
>>>> possible. Any thoughts on improving it would be appreciated. Thanks
>>>> Jeff
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> --
>>> Dimitris Rizopoulos
>>> Assistant Professor
>>> Department of Biostatistics
>>> Erasmus University Medical Center
>>>
>>>
>>> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
>>> Tel: +31/(0)10/7043478
>>> Fax: +31/(0)10/7043014
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list