[R] Sum efficiently from large matrix according to re-occuring levels of factor?

jim holtman jholtman at gmail.com
Mon Jul 21 03:21:35 CEST 2008


Does this do what you want:

> # following up on another idea that was presented
> # where are the breaks
> dataBreaks <- cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0)))
> # sum up column 3 and output the first two columns with the indices
> result <- lapply(split(seq(nrow(x)), dataBreaks), function(.sect){
+     c(x[.sect[1], 1:2], sum(x[.sect, 3]))
+ })
> do.call(rbind, result)
  [,1] [,2] [,3]
0    1    7    3
1    2    4    2
2    3    2    3
3    1    7   10


On Sun, Jul 20, 2008 at 7:57 PM, Ralph S. <ruffel1 at hotmail.com> wrote:
>
> The first and second column are actually indices of another matrix (my example may make this not sufficiently clear). I want to compare the sum with that corresponding entry, and then record the result of that.
>
> Any idea?
>
> Best,
>
> Ralph
>
>
>
> ----------------------------------------
>> Date: Sun, 20 Jul 2008 16:50:41 -0700
>> From: h.wickham at gmail.com
>> To: ruffel1 at hotmail.com
>> Subject: Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?
>> CC: r-help at r-project.org
>>
>> On Sun, Jul 20, 2008 at 4:47 PM, hadley wickham  wrote:
>>> On Sun, Jul 20, 2008 at 4:16 PM, Ralph S.  wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am trying to calculate the sum for each occurrence of the level of a factor in a very large matrix. In addition, I want to save that sum together with the information of the level of the factor and the level of a second factor.
>>>>
>>>> My matrix looks like this:
>>>>
>>>> x<-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3)
>>>>
>>>> I want to sum according to the levels in the first column and save the sum with the information of the level in the first and the second column in a new matrix.
>>>>
>>>> That is, I want output in the matrix of form:
>>>>
>>>> 1 7 3
>>>> 2 4 2
>>>> 3 2 3
>>>> 1 7 10
>>>>
>>>
>>> Why that and not:
>>>
>>> 1 7 13
>>> 2 4 2
>>> 3 2 3
>>>
>>> ?
>>
>> Here's a solution for that case:
>>
>> index <- x[, 2] + x[, 1] * max(x[, 2])
>> cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum))
>>
>> It takes about half a second for a million row matrix.
>>
>> Hadley
>>
>>
>>
>> --
>> http://had.co.nz/
>
> _________________________________________________________________
> With Windows Live for mobile, your contacts travel with you.
>
> 072008
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list