[R] replace values in array with means of certain values in array

Petr Savicky savicky at praha1.ff.cuni.cz
Mon Jan 10 18:07:15 CET 2011


On Mon, Jan 10, 2011 at 03:21:18PM +0100, joke R wrote:
> Dear all,
> 
> I have a problem with arrays.
> Simplified, I have two arrays:
> 
> A =
> [,,1]
> 1   2   3
> 4   5   6
> 7   8   9
> 
> [,,2]
> 10  11  12
> 13  14  15
> 16  17  18
> 
> B=
> 1   1   2
> 1   1   2
> 3   3   2
> 
> Basically, B declares to which cluster the values of A belong to.  Now I
> want an array C where for each cluster of B, the mean of A is shown (the
> mean on the first 2 dimensions):
> 
> C=
> [,,1]
> 3    3    6
> 3    3    6
> 7.5  7.5  6
> [,,2]
> 12    12    15
> 12    12    15
> 16.5  16.5  15
> 
> The following short solution works:
> 
> A <-
> aperm(array(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),dim=c(3,3,2)),c(2,1,3))
> B <- matrix(c(1,1,2,1,1,2,3,3,2),nrow=3,byrow=TRUE)
> C <- array(NA,dim=c(3,3,2))
> 
> for(a in 1:3){
> TS <- matrix(A[B==a],nrow=2,byrow=TRUE)
> C[B==a] <- rep(rowMeans(TS),each=sum(B==a))
> }
> 
> But unfortunately, my dataset is a 64x64x64x120 array, and I have
> approximately 29000 clusters, and I'd have to repeat the process on 500
> simulated datasets.
> If I try the code above, it takes about 2 hours...
> 
> Strange enough, it is the first line in the loop that takes so long:
> reconstructing a dataset with only the datapoints belonging to a certain
> cluster in it...

Let me suggest to consider the following approach, which avoids the
first line of the above code, computes the means in a different way and
puts them back using the second line of the above code.

First, transform A into a matrix, where the first two dimensions 
represent the rows and the remaining dimensions represent the columns.
This does not reorder the elements of the array, only changes the
dimension information.

  A1 <- matrix(A, nrow=3*3, ncol=2)

Compute the means

  B1 <- c(B)
  C1 <- rowsum(A1, group=B1)/c(table(B1))

Put the means to the required positions

  for(a in 1:3){
    C[B==a] <- rep(C1[a, ], each=sum(B==a))
  }

I hope this can help.

Petr Savicky.



More information about the R-help mailing list