[R] large matrix

Marc Schwartz MSchwartz at MedAnalytics.com
Thu Apr 21 19:15:01 CEST 2005


On Thu, 2005-04-21 at 17:03 +0100, Jorge Manuel de Almeida MagalhÃ£es
wrote:
> Dear R-users
> 
> I need to convert a matrix with three columns in a new  array  with 
> multiple columns.
> 
> For example,
> 
> oldmatrix
> 
> 1 	4	5
> 1	54	52
> 1	9	43
> 2	32	5
> 2	54	6
> 2	76	6
> 3	54	54
> 3	543	7
> 3       54     6
> 
> newmatrix
> 
> 5  	5	54
> 52	6	7
> 43	6	6
> 
> 
> if the first column have a new value then add a column to the new 
> matrix and the new[i,j] <- old[,3][i]
> 
> I write this code, but my initial matrix is very large and the 
> convertion is very slow. How I can optimise that code?

<snip>

With the presumption that your variable in the first column has an equal
number of rows for each unique value, the easiest thing to do might be:

> mat
      [,1] [,2] [,3]
 [1,]    1    4    5
 [2,]    1   54   52
 [3,]    1    9   43
 [4,]    2   32    5
 [5,]    2   54    6
 [6,]    2   76    6
 [7,]    3   54   54
 [8,]    3  543    7
 [9,]    3   54    6


> do.call("cbind", split(mat[, 3], mat[, 1]))
      1 2  3
[1,]  5 5 54
[2,] 52 6  7
[3,] 43 6  6


What I have done here is to split() the matrix into lists containing the
third column, broken down by the value in the first column:

> split(mat[, 3], mat[, 1])
$"1"
[1]  5 52 43

$"2"
[1] 5 6 6

$"3"
[1] 54  7  6

The result of then using do.call(), is to use cbind() against each list
element that is the result of the split() operation, resulting in the
output matrix. Note that the columns in the result are named for each
list element in the result of split().

This should perhaps be about the fastest approach I would think.

BTW, you gotta love a language that can do this in one line...  :-)

See ?split and ?do.call for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list