[R] Substitute Values in a Matrix?

Wed Sep 10 05:51:27 CEST 2008

on 09/09/2008 06:12 PM Chris82 wrote:
> Hi,
> 
> I'm searching for a function to subistute Values in a Matrix to new Values.
> For example: 
> 
> old value  new value
> 1.1             6
> 1.2             7
> .                .
> .                .
> .                .
> 1.9             14
> 2.0             15
> 
> and
> 
> 2.1             15.5
> 2.2             16
> .                .
> .                .
> 2.9             19.5
> 3.0             20
> 
> There is a difference between the correlation old to new value
> 
> For the present my code is like this:
> 
> y <- matrix(c(1,2,1, 3,2,4, 1,1,1), ncol=3, byrow=TRUE)
> 
> for (i in 1:3) {
>  for (j in 1:3) {
>    if y[i,j] = 1:2 substitute y[i,j] = 6:15
> 
> 
> }
> }
> 
> But I don't find a correct syntax for this term " if y[i,j] = 1:2 substitute
> y[i,j] = 6:15" and I'm not sure if my idea is to simple and it's more
> complex than I think.
> 
> thanks.

It is made more complicated because you are using floating point values
in the table lookup process, rather than integer or character based
comparisons. See R FAQ 7.31 for why floating point comparisons are
problematic.

In this case, we need to set up a process by which we can include a
tolerance for the floating point comparisons for the Search vector in
the source matrix and then generate the matching indices into the
Replace vector.

In traditional floating point comparison applications, you would look to
use all.equal(), but it is not suitably 'vectorized' for this
application, so we set up a parallel process.

There are likely to be further optimizations applicable here, for
someone with fresher eyes, but here is a first pass function:

SR <- function(x, Search, Replace, tol = .Machine$double.eps ^ 0.5)
{
  Ind <- sapply(x, function(i) which(abs(i - Search) < tol))
  names(Ind) <- seq(length(Ind))
  Ind <- unlist(Ind)
  x[as.integer(names(Ind))] <- Replace[Ind]
  x
}

The arguments are the source matrix, the Search vector, the Replace
vector and the required tolerance for the comparison. In this case, I
took the default tolerance from all.equal().

Within the function body, we loop over each value in the source matrix
and subtract the Search vector. We then compare the absolute values of
the result vector to the tolerance value and get the index using which()
for any values within the tolerance. We thus presume that this is a
'match' between the matrix value and a Search vector value, within the
tolerance level.

They key is that if there is no value that matches, which() returns
integer(0), not NA, so there is additional code to deal with that
possibility. In this case, 'Ind' would be a list, rather than a vector,
but the additional code will work in either case, perhaps adding a bit
of overhead in lieu of an if/else construct.

So let's set up two test matrices. The first only has the values in the
Search vector, whereas the second has four additional values that do not
match.

Search <- seq(1.1, 3.0, 0.1)
Replace <- c(6:15, seq(15.5, 20, 0.5))

mat <- matrix(seq(1.1, 3.0, 0.1), ncol = 4)
mat2 <- matrix(c(seq(1.1, 3.0, 0.1), 0, 4, 5, 6), ncol = 4)

> Search
 [1] 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7
[18] 2.8 2.9 3.0

> Replace
 [1]  6.0  7.0  8.0  9.0 10.0 11.0 12.0 13.0 14.0 15.0 15.5 16.0 16.5
[14] 17.0 17.5 18.0 18.5 19.0 19.5 20.0

> mat
     [,1] [,2] [,3] [,4]
[1,]  1.1  1.6  2.1  2.6
[2,]  1.2  1.7  2.2  2.7
[3,]  1.3  1.8  2.3  2.8
[4,]  1.4  1.9  2.4  2.9
[5,]  1.5  2.0  2.5  3.0

> mat2
     [,1] [,2] [,3] [,4]
[1,]  1.1  1.7  2.3  2.9
[2,]  1.2  1.8  2.4  3.0
[3,]  1.3  1.9  2.5  0.0
[4,]  1.4  2.0  2.6  4.0
[5,]  1.5  2.1  2.7  5.0
[6,]  1.6  2.2  2.8  6.0

# Let's do 'mat'
> SR(mat, Search, Replace)
     [,1] [,2] [,3] [,4]
[1,]    6   11 15.5 18.0
[2,]    7   12 16.0 18.5
[3,]    8   13 16.5 19.0
[4,]    9   14 17.0 19.5
[5,]   10   15 17.5 20.0

# Now on 'mat2'
> SR(mat2, Search, Replace)
     [,1] [,2] [,3] [,4]
[1,]    6 12.0 16.5 19.5
[2,]    7 13.0 17.0 20.0
[3,]    8 14.0 17.5  0.0
[4,]    9 15.0 18.0  4.0
[5,]   10 15.5 18.5  5.0
[6,]   11 16.0 19.0  6.0

My solution is likely to become rather inefficient on 'large' matrices,
for some suitable definition of 'large'.

HTH,

Marc Schwartz