[R] replacing elements of distance matrix

Nikhil Kaza nikhil.list at gmail.com
Mon Jul 19 20:27:33 CEST 2010


My mistake, instead of colnames(d1)

use substr(colnames(d1),1,1) or similar

On Jul 19, 2010, at 2:15 PM, Nikhil Kaza wrote:

> Michael,
>
> You can modify the following code to suit. Also avoid using dist as  
> a variable name since it is a function in base. However, are you  
> sure you want to do this? Sx is the variance using sites in all the  
> regions!
>
> d1 <- apply(x,1, function(i){mahalanobis(x,i,Sx)})
> is.na(d1) <- !sapply(id1, grepl, colnames(d1), fixed=T)
>
> If on the other hand you want to use only variance within a region  
> modify like this ( i am sure more optimal code can be written)
>
> #not tested
> x.L <- split(x,id1)
> n.L <- split(rownames(x), id1)
> for (i in 1:length(x.L)){names(x.L[[i]]) <- n.L[[i]]}
> m2 <- function(i,j){mahalanobis(j, i, var(j))}
> m3 <- function(k){apply(as.matrix(k),1,m2,as.matrix(k))}
> d2 <- lapply(x.L, m3)
>
>
>
> Nikhil Kaza
> Asst. Professor,
> City and Regional Planning
> University of North Carolina
>
> nikhil.list at gmail.com
>
> On Jul 19, 2010, at 11:37 AM, Michael Ralph M. Abrigo wrote:
>
>> Thanks for the tip, Nikhil. However, i need only one matrix as input
>> for another to compute for non-bipartite matching which minimizes
>> pairwise distances between observations. As such, I need the
>> georeference (id) of the observations for subsequent processing.  
>> Below
>> is an illustration.
>>
>>
>>> #generate data
>>> x <- as.matrix(runif(5))
>>> Sx <- var(x)
>>>
>>> #generate id
>>> set.seed(1)
>>> id1 <- sample(1:2,5, replace=T)
>>> id2 <- c(1:5)
>>> rownames(x) <- paste(id1, id2)
>>>
>>> #generate distance
>>> dist <- as.matrix(
>> +   apply(x,1,function(i){
>> +     mahalanobis(x,i,Sx)
>> +    }
>> +  )
>> + )
>>>
>>> #print matrices
>>> x
>>         [,1]
>> 1 1 0.2059746
>> 1 2 0.1765568
>> 2 3 0.6870228
>> 2 4 0.3841037
>> 1 5 0.7698414
>>> dist
>>           1 1        1 2        2 3       2 4        1 5
>> 1 1 0.00000000 0.01165534 3.11660015 0.4273402 4.28210082
>> 1 2 0.01165534 0.00000000 3.50943798 0.5801450 4.74056406
>> 2 3 3.11660015 3.50943798 0.00000000 1.2358255 0.09237602
>> 2 4 0.42734018 0.58014499 1.23582554 0.0000000 2.00395492
>> 1 5 4.28210082 4.74056406 0.09237602 2.0039549 0.00000000
>>
>>
>> The geo-id is composed of two references, the first digit for the
>> region and the next for the observation itself. What I'm thinking of
>> is for pairwise distance between observations of different regions,
>> say site-11 and site-23 or site-24 to be replaced by a large number,
>> say 999999. I need the id for future processing, though.
>> Maybe I can stack the matrices generated using your tip to form a
>> block diagonal matrix, but then I do not have my ids? Im really  
>> sorry.
>> Im a bit lost.
>> Cheers,
>> Michael
>>
>> On Mon, Jul 19, 2010 at 10:10 PM, Nikhil Kaza  
>> <nikhil.list at gmail.com> wrote:
>>>
>>> replace dist with mahalanobis distance in the following example.
>>>
>>> a <- cbind(runif(10), sample(1:3, 10, replace=T))
>>> a.L <- split(a,a[,2])
>>> dist.L <- lapply(a.L, dist)
>>>
>>>
>>>
>>> Nikhil Kaza
>>> Asst. Professor,
>>> City and Regional Planning
>>> University of North Carolina
>>>
>>> nikhil.list at gmail.com
>>>
>>> On Jul 19, 2010, at 9:24 AM, Michael Ralph M. Abrigo wrote:
>>>
>>>> Hi! I am trying to implement non-bipartite matching. I have  
>>>> around 500 sites
>>>> which can be clustered by 10 regions. I am able to calculate  
>>>> pairwise
>>>> Mahalanobis distances between sites (thanks to another post in  
>>>> the forum).
>>>> However, I want to constrain my match to sites within the same  
>>>> region. Thus
>>>> I want to replace elements of the distance matrix with a high  
>>>> value, say
>>>> 999999, for sites not of the same region so that the pair will  
>>>> not be
>>>> matched.
>>>> In the original data file I have information on which sites  
>>>> belong to what
>>>> region. However, when I compute for pairwise Mahalanobis  
>>>> distances, I only
>>>> use a subset of the file, which, naturally, does not include the
>>>> georeference of the sites. How should I do this? Any hint will be  
>>>> most
>>>> appreciated.
>>>> Btw, I am relatively new in using R. I may export the matrix to  
>>>> another
>>>> program and replace the elements there, but that is a very very  
>>>> dirty and
>>>> rough trick that I would rather not do given better options.
>>>> Many thanks in advance.
>>>>
>>>> Cheers,
>>>> Michael
>>>>
>>>> --
>>>> "I am most anxious for liberties for our country... but I place  
>>>> as a prior
>>>> condition the education of the people so that our country may  
>>>> have an
>>>> individuality of its own and make itself worthy of liberties... "  
>>>> Jose
>>>> Rizal,1896
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> "I am most anxious for liberties for our country... but I place as a
>> prior condition the education of the people so that our country may
>> have an individuality of its own and make itself worthy of
>> liberties... " Jose Rizal,1896
>>
>>
>>
>> --
>> "I am most anxious for liberties for our country... but I place as a
>> prior condition the education of the people so that our country may
>> have an individuality of its own and make itself worthy of
>> liberties... " Jose Rizal,1896
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list