[R] replacing elements of distance matrix

Nikhil Kaza nikhil.list at gmail.com
Mon Jul 19 20:15:38 CEST 2010


Michael,

You can modify the following code to suit. Also avoid using dist as a  
variable name since it is a function in base. However, are you sure  
you want to do this? Sx is the variance using sites in all the regions!

d1 <- apply(x,1, function(i){mahalanobis(x,i,Sx)})
is.na(d1) <- !sapply(id1, grepl, colnames(d1), fixed=T)

If on the other hand you want to use only variance within a region  
modify like this ( i am sure more optimal code can be written)

#not tested
x.L <- split(x,id1)
n.L <- split(rownames(x), id1)
for (i in 1:length(x.L)){names(x.L[[i]]) <- n.L[[i]]}
m2 <- function(i,j){mahalanobis(j, i, var(j))}
m3 <- function(k){apply(as.matrix(k),1,m2,as.matrix(k))}
d2 <- lapply(x.L, m3)



Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of North Carolina

nikhil.list at gmail.com

On Jul 19, 2010, at 11:37 AM, Michael Ralph M. Abrigo wrote:

> Thanks for the tip, Nikhil. However, i need only one matrix as input
> for another to compute for non-bipartite matching which minimizes
> pairwise distances between observations. As such, I need the
> georeference (id) of the observations for subsequent processing. Below
> is an illustration.
>
>
>> #generate data
>> x <- as.matrix(runif(5))
>> Sx <- var(x)
>>
>> #generate id
>> set.seed(1)
>> id1 <- sample(1:2,5, replace=T)
>> id2 <- c(1:5)
>> rownames(x) <- paste(id1, id2)
>>
>> #generate distance
>> dist <- as.matrix(
> +   apply(x,1,function(i){
> +     mahalanobis(x,i,Sx)
> +    }
> +  )
> + )
>>
>> #print matrices
>> x
>          [,1]
> 1 1 0.2059746
> 1 2 0.1765568
> 2 3 0.6870228
> 2 4 0.3841037
> 1 5 0.7698414
>> dist
>            1 1        1 2        2 3       2 4        1 5
> 1 1 0.00000000 0.01165534 3.11660015 0.4273402 4.28210082
> 1 2 0.01165534 0.00000000 3.50943798 0.5801450 4.74056406
> 2 3 3.11660015 3.50943798 0.00000000 1.2358255 0.09237602
> 2 4 0.42734018 0.58014499 1.23582554 0.0000000 2.00395492
> 1 5 4.28210082 4.74056406 0.09237602 2.0039549 0.00000000
>
>
> The geo-id is composed of two references, the first digit for the
> region and the next for the observation itself. What I'm thinking of
> is for pairwise distance between observations of different regions,
> say site-11 and site-23 or site-24 to be replaced by a large number,
> say 999999. I need the id for future processing, though.
> Maybe I can stack the matrices generated using your tip to form a
> block diagonal matrix, but then I do not have my ids? Im really sorry.
> Im a bit lost.
> Cheers,
> Michael
>
> On Mon, Jul 19, 2010 at 10:10 PM, Nikhil Kaza  
> <nikhil.list at gmail.com> wrote:
>>
>> replace dist with mahalanobis distance in the following example.
>>
>> a <- cbind(runif(10), sample(1:3, 10, replace=T))
>> a.L <- split(a,a[,2])
>> dist.L <- lapply(a.L, dist)
>>
>>
>>
>> Nikhil Kaza
>> Asst. Professor,
>> City and Regional Planning
>> University of North Carolina
>>
>> nikhil.list at gmail.com
>>
>> On Jul 19, 2010, at 9:24 AM, Michael Ralph M. Abrigo wrote:
>>
>>> Hi! I am trying to implement non-bipartite matching. I have around  
>>> 500 sites
>>> which can be clustered by 10 regions. I am able to calculate  
>>> pairwise
>>> Mahalanobis distances between sites (thanks to another post in the  
>>> forum).
>>> However, I want to constrain my match to sites within the same  
>>> region. Thus
>>> I want to replace elements of the distance matrix with a high  
>>> value, say
>>> 999999, for sites not of the same region so that the pair will not  
>>> be
>>> matched.
>>> In the original data file I have information on which sites belong  
>>> to what
>>> region. However, when I compute for pairwise Mahalanobis  
>>> distances, I only
>>> use a subset of the file, which, naturally, does not include the
>>> georeference of the sites. How should I do this? Any hint will be  
>>> most
>>> appreciated.
>>> Btw, I am relatively new in using R. I may export the matrix to  
>>> another
>>> program and replace the elements there, but that is a very very  
>>> dirty and
>>> rough trick that I would rather not do given better options.
>>> Many thanks in advance.
>>>
>>> Cheers,
>>> Michael
>>>
>>> --
>>> "I am most anxious for liberties for our country... but I place as  
>>> a prior
>>> condition the education of the people so that our country may have  
>>> an
>>> individuality of its own and make itself worthy of liberties... "  
>>> Jose
>>> Rizal,1896
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> "I am most anxious for liberties for our country... but I place as a
> prior condition the education of the people so that our country may
> have an individuality of its own and make itself worthy of
> liberties... " Jose Rizal,1896
>
>
>
> --
> "I am most anxious for liberties for our country... but I place as a
> prior condition the education of the people so that our country may
> have an individuality of its own and make itself worthy of
> liberties... " Jose Rizal,1896
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list