[R] Retrieving the 2 row of "dist" computations

Jeff08 jefferyding at gmail.com
Fri Jun 11 08:52:59 CEST 2010


Edit:

I'm stupid and visualized the "dist" matrix incorrectly in my head.

Should be
Column # = x, Row # = y. n = 827-(x-2)
index = y-1+(n+827)(827-n+1)/2

Everything works just fine. Thanks!


Jeff08 wrote:
> 
> Edit:
> 
> There is something funky about the code. It definitely returns the right
> column of the "distance" data, but returns an incorrect row. 
> 
> Code:
> 
> NCols=250
> NRows=829 
> myMat<-matrix(runif(NCols*NRows), ncol=NCols) 
> 
> d<-dist(myMat)
> e<-sort.list(d)
> e<-e[1:5]  ##Retrieve minimum 5 distances
> 
> k <- 5
> res <- matrix(NA, ncol = 2, nrow = k)
> ds <- sort(d)
> for(i in 1:k) res[i, ] <- which(as.matrix(d) == ds[i], arr.ind = TRUE)[1,]
> colnames(res) <- c('row','col')
> rownames(res) <- 1:k
> res
> 
> I have derived the formula for 829 rows, to check if the returned column
> and row matches the index given by e.
> 
> Column # = x, Row # = y. n = 828-(x-2)
> index = y+(n+828)(828-n+1)/2
> 
> 
> Formula R CODE
> ##Just checking for row 1
> i<-1
> y<-res[i,1]
> x<-res[i,2]
> n<-(828-(x-2))
> index1<-(y+(n+828)*(828-n+1)/2)
> index2<-e[i]
> ##index1 should equal index2, but this is not the case
> ##you can tell that the column is right because index1 & index 2 is close
> ##(a change in row of 1 shifts the index by 1, but a change in column
> ## shifts index by ~400 on average)
> 
> You can then compare this index to the one given by e[i]
> 
> 
> 
> Jorge Ivan Velez wrote:
>> 
>> Hi there,
>> 
>> I am sure there is a better way to do it, but here is a suggestion:
>> 
>> res <- matrix(NA, ncol = 2, nrow = 5)
>> for(i in 1:5) res[i, ] <- which(as.matrix(d) == sort(d)[i], arr.ind =
>> TRUE)[1,]
>> res
>> 
>> HTH,
>> Jorge
>> 
>> 
>> On Wed, Jun 9, 2010 at 11:30 PM, Jeff08 <> wrote:
>> 
>>>
>>> Dear R Gurus,
>>>
>>> As you probably know, dist calculates the distance between every two
>>> rows
>>> of
>>> data. What I am interested in is the actual two rows that have the least
>>> distance between them, rather than the numerical value of the distance
>>> itself.
>>>
>>> For example, If the minimum distance in the following sample run is
>>> d[14],
>>> which is .3826119, and the rows are 4 & 6. I need to find a generic way
>>> to
>>> retrieve these rows, for a generic matrix of NRows (in this example
>>> NRows=7)
>>>
>>> NCols=5
>>> NRows=7
>>> myMat<-matrix(runif(NCols*NRows), ncol=NCols)
>>>
>>> d<-dist(myMat)
>>>
>>>          1         2         3         4         5         6
>>> 2 0.7202138
>>> 3 0.7866527 0.9052319
>>> 4 0.6105235 1.0754259 0.8897555
>>> 5 0.5032729 1.0789359 0.9756421 0.4167131
>>> 6 0.6007685 0.6949224 0.3826119 0.7590029 0.7994574
>>> 7 0.9751200 1.2218754 1.0547197 0.5681905 0.7795579 0.8291303
>>>
>>> e<-sort.list(d)
>>> e<-e[1:5]  ##Retrieve minimum 5 distances
>>>
>>> [1] 14 16  4 18  5
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Retrieving-the-2-row-of-dist-computations-tp2249844p2249844.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
> 



-- 
View this message in context: http://r.789695.n4.nabble.com/Retrieving-the-2-row-of-dist-computations-tp2249844p2251349.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list