[R] exploring dist()

Gavin Simpson gavin.simpson at ucl.ac.uk
Sun Mar 20 20:43:47 CET 2011


On Fri, 2011-03-18 at 06:21 -0700, bra86 wrote:
> Hello, everybody, 
> 
> I hope somebody could help me with a dist() function.
> I have a data frame of size 2*4087 (col*row), where col corresponds to the
> treatment and rows are

So you have 4087 species? If yes, normally, you'd have the species in
the columns and the samples/treatments in the row.

> species, values are Hellinger distances, I should reconstruct a distance
> matrix

This doesn't make sense - distances would mean you have a square
symmetric matrix but 2 * 4087 isn't square. Do you mean you have
Hellinger **transformed** the data such that when you take the Euclidean
distances of this transformed data you get the Hellinger distance rather
than the Euclidean distance?

If yes - and you sort the rows/columns issue - R wants the samples in
rows - then it is reasonably simple.

Here is a much simplified example with 5 species and 4 samples:

dat <- data.frame(runif(4, 1, 10), runif(4, 2, 10), runif(4, 4, 20),
                  runif(4, 1, 4), runif(4, 0, 5))
names(dat) <- paste("spp", LETTERS[1:5])
rownames(dat) <- paste("samp", 1:4)

So we have data that looks like this:

> dat
          spp A    spp B     spp C    spp D     spp E
samp 1 6.974237 7.933403  5.460453 3.975219 4.6818142
samp 2 1.049801 6.751013 14.143798 1.777532 4.0261914
samp 3 5.742314 2.243850 15.613524 3.476935 0.4144043
samp 4 5.985012 9.576440  8.722579 3.411262 1.8126338

Then I apply a Hellinger transformation:

require(vegan)
datH <- decostand(dat, method = "hellinger")

So at this point we have something that I think you are telling us you
have:

> datH
           spp A     spp B     spp C     spp D     spp E
samp 1 0.4901864 0.5228086 0.4337378 0.3700782 0.4016244
samp 2 0.1945069 0.4932488 0.7139447 0.2530989 0.3809156
samp 3 0.4570334 0.2856942 0.7536245 0.3556336 0.1227769
samp 4 0.4503635 0.5696823 0.5436922 0.3400073 0.2478481

We can use dist() on this data frame via:

dij <- dist(datH)

If we look at the object created, we see the **printed** representation
of the dissimilarity matrix, which is a 4*4 matrix in this example:

> dij
          samp 1    samp 2    samp 3
samp 2 0.4253576                    
samp 3 0.4874570 0.4367179          
samp 4 0.2010581 0.3543312 0.3750363

Note that the diagonal and the upper triangle of the matrix are not
printed, or stored even, because they are trivial (0 for all diagonals
and the upper triangle is the same as the lower triangle).

dist() actually creates a vector of numbers that will fill the lower
triangle of the dissimilarity matrix. This saves on storage space. If
you want the add the diagonal and upper triangle, we can get it one of
two ways:

1) dist(datH, diag = TRUE, upper = TRUE)

2) as.matrix(dij)

However only the second actually returns a matrix with 16 numbers, the
former still only computes the 6 pair-wise distances, but when
**printed** it shows the full matrix.

If you really have species in rows and smaples in columns, then you can
transpose your matrix, e.g.

datH.t <- t(dat.H)

and then compute the dissimilarity matrix as above.

Does this help?

G

> with a dist() function. I know that "euclidean" method should be used.
> 
> When I type:
> dist(dframe,"euclidean")
> it gives me a truncated table, where values are missing.
> 
> I suppose that I have to define something for the values,
> but I have no idea what exactly, because I am not familiar with r at all.
> 
> I would be very appreciated for every kind of suggestions or tips.
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/exploring-dist-tp3387187p3387187.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list