[R] exploring dist()
Gavin Simpson
gavin.simpson at ucl.ac.uk
Sun Mar 20 20:43:47 CET 2011
On Fri, 2011-03-18 at 06:21 -0700, bra86 wrote:
> Hello, everybody,
>
> I hope somebody could help me with a dist() function.
> I have a data frame of size 2*4087 (col*row), where col corresponds to the
> treatment and rows are
So you have 4087 species? If yes, normally, you'd have the species in
the columns and the samples/treatments in the row.
> species, values are Hellinger distances, I should reconstruct a distance
> matrix
This doesn't make sense - distances would mean you have a square
symmetric matrix but 2 * 4087 isn't square. Do you mean you have
Hellinger **transformed** the data such that when you take the Euclidean
distances of this transformed data you get the Hellinger distance rather
than the Euclidean distance?
If yes - and you sort the rows/columns issue - R wants the samples in
rows - then it is reasonably simple.
Here is a much simplified example with 5 species and 4 samples:
dat <- data.frame(runif(4, 1, 10), runif(4, 2, 10), runif(4, 4, 20),
runif(4, 1, 4), runif(4, 0, 5))
names(dat) <- paste("spp", LETTERS[1:5])
rownames(dat) <- paste("samp", 1:4)
So we have data that looks like this:
> dat
spp A spp B spp C spp D spp E
samp 1 6.974237 7.933403 5.460453 3.975219 4.6818142
samp 2 1.049801 6.751013 14.143798 1.777532 4.0261914
samp 3 5.742314 2.243850 15.613524 3.476935 0.4144043
samp 4 5.985012 9.576440 8.722579 3.411262 1.8126338
Then I apply a Hellinger transformation:
require(vegan)
datH <- decostand(dat, method = "hellinger")
So at this point we have something that I think you are telling us you
have:
> datH
spp A spp B spp C spp D spp E
samp 1 0.4901864 0.5228086 0.4337378 0.3700782 0.4016244
samp 2 0.1945069 0.4932488 0.7139447 0.2530989 0.3809156
samp 3 0.4570334 0.2856942 0.7536245 0.3556336 0.1227769
samp 4 0.4503635 0.5696823 0.5436922 0.3400073 0.2478481
We can use dist() on this data frame via:
dij <- dist(datH)
If we look at the object created, we see the **printed** representation
of the dissimilarity matrix, which is a 4*4 matrix in this example:
> dij
samp 1 samp 2 samp 3
samp 2 0.4253576
samp 3 0.4874570 0.4367179
samp 4 0.2010581 0.3543312 0.3750363
Note that the diagonal and the upper triangle of the matrix are not
printed, or stored even, because they are trivial (0 for all diagonals
and the upper triangle is the same as the lower triangle).
dist() actually creates a vector of numbers that will fill the lower
triangle of the dissimilarity matrix. This saves on storage space. If
you want the add the diagonal and upper triangle, we can get it one of
two ways:
1) dist(datH, diag = TRUE, upper = TRUE)
2) as.matrix(dij)
However only the second actually returns a matrix with 16 numbers, the
former still only computes the 6 pair-wise distances, but when
**printed** it shows the full matrix.
If you really have species in rows and smaples in columns, then you can
transpose your matrix, e.g.
datH.t <- t(dat.H)
and then compute the dissimilarity matrix as above.
Does this help?
G
> with a dist() function. I know that "euclidean" method should be used.
>
> When I type:
> dist(dframe,"euclidean")
> it gives me a truncated table, where values are missing.
>
> I suppose that I have to define something for the values,
> but I have no idea what exactly, because I am not familiar with r at all.
>
> I would be very appreciated for every kind of suggestions or tips.
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/exploring-dist-tp3387187p3387187.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list