[R-sig-eco] Calculate similarity matrix for category data in R

Jari Oksanen jari.oksanen at oulu.fi
Fri Feb 24 10:57:32 CET 2012


On 24/02/2012, at 11:36 AM, Yong Zhang wrote:

> Dear list,
> 
> Actually, I know how to creat the distance(similarity) matrix for quantitive data, and the "package" for doing this is summaried well by Gavin Simpson. As you can find here: http://cran.r-project.org/web/views/Environmetrics.html
> 
> Now, what I encount maybe another situation, I have a data set with 20 species, and their taxonomic units (Order, Family and Genus).  To describe my question more clearly, the data set is listed as below:
> 
>                 G                 F              O
> spe1         G1              F1           O1
> spe2         G1              F1           O1
> spe3         G2              F2           O1
> spe4         G3              F3           O2
> spe5
> .
> .
> .
> .
> spe20      G15             F10           O5
> 
> Note: spe1 to spe20 means the 20 species, G, F, and O are the abbreviations for Genus, Family and Order (the 3 different taxonomic units).  From above, you can find that these 20 species belong to 5 Order, 10 Family and 15 Genus (i.e., species1 and species2 are classified into the same taxonomic units, however, species2 and species3 are different at the Genus level). 
> 
> What I'd like to do is creating the similarity (distance) matrix based on the above data set.  I'm wondering if this is possible to deal with this in R, and which package is my correct choice?
> 

Yong,

You can do this with function taxa2dist() in vegan.

If your taxonomic levels are factors (instead of characters), you can also use daisy() in the cluster package. 

vegan::taxa2dist with default settings and cluster::daisy will give linearly related results. However, vegan::taxa2dist results are scaled to maximum=100 and cluster::daisy to maximum=1. Moreover, cluster::daisy will regard all species in the same genus as identical (dissimilarity = 0) unless you also have a factor for species. In contrast, vegan::taxa2dist will not give zero dissimilarities, but all rows will be regarded as different (species).

Cheers, Jari Oksanen
-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland



More information about the R-sig-ecology mailing list