[R-sig-eco] Dissimilarity matrix from traits with partial overlap

Holland, Jeffrey D jdhollan at purdue.edu
Fri Oct 9 02:17:31 CEST 2015

Dear Ecologists,
     A student of mine is using vegan to create a Gower’s dissimilarity matrix between species of wood-boring beetles based upon ecological traits.  One trait we are using is the Family of host trees used as hosts by the larvae.  Most species can use trees from several Families, and so the number of trees used can be completely different, completely the same, or overlap (i.e., some the same and some different).  To accomplish this, we have created a binary variable for each tree Family and enter 1 if this is used by the species.  We also wish to have the “host trees” variables sum so that they have a total weight of 1 in calculating dissimilarity.  Laliberté & Legendre (2010) suggest in such cases of partial overlap in variable states to create dummy variables as we have and then weight each state-variable (e.g., one tree Family) by 1/n, where n is the number of states.  
     However, this should evaluate to 0 if no hosts are shared and 1 if there is perfect overlap in hosts.  There is an option in the gowdist command in vegan to specify symmetric and asymmetric variables, to remove or include double-zeros in the calculation of dissimilarity.  If we use an example in which two species each use 2 of 10 available host tree Families we get the following possible scenarios, with each tree Family variable given a weight of 1/10:

Symmetrical, similarity will range between 0.6 and 1, because even with no hosts shared, the double-zeros (6 minimum) will be counted as matches.  So, even species with no hosts in common will appear more similar than they are.  The total weight will be 1 as required.  The obvious solution seem to be to designate the host Families as asymmetrical variables.  

Asymmetrical, similarity will range between 0 and 1, because the matches and the host Families used by only one of the two species compared are properly not counted as matches, and the double zeros do not count as similar.  However, the summed weight will now vary between 0.2 and 0.4, and never total in such a way to approach the desired total weight of 1.

The result is that we are left choosing between two sub-optimal solutions: counting double zeros as similar hosts or greatly down-weighting the hosts used in similarity calculation (and doing this inconsistently depending on the ranges of hosts for each pair of species).  There are many species (~100 beetles) and many hosts (~50 tree Families).  Does anyone know of a solution to this?
Jeff Holland, Purdue University

REF: Laliberté, E. and P. Legendre. 2010. A distance-based framework for measuring functional diversity from multiple traits. Ecology 91: 299-305. Page 301, top of 2nd column.

Jeffrey D. Holland, PhD
Associate Professor, Landscape Ecology & Biodiversity
Dept. of Entomology, Purdue University

More information about the R-sig-ecology mailing list