[R-sig-eco] Testing difference between diversity indices with vegan::oecosimu

Thu Apr 26 17:47:20 CEST 2012

On 26 Apr 2012, at 0:19, Kay Cichini wrote:

> Hello all,
> 
> I'd like to test if total diversity differs between two communities. For
> each community several samples were taken and abundances collapsed over
> groups to compute total diversity for each group. I tried to use
> vegan::oecosimu to test non-randomness of my statisitc (difference in
> Simpson-Diversity indices of collapsed abundances) - however, I am not
> quite sure if I oversee posssible pitfalls:
> 
> library(vegan)
> data(dune)
> 
> # a grouping variable:
> gr <- gl(2, nrow(dune)/2)
> 
> divdiff <- function(x) abs(diversity(colSums(x[gr == "1", ]), "simp") -
>                           diversity(colSums(x[gr == "2", ]), "simp"))
> # testing function:
> divdiff(dune)
> 
> oecosimu(dune, divdiff, "r2dtable", nsimul = 1999)
> # oecosimu with 1999 simulations
> # simulation method r2dtable
> # alternative hypothesis: true mean is not equal to the statistic
> #           statistic        z     2.5%      50% 97.5% Pr(sim.)
> # statistic   0.00275 -0.20996  0.00013  0.00280  0.01     0.98
> 
Kay,

I think that Gav's suggestion is the most natural one: permute your classification vector and compare your observed difference to the permutation values. Null models can be problematic, and you must very carefully think what kind of null model you need and what is the null hypothesis under each null model. Quantitative null models are even trickier. I see the following possible problems with your idea:

- You used "r2dtable" null model which fixes both row and column totals (but not frequencies). This means that for all simulations the overall gamma diversity is fixed: Simpson index is found from species totals, and these are fixed. When you also fix row totals, the generated null models can be too similar to each other, and this in turn gives too low P-values. I think that when analysing overall diversities from marginal sums, you should use a null model that allows those marginal sums to vary. This may not be possible with the release version of vegan, but the development version in R-Forge has a completely redesigned null model engine with several new quantitative null models and allows plugging in your own null models (which could even include permutation models). 

- If usual null models can be painful, the quantitative null models give you double trouble. One problem is that they produce too evenly distributed data. For "r2dtable" this holds in two ways: the method fixes marginal totals, but not marginal frequencies (= number of non-zero cells). Typically the number of zeros is much lower than in real data, and the variance of rows and columns is lower than in any real data. Moreover, the simulated samples are often much more similar to each other than real re-sampling of Nature. This is like using Poisson glm for abundance data: the data are regularly over-dispersed to Poisson, and therefore the P-values are too low. You have just the same danger with these null models: the simulation variation is too low, and therefore your P-values are too low.

- The "r2dtable" method requires that your data are individuals: they are individuals that are swapped between cells. You used Dutch Dune meadow data in your example. Technically this works, since the data are integers, but they are cover class values and not individual, and therefore the swapping of integer pieces of cover classes has no meaning. If you want to consider null models, you should again switch to R-Forge version of vegan (currently there at version 2.1-15) which allows some models that apply to data that is not made of individuals, and also some methods that can retain the original marginal variances of the data.

There are many things that you need to consider if you want to use null models. However, I think that permutation of classification vector saves a lot of trouble, and is more easily understood and communicated.

Cheers, Jari Oksanen