[R-sig-eco] null model selection in functional diversity analysis

Thu Oct 23 11:43:51 CEST 2014

Dear r-sig-ecology listers,

I am involved in a study whose objective is the see if there relationships
between the functional diversity of the fish community and environmental
factors of the sample site for a number of sites in a bounded environment.
Specifically, we are looking at the parameters of functional richness
("FRic") and functional diversity ("RaoQ") as calculated by the R package
"FD". We observe certain trends in these indices as related to the
environmental factors in question, and would now like to determine if the
effect of their deviation from a null model is significant or not.

Given the fact that FRic and RaoQ are often correlated with the richness
and diversity, respectively, of the community, many null models are
designed to remove these effects by permutating the abundance matrix while
maintaining the functional trait matrix constant.

Null model descriptions:

1. Own null model - Our first attempt has been to permutate the site
association of individual fishes to site (see function "ownNull" below). The
result of the null model is that abundance values for sites (rows) and
species (columns) are maintained. The original idea was that we wanted to
maintain the overall site capacity to sustain a certain number of
individuals. Also, since certain species are linked with higher abundances,
these total abundances should be maintained. The rationale was that in the
absence of site-specific environmental controls on species identity,
individuals would roam at random over the entire study region and end up
randomly at the individual sites, each of which has a certain carrying
capacity determining the number of individuals per site, but not their
identity.
Summary of constrained patterns:
    * site abundance (yes)
    * site diversity (no)
    * site richness (no)
    * global spp abundance (yes)
    * global spp frequency distribution (no)

2. Independent swap (Gotelli, 2000) - This model also permutates species
association to site, but randomizes matrix values (i.e. packets of observed
individuals), with the additional constraint that accepted permutations
must maintain site richness (i.e. total number of spp). This model seems in
line with some of our original assumptions, but differs in that it focuses
more on site richness and maintains global species frequency distributions,
i.e. it takes the abundance values of species observed at each site and
randomly assigns them to new sites. My coauthors worried about this last
point - since it locks in large abundance values which might have been very
much a product of site, and therefore abundance at site is not constrained.
Summary of constrained patterns:
    * site abundance (no)
    * site diversity (no)
    * site richness (yes)
    * global spp abundance (yes)
    * global spp frequency distribution (yes)

We would ultimately like to quantify the effect of these environmental
factors on our observed FRic and RaoQ indices, and thus were quantifying
the standardized effect size (SES) based on the null model distribution. Since
only the independent swap method maintains richness, this would seem to be
appropriate for determining FRic. For RaoQ, neither option maintains
diversity, so are either appropriate? Are more than one null model needed?

So, we have two main questions:
1. What would be an appropriate null model given our objectives - One of
these, or another suggestion?
2. Another aspect discussed on this list before (
https://stat.ethz.ch/pipermail/r-sig-ecology/2010-January/001003.html),
concerns that of the uniqueness of the permutated null models (especially
applicable given the constraints of the independent swap algorithm) as well
as the overall variability in null models. Calculating the number of unique
permutated matrices is easy enough, but how would one assess whether null
model variability is of a similar magnitude to that of the original data?

Many thanks in advance for any help on these issues. An example script
illustrating these issues can be found below.

Cheers,
Marc

-- 
Dr. Marc Taylor
Leibniz Center for Marine Tropical Ecology

########################################################
###################### R script ########################
########################################################
### Required packages
library(FD)
library(picante)

# Required function
nullOwn <- function(mat){
  Counts <- data.frame(row=rep(NaN, sum(mat)), col=NaN)
  dim(Counts)
  head(Counts)
  lu <- expand.grid(row=seq(nrow(mat)), col=seq(ncol(mat)))
  lu$n <- c(mat)
  pos.cs <- cumsum(c(mat))
  for(i in seq(nrow(lu))){
    if(lu$n[i] > 0){
      if(i == 1) {
        Seq <- 1:pos.cs[i]
      } else {
        Seq <- (pos.cs[i-1]+1):pos.cs[i]
      }
      Counts[Seq, ] <- lu[i,1:2]
    }
  }

  Counts.i <- Counts
  Counts.i[,1] <- Counts[sample.int(nrow(Counts)),1] # randomize row
(site) position
  Counts.i$p <- 1
  tmp <- aggregate(p ~ row + col, data=Counts.i, FUN=sum)
  mat.i <- 0*mat
  for(j in seq(nrow(tmp))){
    mat.i[tmp$row[j], tmp$col[j]] <- tmp$p[j]
  }
  mat.i
}

############################
# sythetic data
set.seed(1)
ex1 <- simul.dbFD(s = c(10, 20, 30, 40, 50), r = 5, tr.method="lnorm")
dev.off()
traits <- ex1$traits
abund <- ex1$abun
spp_weight <- rep(1, ncol(abund))
spp_weight[c(2, 5, 20, 25)] <- c(5,10,20,40) # make some species more abundant
abund <- round(t(t(abund) * spp_weight))

# plot abundance
pal <- colorRampPalette(c("blue4", "cyan", "yellow","red1"), bias=3)
image(seq(ncol(abund)), seq(nrow(abund)), t(abund), col=pal(100),
xlab="Species", ylab="Sample")

#########################################
### Analysis ############################
#########################################
### Original
resOrig <- dbFD(
  traits, abund, w.abun=TRUE,
  corr="lingoes", calc.FRic=TRUE, m="max",
  stand.FRic=FALSE, scale.RaoQ=FALSE, calc.FGR=FALSE,
  calc.CWM=FALSE, print.pco=FALSE
)
names(resOrig)

### Plot FD vs. other indices
Div <- diversity(abund) # diversity
Rich <- rowSums(abund > 0) # richness
Abu <- rowSums(abund) # abundance
plot(Div, xlab="Sites")
plot(Rich, xlab="Sites")
plot(Abu, xlab="Sites")

# FRic
plot(Div, resOrig$FRic) # FRic vs. diversity
plot(Rich, resOrig$FRic)  # FRic vs. richness

# RaoQ
plot(Div, resOrig$RaoQ) # RaoQ vs. diversity
plot(Rich, resOrig$RaoQ)  # RaoQ vs. richness

###################################
###  Null model comparisons #######
###################################

### Own null model
abund.null <- nullOwn(abund)

# community matrix
pal <- colorRampPalette(c("blue4", "cyan", "yellow","red1"), bias=3)
op <- par(mar=c(1,1,3,1), oma=c(2,2,0,0), mfrow=c(2,1))
image(t(abund), col=pal(100), axes=FALSE)
mtext("Original", side=3, line=0.5, font=2)
image(t(abund.null), col=pal(100), axes=FALSE)
mtext("Null model", side=3, line=0.5, font=2)
mtext("Species", outer=TRUE, side=1, line=0.5)
mtext("Samples", outer=TRUE, side=2, line=0.5)
par(op)

# plot site indices
plot(diversity(abund), diversity(abund.null)) # diversity
plot(apply(abund>0, 1, sum), apply(abund.null>0, 1, sum)) # richness
plot(apply(abund, 1, sum), apply(abund.null, 1, sum)) # abundance

# plot species indices
plot(apply(abund, 2, sum), apply(abund.null, 2, sum), log="xy") # abundance
op <- par(mfcol=c(1,2)) # frequency distribution (not equal)
hist(log(abund[,20]))
hist(log(abund.null[,20]))
par(op)

### independent swap model
abund.null <- randomizeMatrix(abund, null.model = "independentswap")

# community matrix
pal <- colorRampPalette(c("blue4", "cyan", "yellow","red1"), bias=3)
op <- par(mar=c(1,1,3,1), oma=c(2,2,0,0), mfrow=c(2,1))
image(t(abund), col=pal(100), axes=FALSE)
mtext("Original", side=3, line=0.5, font=2)
image(t(abund.null), col=pal(100), axes=FALSE)
mtext("Null model", side=3, line=0.5, font=2)
mtext("Species", outer=TRUE, side=1, line=0.5)
mtext("Samples", outer=TRUE, side=2, line=0.5)
par(op)

# plot site indices
plot(diversity(abund), diversity(abund.null)) # diversity
plot(apply(abund>0, 1, sum), apply(abund.null>0, 1, sum)) # richness
plot(apply(abund, 1, sum), apply(abund.null, 1, sum)) # abundance

# plot species indices
plot(apply(abund, 2, sum), apply(abund.null, 2, sum), log="xy") # abundance
op <- par(mfcol=c(1,2)) # frequency distribution (equal)
hist(log(abund[,20]))
hist(log(abund.null[,20]))
par(op)

	[[alternative HTML version deleted]]