[R-sig-eco] Manual Rarefaction and CI's

Wed May 28 19:05:06 CEST 2014

I have a question related to rarefaction of our samples.  Unlike all of 
the examples of which I'm aware, I'm not working with 'sites', per se.  
Instead, I'm working with the parasites of snails.  Snails are infected 
with 1 parasite species, and each pond/stream can hold several parasite 
species.  The difficulty comes when comparing parasite richness between 
sites (Pond A vs Pond B) and sampling effort (# snails collected) 
differs.  Rarefaction provides a means to standardize effort on the 
lowest sample size so that we can test if parasite richness differs 
between sites.

I'm familiar with the vegan package and how it performs rarefaction.  
But, I can't perform this type of analysis because (1.) 'uninfected' 
snails would be considered empty patches, and (2.) max richness will be 
1 (only one parasite species can (usually) infect a snail).  To counter 
this problem, I've tried to rarefy my samples using randomization 
methods.  The script is below.  In summary, I sampled (with replacement) 
my dataset and calculated the number of parasite species from each 
sample.  I repeated this 100 times and calculated the mean richness for 
a given sample size.  I did this for sample sizes 1-50 (smallest sample 
size is 50 individuals).

I have a few questions related this this analysis.
1.  Does this appear correct?

2.  How can I generate CI using this method?  I normally calculate the 
2.5% and 97.5% quantiles when performing randomization, but the nature 
of our data does not lend itself well to quantile calculations.  Could I 
assume that my bootstrapped means follow a normal distribution?  If not, 
then what would be the best method for generating confidence intervals?

3.  This last one is the primary reason for this post.  When performing 
this analysis on my real datasets, I've noticed a peculiar thing.  In 
one sample, my variance of the bootstrapped samples converges to zero as 
sample size approaches the true sample size (this is for my sample of 
the smallest sample size). Relatedly, I've noticed some (but not all) of 
my CI narrowing as my sample size approaches the actual size.  This 
suggests that I'm not doing something correctly, but I'm not sure how to 
correct it (or if I even need to correct it).  The alternative for me is 
to scrap the rarefaction and perform an alternative analysis, such as 
randomizing the sites and comparing the actual difference in richness 
between sites to a null distribution of randomized differences.  Thoughts?

NN

#Script below
#Two sites: Site A = 100 snails; Site B = 50 snails
#0 = uninfected; 1-4 = species 1-4

A <- c(rep(0,56),rep(1,25),rep(2,15),rep(3,3),rep(4,1))
B <- c(rep(0,29),rep(1,12),rep(2,7),rep(3,2))

#Function to generate a random sample with replacement and calculate 
species richness
bsS <- function(x,n=100){
   rs <- sample(x,size=n,replace=T)
   S <- sum(table(rs)[-1]>0)
   return(S)
}

#Function to rarefy my dataset; utilized bsS function
#For sample sizes of 1:n, generate 'reps' number of bootstrapped 
richness values (bsS)
#and save the mean richness for each sample.
rrfy <- function(x,n=10,reps=100){
   smax <- sum(table(x)[-1]>0)
   s <- c()
   std <- c()
   low <- c()
   high <- c()
   for (i in 1:n){
     rs <- c()
     for(j in 1:reps){
       rs[j] <- bsS(x,i)
     }
     s[i] <- mean(rs)
     std[i] <- sd(rs)
     low[i] <- mean(rs)-1.96*(sd(rs)/sqrt(length(rs))) #I don't think 
this is correct
     high[i] <- mean(rs)+1.96*(sd(rs)/sqrt(length(rs))) #I don't think 
this is correct
   }
   plot(s~c(1:n),type='l',ylim=c(0,smax+1))
   lines(low~c(1:n),lty=2,col='red')
   lines(high~c(1:n),lty=2,col='red')
   return(data.frame(x=c(1:n),s,stdev=std))
}
rA <- rrfy(A,n=50)
rB <- rrfy(B,n=50)