[R-sig-eco] Manual Rarefaction and CI's
Nicholas J. Negovetich
nj.negovetich at gmail.com
Wed May 28 19:05:06 CEST 2014
I have a question related to rarefaction of our samples. Unlike all of
the examples of which I'm aware, I'm not working with 'sites', per se.
Instead, I'm working with the parasites of snails. Snails are infected
with 1 parasite species, and each pond/stream can hold several parasite
species. The difficulty comes when comparing parasite richness between
sites (Pond A vs Pond B) and sampling effort (# snails collected)
differs. Rarefaction provides a means to standardize effort on the
lowest sample size so that we can test if parasite richness differs
between sites.
I'm familiar with the vegan package and how it performs rarefaction.
But, I can't perform this type of analysis because (1.) 'uninfected'
snails would be considered empty patches, and (2.) max richness will be
1 (only one parasite species can (usually) infect a snail). To counter
this problem, I've tried to rarefy my samples using randomization
methods. The script is below. In summary, I sampled (with replacement)
my dataset and calculated the number of parasite species from each
sample. I repeated this 100 times and calculated the mean richness for
a given sample size. I did this for sample sizes 1-50 (smallest sample
size is 50 individuals).
I have a few questions related this this analysis.
1. Does this appear correct?
2. How can I generate CI using this method? I normally calculate the
2.5% and 97.5% quantiles when performing randomization, but the nature
of our data does not lend itself well to quantile calculations. Could I
assume that my bootstrapped means follow a normal distribution? If not,
then what would be the best method for generating confidence intervals?
3. This last one is the primary reason for this post. When performing
this analysis on my real datasets, I've noticed a peculiar thing. In
one sample, my variance of the bootstrapped samples converges to zero as
sample size approaches the true sample size (this is for my sample of
the smallest sample size). Relatedly, I've noticed some (but not all) of
my CI narrowing as my sample size approaches the actual size. This
suggests that I'm not doing something correctly, but I'm not sure how to
correct it (or if I even need to correct it). The alternative for me is
to scrap the rarefaction and perform an alternative analysis, such as
randomizing the sites and comparing the actual difference in richness
between sites to a null distribution of randomized differences. Thoughts?
NN
#Script below
#Two sites: Site A = 100 snails; Site B = 50 snails
#0 = uninfected; 1-4 = species 1-4
A <- c(rep(0,56),rep(1,25),rep(2,15),rep(3,3),rep(4,1))
B <- c(rep(0,29),rep(1,12),rep(2,7),rep(3,2))
#Function to generate a random sample with replacement and calculate
species richness
bsS <- function(x,n=100){
rs <- sample(x,size=n,replace=T)
S <- sum(table(rs)[-1]>0)
return(S)
}
#Function to rarefy my dataset; utilized bsS function
#For sample sizes of 1:n, generate 'reps' number of bootstrapped
richness values (bsS)
#and save the mean richness for each sample.
rrfy <- function(x,n=10,reps=100){
smax <- sum(table(x)[-1]>0)
s <- c()
std <- c()
low <- c()
high <- c()
for (i in 1:n){
rs <- c()
for(j in 1:reps){
rs[j] <- bsS(x,i)
}
s[i] <- mean(rs)
std[i] <- sd(rs)
low[i] <- mean(rs)-1.96*(sd(rs)/sqrt(length(rs))) #I don't think
this is correct
high[i] <- mean(rs)+1.96*(sd(rs)/sqrt(length(rs))) #I don't think
this is correct
}
plot(s~c(1:n),type='l',ylim=c(0,smax+1))
lines(low~c(1:n),lty=2,col='red')
lines(high~c(1:n),lty=2,col='red')
return(data.frame(x=c(1:n),s,stdev=std))
}
rA <- rrfy(A,n=50)
rB <- rrfy(B,n=50)
More information about the R-sig-ecology
mailing list