[R-sig-eco] How to deal with rare species post-rarefaction - subsampling or not?
Tim Richter-Heitmann
trichter at uni-bremen.de
Wed Oct 7 16:34:36 CEST 2015
Dear list,
I have six fields with 60 samples, and i want to analyze the microbial
diversity based on high throughput sequencing.
The read range between samples was about one magnitude (i. e the samples
with the highest reads had about tenfold more than those with the least
read numbers).
i have done rarefaction based on Hill numbers (Chao and Joust, 2014) and
i found out that i reached full coverage with n=1, and n=2,
respectively, and i was reaching plateau for n=0 for all samples. My
lowest sample completeness value was 0.995 for a sample with about
30,000 observations.
See here an example of one the six fields (from top to bottom:
Rarefaction based on species richness, linearized simpson, linearized
shannon; the dots represent the end of each curve, after which the curve
was extrapolated according to the aforementioned paper).
http://s21.postimg.org/fm0nhp4w7/image.png
I did species richness boxplots based on uncorrected species richness
and on corrected values (for which i used the "double
reference"-approach (see the paper)), and they were substantially
different, with some of the high read samples losing about 20% of their
observed species richness.
Now, on to the question(s):
- One of my wishes is to identify shared species and core species sets
in the entirety of the six fields or subsets. I would like to use my
entire dataset without subsampling, since i have such a high sample
coverage, but this obviously has impact on the interpretation of the
data. However, if i subsample, dont i have to do it in many
permutations? And wouldnt subsampling also have severe impact on the
interpretatory power of my analysis as well?
- I have yet to find a nice subsampling routine in R for community data,
that enables me to do further calculations on the entire set of n
subsets, possibly in lists.
- As a bonus, if i want to use Chao-1 as an index of expected species
richness, do i do it on subsampled datasets or on samples as they are? I
would rather do it on raw data (because this is what i have measured),
but i fear for sample comparability.
I think i have shifted the problem of subsampling now to the area of
rare and very rare biospheres.
Sorry for bothering, many thanks for reading it.
--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate
International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069
More information about the R-sig-ecology
mailing list