[R] CoDA: Count Zeros in Biological Data

Rich Shepard rshepard at appl-ecosys.com
Fri Jun 20 03:23:58 CEST 2014


   I have several small biological count-based data sets with one or more
rows having zero proportion. The other proportions in the row sum to 1.000
(or 0.9999 in the sixth data row below because of rounding errors in the
computer). An example is:

    sampdate filter gather  graze predate  shred
  2000-07-18 0.0550 0.5596 0.0734  0.2294 0.0826
  2003-07-08 0.0734 0.6147 0.0183  0.2294 0.0642
  2005-07-13 0.1161 0.5714 0.0357  0.1696 0.1071
  2006-06-28 0.1000 0.4667 0.1500  0.1333 0.1500
  2010-09-14 0.0778 0.6111 0.0444  0.1889 0.0778
  2011-07-13 0.0879 0.5714 0.0659  0.2747 0.0000
  2012-07-11 0.1042 0.5313 0.0625  0.2396 0.0625

   My concern is that in most field-biological (ecological/environmental)
data there can be two explanations for zero counts: the organism was not
present on that date or it was present but not collected. There is no way to
determine which case holds true in each instance, but the ecological
interpretations differ.

   The zCompositions package offers several methods of imputing a value to
replace the zeros. As I'm completely new to compositional data analyses
(CoDA) I would appreciate advice on how to select the most appropriate
method for these data sets. The available methods are: Geometric Bayesian
multiplicative, BM, (GBM, default); square root BM (SQ); Bayes-Laplace BM
(BL); count zero multiplicative (CZM); user-specified hyper-parameters
(user).

   These biological data seem to me to be different from geochemical or
economic data I see in package data sets or the CoDA references I've
acquired and read.

   Advice and suggestions (including references to application of CoDA to
ecological/environmental data) will be appreciated.

Rich



More information about the R-help mailing list