[R] NA in cca (vegan)

Jari Oksanen jari.oksanen at oulu.fi
Tue Sep 8 16:17:07 CEST 2009


Gavin Simpson <gavin.simpson <at> ucl.ac.uk> writes:

> 
> On Fri, 2009-09-04 at 17:15 +0200, Kim Vanselow wrote:
> > Dear all,
> > I would like to calculate a cca (package vegan) with species and
> > environmental data. One of these environmental variables is
> > cos(EXPOSURE).
> > The problem: for flat releves there is no exposure. The value is
> > missing and I can't call it 0 as 0 stands for east and west.
> > The cca does not run with missing values. What can I do to make vegan
> > cca ignoring these missing values?
> > Thanks a lot,
> > Kim
> 
> 
> This is timely as Jari Oksanen (lead developer on vegan) has been
> looking into making this happen automatically in vegan ordination
> functions. The solution for something like cca is very simple but it
> gets more complicated when you might like to allow features like
> na.exclude etc and have all the functions that operate on objects of
> class "cca" work nicely.
> 
> For the moment, you should just process your data before it goes into
> cca. Here I assume that you have two data frames; i) Y is the species
> data, and ii) X the environmental data. Further I assume that only one
> variable in X has missings, lets call this Exposure:
> 
Kim,

A test version of NA handling in cca is now in the development version of vegan
at http://vegan.r-forge.r-project.org/. You may get current source code or a bit
stale packages from that address (when writing this, the packages are two to
three days behind the current devel version). Instruction of downloading the
working version of vegan can be found in the same web site.  

Basically the development version does exactly the same thing as Gavin showed
you in his response. It does a "listwise" elimination of missing values. Indeed,
it may be better to do that manually and knowingly than to use perhaps
surprising automation of handling missing values within the function. 

Your missing values are somewhat wierd as they are not missing values (= unknown
and unobserved), but you just decided to use a coding system that does not cope
with your well known and measured values. I would prefer to find a coding that
puts flat ground together with exposure giving similar conditions. In no case
should they be regarded as NA since they are available and known, and censoring
them from your data may distort your analysis. Perhaps having a new variable
(hasExposure, TRUE/FALSE) and coding them as east/west (=0) in Exposure could
make more sense. Indeed, model term hasExposure*Exposure would make sense as
this would separate flat ground from slopes of different Exposures. The
interaction term and aliasing would take care of having flat ground with known
values but separate from exposed slopes.

Cheers, Jari Oksanen




More information about the R-help mailing list