[R] Abundance data ordination in R

Mon Apr 2 11:48:37 CEST 2007

Milton Cezar Ribeiro <milton_ruser <at> yahoo.com.br> writes:

> 
> Dear R-gurus
> 
> I have a data.frame with abundance data for species and sites which looks like:
> mydf<-data.frame(
>  sp1=sample(0:10,5,replace=T),
>  sp2=sample(0:20,5,replace=T),
>  sp3=sample(0:4,5,replace=T),
>  sp4=sample(0:2,5,replace=T))
> rownames(mydf)<-paste("sites",1:5,sep="")
> 
> I would like make an ordination analysis of these data and my worries is about
the "zeros" (absence of
> species) into the matrix. Up to I read (Gotelli - A primir of ecological
statistics, 2004), when I have
> abundance data I can´t compute Euclidian Distances because the zeros have the
meaning of absence of the
> species and not as zero counting. Gotelli suggests one make "principal
coordinates analysis". I would
> like to here from you what you think about and what is the best packages and
functions to I compute my
> distance matrices and do my ordination analysis. Can I considere zero as NA on
my data.frame? Is there a
> good PDF book available about Multivariate Analysis for abundance data
available on the web?
> 
> 
Other people already suggested what to do with these data and where to find pdf
texts. I only comment on some points raised in this original question. Firstly,
Euclidean distance is quite OK with zeros, or at least as good as any other
normal dissimilarity index is with zeros. Euclidean distance on non-transformed
data is poor for other reasons (it takes squared differences emphasizing
abundance, and even when two sites have nothing in common, Euclidean distance
varies with total abundances). Using Principal Co-ordinates analysis does not
change this, since it also can be run with Euclidean distances. However, there
are a many packages providing "better" dissimilarity indices or transformations
that make Euclidean distances more useful (such as the Hellinger transformation).

Another question is more abstract: indeed, you may regard most zeros as missing
data. Species probably could occur in your sample site, more or less, but it was
too scarce to be observed. How to do this in practice is the tricky issue. You
cannot simply change zeros to NA, since then the dissimilarities (if they don't
fail) will really give a special significance to these cells. Regarding them as
zeros certaily makes more sense than removing *pairs* of data where species is
NA in one site and present in another. There are ways to have something like
handling zeros as missing values of various degrees(!), but my decency prohibits
me to write about these methods.

cheers, jari oksanen