[R-sig-eco] capscale() for PCoA-CDA

Fri Dec 4 07:20:41 CET 2009

On 3/12/09 23:54 PM, "gabriel singer" <gabriel.singer at univie.ac.at> wrote:

> Hi everybody,
> 
> Anybody has used capscale() in package vegan to compute a PCoA-CDA as
> suggested by Anderson and Willis 2003 (Ecology 84: 511 ff) using one or
> more factors as "predictors"?
> 
> Then I wonder about:
> 
> *) How to interpret interactions of factors? Why are interactions
> (specified as "~factor1*factor2" in the function call) shown as
> continuous predictors (using arrows) in the plot function? Wouldn´t
> centroids for all cells in the design be more appropriate? Aren´t
> factorial interactions in a CDA setting more or less meaningless?

Internally capscale() uses constrasts of variables, and they are treated as
continuous variables and shown as arrows in plots. However, if the
constrasts correspond to simple factors, they are not drawn but their
centroids are shown. For ordered factors you get both centroids and the
arrows. The interactions of contrasts cannot be shown as simple class means
and therefore they are drawn as arrows. The simple centroids are not
appropriate, but you should have centroids of all combinations of class
levels of interacting factors.

If you think that factorial interactions in *** (what is CDA?) are
meaningless, why do you want to use them?

I wouldn't say they are meaningless, because that depends on your meaning.
Often they are difficult to interpret, but that's another issue.

> 
> *) How to get classification statistics? And how to efficiently run a
> "leave 1 out" classification analysis? I thought of manually writing
> code that checks for the closest centroid. Would it be appropriate to
> use Euclidean distance as a criterion for this since it happens in PCo
> space? Probably there are more efficient functions which I do not know
> of, yet,... for example a function that allows extraction of distances
> of all objects to all centroids?
>
There is no such thing. Contributed code will be reviewed for inclusion into
vegan.

> *) Is the application of capscale on a Euclidean distance matrix
> equivalent to a classical DFA aka CDA on the original data - or am I
> completely wrong with this idea?
>
No, it isn't equal to "DFA aka CDA". Perhaps... Depends on what are DFA and
CDA. With Euclidean distances, capscale() is equivalent to redundancy
analysis (RDA). Guessing that "DFA aka CDA" are discriminant analysis, RDA
is not equal to them. The major difference is that RDA uses no information
about scatter of points with respect to the class centroids, but it only
uses class centroids. The RDA tries to maximize the distances among class
centroids, but it doesn't try to maximize the separation of points of
different classes. The methods are very different although the results may
have some similarities.

This is connected to the previous question: because RDA (that is in the
heart of capscale()) does not try to optimize in classification, there is no
classification statistic to be optimized. That should be estimated
independently of the analysis and after the analysis, and there are no
functions for the purpose in vegan.

> *) Given only one factor as a "predictor", I guess using permutest() or
> anova() on an object resulting from capscale is completely equivalent to
> a direct application of adonis()? Correct?
>
Have you tried this? After trying, you could tell us if this is true. I
wouldn't expect this. The results may not be completely different, but
internally the methods are pretty different, and when I tried with the same
random number seed and hence same permutations, the results were not
identical.

Cheers, Jari