[R-sig-eco] capscale() for PCoA-CDA
gabriel singer
gabriel.singer at univie.ac.at
Fri Dec 4 14:02:22 CET 2009
Dear Jari and others,
>> Hi everybody,
>>
>> Anybody has used capscale() in package vegan to compute a PCoA-CDA as
>> suggested by Anderson and Willis 2003 (Ecology 84: 511 ff) using one or
>> more factors as "predictors"?
>>
>> Then I wonder about:
>>
>> *) How to interpret interactions of factors? Why are interactions
>> (specified as "~factor1*factor2" in the function call) shown as
>> continuous predictors (using arrows) in the plot function? Wouldn´t
>> centroids for all cells in the design be more appropriate? Aren´t
>> factorial interactions in a CDA setting more or less meaningless?
>>
>
> Internally capscale() uses constrasts of variables, and they are treated as
> continuous variables and shown as arrows in plots. However, if the
> constrasts correspond to simple factors, they are not drawn but their
> centroids are shown. For ordered factors you get both centroids and the
> arrows. The interactions of contrasts cannot be shown as simple class means
> and therefore they are drawn as arrows. The simple centroids are not
> appropriate, but you should have centroids of all combinations of class
> levels of interacting factors.
>
> If you think that factorial interactions in *** (what is CDA?) are
> meaningless, why do you want to use them?
>
> I wouldn't say they are meaningless, because that depends on your meaning.
> Often they are difficult to interpret, but that's another issue.
>
I understand the arrows for interactions now, thanks.
I used CDA in the sense of Anderson and Willis 2003 (and others) as
Canonical Disicriminant Analysis,
as such it is - at least to my understanding - equivalent to
Discriminant Function Analyses.
When CDA aka DFA is used with 2 interacting factors, it will try to best
separate groups and that
is *any groups*, and I can´t see why (and how) there should be
preference given to any grouping
criterion (factor 1, factor 2 or both)... In the end a 4-level factor
should be as good as
a 2*2 factorial combination. In this sense I used the word "meaningless".
In fact, capscale() results for a 1*4 constraint (1 factor, 4 levels)
are identical with a 2*2 constraint.
However, centroids are at differnt positions (!), in fact centroids of
all combinations of class levels are at
weird (wrong as I think) positions in the 2*2 case!?
Still, "interactions" finally make sense when interpreting the plot,
that´s quite true.
>
>> *) How to get classification statistics? And how to efficiently run a
>> "leave 1 out" classification analysis? I thought of manually writing
>> code that checks for the closest centroid. Would it be appropriate to
>> use Euclidean distance as a criterion for this since it happens in PCo
>> space? Probably there are more efficient functions which I do not know
>> of, yet,... for example a function that allows extraction of distances
>> of all objects to all centroids?
>>
>>
> There is no such thing. Contributed code will be reviewed for inclusion into
> vegan.
>
>
>> *) Is the application of capscale on a Euclidean distance matrix
>> equivalent to a classical DFA aka CDA on the original data - or am I
>> completely wrong with this idea?
>>
>>
> No, it isn't equal to "DFA aka CDA". Perhaps... Depends on what are DFA and
> CDA. With Euclidean distances, capscale() is equivalent to redundancy
> analysis (RDA). Guessing that "DFA aka CDA" are discriminant analysis, RDA
> is not equal to them. The major difference is that RDA uses no information
> about scatter of points with respect to the class centroids, but it only
> uses class centroids. The RDA tries to maximize the distances among class
> centroids, but it doesn't try to maximize the separation of points of
> different classes. The methods are very different although the results may
> have some similarities.
>
> This is connected to the previous question: because RDA (that is in the
> heart of capscale()) does not try to optimize in classification, there is no
> classification statistic to be optimized. That should be estimated
> independently of the analysis and after the analysis, and there are no
> functions for the purpose in vegan.
>
>
Slightly confused now... Anderson and Willis (2003) describe PCoA on a
dissimilarity structure, followed by
CDA or CCorA and call the procedure CAP (Canonical A of Principal
Coordinates). I will call the latter two
approaches PCoA-CDA and PCoA-CCorA. Now, I get that CCorA differs from
RDA mainly conceptually,
so there is not much (any?) difference between PCoA-CCorA and PCoA-RDA =
capscale().
Now, is PCoA-CDA really equivalent to db-RDA (in the sense of Legendre and
Anderson 1999)? I initially thought this would be the case. They both
use a set of dummy variables to code
for the factor and treat these as continous predictors. A second thought
tells me they can´t be the same. Then
maybe what´s left is only the term capscale() which is not the same as
CAP in the case of PCoA-CDA...
Seems I am getting lost in the panoply of acronyms, sorry...
>> *) Given only one factor as a "predictor", I guess using permutest() or
>> anova() on an object resulting from capscale is completely equivalent to
>> a direct application of adonis()? Correct?
>>
>>
> Have you tried this? After trying, you could tell us if this is true. I
> wouldn't expect this. The results may not be completely different, but
> internally the methods are pretty different, and when I tried with the same
> random number seed and hence same permutations, the results were not identical.
>
Well, the question was sort of aimed at what´s happening in the
background, obviously that´s not the same
(though I don´t get how the two permutation tests exactly differ, I
thought - at least in the sample 1 factor case -
it´s basically permuting raw data and building a pseudo-F distribution).
In my trials I got very similar
results (also same pseudo F - so I thought the test statistic has to be
the same) and interpreted any
differences of the P-values as due to differences in the permutations.
Jari, thanks for the discussion!
Cheers, Gabriel
More information about the R-sig-ecology
mailing list