[R] cca and cca.predict in vegan-what sort of prediction is possible

Mon Aug 6 13:23:27 CEST 2007

> I am not clear quite how one could use cca from package vegan and the associated 
> predict.cca to predict species abundance from environmental data (or if this is possible 
> in a generalised way). In other words, can one derive a cca object based on known 
> community data and use that to predict e.g. species abundances in a different number 
> of samples based on environmental data? The help notes show that prediction is 
> possible, but it seems that the number of samples is constrained to
> that in the original, 
> "training" set. 
> 
> If this is possible, a reference or example would be much, much appreciated.

This is not possible with the current predict.cca. It seems that you
want to use CCA to approximate your original data (type = "response" in
predict.cca), and that ignores 'newdata' argument. However, this type of
prediction is doable and simply looking at the code shows you how to do
that. You only need linear combination scores (u in the code), species
scores (v), eigenvalues and row and column totals for the data
approximation. You can use predict.cca to get the linear combination
scores (u) using environmental data as 'newdata' with new sites, and
then you can use this in the predict.cca code. You also need to supply
totals (sums) for new rows. This is all pretty technical and tedious. In
principle, the function could be changed to accept optional arguments
for linear combination, weighted averages and species scores, but then
it also would need matching arguments for row and column sums making the
usage tedious (change would be easy, usage difficult). I think it is
better to look at the code and follow its example if you really are in
need of more complicated analyses. 

Another issue is that CCA is not good in predicting species composition:
it only is weighted linear regression. You will see, for instance, that
the method happily gives you negative abundances that some ecologists
find very disturbing. If you really want to predict species composition
from environmental data, I suggest nonlinear regression (mgcv:::gam with
appropriate family, for instance) or some fancier methods.

Please note that this kind of specific questions should not be sent to
the R News, but to more specialized mailing lists or to the package
author directly (although the author was not reading email in July).

Best wishes, Jari Oksanen