[R-sig-eco] cross validation in CoCA and CCA

Jari Oksanen jari.oksanen at oulu.fi
Sat Mar 29 08:32:23 CET 2014


Jesse,

I do not know what you mean with CV in this context, but basic cross validation can be done with vegan functions cca(), rda() and capscale(). These functions have predict method that accepts 'newdata', and using new data allows cross validation. They also have a calibrate() function that can directly estimate the values of predictor values from community data, and this also has 'newdata'. So you can build ("train") a model and then use independent 'newdata' to use ("test") the model. However, we do not have any generic crossvalidate(object, data, k, …) function for canned cross validation process, but you have to do this by hand. Neither do we have any functions for multistep CV or structured CV where some of the external variables were known and others predicted/calibrated. However, basic facilities for hand crafting such models are provided. Simple things, like k-fold cross validation are really trivial to build with ordination (but if you build in the uncertainty of model building in the process --- like you should --- you must be very careful in collecting the data as the variables can change in cross validation).

Here one 5-fold CV cycle with rda:

library(vegan)
data(mite, mite.env)
## 5-fold CV
k <- rep(1:5, len=nrow(mite))
## x is matrix to collect predictions for two vars
x <- matrix(NA, nrow=nrow(mite), ncol=2)
colnames(x) <- c("SubsDens", "WatrCont")
## shuffle for each CV
k <- sample(k)
## the next line could be broken into several commands within {}
for(i in 1:5) x[k==i,] <- calibrate(rda(decostand(mite, "hell") ~ SubsDens+WatrCont, mite.env, subset = k != i), newdata = decostand(mite[k==i,], "hell"))

Easy, but not very good a prediction (cca would be marginally better, like it usually is).

Cheers, Jari Oksanen

On 29/03/2014, at 04:44 AM, Gavin Simpson wrote:

> In short, no. I haven't ported the rough code for LOO CV of CCA or
> CCA-PLS models. I think I ported the mean centring and crossval
> functions from the Matlab sources, but not the code in the
> `example_crossvalCCA.m` file from the supplementary materials on the
> CoCA paper in Ecology.
> 
> I could take a look and see how easy i will be to add this, but it
> doesn't sit well with cocorresp or vegan as the former was designed
> really for CoCA and the latter doesn't have the other functionality
> needed (which exists in cocorresp) and we've not really implemented CV
> for ordination methods.
> 
> That said, this is R and it is relatively trivial to write your own
> LOO or k-fold CV loop, and you can predict from a CCA model using the
> `predict()` method for cca objects available in vegan.
> 
> Part of the reason, at least as far as I see things, for not having CV
> in the common ordination software (closed or open source) is that
> these methods tend not to be seen as purely predictive models, which
> is what CV is designed to evaluate.
> 
> Don't hold your breath for me getting this in cocorresp, but if you
> want to follow up I might be persuaded to take a look and see if what
> is already in cocorresp will enable you to follow the code in the
> `example_crosvalCCA.m` file to write your own LOO code.
> 
> HTH
> 
> G
> 
> On 28 March 2014 14:57, Jesse Becker <jcbecker42 at gmail.com> wrote:
>> Hello list,
>> I am doing a concordance study between riverine environmental conditions,
>> invertebrate, and fish assemblages.  I am doing a predictive CoCA as part
>> of the analysis with the cocorresp package.  My question is whether there
>> is an implementation of the cross-validation procedure in the cocorresp
>> package that would work on the results of a CCA or RDA, without having to
>> use MATLAB (which I don't have access to)?  My understanding is that by
>> doing the cross validation on the CCA (and hopefully RDA, although I've
>> never seen it done) it allows for a more consistent evaluation of
>> differences between the two methods.  I haven't seen this as a function in
>> vegan.
>> 
>> Jari?  Gavin?
>> 
>> Thanks,
>> Jesse
>> 
>> 
>> Jesse C. Becker, Ph.D.
>> 765.285.8889765.285.8889 office
>> 512.587.4428512.587.4428 cell
>> jcbecker at bsu.edu
>> jcbecker42 at gmail.com
>> 
>> Call
>> Send SMS
>> Add to Skype
>> You'll need Skype CreditFree via SkypeI am
>> 
>>        [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> 
> 
> -- 
> Gavin Simpson, PhD
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list