[R-sig-eco] Number of Groups in SpeciesMix

Tue Jan 12 01:09:07 CET 2016

Dear Alexandre,

I'm glad that you are using species archetypes models (SAMs).  I hope that SAMs can answer your questions succinctly.

I think that a lot will be clarified if you look at the example in the help file for clusterSelect().  There you will see that:

1) obs is just a dummy -- you can leave it out (~1+x) or insert whatever you like (anything~1+x).  This is an unfortunate nomenclature, but bearable I 
think.
2) dat1$pa contains all the species observations.  Note that all the columns (species) in this data.frame are used in the analysis.  So, make sure 
that you remove any unwanted species prior to passing it as an argument.
3) dat is the data.frame containing all the environmental covariates.  Note that the number of rows in dat should match dat1.pa (should get an error 
if not).  The model fitting function will extract the right terms, and functions there of (just like lm or glm will do).

So, in terms of your specific questions:

1) obs doesn't stand for the species data.  It doesn't even stand for anything.  Ignore it or just type anything at all.
2) You should put your species data in the sp.data argument (all species to be included in the analysis and no more)
3) You should put your environmental data in the covar.data, and the right bits of it will be extracted according to the right hand side of your formula.

I would encourage you to look at the example in ?clusterSelect.  See how the (simulated) data set is arranged into species data and environmental 
data, and how they are passed to clusterSelect().

I'm happy to help as much as I can, either on list or off (but preferably not both).  I'm also happy to take suggestions about how the package/method 
can be improved.

Regards,

Scott (contributor to, but not author of, SpeciesMix)

On 12/01/16 06:29, Alexandre F. Souza wrote:
> Dear friends,
>
> I am willing to apply the SAM analytical framework to a dataset of plant
> species in coastal Brazil using the SpeciesMix package. The SpeciesMix
> package fits Species Archtype Models, a special type of finite mixture of
> regression model motivated by the analysis of multi-species data.
>
> In appying function clusterSelect, which helps in defining the best number
> of species groups G, I would like to confirm if my understanding is
> correct: the formula reported there as "obs ~ 1 + x" in
>
> clusters <- clusterSelect(obs~1+x,dat1$pa,dat,G=2:5,em.refit=2)
>
> is a generic formulation not directly related to the species (dat1$pa) or
> environmental (dat) data, isn't it? So in principle I should use this same
> formulation as well, understanding that obs stands for the whole species
> data matrix, 1 for the presence of a constant, and x for the whole
> explanatory dataset?
>
> I tried to apply obs~1+x but it returns an error message, however.
>
> I am kind of blocked here so any thoughts could help...
>
> Sincerely,
>
> Alexandre
>

-- 
Scott Foster
CSIRO
E scott.foster at csiro.au T +61 3 6232 5178
Postal address: CSIRO Marine Laboratories, GPO Box 1538, Hobart TAS 7001
Street Address: CSIRO, Castray Esplanade, Hobart Tas 7001, Australia
www.csiro.au