[R-sig-eco] Using multiple species data for gam

Scott Foster scott.foster at csiro.au
Wed Feb 11 07:02:35 CET 2015


Hi,

People, including me, do get excited by analysing lots of species together.  It is an interesting statistical problem that can lead to some great 
ecological insights.  However, the wood needs to be seen before the trees obscure it, completely (and possibly forever).

Rajendra: What is the ecological question that you are trying to answer?  This will help focus your search of statistical methods. Are you trying to 
determine differences in a designed experiment? Is there a pre-defined hypothesis that needs to be tests?  Are you trying to delineate sampling sites 
based on their composition?  Are you trying to predict distributions of species, assemblage, or habitat?

In any of these cases, I would encourage the use of model-based approaches.  The reasons are outline in this paper 
http://link.springer.com/article/10.1007%2Fs11258-014-0366-3

Below are some links for papers (leading to, in many cases) R packages.  Please be aware (before looking at them), that I am an author on some of them 
and that this message may be construed as blatant self-promotion.  I'm sure that I have missed some references and I hope that others will fill them in.

I do have to wonder if you need 1000 species for many of the questions that you will be asking these data about.  Are there many singletons?  Do you 
have many sites?  What is the spatial scale (how big are the environmental gradients and species-turnover)?

Computation is likely to hurt with 1000 species.  However, even if it does take a while it will still be nothing compared with the time and effort 
expended to gather the data.

I hope that this helps,

Scott

http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2012.00190.x/full #mvabund -- hypothesis testing for many species.
http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12236/abstract     #model-based ordination (unconstrained)
http://www.esajournals.org/doi/abs/10.1890/12-1322.1 #predicting using multiple species gives better predictions
http://www.sciencedirect.com/science/article/pii/S0304380010006393 #Grouping species based on their responses to the environment (we call these models 
SAMs)
http://link.springer.com/article/10.1007%2Fs13253-013-0146-x #More SAMs (more papers dealing with model selection are available -- Francis Hui is the 
first author)
http://www.esajournals.org/doi/abs/10.1890/10-1251.1 #pre-curser to SAMs, but doesn't explicitly do the grouping (but does in the example).
http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12180/abstract #JSDMs, models the full species covariance matrix -- sensible but will be 
challenged with 1000 species.
http://onlinelibrary.wiley.com/doi/10.1002/env.2245/abstract #grouping and modelling responses to environments of assemblages
http://onlinelibrary.wiley.com/doi/10.1111/ele.12380/abstract #similar to previous, but with different tweaks.  Done independently of previous.
http://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2009.01263.x/full #added just in case species labels are not important but the structure of the 
assemblage is.


On 11/02/15 05:27, Mailing lissts wrote:
> Hi everyone !
>
> In my humble and biased opinion, there are two approaches that may be interesting to considered to deal with so many species; the ordination approach and the model-based approach.
>
> As Gavin proposed, using a CCA might not be a bad idea except for variance-mean problem highlighted by David Warton a few years ago in a paper in Methods in Ecology and Evolution (and can’t find it quickly at the moment). However, I worked on developing consensus RDA which might be helpful in dealing with this problem. If you want to take a look at the paper, it is here : http://dx.doi.org/10.1890/13-0648.1 <http://dx.doi.org/10.1890/13-0648.1>
>
> Consensus RDA is currently implement in a package available on R-forge in the package ordiconsensus (https://r-forge.r-project.org/R/?group_id=68 <https://r-forge.r-project.org/R/?group_id=68>).
>
> Another approach that may be worth investigating for problems similar to the one discussed here was proposed by Ovaskainen and Soininen in Ecology in 2011 (  <http://dx.doi.org/10.1890/10-1251.1>). I am currently working on implementing their work in a package called HMSC, which is also available on R-forge (https://r-forge.r-project.org/R/?group_id=1682 <https://r-forge.r-project.org/R/?group_id=1682>). Note that the HMSC package is not as mature and maybe a little buggy.
>
> In any case, these two approaches are new ideas that might be interesting to consider in addition of the ones discussed in the current thread.
>
> Have a good day !
>
> Guillaume
>
>> Le 2015-02-10 à 12:11, Gavin Simpson <ucfagls at gmail.com> a écrit :
>>
>> mvabund has a manyany() function which allows you to run the same sort of
>> analysis as manyglm() does without having to use a GLM. Hence you could do
>> a many GAM using manyany() and the mgcv::gam() function(ality). There is an
>> example of this on the ?manany help page.
>>
>> Still, doing this for a 1000 species is going to be tough going, even if
>> you just used manyglm() but it may be doable if you are prepared to wait
>> for the models to fit and you have sufficient data in each species to fit a
>> complex model like a GAM.
>>
>> G
>>
>> On 10 February 2015 at 10:28, Tim Meehan <tmeeha at gmail.com> wrote:
>>
>>> If you want to do this in a glm framework, you might look into the mvabund
>>> package:
>>>
>>> http://cran.r-project.org/web/packages/mvabund/mvabund.pdf
>>>
>>> I've never used it with anything approaching 1000 species, though.
>>>
>>> On Tue, Feb 10, 2015 at 2:41 AM, Rajendra Mohan panda <
>>> rmp.iit.kgp at gmail.com
>>>> wrote:
>>>> Dear All
>>>>
>>>> I have >1000 species with presence and absence (0 or 1) values and with
>>>> seven corresponding predictor variables. If I can run gam/glm for the
>>> data
>>>> using all species data simultaneously vs predictors. Data are arranged in
>>>> columns against their GPS locations (see below). I know it is possible to
>>>> do separately for each species.
>>>>
>>>> Your kind response is highly appreciated.
>>>>
>>>> Sites  Sp1  Sp2 Sp3 Alt Temp Pptn   Ft
>>>> 1A         0      1    1     20   30     1000 Evergreen
>>>>
>>>> With Best Regards
>>>> Rajendra M Panda
>>>> School of Water Resources
>>>> Indian Institute of Technology Kharagpur, India
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-ecology mailing list
>>>> R-sig-ecology at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-ecology mailing list
>>> R-sig-ecology at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>
>>
>>
>> -- 
>> Gavin Simpson, PhD
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Scott Foster
CSIRO
E scott.foster at csiro.au T +61 3 6232 5178
Postal address: CSIRO Marine Laboratories, GPO Box 1538, Hobart TAS 7001
Street Address: CSIRO, Castray Esplanade, Hobart Tas 7001, Australia
www.csiro.au



More information about the R-sig-ecology mailing list