[R-sig-eco] Mixed models and multivariate methods for temporal-spatial nested data

Tania Bird taniabird at gmail.com
Mon Feb 27 17:45:13 CET 2017


Many thanks for your useful advice Bob!
Unfortunately I did try to use my University's statistical consulting
department, but they were not able to provide advice at this level for
either the multivariate or mixed effect models. :(
I would be happy to consult with someone else if anyone if offering
such a service?

Tania Bird


On 27 February 2017 at 18:14, Bob OHara <bohara at senckenberg.de> wrote:
> Hm, this is a big job. The optimal solution is to see if your university
> offers a statistical consulting service. I don't see any big conceptual
> problems, but getting a good analysis will take a bit of time and
> exploration. I think you can probably 'just' use a GLMM, but getting the
> right GLMM and deciding what a good model is will take time and some poking
> of the data.
>
> Anyway, some answers below, which may (or may not) help.
>
>
> On 02/27/2017 04:27 PM, Tania Bird wrote:
>>
>> Hi all,
>>
>> I am seeking advice on how to analyse my unbalanced, multi-nested
>> multivariate data set. I realise there are many questions in this
>> email and I would be willing to consult with someone privately on this
>> if it is an option.
>>
>> I am using abundance data for insect species (I have the same
>> experimental design for reptiles, and annual plants as well). I use
>> Simpson's diversity as a univariate response and species composition
>> as a multivariate response.
>>
>> Experimental Design:
>> Plots are divided into three habitat types A, B, C based on vegetation.
>> Each habitat has 3 or 4 replicate control plots that are repeat
>> sampled (one sample a year always in spring).
>> In addition B and C have 3 or 4 treatment (vegetation removal) plots.
>>   'S' plots are disturbed( trampling and off-road vehicles) but the
>> disturbance is unquantified and I don't know the pre-disturbance
>> habitat type.
>>
>> The total data set is across a 12 year period, but the sampling was
>> unbalanced for various reasons. I attach a png of the metadata of the
>> plots over time to show the unbalanced sampling.
>> https://www.dropbox.com/s/7vxvo3x9lnywdbm/insects_years.gif?dl=0
>>
>> Each year the sampling across plots was conducted at the same time,
>> and so plots are comparable within a year.
>> In general, As were sampled every year and are considered the 'target'
>> habitat. B's were sampled in the earlier years and C's later on, and
>> in the last couple of years all three types were sampled together.
>>
>> The treatments on B & C were conducted using different methods and in
>> different years, so in principle I should probably test each
>> separately just against their own control pairs. However the
>> hypothesis for both treatments is that treated plots will be more
>> similar in composition to A plots than the paired control plots (if
>> possible I want to check if they become more or less similar to A over
>> time).
>>
>> So in that regard I thought there might be a way to include all
>> habitat types in one analysis? Perhaps using time as "number of years
>> since treatment" rather than a date? (Although I have no environmental
>> data with which to standardise).  S dunes have no "pre-treatment"  but
>> the hypothesis is that S plots will be most similar to A compared to
>> all other (treated and control) plot types.  I am not sure how to
>> include these plots in a testable model.
>>
>> Questions regarding the design:
>>
>>>> Can I use all the habitat types in one model (preferable!) or can I only
>>>> test B treated against B control etc?
>
> Yes you can. you obviously need a Treatment effect, and you should expect to
> have a Treatment by Habitat interaction.
>
> There may also be some sort of interaction with time (either as Time, or
> Time Since Treatment)
>
>>>> Must I remove data to create blocks of sampled or is 'all data useful'?
>>
>> e.g. A's were the only plots sampled in 2010- should I remove that
>> year completely?
>> e.g. C1 & C5 were sampled in 2005 while the rest were not until 2011,-
>> should I only include data from 2011 onwards for all C's?
>> e.g. Should I remove A4 completely since its only sampled in the last
>> few years or its still useable?
>
> No, you should be able to use all of the data, you just have to be a bit
> careful about how you model Time.
>>>>
>>>> Can I include S in the analyses in order to compare them with B and C
>>>> treated plots in relation to A plots?
>
> Yes, in principal. It just doesn't have a Habitat:Treatment interaction.
>
>> I have already analysed my first research question
>> Q1) To understanding the differences in diversity and composition
>> across control habitats, irrelevant of time.
>>
>> The analysis approach I used for this is:
>> i) Mixed effect model:  GLMM PQL (Penalised Quasi-Likelihood) using
>> MASS R package.
>>      Diversity ~   fixed effect = habitat type + random effect = year ,
>> Family = poisson
>
> There are better tools than glmmPQL nowadays. Have a look at the lme4
> package, for example.
>>
>> ii) Pairwise permutational multivariate analysis of variance (MANOVA)
>> with R code based on the adonis2 function, to determine if the
>> composition among habitats (visualised in NMDS) were significantly
>> different from each other.
>>
>> iii) RDA with habitat as explanatory and year as covariate to test
>> explained variance.
>>
>> Now I am trying to expand this analysis to include a temporal element
>> to answer Q2 & Q3
>> Q2) to understand the trends in diversity and composition over time in
>> control habitats
>> Q3) to understand the impact of treatment on diversity and composition
>> (over time if possible?)
>>
>> The addition of time into the analyses is a bit difficult for me to
>> work out, due to the multi-nested and unbalanced design of the data; I
>> am not sure what methods to use to include time as a variable for
>> looking at a) diversity and b) composition
>
> Take a look at repeated measures models. There are a few ways this could be
> set up, depending a bit on the data.
>
>> Questions regarding analyses:
>>>
>>> 1> Is there an appropriate mixed effect model I can use to look at
>>> differences in diversity on different control plots and include time as a
>>> factor (rather than as a random effect)?
>
> There are probably several. :-) For example you could include Time as a
> continuous covariate, alongside the random effect. You could also just
> include it as a fixed effect, but that could get messy.
>>>
>>> 2> How can I appropriately test if different habitats exhibit different
>>> trends in composition over time (ie. a multivariate approach). For example,
>>> I might expect that A's will remain relatively stable over time, while C's
>>> will exhibit high turnover (fluctuation) across years, or that B's will
>>> slowly shift composition to be more similar to C. How can I test these
>>> directional hypotheses?
>>
>> I thought to create a Principle Response Curve to see relative
>> differences over time, but as far as I understand, I cannot use a
>> permutation test here due to the unbalance design. I also thought to
>> take the scores on the first RDA axis as a univariate measure, and
>> then plot this over time.. but I'm not sure if its an appropriate
>> approach or how to then test this statistically.
>>
>> I also thought to try and create some measure of "compositional
>> temporal stability" for each plot and test this using ANOVA (like some
>> sort of "multivariate Coefficient of Variation"). One such measure
>> could be distance of each plot-year from the habitat centroid in
>> ordination space but again, I'm not sure if this is an appropriate
>> approach. Any suggestions for other measures would be welcome.
>
> That's essentially a question about the variance in responses. There are
> doubly hierarchical models that you could try, but you might not want to go
> there.
>>>
>>> 3> Finally can I extend these temporal analysis (of diversity and of
>>> composition) to look at response trends to treatments, given the structure
>>> of my data?
>>
>> I would like to see if I can detect some form of resistance to, or
>> recovery from, the treatment over time ... But if not, can I test the
>> overall treatment affect and use time as a random effect like i did
>> for my first question?
>>
>> Thank you for any suggestions of analyses and/or ways to subset the
>> data that would allow me to answer these questions.
>
> Essentially you need some structure on the time covariate. You could start
> by using time since treatment as a factor, and plot those estimates. Again,
> there should be a bit of playing around with the model, to see what makes
> sense.
>
> Bob
>
> --
> Bob O'Hara
> NOTE NEW ADDRESS!!!
> Institutt for matematiske fag
> NTNU
> 7491 Trondheim
> Norway
>
> Mobile: +49 1515 888 5440
> Journal of Negative Results - EEB: www.jnr-eeb.org
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list