[R-sig-eco] repeated measures NMDS?

Thu Nov 11 14:50:12 CET 2010

many, many thanks,
i think this can serve as a pedagogically worthful example for anyone 
facing troubles with dependence within species / environment data.

yours,
kay

ps: nevermind the mistake.

Gavin Simpson schrieb:
> On Thu, 2010-11-11 at 13:03 +0100, Kay Cecil Cichini wrote:
>> thanks a lot for the illustrative example.
>>
>> ..referring to your quote:
>>
>> "...This of course doesn't account for any temporal correlation within 
>> sites nor, if the observations on the 24 blocks were made at the same 
>> times, that you might want to have the same permutation within each block.
>> In the former there are 3^24 possible permutations (time series within
>> blocks), so 999 random perms is reasonable, *but* some of these random
>> perms (in permuted.index()) will not respect the temporal ordering and
>> thus you aren't really exploring the correct NULL.
>> With the latter constraint (same temporal perm within blocks), there are
>> 3 random permutations, so good luck getting a reasonable p-value from
>> that."
>>
>> ..so for the time beeing we assume the former case -
>> and for the latter there is no way out.
> 
> Yes - for the case of wanting the same temporal permutation within each
> block there only are 3 permutations (6 if you allow time to go backwards
> [mirror = TRUE]), but this includes the observed ordering, so only 2 (5)
> other permutations to try. This is where permutation tests fail. If the
> observed statistic is bigger than the statistic from the two random
> permutations, this is an exact statistic in the sense that you've
> evaluated all possible orderings consistent with the null, but all you
> can is that the p-value is p < 0.333.
> 
> Having said that, we can perhaps try to be reasonable and relax some of
> the assumptions (how often do our data fully meet all the assumptions of
> the parametric statistical approaches we use?) and be happy with a null
> that respects temporal autocorrelation within block, but not across
> blocks. One might then choose to accept as a significant result a
> permutation p that is say p <= 0.01 or even p <= 0.001, rather than the
> usual p <= 0.05, to help guard against using the wrong Null.
> 
>> yours,
>> kay
>>
>> ps: in germany/austria there are two alternative spellings for the name 
>> kai/kay - beeing a male name opposed to the english kay.
> 
> I am truly sorry for my mistake - please accept my apologies. Totally
> unintentional I assure you.
> 
> All the best,
> 
> G
> 
>> Gavin Simpson schrieb:
>>> [Apologies - I replied with this only to Kay. Hopefully she won't mind
>>> receiving it twice!]
>>>
>>> On Thu, 2010-11-11 at 10:32 +0100, Kay Cecil Cichini wrote:
>>>> thanks a lot for your elaborations.
>>>>
>>>> of course, envfit(..,strata=rep.mes) does it.
>>>>
>>>> then, at least, i consider it exercise for other cases, were you really 
>>>> might need a handmade permutation
>>>>
>>>> so, as to round this off, i actually can't analyse this very design in 
>>>> such a way, with the right NULL concerned - but were to go from here?
>>> You could hook up the code in 'permute'. It contains a new
>>> permuted.index() function (and currently no NAMESPACE, so will overwrite
>>> (mask) the vegan version if loaded after vegan) which will break all the
>>> permutation code in vegan).
>>>
>>> Here is your example, modified to use the code in permute. I post this
>>> to illustrate how you'd use the new code. There are lots of examples
>>> in ?permuted.index (for the permute package, not the vegan package
>>> version), but *don't* touch the permutation t-test example code as it
>>> uses permCheck() and it might call allPerms() and allPerms() *IS*
>>> *WRONG* for some designs --- this is the last bit I need to fix/get
>>> working before we can make our first release of this code.
>>>
>>> HTH
>>>
>>> G
>>>
>>> Here's the example script:
>>>
>>> ## Load packages
>>> require(vegan)
>>> require(permute)
>>>
>>> ## Data
>>> set.seed(123)
>>> sp <- matrix(runif(24*3*5, 0, 100), nrow = 24 * 3, ncol = 5)
>>> env <- rnorm(24*3, 10, 2)
>>> rep.mes <- gl(24, 3)
>>>
>>> ### NMDS:
>>> sol <- metaMDS(sp, trymax = 5)
>>> fit <- envfit(sol~env, permutations = 0) ## perms now won't work!
>>>
>>> B <- 999 ## number of perms
>>>
>>> ### setting up frame for population of r2 values:
>>> pop <- rep(NA, B + 1)
>>> pop[1] <- fit$vectors$r
>>>
>>> ## set-up a Control object:
>>> ctrl <- permControl(strata = rep.mes,
>>>                     within = Within(type = "series", mirror = FALSE))
>>> ## we turn off mirroring as time should only flow in one direction
>>>
>>> ## Number of observations
>>> nobs <- nrow(sp)
>>>
>>> ## check it works
>>> matrix(permuted.index(nobs, control = ctrl), ncol = 3, byrow = TRUE)
>>> ## Yep - Phew!!!
>>>
>>> ### loop:
>>> set.seed(1)
>>> for(i in 2:(B+1)){
>>>     idx <- permuted.index(nobs, control = ctrl)
>>>     fit.rand <- envfit(sol ~ env[idx], permutations = 0)
>>>     pop[i] <- fit.rand$vectors$r
>>> }
>>>
>>> ### p-value:
>>> pval <- sum(pop >= pop[1]) / (B + 1)
>>> pval
>>>
>>> I get:
>>>
>>>> pval
>>> [1] 0.286
>>>
>>> Now to compare with the actual permutation you'd have gotten from
>>> env.fit, you first need:
>>>
>>> detach(package:permute)
>>>
>>> Then run:
>>>> set.seed(1)
>>>> fit2 <- envfit(sol~env, permutations = 999, strata = rep.mes)
>>>> fit2
>>> ***VECTORS
>>>
>>>       NMDS1   NMDS2     r2 Pr(>r)
>>> env 0.28727 0.95785 0.0315  0.321
>>> P values based on 999 permutations, stratified within strata.
>>>
>>>> a simplistic approach could be, averaging sites, yielding n=24 for 
>>>> further testing.
>>>>
>>>> yours,
>>>> kay
>>>>
>>>>
>>>> Gavin Simpson schrieb:
>>>>> On Thu, 2010-11-11 at 09:50 +0100, Kay Cecil Cichini wrote:
>>>>>> gavin,
>>>>>>
>>>>>> sorry - of course it should be permute.strata=F, permuting within 
>>>>>> individual sites!
>>>>>> but despite of this the code should work, doesn't it?
>>>>> Yes, it should - i.e the permutation will be random within the blocks.
>>>>> Whether it does or not is another matter entirely. AFAICR, this option
>>>>> in permuted.index2() did work.
>>>>>
>>>>> *But*, this is doing exactly what the original permuted.index() does if
>>>>> you set argument 'strata' to be the grouping factor - in your case:
>>>>>
>>>>> envfit(sol ~ env, strata = rep.mes, perm = 999)
>>>>>
>>>>> for example. So there is no need to code up the analysis by hand.
>>>>>
>>>>> This of course doesn't account for any temporal correlation within sites
>>>>> nor, if the observations on the 24 blocks were made at the same times,
>>>>> that you might want to have the same permutation within each block.
>>>>>
>>>>> In the former there are 3^24 possible permutations (time series within
>>>>> blocks), so 999 random perms is reasonable, *but* some of these random
>>>>> perms (in permuted.index()) will not respect the temporal ordering and
>>>>> thus you aren't really exploring the correct NULL.
>>>>>
>>>>> With the latter constraint (same temporal perm within blocks), there are
>>>>> 3 random permutations, so good luck getting a reasonable p-value from
>>>>> that.
>>>>>
>>>>> The two restricted permutations /should/ work correctly, /but/ if you
>>>>> are doing this by hand, I'd look at the functions in the 'permute'
>>>>> package - only on R-Forge, on the Vegan R-Forge area - as I know the
>>>>> code to generate these permutations in that package *does* work. (It is
>>>>> the helper cruft around it that needs more work.)
>>>>>
>>>>> https://r-forge.r-project.org/R/?group_id=68
>>>>>
>>>>> I've had a busy Summer and not made as much progress as I would have
>>>>> liked, but I hope to finish this soon and make an initial release to
>>>>> CRAN so we can start grafting it into vegan.
>>>>>
>>>>> In the meantime, I can help people try to link the two packages if
>>>>> needed, but I don't have much time till the end of this month.
>>>>>
>>>>> G
>>>>>
>>>>>> thanks,
>>>>>> kay
>>>>>>
>>>>>>
>>>>>>
>>>>>> Gavin Simpson schrieb:
>>>>>>> On Wed, 2010-11-10 at 23:33 +0100, Kay Cecil Cichini wrote:
>>>>>>>> hi eduard,
>>>>>>>>
>>>>>>>> i faced similar problems recently and came to the below solution.
>>>>>>>> i only try to address the pseudoreplication with an appropiate  
>>>>>>>> permutation scheme.
>>>>>>>> when it comes to testing the interactions, things may get more complicated.
>>>>>>>>
>>>>>>>> the code is in no way approven of, but at least it maybe good enough  
>>>>>>>> for a starter.
>>>>>>>>
>>>>>>>> best,
>>>>>>>> kay
>>>>>>> Hi Kay,
>>>>>>>
>>>>>>> I don't think you have this right.
>>>>>>> If you have measured repeatedly, say 5 times, on the same 10
>>>>>>> individuals, or if you have ten fields and you take 5 quadrats from
>>>>>>> each, you need to permute *within* the individuals/fields, not permute
>>>>>>> the individuals/fields which is what permute.strata does. permute.strata
>>>>>>> would be useful in evaluating factors that vary at the block
>>>>>>> (individuals/fields) level, not at the sample levels.
>>>>>>>
>>>>>>> From what Eduard and you describe, the code you show is not the correct
>>>>>>> permutation. But I may have misunderstood your intention.
>>>>>>>
>>>>>>> Also, be careful with permuted.index2 - there are reasons why it hasn't
>>>>>>> been integrated (design goals changed and we felt it would work best in
>>>>>>> a separate package that others could draw upon without loading all of
>>>>>>> vegan) and the code has festered a bit and may contain bugs - buyer
>>>>>>> beware!
>>>>>>>
>>>>>>> G
>>>>>>>
>>>>>>>> library(vegan)
>>>>>>>>
>>>>>>>> ### species matrix with 5 sp.
>>>>>>>> ### one env.variable
>>>>>>>> ### a factor denoting blocks of repeated measurments
>>>>>>>>
>>>>>>>> sp<-matrix(runif(24*3*5,0,100),24*3,5)
>>>>>>>> env<-rnorm(24*3,10,2)
>>>>>>>> rep.mes<-gl(24,3)
>>>>>>>>
>>>>>>>> ### NMDS:
>>>>>>>> sol<-metaMDS(sp,trymax=5)
>>>>>>>>
>>>>>>>> fit<-envfit(sol~env)
>>>>>>>> plot(sol)
>>>>>>>> plot(fit)
>>>>>>>>
>>>>>>>> ### testing code for appropiate randomization,
>>>>>>>> ### permuting blocks of 3 as a whole:
>>>>>>>> permuted.index2(nrow(sp),permControl(strata = rep.mes,permute.strata=T))
>>>>>> correctly, this should say:
>>>>>> ### testing code for appropiate randomization,
>>>>>> ### permuting within sites:
>>>>>> permuted.index2(nrow(sp),permControl(strata = rep.mes))
>>>>>>
>>>>>>>> B=4999
>>>>>>>>
>>>>>>>> ### setting up frame for population of r2 values:
>>>>>>>> pop<-rep(NA,B+1)
>>>>>>>> pop[1]<-fit$vectors$r
>>>>>>>>
>>>>>> and:
>>>>>>>> ### loop:
>>>>>>>> for(i in 2:(B+1)){
>>>>>>>> fit.rand<-envfit(sol~env[permuted.index2(nrow(sp),
>>>>>>>>                           permControl(strata = rep.mes))])
>>>>>>>>            pop[i]<-fit.rand$vectors$r
>>>>>>>> }
>>>>>>>>
>>>>>>>> ### p-value:
>>>>>>  >> print(pval<-sum(pop>pop[1])/B+1)
>>>>>>
>>>>>> here a bracket was missing:
>>>>>> print(pval<-sum(pop>pop[1])/(B+1))
>>>>>>
>>>>>>>> ### compare to anti-conservative p-value from envfit(),
>>>>>>>> ### not restricting permutations:
>>>>>>>>
>>>>>>>> envfit(sol~env,perm=B)
>>>>>>>>
>>>>>>>>
>>>>>>>> Zitat von Eduard Szöcs <szoe8822 at uni-landau.de>:
>>>>>>>>
>>>>>>>>> Thanks, that helped.
>>>>>>>>>
>>>>>>>>> permuted.index2() generates these types of permutations. But  
>>>>>>>>> envfit() does not use this yet.
>>>>>>>>> What if I modify vectorfit() (used by envfit() ) in such a way that  
>>>>>>>>> it uses permuted.index2() instead of permuted.index()?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Eduard Szöcs
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 08.11.2010 22:01, schrieb Gavin Simpson:
>>>>>>>>>> On Mon, 2010-11-08 at 15:39 +0100, Eduard Szöcs wrote:
>>>>>>>>>>> Hi listers,
>>>>>>>>>>>
>>>>>>>>>>> I have species and environmental data for 24 sites that were sampled
>>>>>>>>>>> thrice. If I want to analyze the data with NMDS I could run metaMDS on
>>>>>>>>>>> the whole dataset (24 sites x 3 times = 72) and then fit environmental
>>>>>>>>>>> data, but this would be some kind of pseudoreplication given that the
>>>>>>>>>>> samplings are not independent and the gradients may be overestimated,
>>>>>>>>>>> wouldn`t it?
>>>>>>>>>>>
>>>>>>>>>>> For environmental data a factor could be included for the sampling
>>>>>>>>>>> dates - but this would not be possible for species data.
>>>>>>>>>>>
>>>>>>>>>>> Is there an elegant way either to aggregate data before ordination or
>>>>>>>>>>> to conduct sth. like a repeated measures NMDS?
>>>>>>>>>>>
>>>>>>>>>>> Thank you in advance,
>>>>>>>>>>> Eduard Szöcs
>>>>>>>>>> Depends on how you want to fit the env data - the pseudo-replication
>>>>>>>>>> isn't relevant o the nMDS. If you are doing it via function `envfit()`,
>>>>>>>>>> then look at argument `'strata'` which should, in your case, be set to a
>>>>>>>>>> factor with 24 levels. This won't be perfect because your data are a
>>>>>>>>>> timeseries and, strictly, one should permute them whilst maintaining
>>>>>>>>>> their ordering in time, but as yet we don't have these types of
>>>>>>>>>> permutations hooked into vegan.
>>>>>>>>>>
>>>>>>>>>> If you are doing the fitting some other way you'll need to include
>>>>>>>>>> "site" as a fixed effect factor to account for the within site
>>>>>>>>>> correlation.
>>>>>>>>>>
>>>>>>>>>> You don't need to worry about the species data and accounting for
>>>>>>>>>> sampling interval. You aren't testing the nMDS "axes" or anything like
>>>>>>>>>> that, and all the species info has been reduced to dissimilarities and
>>>>>>>>>> thence to a set of nMDS coordinates. You need to account for the pseudo
>>>>>>>>>> rep at the environmental modelling level, not the species level.
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>> G
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> R-sig-ecology mailing list
>>>>>>>>> R-sig-ecology at r-project.org
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-ecology mailing list
>>>>>>>> R-sig-ecology at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>