[R-sig-eco] repeated measures NMDS?

Mon Nov 15 22:23:09 CET 2010

I can only agree with kay - thank you for your informative answers!

Eduard

Am 11.11.2010 14:50, schrieb Kay Cecil Cichini:
> many, many thanks,
> i think this can serve as a pedagogically worthful example for anyone 
> facing troubles with dependence within species / environment data.
>
> yours,
> kay
>
> ps: nevermind the mistake.
>
>
>
> Gavin Simpson schrieb:
>> On Thu, 2010-11-11 at 13:03 +0100, Kay Cecil Cichini wrote:
>>> thanks a lot for the illustrative example.
>>>
>>> ..referring to your quote:
>>>
>>> "...This of course doesn't account for any temporal correlation 
>>> within sites nor, if the observations on the 24 blocks were made at 
>>> the same times, that you might want to have the same permutation 
>>> within each block.
>>> In the former there are 3^24 possible permutations (time series within
>>> blocks), so 999 random perms is reasonable, *but* some of these random
>>> perms (in permuted.index()) will not respect the temporal ordering and
>>> thus you aren't really exploring the correct NULL.
>>> With the latter constraint (same temporal perm within blocks), there 
>>> are
>>> 3 random permutations, so good luck getting a reasonable p-value from
>>> that."
>>>
>>> ..so for the time beeing we assume the former case -
>>> and for the latter there is no way out.
>>
>> Yes - for the case of wanting the same temporal permutation within each
>> block there only are 3 permutations (6 if you allow time to go backwards
>> [mirror = TRUE]), but this includes the observed ordering, so only 2 (5)
>> other permutations to try. This is where permutation tests fail. If the
>> observed statistic is bigger than the statistic from the two random
>> permutations, this is an exact statistic in the sense that you've
>> evaluated all possible orderings consistent with the null, but all you
>> can is that the p-value is p < 0.333.
>>
>> Having said that, we can perhaps try to be reasonable and relax some of
>> the assumptions (how often do our data fully meet all the assumptions of
>> the parametric statistical approaches we use?) and be happy with a null
>> that respects temporal autocorrelation within block, but not across
>> blocks. One might then choose to accept as a significant result a
>> permutation p that is say p <= 0.01 or even p <= 0.001, rather than the
>> usual p <= 0.05, to help guard against using the wrong Null.
>>
>>> yours,
>>> kay
>>>
>>> ps: in germany/austria there are two alternative spellings for the 
>>> name kai/kay - beeing a male name opposed to the english kay.
>>
>> I am truly sorry for my mistake - please accept my apologies. Totally
>> unintentional I assure you.
>>
>> All the best,
>>
>> G
>>
>>> Gavin Simpson schrieb:
>>>> [Apologies - I replied with this only to Kay. Hopefully she won't mind
>>>> receiving it twice!]
>>>>
>>>> On Thu, 2010-11-11 at 10:32 +0100, Kay Cecil Cichini wrote:
>>>>> thanks a lot for your elaborations.
>>>>>
>>>>> of course, envfit(..,strata=rep.mes) does it.
>>>>>
>>>>> then, at least, i consider it exercise for other cases, were you 
>>>>> really might need a handmade permutation
>>>>>
>>>>> so, as to round this off, i actually can't analyse this very 
>>>>> design in such a way, with the right NULL concerned - but were to 
>>>>> go from here?
>>>> You could hook up the code in 'permute'. It contains a new
>>>> permuted.index() function (and currently no NAMESPACE, so will 
>>>> overwrite
>>>> (mask) the vegan version if loaded after vegan) which will break 
>>>> all the
>>>> permutation code in vegan).
>>>>
>>>> Here is your example, modified to use the code in permute. I post this
>>>> to illustrate how you'd use the new code. There are lots of examples
>>>> in ?permuted.index (for the permute package, not the vegan package
>>>> version), but *don't* touch the permutation t-test example code as it
>>>> uses permCheck() and it might call allPerms() and allPerms() *IS*
>>>> *WRONG* for some designs --- this is the last bit I need to fix/get
>>>> working before we can make our first release of this code.
>>>>
>>>> HTH
>>>>
>>>> G
>>>>
>>>> Here's the example script:
>>>>
>>>> ## Load packages
>>>> require(vegan)
>>>> require(permute)
>>>>
>>>> ## Data
>>>> set.seed(123)
>>>> sp <- matrix(runif(24*3*5, 0, 100), nrow = 24 * 3, ncol = 5)
>>>> env <- rnorm(24*3, 10, 2)
>>>> rep.mes <- gl(24, 3)
>>>>
>>>> ### NMDS:
>>>> sol <- metaMDS(sp, trymax = 5)
>>>> fit <- envfit(sol~env, permutations = 0) ## perms now won't work!
>>>>
>>>> B <- 999 ## number of perms
>>>>
>>>> ### setting up frame for population of r2 values:
>>>> pop <- rep(NA, B + 1)
>>>> pop[1] <- fit$vectors$r
>>>>
>>>> ## set-up a Control object:
>>>> ctrl <- permControl(strata = rep.mes,
>>>>                     within = Within(type = "series", mirror = FALSE))
>>>> ## we turn off mirroring as time should only flow in one direction
>>>>
>>>> ## Number of observations
>>>> nobs <- nrow(sp)
>>>>
>>>> ## check it works
>>>> matrix(permuted.index(nobs, control = ctrl), ncol = 3, byrow = TRUE)
>>>> ## Yep - Phew!!!
>>>>
>>>> ### loop:
>>>> set.seed(1)
>>>> for(i in 2:(B+1)){
>>>>     idx <- permuted.index(nobs, control = ctrl)
>>>>     fit.rand <- envfit(sol ~ env[idx], permutations = 0)
>>>>     pop[i] <- fit.rand$vectors$r
>>>> }
>>>>
>>>> ### p-value:
>>>> pval <- sum(pop >= pop[1]) / (B + 1)
>>>> pval
>>>>
>>>> I get:
>>>>
>>>>> pval
>>>> [1] 0.286
>>>>
>>>> Now to compare with the actual permutation you'd have gotten from
>>>> env.fit, you first need:
>>>>
>>>> detach(package:permute)
>>>>
>>>> Then run:
>>>>> set.seed(1)
>>>>> fit2 <- envfit(sol~env, permutations = 999, strata = rep.mes)
>>>>> fit2
>>>> ***VECTORS
>>>>
>>>>       NMDS1   NMDS2     r2 Pr(>r)
>>>> env 0.28727 0.95785 0.0315  0.321
>>>> P values based on 999 permutations, stratified within strata.
>>>>
>>>>> a simplistic approach could be, averaging sites, yielding n=24 for 
>>>>> further testing.
>>>>>
>>>>> yours,
>>>>> kay
>>>>>
>>>>>
>>>>> Gavin Simpson schrieb:
>>>>>> On Thu, 2010-11-11 at 09:50 +0100, Kay Cecil Cichini wrote:
>>>>>>> gavin,
>>>>>>>
>>>>>>> sorry - of course it should be permute.strata=F, permuting 
>>>>>>> within individual sites!
>>>>>>> but despite of this the code should work, doesn't it?
>>>>>> Yes, it should - i.e the permutation will be random within the 
>>>>>> blocks.
>>>>>> Whether it does or not is another matter entirely. AFAICR, this 
>>>>>> option
>>>>>> in permuted.index2() did work.
>>>>>>
>>>>>> *But*, this is doing exactly what the original permuted.index() 
>>>>>> does if
>>>>>> you set argument 'strata' to be the grouping factor - in your case:
>>>>>>
>>>>>> envfit(sol ~ env, strata = rep.mes, perm = 999)
>>>>>>
>>>>>> for example. So there is no need to code up the analysis by hand.
>>>>>>
>>>>>> This of course doesn't account for any temporal correlation 
>>>>>> within sites
>>>>>> nor, if the observations on the 24 blocks were made at the same 
>>>>>> times,
>>>>>> that you might want to have the same permutation within each block.
>>>>>>
>>>>>> In the former there are 3^24 possible permutations (time series 
>>>>>> within
>>>>>> blocks), so 999 random perms is reasonable, *but* some of these 
>>>>>> random
>>>>>> perms (in permuted.index()) will not respect the temporal 
>>>>>> ordering and
>>>>>> thus you aren't really exploring the correct NULL.
>>>>>>
>>>>>> With the latter constraint (same temporal perm within blocks), 
>>>>>> there are
>>>>>> 3 random permutations, so good luck getting a reasonable p-value 
>>>>>> from
>>>>>> that.
>>>>>>
>>>>>> The two restricted permutations /should/ work correctly, /but/ if 
>>>>>> you
>>>>>> are doing this by hand, I'd look at the functions in the 'permute'
>>>>>> package - only on R-Forge, on the Vegan R-Forge area - as I know the
>>>>>> code to generate these permutations in that package *does* work. 
>>>>>> (It is
>>>>>> the helper cruft around it that needs more work.)
>>>>>>
>>>>>> https://r-forge.r-project.org/R/?group_id=68
>>>>>>
>>>>>> I've had a busy Summer and not made as much progress as I would have
>>>>>> liked, but I hope to finish this soon and make an initial release to
>>>>>> CRAN so we can start grafting it into vegan.
>>>>>>
>>>>>> In the meantime, I can help people try to link the two packages if
>>>>>> needed, but I don't have much time till the end of this month.
>>>>>>
>>>>>> G
>>>>>>
>>>>>>> thanks,
>>>>>>> kay
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Gavin Simpson schrieb:
>>>>>>>> On Wed, 2010-11-10 at 23:33 +0100, Kay Cecil Cichini wrote:
>>>>>>>>> hi eduard,
>>>>>>>>>
>>>>>>>>> i faced similar problems recently and came to the below solution.
>>>>>>>>> i only try to address the pseudoreplication with an 
>>>>>>>>> appropiate  permutation scheme.
>>>>>>>>> when it comes to testing the interactions, things may get more 
>>>>>>>>> complicated.
>>>>>>>>>
>>>>>>>>> the code is in no way approven of, but at least it maybe good 
>>>>>>>>> enough  for a starter.
>>>>>>>>>
>>>>>>>>> best,
>>>>>>>>> kay
>>>>>>>> Hi Kay,
>>>>>>>>
>>>>>>>> I don't think you have this right.
>>>>>>>> If you have measured repeatedly, say 5 times, on the same 10
>>>>>>>> individuals, or if you have ten fields and you take 5 quadrats 
>>>>>>>> from
>>>>>>>> each, you need to permute *within* the individuals/fields, not 
>>>>>>>> permute
>>>>>>>> the individuals/fields which is what permute.strata does. 
>>>>>>>> permute.strata
>>>>>>>> would be useful in evaluating factors that vary at the block
>>>>>>>> (individuals/fields) level, not at the sample levels.
>>>>>>>>
>>>>>>>> From what Eduard and you describe, the code you show is not the 
>>>>>>>> correct
>>>>>>>> permutation. But I may have misunderstood your intention.
>>>>>>>>
>>>>>>>> Also, be careful with permuted.index2 - there are reasons why 
>>>>>>>> it hasn't
>>>>>>>> been integrated (design goals changed and we felt it would work 
>>>>>>>> best in
>>>>>>>> a separate package that others could draw upon without loading 
>>>>>>>> all of
>>>>>>>> vegan) and the code has festered a bit and may contain bugs - 
>>>>>>>> buyer
>>>>>>>> beware!
>>>>>>>>
>>>>>>>> G
>>>>>>>>
>>>>>>>>> library(vegan)
>>>>>>>>>
>>>>>>>>> ### species matrix with 5 sp.
>>>>>>>>> ### one env.variable
>>>>>>>>> ### a factor denoting blocks of repeated measurments
>>>>>>>>>
>>>>>>>>> sp<-matrix(runif(24*3*5,0,100),24*3,5)
>>>>>>>>> env<-rnorm(24*3,10,2)
>>>>>>>>> rep.mes<-gl(24,3)
>>>>>>>>>
>>>>>>>>> ### NMDS:
>>>>>>>>> sol<-metaMDS(sp,trymax=5)
>>>>>>>>>
>>>>>>>>> fit<-envfit(sol~env)
>>>>>>>>> plot(sol)
>>>>>>>>> plot(fit)
>>>>>>>>>
>>>>>>>>> ### testing code for appropiate randomization,
>>>>>>>>> ### permuting blocks of 3 as a whole:
>>>>>>>>> permuted.index2(nrow(sp),permControl(strata = 
>>>>>>>>> rep.mes,permute.strata=T))
>>>>>>> correctly, this should say:
>>>>>>> ### testing code for appropiate randomization,
>>>>>>> ### permuting within sites:
>>>>>>> permuted.index2(nrow(sp),permControl(strata = rep.mes))
>>>>>>>
>>>>>>>>> B=4999
>>>>>>>>>
>>>>>>>>> ### setting up frame for population of r2 values:
>>>>>>>>> pop<-rep(NA,B+1)
>>>>>>>>> pop[1]<-fit$vectors$r
>>>>>>>>>
>>>>>>> and:
>>>>>>>>> ### loop:
>>>>>>>>> for(i in 2:(B+1)){
>>>>>>>>> fit.rand<-envfit(sol~env[permuted.index2(nrow(sp),
>>>>>>>>>                           permControl(strata = rep.mes))])
>>>>>>>>>            pop[i]<-fit.rand$vectors$r
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> ### p-value:
>>>>>>> >> print(pval<-sum(pop>pop[1])/B+1)
>>>>>>>
>>>>>>> here a bracket was missing:
>>>>>>> print(pval<-sum(pop>pop[1])/(B+1))
>>>>>>>
>>>>>>>>> ### compare to anti-conservative p-value from envfit(),
>>>>>>>>> ### not restricting permutations:
>>>>>>>>>
>>>>>>>>> envfit(sol~env,perm=B)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Zitat von Eduard Szöcs <szoe8822 at uni-landau.de>:
>>>>>>>>>
>>>>>>>>>> Thanks, that helped.
>>>>>>>>>>
>>>>>>>>>> permuted.index2() generates these types of permutations. But  
>>>>>>>>>> envfit() does not use this yet.
>>>>>>>>>> What if I modify vectorfit() (used by envfit() ) in such a 
>>>>>>>>>> way that  it uses permuted.index2() instead of permuted.index()?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Eduard Szöcs
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 08.11.2010 22:01, schrieb Gavin Simpson:
>>>>>>>>>>> On Mon, 2010-11-08 at 15:39 +0100, Eduard Szöcs wrote:
>>>>>>>>>>>> Hi listers,
>>>>>>>>>>>>
>>>>>>>>>>>> I have species and environmental data for 24 sites that 
>>>>>>>>>>>> were sampled
>>>>>>>>>>>> thrice. If I want to analyze the data with NMDS I could run 
>>>>>>>>>>>> metaMDS on
>>>>>>>>>>>> the whole dataset (24 sites x 3 times = 72) and then fit 
>>>>>>>>>>>> environmental
>>>>>>>>>>>> data, but this would be some kind of pseudoreplication 
>>>>>>>>>>>> given that the
>>>>>>>>>>>> samplings are not independent and the gradients may be 
>>>>>>>>>>>> overestimated,
>>>>>>>>>>>> wouldn`t it?
>>>>>>>>>>>>
>>>>>>>>>>>> For environmental data a factor could be included for the 
>>>>>>>>>>>> sampling
>>>>>>>>>>>> dates - but this would not be possible for species data.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there an elegant way either to aggregate data before 
>>>>>>>>>>>> ordination or
>>>>>>>>>>>> to conduct sth. like a repeated measures NMDS?
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you in advance,
>>>>>>>>>>>> Eduard Szöcs
>>>>>>>>>>> Depends on how you want to fit the env data - the 
>>>>>>>>>>> pseudo-replication
>>>>>>>>>>> isn't relevant o the nMDS. If you are doing it via function 
>>>>>>>>>>> `envfit()`,
>>>>>>>>>>> then look at argument `'strata'` which should, in your case, 
>>>>>>>>>>> be set to a
>>>>>>>>>>> factor with 24 levels. This won't be perfect because your 
>>>>>>>>>>> data are a
>>>>>>>>>>> timeseries and, strictly, one should permute them whilst 
>>>>>>>>>>> maintaining
>>>>>>>>>>> their ordering in time, but as yet we don't have these types of
>>>>>>>>>>> permutations hooked into vegan.
>>>>>>>>>>>
>>>>>>>>>>> If you are doing the fitting some other way you'll need to 
>>>>>>>>>>> include
>>>>>>>>>>> "site" as a fixed effect factor to account for the within site
>>>>>>>>>>> correlation.
>>>>>>>>>>>
>>>>>>>>>>> You don't need to worry about the species data and 
>>>>>>>>>>> accounting for
>>>>>>>>>>> sampling interval. You aren't testing the nMDS "axes" or 
>>>>>>>>>>> anything like
>>>>>>>>>>> that, and all the species info has been reduced to 
>>>>>>>>>>> dissimilarities and
>>>>>>>>>>> thence to a set of nMDS coordinates. You need to account for 
>>>>>>>>>>> the pseudo
>>>>>>>>>>> rep at the environmental modelling level, not the species 
>>>>>>>>>>> level.
>>>>>>>>>>>
>>>>>>>>>>> HTH
>>>>>>>>>>>
>>>>>>>>>>> G
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> R-sig-ecology mailing list
>>>>>>>>>> R-sig-ecology at r-project.org
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> R-sig-ecology mailing list
>>>>>>>>> R-sig-ecology at r-project.org
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology