[R-sig-eco] orthogonal variables aliased by rda(vegan)

Thu Jun 5 06:43:47 CEST 2008

Quoting Andrea Bertolo <andrea.bertolo at uqtr.ca>:

> I agree that lm would be a natural choice with well distibuted univariate
> data. However, the zero-inflated distribution I found in my data pushed me
> to look for a method based on a randomization test rather than a parametric
> one. This was the reason I used the rda fonction (followed by the ANOVA one).
>
Andrea,

In principle (and in practice) you can use permutation tests with lm()  
as well. It needs some manual work, though.

I have known this problem for some time, but my plan was to have a  
peaceful look on this with thorough testing after having the next CRAN  
release of vegan. Because this surfaced now, I changed the rda and cca  
code in http://r-forge.r-project.org/projects/vegan repository so that  
it does not alias constraints any more when the rank of dependent data  
< rank of constraints + conditions. It gives warnings, though, but  
many things may work. One thing that surely won't work is  
anova.cca(..., by="term") because there it gets 0 df for those  
variables that were aliased previously. However, anova.cca(..., by =  
"margin") should work, and for orthogonal variables this is just the  
same.

The vegan version 1.14-3 from R-Forge contains this fix. It can take  
up to two days before a Windows binary is available, but you can get  
the rda.default.R and print.cca.R files from the repository, source  
them, and see if they work.

I'd still recommend lm(). ZI* and other models are available at pscl  
package, but I'm not sure how they work with a huge number of MEMs.  
There probably are several simulation tools (like simulate()) for lm()  
which may help handling ecological data with weird properties.

cheers, jari oksanen

> Probably the best would be to use a ZIP or ZINB regression to model this
> kind of data, but I am still not very confortable with this kind of
> modeling. By the way, I am exploring the pscl library to run Zero-Inflated
> models with the zeroinfl function: does anybody here have tested its
> performances and could recommend it ?
>
> best
> andrea
>
>
>
>
>
> At 23:34 2008-06-04 +0300, Jari Oksanen wrote:
>>
>> On 4 Jun 2008, at 21:28, Andrea Bertolo wrote:
>>
>>> Dear Jari,
>>>
>>> this is exactly the structure of my data.
>>> I in fact forgot to mention that I was using the rda fonction as a
>>> regression here (thanks to Karl Cottenie to let me see it...), i.e. to
>>> explain the abundance of a single fish species with 14 MEMs.
>>>
>> In this case there is no reason to use rda in R: simply use lm.
>>
>> All independent variables are used in calculations, but not displayed
>> in ordination diagrams. If you only have one dependent variable, you
>> could not see much, though. The biplot arrows are defined to be of
>> unit length  in full constrained space, so that they only can be +1 or
>> -1 in one dimension. However, the regression coefficients should be
>> available for all: just use coef() on the result to get the
>> coefficients.
>>
>> I cannot find a reason to use a Euclidean multivariate method (rda)
>> for univariate data when you can use univariate methods like the
>> standard lm() -- and lm indeed is the natural choice in this case.
>>
>>> There is only one thing that I don't understant.
>>> What do you mean exactly when you say that "these extra constraints
>>> are
>>> taken into account, but not reported in all cases (the aliasing is
>>> explicit: there is code to turn down displaying the results that were
>>> calculated)." ?
>>> Does this mean that the rda is calculated with all the 14 contraints
>>> (in
>>> this case), but only the first constraint is plotted ? I ran the rda
>>> with
>>> the only "un-aliased" MEM and I obtained the same results, showing
>>> that the
>>> aliased constraints are effectively eliminated.
>>
>> I just tried this, and could not confirm this. The result were
>> different for eigenvalues, configuration, coefficients or anything I
>> looked at. There was this aliasing thing, but that does not mean that
>> the results are similar. The only similar thing is that in when
>> ordinating a single species you get only one axis (but it's a
>> different axis depending on your constraints). So do not ordinate: use
>> lm.
>>
>>> Also, If I remeber well,
>>> when I run the same rda with Canoco I can keep the all 14
>>> constraints in
>>> the analysis, getting a different result. Are these differences only a
>>> graphic matter or there are more substantial differeces ?
>>
>> Perhaps somebody knows Canoco and can tell you.
>>
>> cheers, jari oksanen
>>
>>> many thanks
>>> andrea
>>>
>>>
>>>
>>>
>>> At 20:51 2008-06-04 +0300, you wrote:
>>>> Quoting Andrea Bertolo <andrea.bertolo at uqtr.ca>:
>>>>
>>>>> Hi to all,
>>>>>
>>>>> I am running some RDAs on Vegan using Moran eigenvector Maps (MEM)
>>>>> to
>>>>> explain the variation in the abundance of a fish species.
>>>>>
>>>>> In several occasions I noticed that, despite MEM are othogonal by
>>>>> definition, they were aliased by the rda fonction and thus not
>>>>> taken into
>>>>> account in the analysis, as if they were collinear !
>>>>>
>>>>> I cannot understand why, since the MEMs used in the analysis
>>>>> showed not
>>>>> only to be orthogonal (I post-verified it by calculating a VIF, in
>>>>> case
>>>>> of), but also to add each a significant fraction to the explained
>>>>> variation
>>>>> of the dependent variable (MEMs were selected by the forward.sel
>>>>> fonction
>>>>> available in the packfor package).
>>>>>
>>>>> Could anybody tell me if I am doing something wrong or there is a
>>>>> real bug
>>>>> here ?
>>>>
>>>> I'll try.
>>>>
>>>> This is something that really may be a problem in vegan, and may
>>>> change in future versions. The problem here is (probably) that the
>>>> number of your PCNMs is higher than min(nrow(x), ncol(y)) in your
>>>> dependent data. There really are two kind of ranks in rda/cca etc. in
>>>> vegan: the rank of the dependent community data, and the rank of the
>>>> constraints. It seems that in your case of spatial filtering the rank
>>>> of the MEM's is higher than the rank of the dependent matrix. The
>>>> current practice in vegan is the take the rank of the solution as the
>>>> min of these two ranks, and the extra constraints are aliased.
>>>> However, these extra constraints are taken into account, but not
>>>> reported in all cases (the aliasing is explicit: there is code to
>>>> turn
>>>> down displaying the results that were calculated). This will change
>>>> in
>>>> the future, but I'm not yet quite sure how this should be changed --
>>>> this needs research, and discussion among developers and other
>>>> interested parties. However, the main results will not change. Things
>>>> that will change are the display of the same results, and some
>>>> derived
>>>> functions (predict, calibrate.cca and perhaps some others, but I
>>>> don't
>>>> know yet which). This is a change that has wide reaching effects, and
>>>> therefore you cannot expect this before autumn -- or "fall" on that
>>>> side of the pond).
>>>>
>>>> Vegan is not yet ready for the full blown spatial analysis, but we
>>>> are
>>>> working for that (and here "we" is really plural and implies several
>>>> persons). If you need detailed advice on the possible consequences of
>>>> the upcoming change, you may directly contact me or us at the
>>>> http://vegan.r-forge.r-project.org/.
>>>>
>>>> Best wishes, Jari Oksanen
>>>>
>>>> _______________________________________________
>>>> R-sig-ecology mailing list
>>>> R-sig-ecology at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>>
>>>
>>> _______________________________________________
>>> R-sig-ecology mailing list
>>> R-sig-ecology at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>>
>