[R-sig-eco] Removing non significant response variable in rda analysis with forward selection?

Etienne Laliberté etiennelaliberte at gmail.com
Fri Jul 30 00:11:38 CEST 2010

Dear Amélie,

To me, the approach you're describing sounds like you're trying to
shoehorn you data to fit your predictions, which can be dangerous at
best and dishonest at worst.

My understanding is that your explanatory variable is a factor with
different groups. If you're interested to see which species best
discriminate between these a priori specified groups, then you may want
to use canonical discriminant analysis (CAD). Have a look at:

Anderson, M. J., and T. J. Willis. 2003. Canonical analysis of principal
coordinates: a useful method of constrained ordination for ecology.
Ecology 84:511-525.

I've only used this in PRIMER v6 / PERMANOVA, but not in R. However I
believe it is implemented in:


but Jari and others will be more helpful there.

A somewhat related (but focusing on a different question) approach could
be the IndVal method described in:

Dufrêne, M., and P. Legendre. 1997. Species assemblages and indicator
species: the need for a flexible asymmetrical approach. Ecological
Monographs 67:345-366.

where you could look at which species are the best "indicators" that
characterize different groups of sites.

Hope that helps,


 Le jeudi 29 juillet 2010 à 08:00 -0700, amelie_can a écrit :
> Hello all, 
> My problem is somewhat similar to Vit Syrovatka posted on July 23th and
> titled “Species fit in ordination”.
> In my project, I am doing an rda between species abundances (response
> variable – about 130 species) and type of sites (explanatory/environmental
> variable – one variable). When I finish my analysis & plot it, I have a lot
> of species present and I suspected that several of them did not contribute
> significantly to the analysis. 
> Consequently, I decided to do a forward selection analysis. Usually, a
> forward selection analysis is used to remove environmental variable that
> don’t relate as well with the response variable. But in my case, I only have
> one environmental variable, so I basically switch around my response
> variable (which are now my types of sites) and my explanatory variable
> (which is now my species abundances) for the forward selection analysis. So,
> basically, the forward selection shows me which species explains
> significantly the types of sites found. Then I reran my rda analysis to
> found that including the 20 species that were significant in the forward
> analysis would explain as much the variation of my rda axis as when I had
> all of my species. 
> Is this correct? My supervisor raised question about the fact that I used my
> response variable in forward analysis instead of environmental variable….  ?
> If not, how can we remove species that are not significant? 
> I thought of trying to find which species are correlated to one another. I
> know one can use the cor.test function or the vif function, but it is
> problematic to me, as we can only check two species per analysis. Since I
> have about 130 species, checking all of those permutations by hand is just
> too long. I also thought about doing a partial rda analysis, one species at
> the time to see its significance in the model, but again, seemed too long. 
> Thank you all for your time, 
> Amelie D’Astous
> Laval university
> Quebec

Etienne Laliberté
School of Forestry
University of Canterbury
Private Bag 4800
Christchurch 8140, New Zealand
Phone: +64 3 366 7001 ext. 8365
Fax: +64 3 364 2124

More information about the R-sig-ecology mailing list