[R-sig-eco] Data transformation prior to RDA

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Apr 20 10:18:16 CEST 2010


On Tue, 2010-04-20 at 11:48 +1200, Etienne Laliberté wrote:
> Are your variables species abundances, or other types of descriptors? If
> the former, standardization by column may not be ideal.

I think this needs a little clarification - or a different take on it.
Standardising the species (response) data in PCA/RDA results in each
species having unit variance and hence contributing an equal amount to
the "inertia" measure. This tends to give a more balanced ordination of
abundance data.

In unstandardised PCA/RDA, abundant species with high variance tend to
dominate the resulting ordination.

Standardisation is called for when response data are measured in
different units (i.e. when not species abundances), but may be desirable
for species abundances and in my experience is quite often warranted.

G

>  Transformations
> such as the Hellinger, as suggested by Michael, were developed for
> species abundances data (Legendre & Gallagher 2001).
> 
> There are many ways to transform variables to normalize them, if that's
> what you're after; see chapter 1 or Legendre & Legendre (1998). The
> Box-Cox method is possibly the closest thing to what you're asking, i.e.
> the "best possible transformation for each of the variables". But I'm
> convinced there are as many opinions on the subject as there are
> different methods.
> 
> Cheers
> 
> Etienne
> 
> Le lundi 19 avril 2010 à 20:02 -0300, Devoto Mariano a écrit :
> > Dear all,
> > I'm trying to do a redundancy analysis. I'm following Legendre & Legendre's
> > (1998) tips to prepare the data prior to the analysis, and Im hoping to do
> > the analysis using package 'vegan'.
> > I've already centered and standardized my explanatory and response
> > variables, but I'm having trouble at deciding whether or not (and how) data
> > should be transformed "to linearise the relationships and make the
> > distributions more symmetric". Is there a way to find the best possible
> > transformation for each variable but considering at the same time its
> > linearity to the other ones? Please tell me if I'm not even asking the right
> > question here...
> > Heres my dataset. First 3 columns are my response variables. All the others
> > are explanatory. I know this is a rather basic query, but any tips will be
> > greatly appreciated.
> > 
> >   -0.49350555 -0.37364383  0.70566360 -1.1180986 -1.14255167 -1.30234943
> > -1.0812858 -0.4910362
> > -1.02769104  0.21678178  1.11781073 -1.1123319 -0.88277150 -0.80445588
> > -1.0638291  0.3241891
> > -0.64335588 -2.07868376 -1.36782590 -1.0585453 -1.02709382 -1.07710897
> > -0.2760976  1.4695121
> > 0.25799225  0.82044015  1.02481726 -1.1114373 -0.94050043 -1.23089531
> > -0.7064526 -0.5012921
> > 0.56048832 -0.29655712 -0.07148828 -1.1099933 -1.17141614  1.54301771
> > -1.0921962 -1.9517655
> > -0.36443725 -1.49241963 -0.23840793 -1.1180554 -1.14255167 -1.24049362
> > -1.0856499 -0.6977804
> > -1.97959936  1.30035099 -1.18114614  1.0885061 -0.59412687 -0.21062037
> > 1.7890870  0.5018224
> > -0.24966043 -0.66228200  0.69101500 -0.8697510 -0.88277150 -0.83963955
> > 0.1330428  1.3450534
> > 0.24720930  0.35162548 -1.34252630  1.6571129 -0.59412687 -0.13708733
> > 2.0090270  0.7553207
> > -0.35385550  0.99058254 -1.14295716 -0.6801336 -0.76731365 -0.93148980
> > 1.9120456  1.4084094
> > -0.92880313  1.14039444  1.38922106 -0.9008538 -0.79617811 -0.96178699
> > 0.6512872  1.2365340
> > -0.24431565 -0.20947362  0.76084722 -0.8978493 -0.59412687 -0.56565825
> > -0.4639991 -0.2045137
> > -0.60428104  1.05108295 -0.68704030  1.1833813  0.41612935 -0.07054391
> > 1.2816664  0.6181682
> > 0.63837128  0.06672464  0.32041910  0.4154816  0.12748471  0.46057549
> > -0.2488216  0.3867322
> > 0.67144677  0.66889622  1.83857364  0.8375587  0.27180703  0.82551787
> > -0.2488216 -0.5987399
> > 2.53611774  1.45517653 -0.22337307  0.9253861  0.06975579 -0.22307224
> > 1.6332240  0.5146235
> > -0.13273765 -0.55628531  0.55154280 -0.2721408  0.99341861 -0.14553291
> > -0.1669935  0.9976660
> > -0.02043306 -1.52670601 -2.08967318  1.7138916  2.14799715  2.18006143
> > -0.6034099 -0.9383742
> > 0.80218610 -0.58481301  0.18945796  0.9761855  1.57070788  1.90295452
> > -0.6579619 -1.3578423
> > 1.32726744  0.64941495 -0.42596631  0.7975236  0.87796076  0.63986198
> > -0.0760734 -1.0445683
> > -1.53219503  0.57349823  1.03668089  0.5040093  1.05114754  0.83815684
> > -0.3852017 -0.8672218
> > 0.67016035  0.81036993  0.14519361  0.5065215  1.05114754  0.49360195
> > -0.1124414 -0.7921778
> > 1.53517131 -0.85469204 -0.12003248  0.3702800  1.02228308  0.66797133
> > -0.3185269 -1.1538661
> > -0.67154028 -1.45978251 -0.88080583 -0.7266479  0.93568969  0.18901542
> > -0.8216180  1.0411473
> > 
> > Thanks!
> > 
> > Best wishes,
> > 
> > Mariano
> > 
> > --------------------------
> > 
> > Mariano Devoto
> > School of Biological Sciences
> > University of Bristol
> > Woodland Road
> > 
> > Bristol, UK
> > BS8 1UG
> > Tel. +44 (0) 1179545960 (internal 45960)
> > web: http://agro.uba.ar/~mdevoto <http://agro.uba.ar/%7Emdevoto>
> > 
> > 	[[alternative HTML version deleted]]
> > 
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-sig-ecology mailing list