[R-sig-eco] Standardising and transformation of explanatory/independent/predictor variables for multiple regression analysis

Scott Foster scott.foster at csiro.au
Thu Sep 4 01:20:39 CEST 2014


Dear Sam,

I hear your concern and I sympathise.  The reason for the conflicting advice, in my opinion, is partly historical and partly due to academic 
heredity.  When people first started doing statistical analyses, they didn't have computers and all calculations had to be done by hand.  This, 
coupled with a statistical theory in its infancy, limited the choice of analysis methods.  The result was the pragmatic approach of 
altering-your-data-to-fit-the-method.  There still is, of course, some good reasons to do this, but only sometimes.

Now to answer your questions.  Standardisation of covariates doesn't have inferential benefits.  That is the model you fit will still be the same 
irrespectively.  If you transform your covariates (by a non-linear transformation) then the model will change.  The reason for standardising is to 
avoid computational issues (like numerical underflow and overflow) and some believe it helps to place priors on in a Bayesian analysis.  The reason 
for transforming is quite different.  It is done when you believe that the scale of the covariate is different to that measured.  When fitting smooths 
(GAM(M)s) then the scale shouldn't matter so much anyway, but there still will be some dependence through the location of knots and the distance 
between points in covariate space.

Observations with outlying covariates are likely to have high leverage (they have an excessive amount of influence on the analysis result).  Some 
would argue that you should transform these covariates to account for them.  I would only transform if I thought the scale was wrong, or there were 
other (larger) issues with the data/analysis.  In preference, I would try to do an analysis that reduced the influence of these covariate values.  The 
extreme case is to remove that observation altogether (assume that the observation actually comes from a different sampling frame than you are 
interested in).  A less extreme approach would be to down-weight the observation, or use bootstrap, or resistant/robust methods. These are just 
suggestions that I'm not overly familiar with.  I have used them before but I need to look them up each time).

I hope that this helps,

Scott



On 04/09/14 03:34, Samantha Cox wrote:
> Dear R-sig-ecology,
>
> I have spent some time trawling the internet, and seem to come across slight conflicting advice regarding the standardisation and transformation of variables prior to multiple regression analysis (e.g. LM, LME/GLS, GLM, GLMM, GAM, GAMM).  I searched the archives here and I don't think this is a repeat, but I apologise if it is.
>
>
> 1.       I understand that standardisation (subtract mean and divide by standard deviation) is important within a Bayesian environment and when using programs such as Rjags.  However within frequentist packages (e.g. lme4, MASS etc) under what (if any) circumstances is it necessary?
>
>
>
> 2.       Are transformations (e.g. log, sqrt etc) necessary for non-normal (highly skewed) explanatory variables or where extreme/outliers are observed.  Some literature says this is necessary, other say it is not.  Is the current consensus that transformations are generally not required on predictor/explanatory variables?
>
>
> Thank you
>
> Sam
> ________________________________
> [http://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>
>
> This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, Plymouth University accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. Plymouth University does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>

-- 
Scott Foster
CSIRO
E scott.foster at csiro.au T +61 3 6232 5178
Postal address: CSIRO Marine Laboratories, GPO Box 1538, Hobart TAS 7001
Street Address: CSIRO, Castray Esplanade, Hobart Tas 7001, Australia
www.csiro.au



More information about the R-sig-ecology mailing list