[R-sig-eco] Standardising and transformation of explanatory/independent/predictor variables for multiple regression analysis

Seceek seceek at gmail.com
Thu Sep 4 09:40:29 CEST 2014


jjyhxs

宋坤

> 在 2014年9月4日,7:20,Scott Foster <scott.foster at csiro.au> 写道:
> 
> Dear Sam,
> 
> I hear your concern and I sympathise.  The reason for the conflicting advice, in my opinion, is partly historical and partly due to academic heredity.  When people first started doing statistical analyses, they didn't have computers and all calculations had to be done by hand.  This, coupled with a statistical theory in its infancy, limited the choice of analysis methods.  The result was the pragmatic approach of altering-your-data-to-fit-the-method.  There still is, of course, some good reasons to do this, but only sometimes.
> 
> Now to answer your questions.  Standardisation of covariates doesn't have inferential benefits.  That is the model you fit will still be the same irrespectively.  If you transform your covariates (by a non-linear transformation) then the model will change.  The reason for standardising is to avoid computational issues (like numerical underflow and overflow) and some believe it helps to place priors on in a Bayesian analysis.  The reason for transforming is quite different.  It is done when you believe that the scale of the covariate is different to that measured.  When fitting smooths (GAM(M)s) then the scale shouldn't matter so much anyway, but there still will be some dependence through the location of knots and the distance between points in covariate space.
> 
> Observations with outlying covariates are likely to have high leverage (they have an excessive amount of influence on the analysis result).  Some would argue that you should transform these covariates to account for them.  I would only transform if I thought the scale was wrong, or there were other (larger) issues with the data/analysis.  In preference, I would try to do an analysis that reduced the influence of these covariate values.  The extreme case is to remove that observation altogether (assume that the observation actually comes from a different sampling frame than you are interested in).  A less extreme approach would be to down-weight the observation, or use bootstrap, or resistant/robust methods. These are just suggestions that I'm not overly familiar with.  I have used them before but I need to look them up each time).
> 
> I hope that this helps,
> 
> Scott
> 
> 
> 
>> On 04/09/14 03:34, Samantha Cox wrote:
>> Dear R-sig-ecology,
>> 
>> I have spent some time trawling the internet, and seem to come across slight conflicting advice regarding the standardisation and transformation of variables prior to multiple regression analysis (e.g. LM, LME/GLS, GLM, GLMM, GAM, GAMM).  I searched the archives here and I don't think this is a repeat, but I apologise if it is.
>> 
>> 
>> 1.       I understand that standardisation (subtract mean and divide by standard deviation) is important within a Bayesian environment and when using programs such as Rjags.  However within frequentist packages (e.g. lme4, MASS etc) under what (if any) circumstances is it necessary?
>> 
>> 
>> 
>> 2.       Are transformations (e.g. log, sqrt etc) necessary for non-normal (highly skewed) explanatory variables or where extreme/outliers are observed.  Some literature says this is necessary, other say it is not.  Is the current consensus that transformations are generally not required on predictor/explanatory variables?
>> 
>> 
>> Thank you
>> 
>> Sam
>> ________________________________
>> [http://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>
>> 
>> This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, Plymouth University accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. Plymouth University does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
>> 
>>    [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> -- 
> Scott Foster
> CSIRO
> E scott.foster at csiro.au T +61 3 6232 5178
> Postal address: CSIRO Marine Laboratories, GPO Box 1538, Hobart TAS 7001
> Street Address: CSIRO, Castray Esplanade, Hobart Tas 7001, Australia
> www.csiro.au
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list