[R] Box-Cox Transformation: Drastic differences when varying added constants

Bill Pikounis billpikounis at gmail.com
Mon May 17 18:23:07 CEST 2010

```Hi Holger,
I would also highly recommend you look at the ?boxcox and ?logtrans
functions in the MASS package. There is also a very illuminating,
concise discussion about their use on Pages 170 - 172 of

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics
with S. Fourth edition.

with example.

Hope that helps,
Bill

On Sun, May 16, 2010 at 13:01, Peter Ehlers <ehlers at ucalgary.ca> wrote:
> On 2010-05-16 6:22, Holger Steinmetz wrote:
>>
>> Dear experts,
>>
>> I tried to learn about Box-Cox-transformation but found the following
>> thing:
>>
>> When I had to add a constant to make all values of the original variable
>> positive, I found that
>> the lambda estimates (box.cox.powers-function) differed dramatically
>> depending on the specific constant chosen.
>
> Let's say that x is such that 1/x has a Normal distribution,
> i.e. lambda = -1.
> Then y = (1/x) + b also has a Normal distribution.
> But you're expecting 1/(x+b) to also have a Normal distribution.
>
>>
>> In addition, the correlation between the transformed variable and the
>> original were not 1 (as I think it should be to use the transformed
>> variable
>> meaningfully) but much lower.
>
> Again, your expectation is faulty. The relationship between the
> original and transformed variables is not linear (otherwise,
> why do the transformation?), but cor() computes the Pearson
> correlation coefficient by default. Try method='spearman'.
> Better yet, plot the transformed variables vs the original
> variable for further enlightenment.
>
>  -Peter Ehlers
>
>>
>> With higher added values (and a right skewed variable) the lambda estimate
>> was even negative and the correlation between the transformed variable and
>> the original varible was -.91!!?
>>
>> I guess that is something fundmental missing in my current thinking about
>> box-cox...
>>
>> Best,
>> Holger
>>
>>
>> P.S. Here is what i did:
>>
>> # Creating of a skewed variable X (mixture of two normals)
>> x1 = rnorm(120,0,.5)
>> x2 = rnorm(40,2.5,2)
>> X = c(x1,x2)
>>
>> # Adding a small constant
>> Xnew1 = X +abs(min(X))+ .1
>> box.cox.powers(Xnew1)
>> Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate)
>>
>> # Adding a larger constant
>> Xnew2 = X +abs(min(X)) + 1
>> box.cox.powers(Xnew2)
>> Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate)
>>
>> #Plotting it all
>> par(mfrow=c(3,2))
>> hist(X)
>> qqnorm(X)
>> qqline(X,lty=2)
>> hist(Xtrans1)
>> qqnorm(Xtrans1)
>> qqline(Xtrans1,lty=2)
>> hist(Xtrans2)
>> qqnorm(Xtrans2)
>> qqline(Xtrans2,lty=2)
>>
>> #correlation among original and transformed variables
>> round(cor(cbind(X,Xtrans1,Xtrans2)),2)
>
> --
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help