[R] bctrans: Box-Cox Transformation Problem

Wed Sep 22 18:03:50 CEST 2010

  Hello,

I'm currently trying to model the movement of a slope (v.obs) with a 
regression model.
The data can be found following the given links:
either
http://www.sendspace.com/file/dnugwc
or
http://rapidshare.com/files/420569660/sel.day.txt

I want to use the Box-Cox transformation to normalize the response as 
well as the predictor variables.
The scatterplot looks like this:

library(zoo)
library(alr3)
load("sel.day.txt")
sel.p1<-window(sel, start=as.POSIXct("2008-04-05"), 
end=as.POSIXct("2009-04-01"))

pairs(~v.obs+ snow+ HH6.1+ Q.Enz+ pcpt+ 
qd,data=sel.p1,gap=0.4,cex.labels=1.5)

In Sheather: "A Modern Approach to Regression with R" the function 
bctrans is used to calculate lambda for the variables. I use 
"yeo.johnson" since there are values=0 in the data.
Doing this creates following output:

2> summary(bctrans(~v.obs+ snow+ pcpt+ Q.Enz+ qd+ HH6.1, data=sel.p1, 
family="yeo.johnson"))
yeo.johnson Transformations to Multinormality

       Est.Power Std.Err. Wald(Power=0) Wald(Power=1)
v.obs  -49.9674   5.5747       -8.9632       -9.1426
snow    -4.1130   0.3326      -12.3655      -15.3719
pcpt     0.6111   0.0811        7.5341       -4.7950
Q.Enz   -0.8584   0.0904       -9.4967      -20.5601
qd     -26.1100   2.3432      -11.1427      -11.5695
HH6.1   -6.0205   0.0023    -2653.7643    -3094.5528
                                   LRT df p.value
LR test, all lambda equal 0  549.4523  6       0
LR test, all lambda equal 1 1414.1770  6       0

So what to do with that. I tried transforming my variables with the 
Est.Power given in the output. I rounded the values more or less 
arbitrarily for the first try:
v.obs<-(sel.p1$v.obs^(-0.5)-1)/-0.5
snow<-(sel.p1$snow^(-4)-1)/-4
pcpt<-(sel.p1$pcpt^(0.5)-1)/0.5
Q.Enz<-(sel.p1$Q.Enz^(-0.9)-1)/-0.9
qd<-(sel.p1$qd^(-26)-1)/-26
HH6.1<-(sel.p1$HH6.1^(-6)-1)/-6
trans<-merge(v.obs,qd,pcpt,snow,HH6.1,Q.Enz)

This gives me a lot of -Inf's which I d'ont like too much. I thought 
about transforming the data first, e.g v.obs<-v.obs*10^5. But that 
doesn't seem the right way, and doing that i often get errors from bctrans:

2> summary(bctrans(~ v.obs+ snow+ pcpt+ Q.Enz+ qd+ HH6.1, data=sel.p1, 
family="yeo.johnson"))
Error in optim(start, neg.kernel.profile.logL, hessian = TRUE, method = 
"L-BFGS-B",  :
   L-BFGS-B needs finite values of 'fn'

These errors also happen when i try another formula without the response 
variable:

2> summary(bctrans(~ snow+ pcpt+ Q.Enz+ qd+ HH6.1, data=sel.p1, 
family="yeo.johnson"))
Error in optim(start, neg.kernel.profile.logL, hessian = TRUE, method = 
"L-BFGS-B",  :
   L-BFGS-B needs finite values of 'fn'

Does anybody have an idea how to cope with the data to get proper 
parameters for the transformation?

Thanks a lot

Axel Kasparek
TU München