[R-sig-ME] back log transformation

Fri Mar 25 19:53:42 CET 2011

On 03/25/2011 01:43 PM, espesser wrote:
> 
> Thank you very much  for yours answers , Ben.
> 
> First, here is  the reference of the thread  about back log-
> transformation,
>  initiated by Christina Bogner:
> 
> http://finzi.psych.upenn.edu/R-sig-mixed-models/2009q1/002066.html
> 
> If I have understand well , computing  just  the exp()  without the
> addition of variances, as in:
> 
> TIME = exp( intercept + 6*LONG + ACCO1)
> 
>  gives  (approximatively  ?)  the estimated  median of TIME .

  I think exactly, but I'm hedging
>   I believed that I got the geometric mean of TIME for the conditions (
> long==6, acco == "1"),
> at least for a non-mixed linear model.

  I believe that the (expected) geometric mean and median of the
log-normal are the same.

  A lazy numerical experiment:

> r <- rlnorm(10000,meanlog=1,sdlog=2)
> mean(r)
[1] 19.4728
> median(r)
[1] 2.700373
> exp(mean(log(r)))
[1] 2.808193

  This gets even closer if you use n=100000.

> I failed to  find clear information/explanation about this, so I
> appreciate reference on the topic.
> 
> Thank you again for your help

   I may be a bit biased by being part of the "in crowd" here, but I
think Doug Bates's comments in the previous thread are very sensible --
do you *really* need to back-transform at all?  Or can you just quote
results on the log scale?

> 
> R
> 
> 
> Le 24/03/2011 20:54, Ben Bolker a écrit :
>> On 03/24/2011 06:37 AM, espesser wrote:
>>>   Dear all,
>>>
>>> This subject has been  previously discussed, but I am not sure I proceed
>>> the right way with the use of the variances.
>>    Can you give a reference to the previous discussion please?
>>
>>
>>> Here is the  summary of my lmer model :
>>>
>>> Linear mixed model fit by REML
>>>
>>> Formula: log(TIME) ~ LONG + ACCO  + (1 | SUJET)
>>>
>>>     Data: dssPUISS
>>>     AIC   BIC logLik deviance REMLdev
>>>   899.6 934.1 -442.8    856.7   885.6
>>> Random effects:
>>>   Groups   Name        Variance Std.Dev.
>>>   SUJET    (Intercept) 0.019090 0.13817
>>>   Residual             0.130297 0.36097
>>> Number of obs: 1018, groups: SUJET, 24
>>>
>>> Fixed effects:
>>>               Estimate Std. Error t value
>>> (Intercept)   5.77423    0.04462  129.42
>>> LONG          0.02883    0.01129    2.55
>>> ACCO1        -0.05722    0.02272   -2.52
>>>
>>>
>>> LONG is continuous .
>>> ACCO is a 2 levels factor .
>>>
>>> I would proceed so:
>>>
>>> 1) To compute TIME at this specific point :
>>>
>>> sujet== "s3"
>>> long == 6
>>> acco == "1"
>>>
>>> TIME = exp( intercept + 6*LONG + ACCO1
>>>              +  estimate_of_s3_intercept +  0.5*var(Residual)  )
>>>
>>> with var( Residual)  ==  0.130297
>>>
>>> Is it correct ?
>>
>>     Is the 0.5*var(Residual) to get the mean (rather than the median) of
>> TIME on the original scale ?  It seems reasonable but I wonder if you
>> could simplify your life a little bit by predicting the median rather
>> than the median ...
>>
>>> 2) I am  mainly interested to back-transform the fixed effects, at the
>>> same point.
>>>
>>> 2.1) I would use:
>>>
>>> TIME = exp( intercept + 6*LONG + ACCO1
>>>              + 0.5*var(SUJET) +0.5*var(Residual) )
>>>
>>> with var(SUJET) == 0.019090
>>    Don't quite know what you mean here.  It seems you're thinking about
>> estimating a marginal mean (unknown subject) rather than a conditional
>> mean.  Your approach seems reasonable but I wouldn't want to swear it
>> was right ...
>>
>>>
>>> 2.2) Suppose  there was a second  random intercept (say b)  in my model,
>>> I would use:
>>>
>>> TIME = exp( intercept + 6*LONG + ACCO1
>>>                  + 0.5*var(SUJET) + 0.5*var(b) +  0.5*var(Residual)  )
>>>
>>> Are these 2 expressions correct ?
>>>
>>    This gets stickier.  The second 'random intercept' is from a second
>> random effect grouping factor?  If the random effects are independent,
>> this seems plausible -- otherwise the variance of the sum will not be
>> equal to the sum of the variances ...
>>
>>
>>> 2.3)
>>> Suppose there was a random slope in the model, something like:
>>>
>>> log(TIME) ~ LONG + ACCO  + (LONG | SUJET)
>>>
>>> How can I get TIME  on the original scale ?
>>    If you want the marginal mean (i.e., something analogous to what you
>> are doing above), then you need to calculate the variance -- e.g. if the
>> value  is  (a+b*x + e_a + e_b*x + e_i) where e_a, e_b are random
>> intercept and slope and e_i is residual error, then **if** they were
>> all independent the variance would be var_a + var_b*x^2 + var_e.
>> However, a and b are generally correlated so I believe it would be
>> var_a + var_b*x^2 + 2*cov(a,b)*x + var_e.
>>>
>>> 3) Related question :
>>>
>>> To  extract the stddev of the SUJET  random intercept ,  I use:
>>>
>>> attr(VarCorr(MyModel.lmer)$SUJET,"stddev")
>>>
>>    Yes.
>>
>>    As mentioned above, I think your life would be a bit easier if you
>> just decided that you wanted the median (which is invariant under
>> transformation) rather than the mean on the back-transformed scale ...
> 
>