[R-sig-ME] back log transformation
Ben Bolker
bbolker at gmail.com
Fri Mar 25 19:53:42 CET 2011
On 03/25/2011 01:43 PM, espesser wrote:
>
> Thank you very much for yours answers , Ben.
>
> First, here is the reference of the thread about back log-
> transformation,
> initiated by Christina Bogner:
>
> http://finzi.psych.upenn.edu/R-sig-mixed-models/2009q1/002066.html
>
> If I have understand well , computing just the exp() without the
> addition of variances, as in:
>
> TIME = exp( intercept + 6*LONG + ACCO1)
>
> gives (approximatively ?) the estimated median of TIME .
I think exactly, but I'm hedging
> I believed that I got the geometric mean of TIME for the conditions (
> long==6, acco == "1"),
> at least for a non-mixed linear model.
I believe that the (expected) geometric mean and median of the
log-normal are the same.
A lazy numerical experiment:
> r <- rlnorm(10000,meanlog=1,sdlog=2)
> mean(r)
[1] 19.4728
> median(r)
[1] 2.700373
> exp(mean(log(r)))
[1] 2.808193
This gets even closer if you use n=100000.
> I failed to find clear information/explanation about this, so I
> appreciate reference on the topic.
>
> Thank you again for your help
I may be a bit biased by being part of the "in crowd" here, but I
think Doug Bates's comments in the previous thread are very sensible --
do you *really* need to back-transform at all? Or can you just quote
results on the log scale?
>
> R
>
>
> Le 24/03/2011 20:54, Ben Bolker a écrit :
>> On 03/24/2011 06:37 AM, espesser wrote:
>>> Dear all,
>>>
>>> This subject has been previously discussed, but I am not sure I proceed
>>> the right way with the use of the variances.
>> Can you give a reference to the previous discussion please?
>>
>>
>>> Here is the summary of my lmer model :
>>>
>>> Linear mixed model fit by REML
>>>
>>> Formula: log(TIME) ~ LONG + ACCO + (1 | SUJET)
>>>
>>> Data: dssPUISS
>>> AIC BIC logLik deviance REMLdev
>>> 899.6 934.1 -442.8 856.7 885.6
>>> Random effects:
>>> Groups Name Variance Std.Dev.
>>> SUJET (Intercept) 0.019090 0.13817
>>> Residual 0.130297 0.36097
>>> Number of obs: 1018, groups: SUJET, 24
>>>
>>> Fixed effects:
>>> Estimate Std. Error t value
>>> (Intercept) 5.77423 0.04462 129.42
>>> LONG 0.02883 0.01129 2.55
>>> ACCO1 -0.05722 0.02272 -2.52
>>>
>>>
>>> LONG is continuous .
>>> ACCO is a 2 levels factor .
>>>
>>> I would proceed so:
>>>
>>> 1) To compute TIME at this specific point :
>>>
>>> sujet== "s3"
>>> long == 6
>>> acco == "1"
>>>
>>> TIME = exp( intercept + 6*LONG + ACCO1
>>> + estimate_of_s3_intercept + 0.5*var(Residual) )
>>>
>>> with var( Residual) == 0.130297
>>>
>>> Is it correct ?
>>
>> Is the 0.5*var(Residual) to get the mean (rather than the median) of
>> TIME on the original scale ? It seems reasonable but I wonder if you
>> could simplify your life a little bit by predicting the median rather
>> than the median ...
>>
>>> 2) I am mainly interested to back-transform the fixed effects, at the
>>> same point.
>>>
>>> 2.1) I would use:
>>>
>>> TIME = exp( intercept + 6*LONG + ACCO1
>>> + 0.5*var(SUJET) +0.5*var(Residual) )
>>>
>>> with var(SUJET) == 0.019090
>> Don't quite know what you mean here. It seems you're thinking about
>> estimating a marginal mean (unknown subject) rather than a conditional
>> mean. Your approach seems reasonable but I wouldn't want to swear it
>> was right ...
>>
>>>
>>> 2.2) Suppose there was a second random intercept (say b) in my model,
>>> I would use:
>>>
>>> TIME = exp( intercept + 6*LONG + ACCO1
>>> + 0.5*var(SUJET) + 0.5*var(b) + 0.5*var(Residual) )
>>>
>>> Are these 2 expressions correct ?
>>>
>> This gets stickier. The second 'random intercept' is from a second
>> random effect grouping factor? If the random effects are independent,
>> this seems plausible -- otherwise the variance of the sum will not be
>> equal to the sum of the variances ...
>>
>>
>>> 2.3)
>>> Suppose there was a random slope in the model, something like:
>>>
>>> log(TIME) ~ LONG + ACCO + (LONG | SUJET)
>>>
>>> How can I get TIME on the original scale ?
>> If you want the marginal mean (i.e., something analogous to what you
>> are doing above), then you need to calculate the variance -- e.g. if the
>> value is (a+b*x + e_a + e_b*x + e_i) where e_a, e_b are random
>> intercept and slope and e_i is residual error, then **if** they were
>> all independent the variance would be var_a + var_b*x^2 + var_e.
>> However, a and b are generally correlated so I believe it would be
>> var_a + var_b*x^2 + 2*cov(a,b)*x + var_e.
>>>
>>> 3) Related question :
>>>
>>> To extract the stddev of the SUJET random intercept , I use:
>>>
>>> attr(VarCorr(MyModel.lmer)$SUJET,"stddev")
>>>
>> Yes.
>>
>> As mentioned above, I think your life would be a bit easier if you
>> just decided that you wanted the median (which is invariant under
>> transformation) rather than the mean on the back-transformed scale ...
>
>
More information about the R-sig-mixed-models
mailing list