[R-sig-ME] Variable transformation and back transformation

Mon Mar 23 08:22:58 CET 2009

Douglas Bates schrieb:
> On Thu, Mar 12, 2009 at 2:49 AM, Christina Bogner
> <christina.bogner at uni-bayreuth.de> wrote:
>   
>> Dear all,
>>
>> I have fitted a couple of mixed-effects models to environmental data
>> (chemical and physical soil parameters) with log-transformed dependent
>> variables. I tried generalized mixed-models, but the results were not
>> satisfactory (probably because I am a soil scientist and not a statistician
>> ;-)) Now, as log of concentrations are ecologically not very informative, I
>> would like to back-transform my model parameters. Taking a Gaussian linear
>> mixed-model:
>>     
>
>   
>> log(Mg2)=intercept+beta1*Silt+beta2*Soil.depth+beta3*Flow.region+b1*Plot+b2*/Soil.Depth%in%Plot+var
>> where Mg2 is the concentration of magnesium, betas are fixed-effects and bs
>> random ones. All independent variables except Silt are factors; Silt is
>> continuous.
>>     
>
>   
>> I would write:
>> Mg2=exp(intercept+beta1*mean(Silt in respective
>> Soil.Depth)+beta3*Flow.region+estimate of b1*Plot + estimate of
>> b2*/Soil.Depth%in%Plot+0.5*var)
>> to back-transform to the original scale on the Soil.Depth-level.
>>     
>
>   
>> To back-transform the fixed-effects only, I would drop the estimates of the
>> random-effects:
>> Mg2=exp(intercept+beta1*mean(Silt in respective
>> Soil.Depth)+beta3*Flow.region+ 0.5*var)
>>     
>
>   
>> This approach treats the estimated random effects as dummies, not as an
>> additional variance. Is this right?
>>     
>
>   
Dear Dr. Bates,

thank you very much for your answer.
> I'm not sure exactly what you mean by treating the estimated random
> effects as dummies.
>   
By dummy, I mean just treating the random effects as if they were an 
additional effect. So when calculating the backtransformation, I just 
add the estimated random effect for the respective level to the 
estimates of fixed-effects.
But honestly, I have a problem with this approach. For me, 
random-effects are something like additional variance. In a simple 
linear model, when transforming from log-scale to the original scale, 
variance is multiplied by 0.5. And in my transformation equation I just 
add the estimated random effect.
> In a linear mixed model the random effects are incorporated
> additively.  It is common with data like concentrations that the
> effect of different levels of variability is more appropriately
> modelled as a multiplicative change than as an additive change, which
> corresponds to the additive change on the scale of the logarithm of
> the concentration.
>
> I would try to communicate this graphically by plotting the magnesium
> concentration under various conditions and then plotting the logarithm
> of this concentration.  I would hope to use this to overcome
> resistance to the idea of using a transformation, such as when you say
> that logarithms of concentrations are not meaningful ecologically.  I
> have been fortunate to be present at many informal consulting sessions
> led by the great statistician George Box who started his career as a
> chemist at ICI, a British chemical company.  George always wants to
> examine the data graphically and consider appropriate transformation
> (the Box-Cox transformation family are the result of his work with Sir
> David Cox).  He is aware of the resistance to transformation and has,
> somewhat but not entirely facetiously, suggested that one way around
> it is simply to create a new unit.  Recall that pH is the logarithm of
> the hydrogen ion concentration.  Other examples are decibels
> (logarithm of sound pressure) and octaves (doubling or halving the
> frequency).
>   
You are absolutely right about the pH. But it is already more than 
difficult to communicate the need for a mixed-effects model, even if 
samples are extracted hierarchically. So backtransformation is like 
"forgive me the complicated statistical approach, but I can tell you how 
much magnesium is in the subsoil".
> A summary of a random variable using location and scale parameters is
> meaningful when the distribution is reasonably symmetric.  It is not
> as easy to summarize asymmetric distributions.  (A log-normal
> distribution is more complicated than a normal distribution.)  In more
> general models, such as a linear mixed model, we can simplify the
> description of the model under the appropriate scale but if we try to
> back-transform then the description becomes much more complicated.
>
> I'm not sure that this is addressing your question.  I think you are
> trying to determine a simple way of communicating the meaning of the
> parameter estimates on the concentration scale and, if so, my answer
> is that they don't have a nice simple meaning on that scale.
>
> One thing I noticed in your model is that Soil.Depth is being treated
> as a categorical covariate, as opposed to a continuous covariate.  Was
> the experimental design such that a fixed set of soil depths were
> used?  In other words I am wondering if a model like
>
> log(Mg2) ~ Silt + Flow + (Depth | Plot)
>
> might be more appropriate than
>
> log(Mg2) ~ Silt + Flow + (1 | Depth:Plot) + (1 | Plot)
>
> Of course, the first model does assume that the effect of soil depth
> on magnesium concentration is linear and that may not be appropriate,
> although I would be tempted to store soil depth as at least an ordered
> factor so I could check on the relative importance of linear and
> higher order terms.
>   
Indeed, we had to use fixed soil depths (horizons) because a certain 
amount of soil material is needed for the analysis. So depth designates 
different (chemically distinct) soil subunits. In further studies we 
will be able to analyse small portions of the soil in situ, so that 
depth will be a numerical variable. (I am really excited about producing 
more data and going beyond my actual simplistic approach ;-)).

Thank you again

Christina Bogner