[R-sig-ME] Variable transformation and back transformation

Wed Mar 18 14:29:49 CET 2009

On Thu, Mar 12, 2009 at 2:49 AM, Christina Bogner
<christina.bogner at uni-bayreuth.de> wrote:
> Dear all,
>
> I have fitted a couple of mixed-effects models to environmental data
> (chemical and physical soil parameters) with log-transformed dependent
> variables. I tried generalized mixed-models, but the results were not
> satisfactory (probably because I am a soil scientist and not a statistician
> ;-)) Now, as log of concentrations are ecologically not very informative, I
> would like to back-transform my model parameters. Taking a Gaussian linear
> mixed-model:

> log(Mg2)=intercept+beta1*Silt+beta2*Soil.depth+beta3*Flow.region+b1*Plot+b2*/Soil.Depth%in%Plot+var
> where Mg2 is the concentration of magnesium, betas are fixed-effects and bs
> random ones. All independent variables except Silt are factors; Silt is
> continuous.

> I would write:
> Mg2=exp(intercept+beta1*mean(Silt in respective
> Soil.Depth)+beta3*Flow.region+estimate of b1*Plot + estimate of
> b2*/Soil.Depth%in%Plot+0.5*var)
> to back-transform to the original scale on the Soil.Depth-level.

> To back-transform the fixed-effects only, I would drop the estimates of the
> random-effects:
> Mg2=exp(intercept+beta1*mean(Silt in respective
> Soil.Depth)+beta3*Flow.region+ 0.5*var)

> This approach treats the estimated random effects as dummies, not as an
> additional variance. Is this right?

I'm not sure exactly what you mean by treating the estimated random
effects as dummies.

In a linear mixed model the random effects are incorporated
additively.  It is common with data like concentrations that the
effect of different levels of variability is more appropriately
modelled as a multiplicative change than as an additive change, which
corresponds to the additive change on the scale of the logarithm of
the concentration.

I would try to communicate this graphically by plotting the magnesium
concentration under various conditions and then plotting the logarithm
of this concentration.  I would hope to use this to overcome
resistance to the idea of using a transformation, such as when you say
that logarithms of concentrations are not meaningful ecologically.  I
have been fortunate to be present at many informal consulting sessions
led by the great statistician George Box who started his career as a
chemist at ICI, a British chemical company.  George always wants to
examine the data graphically and consider appropriate transformation
(the Box-Cox transformation family are the result of his work with Sir
David Cox).  He is aware of the resistance to transformation and has,
somewhat but not entirely facetiously, suggested that one way around
it is simply to create a new unit.  Recall that pH is the logarithm of
the hydrogen ion concentration.  Other examples are decibels
(logarithm of sound pressure) and octaves (doubling or halving the
frequency).

A summary of a random variable using location and scale parameters is
meaningful when the distribution is reasonably symmetric.  It is not
as easy to summarize asymmetric distributions.  (A log-normal
distribution is more complicated than a normal distribution.)  In more
general models, such as a linear mixed model, we can simplify the
description of the model under the appropriate scale but if we try to
back-transform then the description becomes much more complicated.

I'm not sure that this is addressing your question.  I think you are
trying to determine a simple way of communicating the meaning of the
parameter estimates on the concentration scale and, if so, my answer
is that they don't have a nice simple meaning on that scale.

One thing I noticed in your model is that Soil.Depth is being treated
as a categorical covariate, as opposed to a continuous covariate.  Was
the experimental design such that a fixed set of soil depths were
used?  In other words I am wondering if a model like

log(Mg2) ~ Silt + Flow + (Depth | Plot)

might be more appropriate than

log(Mg2) ~ Silt + Flow + (1 | Depth:Plot) + (1 | Plot)

Of course, the first model does assume that the effect of soil depth
on magnesium concentration is linear and that may not be appropriate,
although I would be tempted to store soil depth as at least an ordered
factor so I could check on the relative importance of linear and
higher order terms.