[R] [FORGED] log log regression model

Rolf Turner r.turner at auckland.ac.nz
Fri Jan 1 23:27:17 CET 2016


On 01/01/16 14:35, Andras Farkas wrote:
>

>
> Dear All,
>
> wonder if you have a thought on the followimg: if I have a simple
> model like model <- lm(log(y)~log(x)+log(z),data=data), where both,
> the dependent and independent variables are log transformed, is it ok
> just to use ypred <- predict(model,type=response) to get the
> predictions , then transform ypred with exp(ypred)  to y's original
> scale to compare observed or known data (y) with model predicted
> (ypred) on the original scale?
>
> appreciate your thoughts...

Is it OK?  In what sense?  It's a free country --- at least the country 
that I live in is still free (more or less) so you can do whatever you 
feel like.

However it helps to do a little elementary mathematics, rather than just 
hammering and hoping.  When in doubt, do the maths.

You are fitting the model

   log(y) = beta_0 + beta_1 * log(x) + beta_2 * log(z) + E

where it is tacitly assumed that E ~ N(0,sigma^2) and that the E's 
corresponding to different observations are independent. Using predict() 
will get you

    log(y).hat = b_0 + b_1 * log(x) + b_2 * log(z)

where the b_i are the *estimates* of the beta_i.  Under the given 
assumptions, the b_i are unbiased so you get the expected value of 
log(y).hat being equal to the expected value of log(y) and the universe 
is in harmony.  OMMMMMMMMM!

However when you exponentiate everything, things change; expected values 
are not preserved since exponentiation is *not* a linear function!

If you set y.hat = exp(log(y).hat) you get

    y.hat = exp(b_0) * x^{b_1}* z^{b_2}

It is not obvious what the expected value of this expression is, but it 
is pretty sure not to be the same as the expected value of y.

Note that the expected value of y is *not* equal to:

     exp(beta_0) * x^{beta_1} * z^{beta_2}  (1)

Rather, it is equal to this expression multiplied by exp(sigma^2/2) 
(under the stated assumptions).

I *think* (I have not checked this) that the expected value of y.hat 
will be equal "asymptotically" to (1) --- that is, to the expected value 
of y divided by exp(sigma^2/2).

Other younger and wiser heads who read this list might like to chime in 
here and possibly correct me.

So, bottom line, my guess is that if your sample size is "large" then 
you won't go too far wrong by predicting y by

     exp(log(y).hat)*exp(s^2/2)

i.e. by exp(predict(model))*exp(s^2/2)

where s^2 is the estimated variance from the model that you fitted on 
the log scale using lm().

I would advise doing some fairly thorough simulations to get a feel for 
the size of the bias.  (There will always be *some* bias.)

You might also think about fitting the non-linear model

     y = gamma_0 * x^{gamma_1} * z^{gamma_2} + E

if you think that additive errors make more sense than multiplicative 
errors for your setting.

You can do this "fairly easily" using the function nls(), if you choose 
to go this way.

cheers,

Rolf Turner

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



More information about the R-help mailing list