Hi there, After a couple of requests, I am posting replies from Scott Foster to my questions from last week. Thanks also to Simon, Thierry & David for their responses. Tomas FIRST QUESTION > > I am using glms. Could someone please explain what's the difference > between (a) using a gaussian family distribution with a LOG link > function and (b) LOG transforming the response variable with a normal > distribution (Gaussian family distribution with identity link function). > The outputs differ and clearly one option or the other will result in > better fits depending on the dataset (everything else equal) but I want > to understand why is this so. > > Thanks in advance, > Tomas Easdale REPLY This is easiest to understand with the aid of expectations. Under a log link we have log( E( y)) = a + bx => E( y) = exp( a + bx) => E( y) = exp( a) exp( bx) introducing the normal errors gives y = exp( a) exp( b) + e. Under a log transformation we have log( y) = c + dx + e => y = exp( c) exp( bx) exp( e). Note what has happened to the errors in the two models. In the log-linked model the errors are on the original scale. In the transformed model the errors are on a different scale. The choice of model (for normal data) is not immediately clear and depends on your specific situation and its resulting data. Loosely speaking choose the log-link if you have homoscedastic residuals on the original scale and choose the log-transform if you have homoscedastic residuals on the log( y) scale. Be careful with interpretation if you do decide that the log transformed model is the bee's knees for your data. It is not sufficient to simply take the exponential of the predictions on the log scale as predictions on the original scale. Hope this helps Scott SECOND QUESTION > > That response is very useful and it all makes sense now as LOG transforming the response works better for heterocedastic data. But your response also brings me to a second question that has been circulating around. What can/should I do if I want to use my regression to predict responses in the original scale. Exponentiate the errors? That doesn't seem much feasible, is it? > > Cheers > Tomas REPLY You are right, exponentiating the errors is not a good option. There is really only one option (I think) for log-normal data (data that is normal after a transformation) is the back-transform E( y) = exp( mu + sigma / 2) where mu is the mean of the log-transformed data and sigma is its standard deviation. See the wikipedia page on log-normals for a description of why this works. There is a similar back-transformation for variances. Note that log-transforming is not your only option. you could try using another error distribution (e.g. gamma) where there is an increase in variance with the mean (increases with the square of the mean). If this is not the desired relationship then you could even try quasi-likelihood methods that allow you to specify an arbitrary variance mean relationship. If you ever get the time try reading McCullagh and Nelder. It really is a very good book. Scott [[alternative HTML version deleted]]