[R] log-transformed linear regression

Matt Shotwell shotwelm at musc.edu
Thu Nov 11 17:07:32 CET 2010


Servet,

These data do look linear in log space. Fortunately, the model

log(y) = a + b * log(x)

does have intercept zero in linear space. To see this, consider

log(y) = a + b * log(x)
     y = 10^(a + b * log(x))
     y = 10^a * 10^(b * log(x))
     y = 10^a * 10^(log(x^b))
     y = 10^a * x^b

Hence, y = 0 when x = 0. The code below estimates a and b. 

Of course,

y = 10^a * x^b 

is not a line, so we can't directly compare slopes. However, in the
region of your data, the estimated mean is _nearly_ linear. In fact, you
could consider looking at a linear approximation, say at the median of
your x values. The median of your x values is 0.958. For simplicity,
let's just say it's 1.0. The linear approximation (first order Taylor
expansion) of 

y = 10^a * x^b

at x = 1 is

y = 10^a + 10^a * b * (x - 1)
y = 10^a * (1 - b) + 10^a * b * x

So, the slope of the linear approximation is 10^a * b, and the intercept
is 10^a * (1 - b). Taking a and b from the analysis below, the
approximate intercept is -0.00442, and slope 0.22650. You could argue
that these values are consistent with the literature, but that the log
linear model is more appropriate for these data. You could even
construct a bootstrap confidence interval for the approximate slope.

-Matt

On Wed, 2010-11-10 at 19:27 -0500, servet cizmeli wrote:
> Dear List,
> 
> I would like to take another chance and see if there if someone has 
> anything to say to my last post...
> 
> bump
> 
> servet
> 
> 
> On 11/10/2010 01:11 PM, servet cizmeli wrote:
> > Hello,
> >
> > I have a basic question. Sorry if it is so evident....
> >
> > I have the following data file :
> > http://ekumen.homelinux.net/mydata.txt
> >
> > I need to model Y~X-1 (simple linear regression through the origin) with
> > these data :
> >
> > load(file="mydata.txt")
> > X=k[,1]
> > Y=k[,2]
> >
> > aa=lm(Y~X-1)
> > dev.new()
> > plot(X,Y,log="xy")
> > abline(aa,untf=T)
> > abline(b=0.0235, a=0,col="red",untf=T)
> > abline(b=0.031, a=0,col="green",untf=T)
> >
> > Other people did the same kind of analysis with their data and found the
> > regression coefficients of 0.0235 (red line) and 0.031 (green line).
> >
> > Regression with my own data, though, yields a slope of 0.0458 (black
> > line) which is too high. Clearly my regression is too much influenced by
> > the single point with high values (X>100). I would not like to discard
> > this point, though, because I know that the measurement is correct. I
> > just would like to give it less weight...
> >
> > When I log-transform X and Y data, I obtain :
> >
> > dev.new()
> > plot(log10(X),log10(Y))
> > abline(v=0,h=0,col="cyan")
> > bb=lm(log10(Y)~log10(X))
> > abline(bb,col="blue")
> > bb
> >
> > I am happy with this regression. Now the slope is at the log-log domain.
> > I have to convert it back so that I can obtain a number comparable with
> > the literature (0.0235 and 0.031).  How to do it? I can't force the
> > second regression through the origin as the log-transformed data does
> > not go through the origin anymore.
> >
> > at first it seemed like an easy problem but I am at loss :o((
> > thanks a lot for your kindly help
> > servet
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Matthew S. Shotwell
Graduate Student 
Division of Biostatistics and Epidemiology
Medical University of South Carolina



More information about the R-help mailing list