[R] log-transformed linear regression

Thu Nov 11 14:03:57 CET 2010

----------------------------------------
> Date: Wed, 10 Nov 2010 19:27:20 -0500
> From: sa.cizmeli at usherbrooke.ca
> To: r-help at r-project.org
> Subject: Re: [R] log-transformed linear regression
>
> Dear List,
>
> I would like to take another chance and see if there if someone has
> anything to say to my last post...

ok, I took a look at it to help myself re-learn some R so
I solicit better alts from R users in comments below.

If I expand scale down where you have lots of points, the visual
is something like a plateau for x<1 and then a general noisy
trend upward- the slope you get for the aggregate blob of
data probably depends on how your sampling is weighted. I wouldn't
worry about the one point at large X in isolation quite yet but
you may want to consider a piecewise linear or other way to handle
the low X valued data. You may just want to spend more time staring
at pictures before churning out p-values LOL. 

To be clear about your log issues, lm will fit a linear model to the stuff you
give it and of course y=a*x means something different from log(y) =a*log(x).
Overall, you need to figure out what to do with the models depending on
what you think is a candidate for being real- indeed your points at large
X and Y may be out of the linear range for this "system" but who knows. 

If you just try to fit a bunch of data to a line,
the slope of course depends on which part of this bunch you have
sampled. Generally to explore sensitivity, try things like this,
with ".0" of course replaced by whatever threshold you want.
It is easy to automate, put this in a loop and plot historgrams
of the resultant slopes etc. 

> sel=runif(length(X))>.0
> aasel=lm(Y[sel]~X[sel])
> summary(aasel)

Call:
lm(formula = Y[sel] ~ X[sel])

Residuals:
      Min        1Q    Median        3Q       Max
-0.221957 -0.004207  0.004055  0.013395  0.232362

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0256642  0.0033254  -7.718 1.54e-12 ***
X[sel]       0.0463599  0.0003146 147.365  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0399 on 150 degrees of freedom
Multiple R-squared: 0.9931,     Adjusted R-squared: 0.9931
F-statistic: 2.172e+04 on 1 and 150 DF,  p-value: < 2.2e-16

if you increase ".0" to something like .5 or so and keep repating
this you get some idea of what can happen, make changes for the various
models and then decide what you are trying to do etc. 

> sel=runif(length(X))>.9
> aasel=lm(Y[sel]~X[sel])
> summary(aasel)

Call:
lm(formula = Y[sel] ~ X[sel])

Residuals:
      Min        1Q    Median        3Q       Max
-0.199943 -0.002839  0.010626  0.019608  0.149099

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.035546   0.013741  -2.587   0.0162 *
X[sel]       0.052033   0.003401  15.301 7.04e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05878 on 24 degrees of freedom
Multiple R-squared: 0.907,      Adjusted R-squared: 0.9031
F-statistic: 234.1 on 1 and 24 DF,  p-value: 7.037e-14

>
> bump
>
> servet
>
>
> On 11/10/2010 01:11 PM, servet cizmeli wrote:
> > Hello,
> >
> > I have a basic question. Sorry if it is so evident....
> >
> > I have the following data file :
> > http://ekumen.homelinux.net/mydata.txt
> >
> > I need to model Y~X-1 (simple linear regression through the origin) with
> > these data :
> >
> > load(file="mydata.txt")
> > X=k[,1]
> > Y=k[,2]
> >
> > aa=lm(Y~X-1)
> > dev.new()
> > plot(X,Y,log="xy")
> > abline(aa,untf=T)
> > abline(b=0.0235, a=0,col="red",untf=T)
> > abline(b=0.031, a=0,col="green",untf=T)
> >
> > Other people did the same kind of analysis with their data and found the
> > regression coefficients of 0.0235 (red line) and 0.031 (green line).
> >
> > Regression with my own data, though, yields a slope of 0.0458 (black
> > line) which is too high. Clearly my regression is too much influenced by
> > the single point with high values (X>100). I would not like to discard
> > this point, though, because I know that the measurement is correct. I
> > just would like to give it less weight...
> >
> > When I log-transform X and Y data, I obtain :
> >
> > dev.new()
> > plot(log10(X),log10(Y))
> > abline(v=0,h=0,col="cyan")
> > bb=lm(log10(Y)~log10(X))
> > abline(bb,col="blue")
> > bb
> >
> > I am happy with this regression. Now the slope is at the log-log domain.
> > I have to convert it back so that I can obtain a number comparable with
> > the literature (0.0235 and 0.031). How to do it? I can't force the
> > second regression through the origin as the log-transformed data does
> > not go through the origin anymore.
> >
> > at first it seemed like an easy problem but I am at loss :o((
> > thanks a lot for your kindly help
> > servet
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.