[R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?

Mike Marchywka marchywka at hotmail.com
Wed Mar 23 02:55:55 CET 2011







----------------------------------------
> Date: Tue, 22 Mar 2011 09:31:01 -0700
> From: crosspide at hotmail.com
> To: r-help at r-project.org
> Subject: [R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?
>
> Dear all,
>
> I want to improve my adj - R sq. I 've chequed some established models and
> they introduce two times the same variable, one transformed, and the other
> not. It also improves my adj - R sq.
>
> But, isn't this bad for the collinearity? Do I interpret coefficients as
> usual?


I'm not sure how many replies you got or if your question was answered but just offhand
let me see if I understand your concern.
If your data is only over a limited range of v1 where you can Taylor
expand to linear term only then sure it can be hard to tell a linear from log dependence
of quantify a mixture of the two. If you try to find a and b
to fit y=a*f(x) + b*g(x) that minimizes some error, you should be able
to see the issues on paper.  Presumaly log is not linear over a larger
range and any error function, like SSE, would have "reasonbly " peaked
minimum for some values of the two coefficients but you could do a sensitivty
analysis to check- find the second derivatives of your error function or
just perturb the coefficients a bit. I guess if there is some direction
where the error does not change as a and b vary then you have the case you
are worried about.  I'm not sure what you consider to be "usual" but
when I'm doing something like this, I usually have some physical
interpretation mind. Most uninfomratively, you could interpret these
coefficients as those which minimize your error given the data you have :)
What you do from there depends on a lot of specifics. To tell if
a given function seems to be appropriate for the data, it is always good
to look at a plot of residuals. Note that ability to find a unique
set of coefficients that minimizes a given error has nothing to do
with independence of the two terms attached to the coefficients- indeed
polynomial fits are a common example( log having a taylor series just constrains
a lot of coefficient relationships LOL).

P-values and confidence intervals are another matter with post hoc
exploratory work but I'll let a statistician comment on that
as well as the meaning of the R output.
Usually the final decision on a putative model impovement comes
from your ability to infer something about the underlying system
although you may just want a simple empirical approximation
and be more worried about meeting a given error with a limited
number of computations etc etc.

Apparently you found on a retrospective literature search that
everyone else is using the log term. 
Sometimes you see people ask questions like, " given that in 10 papers on
the subject 4 of them used the log term and these authors have historically
been right 50 percent of the time but the other 6 are right 40 percent of the
time, what are the chances that the log term should be included?" I will
also avoid commenting on this question except to say it illustrates
a number of ways people do approach these problems and what you consider
to be relevant to your situation. 



>
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.73140 7.22477 0.240 0.81086
> v1 -0.33886 0.20321 -1.668 0.09705 .
> log(v1) 2.63194 3.74556 0.703 0.48311
> v2 -0.01517 0.01089 -1.394 0.16507
> log(v3) -0.45719 0.27656 -1.653 0.09995 .
> factor1 -1.81517 0.62155 -2.920 0.00392 **
> factor2 -1.87330 0.84375 -2.220 0.02759 *
>
> Analysis of Variance Table
>
> Response: height rise
> Df Sum Sq Mean Sq F value Pr(>F)
> v1 1 51.25 51.246 21.4128 6.842e-06 ***
> log(v1) 1 13.62 13.617 5.6897 0.018048 *
> v2 1 2.84 2.836 1.1850 0.277713
> log(v3) 1 3.02 3.024 1.2638 0.262357
> factor1 1 17.62 17.616 7.3608 0.007279 **
> factor2 1 11.80 11.797 4.9294 0.027586 *
> Residuals 190 454.71 2.393
>
> Thanks,
> user at host.com
>
> --
> View this message in context: http://r.789695.n4.nabble.com/lm-v1-log-v1-improve-adj-Rsq-any-sense-tp3396935p3396935.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
 		 	   		  


More information about the R-help mailing list