[R] when to use "I", "as is" caret
David Winsemius
dwinsemius at comcast.net
Fri Sep 14 16:47:16 CEST 2012
On Sep 14, 2012, at 12:41 AM, agent dunham wrote:
> Dear community,
>
> I've check it while working, but just to reassure myself. Let's say we have
> 2 models:
>
> model1 <- lm(vdep ~ log(v1) + v2 + v3 + I(v4^2) , data = mydata)
If you want to create a second degree polynomial for "proper" statisical inference via a formula, the way forward is:
?poly
model1 <- lm(vdep ~ log(v1) + v2 + v3 + poly(v4,2) , data = mydata)
You will get orthogonal polynomials, which are different than most people's naive expectations, but they do allow your to fairly assess departures from linearity.
It's interesting to compare two methods with the cars dataset:
Proper use of poly():
> fm <- lm(dist ~ poly(speed, 2), data = cars)
> summary(fm)
Call:
lm(formula = dist ~ poly(speed, 2), data = cars)
Residuals:
Min 1Q Median 3Q Max
-28.720 -9.184 -3.188 4.628 45.152
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.980 2.146 20.026 < 2e-16 ***
poly(speed, 2)1 145.552 15.176 9.591 1.21e-12 ***
poly(speed, 2)2 22.996 15.176 1.515 0.136
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.18 on 47 degrees of freedom
Multiple R-squared: 0.6673, Adjusted R-squared: 0.6532
F-statistic: 47.14 on 2 and 47 DF, p-value: 5.852e-12
Improper use of linear and "I-quadratic":
> fm2 <- lm(dist ~ speed+I(speed^2), data = cars)
> summary(fm2)
Call:
lm(formula = dist ~ speed + I(speed^2), data = cars)
Residuals:
Min 1Q Median 3Q Max
-28.720 -9.184 -3.188 4.628 45.152
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.47014 14.81716 0.167 0.868
speed 0.91329 2.03422 0.449 0.656
I(speed^2) 0.09996 0.06597 1.515 0.136
Residual standard error: 15.18 on 47 degrees of freedom
Multiple R-squared: 0.6673, Adjusted R-squared: 0.6532
F-statistic: 47.14 on 2 and 47 DF, p-value: 5.852e-12
#---------
If you wanted the same results as you would get from I(v4^2) and you were using poly() it would look like :
(z <- poly(1:10, 2, raw=TRUE)[,2])
[1] 1 4 9 16 25 36 49 64 81 100
I didn't know off whether one could use the raw-poly column within a formula for lm but it seems to work as I expected:
> fm <- lm(dist ~ I(speed^2), data = cars)
> fm
Call:
lm(formula = dist ~ I(speed^2), data = cars)
Coefficients:
(Intercept) I(speed^2)
8.860 0.129
> fm <- lm(dist ~ poly(speed, 2, raw=TRUE)[,2], data = cars)
> fm
Call:
lm(formula = dist ~ poly(speed, 2, raw = TRUE)[, 2], data = cars)
Coefficients:
(Intercept) poly(speed, 2, raw = TRUE)[, 2]
8.860 0.129
(And Uwe's answer covers the rest.)
> model2 <- lm(vdep ~ log(v1) + v2 + v3 + v4^2, data = mydata)
>
> So in model1 you really square v4; and in model2, v4*^2 *doesn't do
> anything, does it? Model2 could be rewritten:
> model2b <- lm(vdep ~ log(v1) + v2 + v3 + v4, data = mydata) and nothing
> changes, doesn't it?
>
> This "I" caret is essential with powering or when including transformations
> as I(1/(v2+v3)) but not with log transformation, isn't it?. Is there any
> other transformation where I muss use also this "I", as is caret?
>
David Winsemius, MD
Alameda, CA, USA
More information about the R-help
mailing list