[R] formula question
(Ted Harding)
Ted.Harding at manchester.ac.uk
Wed Mar 18 00:31:25 CET 2009
On 17-Mar-09 23:04:25, Erin Hodgess wrote:
> Dear R People:
> Here is a small data frame and two particular formulas:
>> test.df
> y x
> 1 -0.9261650 1
> 2 1.5702700 2
> 3 0.1673920 3
> 4 0.7893085 4
> 5 0.3576875 5
> 6 -1.4620915 6
> 7 -0.5506215 7
> 8 -0.3480292 8
> 9 -1.2344036 9
> 10 0.8502660 10
>> summary(lm(exp(y)~x))
>
> Call:
> lm(formula = exp(y) ~ x)
>
> Residuals:
> Min 1Q Median 3Q Max
> -1.6360 -0.6435 -0.4722 0.4215 2.9127
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.1689 0.9782 2.217 0.0574 .
> x -0.1368 0.1577 -0.868 0.4108
> ---
> Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
>
> Residual standard error: 1.432 on 8 degrees of freedom
> Multiple R-squared: 0.08604, Adjusted R-squared: -0.0282
> F-statistic: 0.7532 on 1 and 8 DF, p-value: 0.4108
>
>> summary(lm(I(y^2)~x))
>
> Call:
> lm(formula = I(y^2) ~ x)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.9584 -0.6387 -0.2651 0.5754 1.4412
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.10084 0.62428 1.763 0.116
> x -0.03813 0.10061 -0.379 0.715
>
> Residual standard error: 0.9138 on 8 degrees of freedom
> Multiple R-squared: 0.01764, Adjusted R-squared: -0.1052
> F-statistic: 0.1436 on 1 and 8 DF, p-value: 0.7146
>
>>
>
> These both work just fine.
>
> My question is: when do you know to use I() and just the function of
> the variable, please?
>
> thanks in advance,
> Erin
> PS Happy St Pat's Day!
In the case of your formula you will find it works just as well
without I():
summary(lm(y^2 ~ x))
Call:
lm(formula = y^2 ~ x)
Residuals:
Min 1Q Median 3Q Max
-0.9584 -0.6387 -0.2651 0.5754 1.4412
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.10084 0.62428 1.763 0.116
x -0.03813 0.10061 -0.379 0.715
The point of I() is that it forces numerical evaluation in an
expression which could be interpreted as a symbolic model formula.
Thus if X1 and X2 were numeric, and you want to regress Y on the
numerical values of X1*X2, then you should use I(X1*X2), since in
Y ~ X1*X2
this would be interpreted as (essentially) fitting both linear
terms and their interaction (equivalent to product here), namely
corresponding to
Y = a + b1*X1 + b2*X2 + b12*X1*X2
In order to force the fitted equation to be
Y = a + b*X1*X2
you would use Y ~ I(X1*X2). This issue does not arise when
a product is on the left-hand side of the model formula, so
you could simply use X1*X2 ~ Y
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Mar-09 Time: 23:31:21
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list