[R] Why terms are dropping out of an lm() model
Sundar Dorai-Raj
sundar.dorai-raj at PDF.COM
Thu Aug 26 20:10:47 CEST 2004
John Pitney wrote:
> Hi all!
>
> I'm fairly new to R and not too experienced with regression. Because
> of one or both of those traits, I'm not seeing why some terms are being
> dropped from my model when doing a regression using lm().
>
> I am trying to do a regression on some experimental data d, which has
> two numeric predictors, p1 and p2, and one numeric response, r. The aim
> is to compare polynomial models in p1 and p2 up to third order. I don't
> understand why lm() doesn't return coefficients for the p1^3 and p2^3
> terms. Similar loss of terms happened when I tried orthonormal
> polynomials to third order.
>
> I'm satisfied with the second-order regression, by the way, but I'd
> still like to understand why the third-order regression doesn't work
> like I'd expect.
>
> Can anyone offer a pointer to help me understand this?
>
> Here's what I'm seeing in R 1.9.1 for Windows. Note the NA's for p1^3
> and p2^3 in the last summary.
>
>
>>d <- read.csv("DOE_data.csv")
>>d$p1
>
> [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0
> [34] 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0
> [67] 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
>
>>d$p2
>
> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
> [34] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
> [67] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>
>>summary(d$r)
>
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 18.68 19.88 21.94 21.48 22.64 24.36
>
>>summary(d.lm1 <- lm(r ~ p1 + p2, data=d))
>
>
> Call:
> lm(formula = r ~ p1 + p2, data = d)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.58107 -0.09248 0.02492 0.26061 0.49617
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 18.66417 0.06591 283.17 <2e-16 ***
> p1 1.96145 0.04036 48.60 <2e-16 ***
> p2 0.85801 0.04036 21.26 <2e-16 ***
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.3126 on 87 degrees of freedom
> Multiple R-Squared: 0.97, Adjusted R-squared: 0.9693
> F-statistic: 1407 on 2 and 87 DF, p-value: < 2.2e-16
>
>
>>summary(d.lm2 <- update(d.lm1, . ~ . + I(p1^2) + I(p2^2) + I(p1 * p2)))
>
>
> Call:
> lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2), data = d)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.106813 -0.021568 0.003214 0.025083 0.084687
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 18.701098 0.011265 1660.14 <2e-16 ***
> p1 2.674525 0.019511 137.08 <2e-16 ***
> p2 0.984765 0.019511 50.47 <2e-16 ***
> I(p1^2) -0.489210 0.008875 -55.12 <2e-16 ***
> I(p2^2) -0.196050 0.008875 -22.09 <2e-16 ***
> I(p1 * p2) 0.265345 0.006275 42.28 <2e-16 ***
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.03969 on 84 degrees of freedom
> Multiple R-Squared: 0.9995, Adjusted R-squared: 0.9995
> F-statistic: 3.598e+04 on 5 and 84 DF, p-value: < 2.2e-16
>
>
>>summary(d.lm3 <- update(d.lm2, . ~ . + I(p1^3) + I(p2^3) + I(p1^2*p2) +
>
> I(p1*p2^2)))
>
> Call:
> lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2) + I(p1^3) +
> I(p2^3) + I(p1^2 * p2) + I(p1 * p2^2), data = d)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.089823 -0.017707 0.001952 0.020820 0.059302
>
> Coefficients: (2 not defined because of singularities)
Did you miss reading the above line? Seems you supplied a singular model
to `lm' and since the default for `lm' is `singular.ok = TRUE,' it just
pivoted these columns out in the QR-decomposition.
--sundar
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 18.728958 0.009657 1939.365 < 2e-16 ***
> p1 2.604190 0.022970 113.376 < 2e-16 ***
> p2 0.860080 0.022970 37.444 < 2e-16 ***
> I(p1^2) -0.463725 0.010950 -42.348 < 2e-16 ***
> I(p2^2) -0.137955 0.010950 -12.598 < 2e-16 ***
> I(p1 * p2) 0.432505 0.024486 17.664 < 2e-16 ***
> I(p1^3) NA NA NA NA
> I(p2^3) NA NA NA NA
> I(p1^2 * p2) -0.025485 0.008482 -3.005 0.00353 **
> I(p1 * p2^2) -0.058095 0.008482 -6.849 1.26e-09 ***
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.03097 on 82 degrees of freedom
> Multiple R-Squared: 0.9997, Adjusted R-squared: 0.9997
> F-statistic: 4.221e+04 on 7 and 82 DF, p-value: < 2.2e-16
>
More information about the R-help
mailing list