[R] Why terms are dropping out of an lm() model

Sundar Dorai-Raj sundar.dorai-raj at PDF.COM
Thu Aug 26 20:10:47 CEST 2004



John Pitney wrote:
> Hi all!
> 
> I'm fairly new to R and not too experienced with regression.  Because
> of one or both of those traits, I'm not seeing why some terms are being
> dropped from my model when doing a regression using lm().
> 
> I am trying to do a regression on some experimental data d, which has
> two numeric predictors, p1 and p2, and one numeric response, r.  The aim
> is to compare polynomial models in p1 and p2 up to third order.  I don't
> understand why lm() doesn't return coefficients for the p1^3 and p2^3
> terms.  Similar loss of terms happened when I tried orthonormal
> polynomials to third order.
> 
> I'm satisfied with the second-order regression, by the way, but I'd
> still like to understand why the third-order regression doesn't work
> like I'd expect.
> 
> Can anyone offer a pointer to help me understand this?
> 
> Here's what I'm seeing in R 1.9.1 for Windows.  Note the NA's for p1^3
> and p2^3 in the last summary.
> 
> 
>>d <- read.csv("DOE_data.csv")
>>d$p1
> 
>  [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0
> [34] 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0
> [67] 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
> 
>>d$p2
> 
>  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
> [34] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
> [67] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> 
>>summary(d$r)
> 
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>   18.68   19.88   21.94   21.48   22.64   24.36
> 
>>summary(d.lm1 <- lm(r ~ p1 + p2, data=d))
> 
> 
> Call:
> lm(formula = r ~ p1 + p2, data = d)
> 
> Residuals:
>      Min       1Q   Median       3Q      Max
> -0.58107 -0.09248  0.02492  0.26061  0.49617
> 
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept) 18.66417    0.06591  283.17   <2e-16 ***
> p1           1.96145    0.04036   48.60   <2e-16 ***
> p2           0.85801    0.04036   21.26   <2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> 
> Residual standard error: 0.3126 on 87 degrees of freedom
> Multiple R-Squared:  0.97,      Adjusted R-squared: 0.9693
> F-statistic:  1407 on 2 and 87 DF,  p-value: < 2.2e-16
> 
> 
>>summary(d.lm2 <- update(d.lm1, . ~ . + I(p1^2) + I(p2^2) + I(p1 * p2)))
> 
> 
> Call:
> lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2), data = d)
> 
> Residuals:
>       Min        1Q    Median        3Q       Max
> -0.106813 -0.021568  0.003214  0.025083  0.084687
> 
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept) 18.701098   0.011265 1660.14   <2e-16 ***
> p1           2.674525   0.019511  137.08   <2e-16 ***
> p2           0.984765   0.019511   50.47   <2e-16 ***
> I(p1^2)     -0.489210   0.008875  -55.12   <2e-16 ***
> I(p2^2)     -0.196050   0.008875  -22.09   <2e-16 ***
> I(p1 * p2)   0.265345   0.006275   42.28   <2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> 
> Residual standard error: 0.03969 on 84 degrees of freedom
> Multiple R-Squared: 0.9995,     Adjusted R-squared: 0.9995
> F-statistic: 3.598e+04 on 5 and 84 DF,  p-value: < 2.2e-16
> 
> 
>>summary(d.lm3 <- update(d.lm2, . ~ . + I(p1^3) + I(p2^3) + I(p1^2*p2) +
> 
> I(p1*p2^2)))
> 
> Call:
> lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2) + I(p1^3) +
>     I(p2^3) + I(p1^2 * p2) + I(p1 * p2^2), data = d)
> 
> Residuals:
>       Min        1Q    Median        3Q       Max
> -0.089823 -0.017707  0.001952  0.020820  0.059302
> 
> Coefficients: (2 not defined because of singularities)

Did you miss reading the above line? Seems you supplied a singular model 
to `lm' and since the default for `lm' is `singular.ok = TRUE,' it just 
pivoted these columns out in the QR-decomposition.


--sundar


>               Estimate Std. Error  t value Pr(>|t|)
> (Intercept)  18.728958   0.009657 1939.365  < 2e-16 ***
> p1            2.604190   0.022970  113.376  < 2e-16 ***
> p2            0.860080   0.022970   37.444  < 2e-16 ***
> I(p1^2)      -0.463725   0.010950  -42.348  < 2e-16 ***
> I(p2^2)      -0.137955   0.010950  -12.598  < 2e-16 ***
> I(p1 * p2)    0.432505   0.024486   17.664  < 2e-16 ***
> I(p1^3)             NA         NA       NA       NA
> I(p2^3)             NA         NA       NA       NA
> I(p1^2 * p2) -0.025485   0.008482   -3.005  0.00353 **
> I(p1 * p2^2) -0.058095   0.008482   -6.849 1.26e-09 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> 
> Residual standard error: 0.03097 on 82 degrees of freedom
> Multiple R-Squared: 0.9997,     Adjusted R-squared: 0.9997
> F-statistic: 4.221e+04 on 7 and 82 DF,  p-value: < 2.2e-16
>




More information about the R-help mailing list