[R] meaning of tests presented in anova(ols(...)) {Design package}

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Jul 16 03:25:22 CEST 2008


Dylan Beaudette wrote:
> Hi,
> 
> I am curious about how to interpret the table produced by
> anova(ols(...)), from the Design package. I have a multiple linear
> regression model, with some interaction, defined by:
> 
> ols(formula = log(ksat * 60 * 60) ~ log(sar) * pol(activity,
>     3) + log(conc) * pol(sand, 3), data = sm.clean, x = TRUE,
>     y = TRUE)
> 
>          n Model L.R.       d.f.         R2      Sigma
>       1834       1203         14       0.48        1.2
> 
> Residuals:
>    Min     1Q Median     3Q    Max
> -5.033 -0.859  0.016  0.739  4.868
> 
> Coefficients:
>                        Value Std. Error     t        Pr(>|t|)
> Intercept         11.3886790  2.0220171  5.63 0.0000000205580
> sar               -4.3991263  1.0157588 -4.33 0.0000156609226
> activity         -40.0591221  5.6907822 -7.04 0.0000000000027
> activity^2        33.0570116  5.0578520  6.54 0.0000000000819
> activity^3        -8.1645147  1.3750370 -5.94 0.0000000034548
> conc               0.3841260  0.0813200  4.72 0.0000024942478
> sand              -0.0096212  0.0327415 -0.29 0.7689032898947
> sand^2             0.0008495  0.0008589  0.99 0.3227487169683
> sand^3             0.0000025  0.0000066  0.39 0.6994987342042
> sar * activity    12.8134698  2.9513942  4.34 0.0000149300007
> sar * activity^2  -9.9981381  2.6310765 -3.80 0.0001494462966
> sar * activity^3   2.1481278  0.7168339  3.00 0.0027662261037
> conc * sand       -0.0157426  0.0076013 -2.07 0.0384966958735
> conc * sand^2      0.0003419  0.0001989  1.72 0.0857381555491
> conc * sand^3     -0.0000027  0.0000015 -1.77 0.0777025949762
> 
> 
> Looking at what I 'think' are "marginal p-values" i.e. results of a
> test against coef_i != 0, there are several terms with non-significant
> coefficients (at p<0.05). Does a non-significant coefficient warrant
> removal from the model, or perhaps a mention in the discussion?

No

> 
> Compared to the above example, what tests are performed when calling
> anova() on this object? Here is the output in R:

Mark Difford gave a nice response for that.

Frank

> 
>                Analysis of Variance          Response: log(ksat * 60 * 60)
> 
>  Factor                                        d.f. Partial SS MS     F
>  sar  (Factor+Higher Order Factors)               4  168.43     42.11  27.0
>   All Interactions                                3  142.13     47.38  30.4
>  activity  (Factor+Higher Order Factors)          6  536.84     89.47  57.3
>   All Interactions                                3  142.13     47.38  30.4
>   Nonlinear (Factor+Higher Order Factors)         4  257.25     64.31  41.2
>  conc  (Factor+Higher Order Factors)              4  443.02    110.75  71.0
>   All Interactions                                3   76.74     25.58  16.4
>  sand  (Factor+Higher Order Factors)              6 1906.29    317.71 203.6
>   All Interactions                                3   76.74     25.58  16.4
>   Nonlinear (Factor+Higher Order Factors)         4  263.00     65.75  42.1
>  sar * activity  (Factor+Higher Order Factors)    3  142.13     47.38  30.4
>   Nonlinear                                       2   95.32     47.66  30.5
>   Nonlinear Interaction : f(A,B) vs. AB           2   95.32     47.66  30.5
>  conc * sand  (Factor+Higher Order Factors)       3   76.74     25.58  16.4
>   Nonlinear                                       2    4.98      2.49   1.6
>   Nonlinear Interaction : f(A,B) vs. AB           2    4.98      2.49   1.6
>  TOTAL NONLINEAR                                  8  455.20     56.90  36.5
>  TOTAL INTERACTION                                6  218.87     36.48  23.4
>  TOTAL NONLINEAR + INTERACTION                   10  573.36     57.34  36.7
>  REGRESSION                                      14 2631.53    187.97 120.4
>  ERROR                                         1819 2839.25      1.56
>  P
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  <.0001
>  0.203
>  0.203
>  <.0001
>  <.0001
>  <.0001
>  <.0001
> 
> Are more of the 'terms' significant (at p<0.05) due to pooling of
> model terms? I have looked through Frank's book on the topic, but
> can't quite wrap my head around what the above is telling me. I am
> mostly interested in presenting a model for use as a applied tool, and
> interpretation of terms / interaction is very important.
> 
> Thanks,
> 
> Dylan
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list