[R-sig-ME] Raw and orthogonal polynomials differences in results

Steven McKinney smckinney at bccrc.ca
Fri Jul 16 21:04:09 CEST 2010


Hi Rebecca

The high correlation between the linear and
quadratic terms in the raw polynomial
form (-0.993) is a warning sign that
raw polynomial fits to this data will be
numerically challenging (due to numerical
precision issues involved in inverting 
matrices with nearly linearly-dependent
columns etc.).

In such situations orthogonal polynomial
fits are essential.  (Notice the lack of correlation
of the orthogonal polynomial regressors - all
close to zero.)  Really, orthogonal polynomial
fits are always the best approach.  When
"raw" polynomial predictor variables show
low correlation, the model may be easier to
interpret or explain in some circumstances, but the 
polynomial fit should always be checked to confirm that
there are no numerical stability issues involved
with the model fit estimates. 

The polynomial fit is indeed the one to interpret,
and it shows no evidence of a quadratic component.
You report that no quadratic component appears
in plots of the data, so that concurs with the
(numerically more stable) orthogonal polynomial
fit results.

Mean-centred variables often perform better
in "raw" polynomial fits, so if you form a new 
variable

F2dat$Av_alt_m <- F2dat$Av_alt - mean(F2dat$Av_alt, na.rm = TRUE)
then a fit of 

total_RO ~ Av_alt_m + I(Av_alt_m^2) + (1 | Block) + (1 | fam:Type)

may show lower correlation between the linear and quadratic
predictors.

The orthogonal polynomial fit is the safest fit with
which to statistically assess the functional form of 
the association of the response with the predictor.
Other basis spline fits and smoothers are also useful.

Steven McKinney, Ph.D.

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3
Canada



> -----Original Message-----
> From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-
> models-bounces at r-project.org] On Behalf Of Rebecca Ross
> Sent: July-16-10 7:03 AM
> To: r-sig-mixed-models at r-project.org
> Subject: [R-sig-ME] Raw and orthogonal polynomials differences in
> results
> 
> Dear List,
> 
> I apologise if this has been covered before, but I haven't been able to
> find an exact answer to my question and I am hoping someone can help or
> advise. I am fitting an lmer model and would like to know if there is
> evidence for curvature or not in the regression, and if there is
> curvature, to plot the predictions and see what it looks like. I had
> been using orthogonal polynomials (poly default), but I have realised
> that I get vastly different results, in terms of significance,
> depending on whether I use orthogonals or raw. I have the impression
> from reading about this that orthogonal ones are better for my case (ie
> model selection, but not to extract the equation), but I am not sure if
> that is correct, and if it means I can follow the orthogonal or the raw
> result. Model formula and model summaries are below for raw and
> orthogonal polynomals. The data 'looks' like there should be a linear
> but not quadratic effect.
> Thank you in advance,
> 
> Rebecca Ross,
> Dphil Candidate,
> Oxford University
> 
> > summary(avalt2POLY)
> Linear mixed model fit by REML
> Formula: total_RO ~ poly(Av_alt, 2) + (1 | Block) + (1 | fam:Type)
>    Data: F2dat
>   AIC  BIC logLik deviance REMLdev
>  2557 2580  -1273     2562    2545
> Random effects:
>  Groups   Name        Variance Std.Dev.
>  fam:Type (Intercept)  28.621   5.3499
>  Block    (Intercept)  14.759   3.8417
>  Residual             111.788  10.5730
> Number of obs: 329, groups: fam:Type, 194; Block, 5 Fixed effects:
>                  Estimate Std. Error t value
> (Intercept)        23.281      1.856  12.541
> poly(Av_alt, 2)1  -82.333     12.776  -6.444
> poly(Av_alt, 2)2   -3.875     12.781  -0.303
> Correlation of Fixed Effects:
>             (Intr) p(A_,2)1
> ply(Av_,2)1  0.000
> ply(Av_,2)2  0.000 -0.002
> > summary(avalt2RAW)
> Linear mixed model fit by REML
> Formula: total_RO ~ Av_alt + I(Av_alt^2) + (1 | Block) + (1 | fam:Type)
>    Data: F2dat
>   AIC  BIC logLik deviance REMLdev
>  2605 2627  -1296     2562    2593
> Random effects:
>  Groups   Name        Variance Std.Dev.
>  fam:Type (Intercept)  28.621   5.3499
>  Block    (Intercept)  14.759   3.8417
>  Residual             111.788  10.5730
> Number of obs: 329, groups: fam:Type, 194; Block, 5 Fixed effects:
>               Estimate Std. Error t value
> (Intercept)  3.709e+01  1.024e+01   3.623
> Av_alt      -6.827e-03  1.444e-02  -0.473
> I(Av_alt^2) -1.479e-06  4.877e-06  -0.303 Correlation of Fixed Effects:
>             (Intr) Av_alt
> Av_alt      -0.974
> I(Av_alt^2)  0.950 -0.993
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models




More information about the R-sig-mixed-models mailing list