[R] poly(x) workaround when x has missing values

Thu Jan 25 02:39:39 CET 2007

Often in practical situations a predictor has missing values, so that poly
crashes. For instance:

> x<-1:10
> y<- x -  3 * x^2 + rnorm(10)/3
> x[3]<-NA
> lm( y ~ poly(x,2) )
Error in poly(x, 2) : missing values are not allowed in 'poly'
>
> lm( y ~ poly(x,2) , subset=!is.na(x)) # This does not help?!?
Error in poly(x, 2) : missing values are not allowed in 'poly'

The following function seems to be an okay workaround.

Poly<- function(x, degree = 1, coefs = NULL, raw = FALSE, ...) {
        notNA<-!is.na(x)
        answer<-poly(x[notNA], degree=degree, coefs=coefs, raw=raw, ...)
        THEMATRIX<-matrix(NA, nrow=length(x), ncol=degree)
        THEMATRIX[notNA,]<-answer
        attributes(THEMATRIX)[c('degree', 'coefs', 'class')]<- attributes(answer)[c('degree', 'coefs', 'class')]
        THEMATRIX
}

>  lm( y ~ Poly(x,2) )

Call:
lm(formula = y ~ Poly(x, 2))

Coefficients:
(Intercept)  Poly(x, 2)1  Poly(x, 2)2
      209.1        475.0        114.0

and it works when x and y are in a dataframe too:

> DAT<-data.frame(x=x, y=y)
> lm(y~Poly(x,2), data=DAT)

Call:
lm(formula = y ~ Poly(x, 2), data = DAT)

Coefficients:
(Intercept)  Poly(x, 2)1  Poly(x, 2)2
    -119.54      -276.11       -68.24

Is there a better way to do this? My workaround seems a bit awkward.
Whoever wrote "poly" must have had a good reason for not making it deal
with missing values?

Thanks for any thoughts

Jacob Wegelin