[R] example to demonstrate benefits of poly in regression?

Mon Apr 1 20:09:23 CEST 2013

Here is an example of the numerical instability that poly() can help with.  Imagine a 100-minute experiment where you chose to record the times as POSIXt objects (stored as seconds since 1970).  Using poly() gives the right predicted values, but using I(x^2) does not:

> d <- data.frame(Date=as.POSIXct("2013-04-01")+60*(1:100), yQuadratic=(function(x)5*x-4*x^2)( (1:100)/50 ))
> d$NumDate <- as.numeric(d$Date)
> newD <- d[c(10, 80, 95),]
> cbind(newD, PQuadratic=predict(lm(yQuadratic~NumDate+I(NumDate^2),data=d),newdata=newD))
                  Date yQuadratic    NumDate PQuadratic
10 2013-04-01 00:10:00       0.84 1364800200     2.1312
80 2013-04-01 01:20:00      -2.24 1364804400    -2.1808
95 2013-04-01 01:35:00      -4.94 1364805300    -3.1048
Warning message:
In predict.lm(lm(yQuadratic ~ NumDate + I(NumDate^2), data = d),  :
  prediction from a rank-deficient fit may be misleading
> cbind(newD, PQuadratic=predict(lm(yQuadratic~poly(NumDate,2),data=d),newdata=newD))
                  Date yQuadratic    NumDate PQuadratic
10 2013-04-01 00:10:00       0.84 1364800200       0.84
80 2013-04-01 01:20:00      -2.24 1364804400      -2.24
95 2013-04-01 01:35:00      -4.94 1364805300      -4.94

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Paul Johnson
> Sent: Monday, April 01, 2013 10:21 AM
> To: R-help
> Subject: [R] example to demonstrate benefits of poly in regression?
> 
> Here's my little discussion example for a quadratic regression:
> 
> http://pj.freefaculty.org/R/WorkingExamples/regression-quadratic-1.R
> 
> Students press me to know the benefits of poly() over the more obvious
> regression formulas.
> 
> I think I understand the theory on why poly() should be more numerically
> stable, but I'm having trouble writing down an example that proves the
> benefit of this.
> 
> I am beginning to suspect that because R's lm uses a QR decomposition under
> the hood, the instability that we used to see when we inverted (X'X) is no
> longer so serious as it used to be. When the columns in X are
> 
> x1 x2 x3 x3squared
> 
> then the QR decomposition is doing about the same thing as we would do if
> we replaced x3 and x3squared by the poly(x3, 2) results.
> 
> Or not. I'm not arguing that, I'm thinking out loud, hoping you'll correct
> me.
> 
> pj
> 
> 
> --
> Paul E. Johnson
> Professor, Political Science      Assoc. Director
> 1541 Lilac Lane, Room 504      Center for Research Methods
> University of Kansas                 University of Kansas
> http://pj.freefaculty.org               http://quant.ku.edu
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.