[R] Specify model with polynomial interaction terms up to degree n

David Winsemius dwinsemius at comcast.net
Tue Jul 3 00:05:46 CEST 2012


On Jul 2, 2012, at 4:13 PM, YTP wrote:

> I am not sure that method works. It appears to be doing something  
> close, but
> returns too many slope coefficients, since I think it is returning
> interaction terms of degree smaller and greater than what was passed  
> to it.

No. It _was_ doing what you asked for. I think you don't understand  
the expansion.

>
> Here is a small example of degree 2:
>
>> X = data.frame(cbind(c(1,2,3), c(4,5,6)))
>> X
>  X1 X2
> 1  1  4
> 2  2  5
> 3  3  6
>
>> y = c(0,1,0)
>
>
>> mymodel2 = glm(y ~ poly(X$X1,2)*poly(X$X2,2),
>> family=binomial(link="logit"))
>
>
> # We see slope coefficients returned for interaction terms of degree  
> 1, 3,
> and 4.

Degree 3 and 4 ???  That wasn't how I would have described the results  
(there being only first and second order terms as determined by the  
'degree' argument), but the fact remains.... You cannot estimate more  
than three parameters with a dataset of only 3 points. That is basic  
math.

It seems you have demonstrated that these weapons are "too sharp" to  
be wielded safely in your hands, so maybe you should step back a few  
paces and re-consider what you are really trying to accomplish. Is the  
goal really curve fitting with with N^2 polynomial parameters. And do  
you _really_ want to be describing to your audience the interpretation  
of models created with a logit link that have high degree polynomial  
terms? Or are you instead interested in flexible regression for  
purposes of description or exploratory analyses such as provided by  
loess (one form of local regression) or perhaps GAM's with smoothing  
terms?

test = data.frame(y = rnorm(1000),X1=rnorm(1000), X2=rnorm(1000) )
plot.new()
persp( x= seq(-2, 2,by=0.1),  y= seq(-2, 2,by=0.1),
        z= predict(mymodel2,
           newdata =expand.grid(X1=seq(-2, 2,by=0.1), X2=seq(-2,  
2,by=0.1)) ) ,
        ticktype="detailed")

This has the advantage that the polynomial basis is hidden and you  
only end up looking at the predictions. I also like working with Frank  
Harrell's 'rms' package because his perimeter function restricts  
plotted estimates to regions with sufficient data and the restricted  
cubic splines have less tendency to blow-up near the boundaries of hte  
data.


>
>> summary(mymodel2)
>
>                                Estimate Std. Error z value Pr(>|z|)
> (Intercept)                   -7.855e+00  4.588e+04       0        1
> poly(X$X1, 2)1                 6.059e-15  7.946e+04       0        1
> poly(X$X1, 2)2                -3.848e+01  7.946e+04       0        1
> poly(X$X2, 2)1                        NA         NA      NA       NA
> poly(X$X2, 2)2                        NA         NA      NA       NA
> poly(X$X1, 2)1:poly(X$X2, 2)1         NA         NA      NA       NA
> poly(X$X1, 2)2:poly(X$X2, 2)1         NA         NA      NA       NA
> poly(X$X1, 2)1:poly(X$X2, 2)2         NA         NA      NA       NA
> poly(X$X1, 2)2:poly(X$X2, 2)2         NA         NA      NA       NA
>
>
>
>
>
> # Data used in the model
>
>> mymodel2$model
>
>  y poly(X$X1, 2).1 poly(X$X1, 2).2 poly(X$X2, 2).1 poly(X$X2, 2).2
> 1 0   -7.071068e-01    4.082483e-01   -7.071068e-01    4.082483e-01
> 2 1    4.350720e-18   -8.164966e-01    4.350720e-18   -8.164966e-01
> 3 0    7.071068e-01    4.082483e-01    7.071068e-01    4.082483e-01
>
>

Orthogonal basis.


> where instead I would expect the data to be like
>
> 1  4   16
> 4  10  25
> 9  18  36

?poly

You can get something like that with poly(X1, degree=2, raw=TRUE)

 > poly(1:3, degree=2, raw=TRUE)
      1 2
[1,] 1 1
[2,] 2 4
[3,] 3 9
attr(,"degree")
[1] 1 2
attr(,"class")
[1] "poly"   "matrix"


-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list