[R] OLS variables

Tue Nov 8 00:35:57 CET 2005

Dear Brian,

I like the idea of providing support for raw polynomials in poly() and
polym(), if only for pedagogical reasons.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] 
> Sent: Monday, November 07, 2005 11:14 AM
> To: John Fox
> Cc: r-help at stat.math.ethz.ch; 'Kjetil Brinchmann halvorsen'
> Subject: RE: [R] OLS variables
> 
> On Mon, 7 Nov 2005, John Fox wrote:
> 
> > Dear Brian,
> >
> > I don't have a strong opinion, but R's interpretation seems more 
> > consistent to me, and as Kjetil points out, one can use polym() to 
> > specify a full-polynomial model. It occurs to me that ^ and 
> ** could 
> > be differentiated in model formulae to provide both.
> 
> However, poly[m] only provide orthogonal polynomials, and I 
> have from time to time considered extending them to provide 
> raw polynomials too.
> Is that a better-supported idea?
> 
> >
> > Regards,
> > John
> >
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> >> -----Original Message-----
> >> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> >> Sent: Monday, November 07, 2005 4:05 AM
> >> To: Kjetil Brinchmann halvorsen
> >> Cc: John Fox; r-help at stat.math.ethz.ch
> >> Subject: Re: [R] OLS variables
> >>
> >> On Sun, 6 Nov 2005, Kjetil Brinchmann halvorsen wrote:
> >>
> >>> John Fox wrote:
> >>>>
> >>>> I assume that you're using lm() to fit the model, and that
> >> you don't
> >>>> really want *all* of the interactions among 20 predictors:
> >> You'd need
> >>>> quite a lot of data to fit a model with 2^20 terms in it,
> >> and might
> >>>> have trouble interpreting the results.
> >>>>
> >>>> If you know which interactions you're looking for, then why not 
> >>>> specify them directly, as in lm(y ~  x1*x2 + x3*x4*x5 +
> >> etc.)? On the
> >>>> other hand, it you want to include all interactions, say, up to 
> >>>> three-way, and you've put the variables in a data frame,
> >> then lm(y ~ .^3, data=DataFrame) will do it.
> >>>
> >>> This is nice with factors, but with continuous variables,
> >> and need of
> >>> a response-surface type, of model, will not do. For 
> instance, with 
> >>> variables x, y, z in data frame dat
> >>>    lm( y ~ (x+z)^2, data=dat )
> >>> gives a model mwith the terms x, z and x*z, not the square terms.
> >>> There is a need for a semi-automatic way to get these, for
> >> instance,
> >>> use poly() or polym() as in:
> >>>
> >>> lm(y ~ polym(x,z,degree=2), data=dat)
> >>
> >> This is an R-S difference (FAQ 3.3.2).  R's formula parser always 
> >> takes
> >> x^2 = x whereas the S one does so only for factors.  This 
> makes sense 
> >> it you interpret `interaction' strictly as in John's 
> description - S 
> >> chose to see an interaction of any two continuous variables as 
> >> multiplication (something which puzzled me when I first 
> encountered 
> >> it, as it was not well documented back in 1991).
> >>
> >> I have often wondered if this difference was thought to be an 
> >> improvement, or if it just a different implementation of the 
> >> Rogers-Wilkinson syntax.
> >> Should we consider changing it?
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595