[R] OLS variables
John Fox
jfox at mcmaster.ca
Mon Nov 7 17:06:31 CET 2005
Dear Brian,
I don't have a strong opinion, but R's interpretation seems more consistent
to me, and as Kjetil points out, one can use polym() to specify a
full-polynomial model. It occurs to me that ^ and ** could be differentiated
in model formulae to provide both.
Regards,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
> -----Original Message-----
> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> Sent: Monday, November 07, 2005 4:05 AM
> To: Kjetil Brinchmann halvorsen
> Cc: John Fox; r-help at stat.math.ethz.ch
> Subject: Re: [R] OLS variables
>
> On Sun, 6 Nov 2005, Kjetil Brinchmann halvorsen wrote:
>
> > John Fox wrote:
> >>
> >> I assume that you're using lm() to fit the model, and that
> you don't
> >> really want *all* of the interactions among 20 predictors:
> You'd need
> >> quite a lot of data to fit a model with 2^20 terms in it,
> and might
> >> have trouble interpreting the results.
> >>
> >> If you know which interactions you're looking for, then why not
> >> specify them directly, as in lm(y ~ x1*x2 + x3*x4*x5 +
> etc.)? On the
> >> other hand, it you want to include all interactions, say, up to
> >> three-way, and you've put the variables in a data frame,
> then lm(y ~ .^3, data=DataFrame) will do it.
> >
> > This is nice with factors, but with continuous variables,
> and need of
> > a response-surface type, of model, will not do. For instance, with
> > variables x, y, z in data frame dat
> > lm( y ~ (x+z)^2, data=dat )
> > gives a model mwith the terms x, z and x*z, not the square terms.
> > There is a need for a semi-automatic way to get these, for
> instance,
> > use poly() or polym() as in:
> >
> > lm(y ~ polym(x,z,degree=2), data=dat)
>
> This is an R-S difference (FAQ 3.3.2). R's formula parser
> always takes
> x^2 = x whereas the S one does so only for factors. This
> makes sense it you interpret `interaction' strictly as in
> John's description - S chose to see an interaction of any two
> continuous variables as multiplication (something which
> puzzled me when I first encountered it, as it was not well
> documented back in 1991).
>
> I have often wondered if this difference was thought to be an
> improvement, or if it just a different implementation of the
> Rogers-Wilkinson syntax.
> Should we consider changing it?
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list