[R] OLS variables
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Nov 7 10:05:15 CET 2005
On Sun, 6 Nov 2005, Kjetil Brinchmann halvorsen wrote:
> John Fox wrote:
>>
>> I assume that you're using lm() to fit the model, and that you don't really
>> want *all* of the interactions among 20 predictors: You'd need quite a lot
>> of data to fit a model with 2^20 terms in it, and might have trouble
>> interpreting the results.
>>
>> If you know which interactions you're looking for, then why not specify them
>> directly, as in lm(y ~ x1*x2 + x3*x4*x5 + etc.)? On the other hand, it you
>> want to include all interactions, say, up to three-way, and you've put the
>> variables in a data frame, then lm(y ~ .^3, data=DataFrame) will do it.
>
> This is nice with factors, but with continuous variables, and need of a
> response-surface type, of model, will not do. For instance, with
> variables x, y, z in data frame dat
> lm( y ~ (x+z)^2, data=dat )
> gives a model mwith the terms x, z and x*z, not the square terms.
> There is a need for a semi-automatic way to get these, for instance,
> use poly() or polym() as in:
>
> lm(y ~ polym(x,z,degree=2), data=dat)
This is an R-S difference (FAQ 3.3.2). R's formula parser always takes
x^2 = x whereas the S one does so only for factors. This makes sense it
you interpret `interaction' strictly as in John's description - S chose
to see an interaction of any two continuous variables as multiplication
(something which puzzled me when I first encountered it, as it was not
well documented back in 1991).
I have often wondered if this difference was thought to be an improvement,
or if it just a different implementation of the Rogers-Wilkinson syntax.
Should we consider changing it?
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list