[R] OLS variables

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Nov 7 17:13:45 CET 2005


On Mon, 7 Nov 2005, John Fox wrote:

> Dear Brian,
>
> I don't have a strong opinion, but R's interpretation seems more consistent
> to me, and as Kjetil points out, one can use polym() to specify a
> full-polynomial model. It occurs to me that ^ and ** could be differentiated
> in model formulae to provide both.

However, poly[m] only provide orthogonal polynomials, and I have from time 
to time considered extending them to provide raw polynomials too.
Is that a better-supported idea?

>
> Regards,
> John
>
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> --------------------------------
>
>> -----Original Message-----
>> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
>> Sent: Monday, November 07, 2005 4:05 AM
>> To: Kjetil Brinchmann halvorsen
>> Cc: John Fox; r-help at stat.math.ethz.ch
>> Subject: Re: [R] OLS variables
>>
>> On Sun, 6 Nov 2005, Kjetil Brinchmann halvorsen wrote:
>>
>>> John Fox wrote:
>>>>
>>>> I assume that you're using lm() to fit the model, and that
>> you don't
>>>> really want *all* of the interactions among 20 predictors:
>> You'd need
>>>> quite a lot of data to fit a model with 2^20 terms in it,
>> and might
>>>> have trouble interpreting the results.
>>>>
>>>> If you know which interactions you're looking for, then why not
>>>> specify them directly, as in lm(y ~  x1*x2 + x3*x4*x5 +
>> etc.)? On the
>>>> other hand, it you want to include all interactions, say, up to
>>>> three-way, and you've put the variables in a data frame,
>> then lm(y ~ .^3, data=DataFrame) will do it.
>>>
>>> This is nice with factors, but with continuous variables,
>> and need of
>>> a response-surface type, of model, will not do. For instance, with
>>> variables x, y, z in data frame dat
>>>    lm( y ~ (x+z)^2, data=dat )
>>> gives a model mwith the terms x, z and x*z, not the square terms.
>>> There is a need for a semi-automatic way to get these, for
>> instance,
>>> use poly() or polym() as in:
>>>
>>> lm(y ~ polym(x,z,degree=2), data=dat)
>>
>> This is an R-S difference (FAQ 3.3.2).  R's formula parser
>> always takes
>> x^2 = x whereas the S one does so only for factors.  This
>> makes sense it you interpret `interaction' strictly as in
>> John's description - S chose to see an interaction of any two
>> continuous variables as multiplication (something which
>> puzzled me when I first encountered it, as it was not well
>> documented back in 1991).
>>
>> I have often wondered if this difference was thought to be an
>> improvement, or if it just a different implementation of the
>> Rogers-Wilkinson syntax.
>> Should we consider changing it?

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list