[R] Order of terms in a model specification...

Thu Nov 10 08:18:43 CET 2005

On Wed, 9 Nov 2005, Duncan Murdoch wrote:

> On 11/9/2005 3:48 PM, Oliver Lyttelton wrote:
>>
>> Sorry for this one as its pretty basic but I've taken a look for info and
>> couldn't find any...

The only reasonably comprehensive account of this of which I am aware is 
in Chapter 6 of MASS (the book).  That is for S, and there are S/R 
differences (not all of which I suspect any one person is aware off).

>> My question is, does the order of main effect terms in a model specification
>> have any impact on the model R fits or not. (in particular when using lm).
>> ie
>>
>> Can A~X+Y+Z lead to different results to A~Z+Y+X, and if so in what
>> circumstances, and how much should I worry about it?

Depemds what you mean by `different results'.   It will specify the same 
model subspace, but not the same basis vectors for that space.

Collinearity is one example, as Duncan points out.  Factors with no 
intercept are another, so A + B - 1 and B + A - 1 will be different
representations in A and B are factors.

Interactions introduce futher complications. Note that unless you jump 
though hoops, the model-fitting does reorder terms in a formula (see the 
keep.order argument to terms.formula). Under some circumstances the order 
in which interactions are specified (A:B vs B:A) can matter.

>> I believe this is an implementation detail as it depends on the way the
>> fitting algorithm works, but it would be great to have a few lines to plug
>> this gap in my knowledge...
>
> Definitely yes, in the case of collinear terms.  For example,
>
> > X <- rnorm(10)
> > Y <- rnorm(10)
> > Z <- X
> > A <- rnorm(10)
> > lm(A ~ X+Y+Z)
>
> Call:
> lm(formula = A ~ X + Y + Z)
>
> Coefficients:
> (Intercept)            X            Y            Z
>     -0.3474      -0.1166      -0.2203           NA
>
> > lm(A ~ Z+Y+X)
>
> Call:
> lm(formula = A ~ Z + Y + X)
>
> Coefficients:
> (Intercept)            Z            Y            X
>     -0.3474      -0.1166      -0.2203           NA
>
>
> In one case X gets a coefficient and Z doesn't, but the other is the
> opposite.
>
> I suspect there would be differences due to rounding in other
> situations, and they might be noticeable in the case of near-collinearity.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595