[R--gR] Modelformulae

Thu Aug 19 17:53:38 CEST 2004

Just a quick reply to some of David E's comments (David M's comments have
been recorded):

> 2. Syntax constraints
> * Interactions between continuous variables of max order 2, eg
> X*Y*Z is illegal
> * (I suppose X*X is equivalent to X^2?)

This needs thinking, but you are probably right.

Generally the formulas on the left hand side of the conditioning symbols
should refer to vector spaces and one should think about this all  the time
when formulas are abbreviated and combined in strange ways. Somebody (Svante
and I?) should look carefully into this.

For the moment, in my head X^2 is span(1, x, x^2). Generally X is span(1,
x).

For vectorspaces U and V, U:V denotes span of all componentwise products of
basis vectors for U and V, embedded into suitable (tensor)productspaces.

A "factor" A is span(e_alpha, alpha\in levels of A), where e_alpha =1 in
cell alpha and 0 otherwise.

A:B is then what we want it to be and X:X is X^2.

X:Y:Z (or X*Y*Z) should be deemed illegal unless one of them is a factor.

But some care may have to be taken with what products mean and how they are
interpreted.

> *     Higher order continuous interactions could be disallowed or
> ignored (prefer the former)

OK

> * Categorical variables are 'factors' in R (sorry for the
> Rothamsted ambience here)

OK, probably no way to get rid of this...

> *     I suppose A*A is illegal if A is a factor, or is it just
> equivalent to A?

with the above definition, A*A=A, but X*X is not X, when X is not a factor.

> * Conditioning symbol | is followed by a simple variable list eg
> (X,Y,A)

yes, but not with parentheses (as commented by David M)

> * no directed cycles for chain graph models
> * more ?
>
> eg ~ A*(X+Y+Z)^2|(X,Y,Z)
>

Here (X+Y+Z)^2= span(1,x,y,z)*span(1,x,y,z)= X^2+Y^2+Z^2+X:Y+X:Z+Y:Z

> 3. Functions can be used, eg ~Z+log(X), sqrt(x-min(x))
>

Careful! Translation to vector spaces needs to be clear.

> 4. Ramifications of ':'
> *     My understanding is that the use of ':' rather than '*' relates to
> different parametrisations of the same space.

A little more than that. It also specifies a lattice of models, rather than
a single one, namely the hierarchical submodels obtained be removing terms
of higher order.

> In principle when specifying a model this should be irrelevant.

Old Rothamstead ambience: Nelder implicitly specifies much more than a model
with a formula, namely a full ANOVA of the data.... He never writes so in
the theoretical part, but does so in his examples. Confusing, but true... I
a

> Or do we want to commit ourselves to a certain
>       parametrisation - if so, why?

No, not other than by the above. With a given parametrisation it becomes
particularly easy to analyse all models by very few computations (only a
single one in the normal case).

> *     I suppose if ':' is allowed we should also allow %in% and /
> (nested).
>

In principle yes, but we have to identify what it means. Split models?

Best regards
Steffen