[R--gR] Modelformulae

Thu Aug 19 11:39:21 CEST 2004

Dear gR-folks

The Danish gR-gang have been talking about describing a model language for
graphical models that

1) could specify at least chain graph models, based on the most general
hierarchical mixed models as
described in Lauritzen (1996) [my book], section 6.4, pages 199-216. (More
general than MIM-models).

2) did not confuse people who were accustomed to glim-type notation and
formulae

3) did not conflict too much with existing formula conventions (MIM, ggm)

4) was clear and unambiguous, and immediately understandable without too
much explanation

5) did not conflict too much with the whole idea and setup of graphical
interaction models

6) accomodates idea of multiple response variables

Here is a first attempt. It may well work, but I would appreciate having
response back if I have overlooked some nasty conflicts or bad sides to
this.

The whole issue is somewhat plagued by the "coincidental" fact that
*intrinsically multivariate* log-linear models via "the Poisson trick" can
be described through univariate response models for the counts.

Below I will first describe the basic general setup, then some conventions
which enable people to use alternative, more traditional approaches, without
ambiguity.

What do you all think of this? Please reply to the entire list...;-)

If it works, the suggestion would be for gRbase to adopt it and abandon
MIM-notation alltogether, as the latter is slightly different in style.

Hopefully it can also be extended to cover BUGS-type models without too many
direct conflicts.

Best regards
Steffen

--
Steffen L. Lauritzen
Department of Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, United Kingdom
Tel: +44 1865 272877; Fax: +44 1865 272595
email: steffen at stats.ox.ac.uk URL: www.stats.ox.ac.uk/~steffen/

---------

The following signs are (at least) permissible:  ~, + , *  ,  :  ,  ^  ,  .
and  |

~ indicates the beginning of a formula. Implicitly think of

log f ~ ....

| denotes parenthood in graph, equiv to normalising/conditioning

+ denotes multiplicative combination (log-additive). Chain components must
be contained within parentheses.

* or : denotes (tensor)product of interaction terms, decomposed into terms
of lower order or not, i.e.
A*B*C specifies all subsets of ABC, whereas A:B:C only uses ABC.

strength of bindings   (*,:)   >   +   >  |

examples of legal formulae (same model with three chain components
specified)

m <- gm( log f ~ (A:B+C:D|D)+(B*E|E)+(D*E|E))

m <- gm( ~ (A:B+C*D|D), ~(B*E|E)+(D*E|E))

hierarchical models, as in  CoCoCg and Lauritzen (1996)cf p. 213

~ A+B:X+B*Y+A*B*X^2+A*X:Y+Y^2 not a mim-model

~ A+B:X+A*Y+A*(X+Y)^2 = mim(A+B/AX+BY/AXY)

some different models

m1<- gm(~A*B+C*D|B*D) equiv  gm(~A*B+C*D+B*D|B*D)

m2<-gm(~((B+D)*E)|E)

m<-b(m1,m2)

m <- gm( ~ (A*B)+(C*D|D)+(B*E+D*E|E))

m<- gm( ~ (A*B)+(C|D)+(B+D|E))

CONVENTION for compatibility with standard regression and ggm:

Y~X+U:A is the same as ~(Y:X+Y:U:A |XUA) = ~(Y:(X+U:A) |XUA),

that is: *If * there is a variable on the left hand side of ~, this is a
response to the variables on the right hand side, and the interaction
structure is the product of right and left hand sides.

Work still needs to be done to identify when models are legal, the same, and
parse them for proper and correct analysis.

Is this the way ahead?