[R--gR] Model formulae for graphical models in R
Giovanni Marchetti
gmm at ds.unifi.it
Thu Oct 24 19:09:14 CEST 2002
*** Model formulae for graphical models in R ***
I would like to suggest that one way to specify graphical models in R
is via model formulae, in Wilkinson and Rogers' (not MIM) notation.
The existing R syntax for model formulae is powerful enough
to define directed acyclic graphs (and cyclic graphs),
concentration (and covariance) graphs and joint-response chain graphs.
I will illustrate the idea with some examples, by assuming
the existence of a function graph() that accepts as input
a model formula and other arguments and gives an object of class graph as
result.
(I use a printed version of the graphs that could be
illegible on some systems. I apologize for this.)
o Concentration graphs
The graph
A
/ \
B---C -- D -- E
is defined by specifying the cliques (as for log-linear models),
with a single model formula with no response:
> g <- graph(~ A*B*C + C*D + D*E)
> g
A B C D E
A - 1 1 0 0
B 1 - 1 0 0
C 1 1 - 0 1
D 0 0 0 - 1
E 0 0 1 1 -
(The output is the printed representation of the edge matrix of the graph.)
(The graph is plotted with plot(g).)
o Covariance graphs
The covariance graph with the same UG and cliques is specified
by adding the edge type (for example type=2, for dashed
edges):
> g <- graph(~ A*B*C + C*D + D*E, type=2)
> g
A B C D E
A - 2 2 0 0
B 2 - 2 0 0
C 2 2 - 0 2
D 0 0 0 - 2
E 0 0 2 2 -
There is no way to define an undirected graph
with a mixture of full and dashed edges.
o DAGs
The DAG
A <- B <- C <- F
^ ^
| |
D <- E
is defined as a sequence of regression models:
> g <- graph(A ~ B, B ~ C+D, C ~ E*F, D ~ E)
> g
A B C D E F
A - 1 0 0 0 0
B 0 - 1 1 0 0
C 0 0 - 0 1 1
D 0 0 0 - 1 0
E 0 0 0 0 - 0
F 0 0 0 0 0 -
Each regression model has one response and the
explanatory nodes combined in additive (with +) or interactive way
(with *).
The order of the variables in the edge matrix is deduced from
the model formulae (or it could be permuted to give an
upper triangular matrix).
o Joint-response chain graphs
Joint-response chain graphs are defined by mixing
model-formulae with no response (used for concentration
or covariance models within blocks) and model formulae
with a response for concentration regressions
or multivariate regressions. For example, consider
the CG model:
----- -----
| Y <----------------|--A |
| \ | ----- | | |
| X | | U | | B |
| / | | | | | |
| Z | <-- | V | <-|- C |
----- ----- -----
It can be defined with:
> g <- graph( ~ Y*X + X*Z, ~ U*V, ~ A*B+C, Y~X+A, Z ~ X+V, V ~ U+C)
Remark: Given the graph
---- -----
| X | <-- | U |
| | | | | |
| Y | <-- | V |
---- -----
note the difference between a block regression
g <- graph( ~ X*Y, ~ U*V, X ~ Y+U, Y ~ X+V)
and a multivariate regression
g <- graph( ~ X*Y, ~ U*V, X ~ U, Y ~ V)
The edge types within the boxes could be indicated as well
as a numeric vector with lenght equal to the number of boxes
(in the specified order).
Giovanni
--
< Giovanni M. Marchetti >
Dipartimento di Statistica, Univ. di Firenze Phone: +39 055 4237 204
viale Morgagni, 59 Fax: +39 055 4223 560
I 50134 Firenze, Italy email: gmm at ds.unifi.it
More information about the R-sig-gR
mailing list