[R--gR] Model formulae for graphical models in R

Giovanni Marchetti gmm at ds.unifi.it
Thu Oct 24 19:09:14 CEST 2002


*** Model formulae for graphical models in R ***
  
I would like to suggest that one way to specify  graphical models in R 
is via model formulae, in  Wilkinson and Rogers' (not MIM) notation.
The existing R syntax for model formulae is powerful enough
to define directed acyclic graphs (and cyclic graphs), 
concentration (and covariance) graphs and joint-response chain graphs.  

I will illustrate the idea with some examples, by assuming 
the existence of a function graph() that accepts as input 
a model formula and other arguments and gives an object of class graph as 
result. 

(I use a printed version of the graphs that could be 
illegible on some systems. I apologize for this.)

o Concentration graphs

The graph 

  A
 / \
B---C -- D -- E

is defined by specifying the cliques (as for log-linear models),
with a single model formula with no response:

> g <- graph(~ A*B*C + C*D + D*E)
> g
       A      B      C      D      E
A      -      1      1      0      0
B      1      -      1      0      0
C      1      1      -      0      1
D      0      0      0      -      1
E      0      0      1      1      -

(The output is the printed representation of the edge matrix of the graph.)
(The graph is plotted  with plot(g).)    

o Covariance graphs

The covariance graph with the same UG and cliques is specified 
by adding the edge type (for example type=2, for dashed
edges):
 

> g <- graph(~ A*B*C + C*D + D*E, type=2)
> g
       A      B      C      D      E
A      -      2      2      0      0
B      2      -      2      0      0
C      2      2      -      0      2
D      0      0      0      -      2
E      0      0      2      2      -

There is no way  to define an undirected graph 
with a mixture of full and dashed edges.

o DAGs

The  DAG
         
      
A <- B <- C <- F
     ^    ^
     |    |
     D <- E
    
is defined as a sequence of regression models:

> g <- graph(A ~ B, B ~ C+D, C ~ E*F, D ~ E) 
> g
      A   B   C   D   E   F  
A     -   1   0   0   0   0
B     0   -   1   1   0   0
C     0   0   -   0   1   1
D     0   0   0   -   1   0
E     0   0   0   0   -   0
F     0   0   0   0   0   -

Each regression model has one response and the 
explanatory nodes combined in additive (with +) or interactive way
(with *).  

The order of the variables in the edge matrix is deduced from 
the model formulae (or it could be permuted to give an
upper  triangular matrix). 

o Joint-response chain graphs

Joint-response chain graphs are defined by mixing 
model-formulae with no response (used for concentration 
or covariance models within blocks) and model formulae
with a response for concentration regressions
or multivariate regressions. For example, consider
the CG model:

 -----                 -----
| Y  <----------------|--A  |
|  \  |      -----    |  |  |
|   X |     |  U  |   |  B  |
|  /  |     |  |  |   |     |
| Z   | <-- |  V  | <-|- C  |
 -----       -----     -----

It can be defined with:

> g <- graph( ~ Y*X + X*Z, ~ U*V, ~ A*B+C, Y~X+A, Z ~ X+V, V ~ U+C)

Remark: Given the graph 

 ----        ----- 
| X  | <--  |  U  |
| |  |      |  |  |
| Y  | <--  |  V  |
 ----        ----- 

note the difference between a block regression  

g <- graph( ~ X*Y, ~ U*V, X ~ Y+U, Y ~ X+V)

and a multivariate regression
 
g <- graph( ~ X*Y, ~ U*V, X ~ U, Y ~ V)

The edge types within the boxes could be indicated as well
as a numeric vector with lenght equal to the number of boxes 
(in the specified order). 


Giovanni

-- 
< Giovanni M. Marchetti >
Dipartimento di Statistica, Univ. di Firenze   Phone:  +39 055 4237 204
viale Morgagni, 59                             Fax:    +39 055 4223 560
I 50134 Firenze, Italy                         email:  gmm at ds.unifi.it





More information about the R-sig-gR mailing list