[R--gR] Basic algorithms into R

Tue Oct 22 11:37:09 CEST 2002

Dear gRs,

Below I have tried to identify some central and helpful algorithms that
it would be nice to have within R. Some of them are implemented already
somewhere, so porting or slight modification might be the only thing
needed. However, some of them are also `generic', and should be thought
of as extendable and modifiable. Some are algorithms on graphs, some
involve data more directly. Usually, most implementations will use a
combination of these. 

My descriptions are probably too diffuse for direct implementation of
any of them, but the list may hopefully be useful as some kind of
starting point for structuring of ideas.

Comments from everyone are appreciated. I consider this list, together
with the graph algorithm discussions, as forming the first preliminary
basis for designing a proper work programme, identifying what should be
done and, in particular, who should do it...

I will try to integrate these things into a "Discussion White Paper"
which is getting born, but it seems a rather difficult process, so it
might take a little while. When thinking about it, there is much more
than I expected. So one should identify doable subsets.

And, I am still fighting hard with the conceptual integration of the
"fitting and selecting"-cluster with the "building and
inferring"-cluster. It may eventually not be possible nor fruitful, but
I am not intending to give up for a while...

Best regards
Steffen
-- 
Steffen L. Lauritzen
Department of Mathematical Sciences, Aalborg University
Fredrik Bajers Vej 7G, DK-9220 Aalborg, Denmark
phone: +45 9635 8858; fax: +45 9815 8129
email: steffen at math.auc.dk; URL: http://www.math.auc.dk/~steffen
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Some fundamental algorithms for graphical models. 
-------------------------------------------------

patefield(x,y,k):

input: vectors (x,y) of non-negative integers with sum(x)=sum(y), k an
integer;
output: k independent random integer matrices m of dimensions
(length(x),length(y)) with apply(m,1,sum)=x and apply(m,2,sum)=y and
distributed as iid Poisson matrices with given row- and column-sums.

Implemented in DIGRAM,CoCo,MIM.
--------------
IPS-algorithm:

input: list of multivariate categorical cases or integer table of
counts, as well as a model formula identifying sufficient marginals.
output: list of estimated log-probabilities of observed cases or table
of fitted counts+ various statistics. (log-probs can be accumulated to
obtain likelihoods)

Comment:
Exists already in R(MASS) as part of loglin(loglm). However, I think the
current R-implementation has exponential complexity in dimension of
table, irrespectively of model formula (?), see below. This can be dealt
with, but demands some additional things. Implemented in CoCo, MIM.
DIGRAM? The efficient version is implemented in CoCo only, as far as I
know (cf Badsberg and Malvestuto,~2000).

Note: 
Here is an example, where the classical notion of sufficient
``reduction" is not directly as helpful as it seems. A list of, say,
100000 cases of observation of 100 attributes can be stored in at most
10Million data slots (one slot per data point). The sufficient
``reduction" to a contingency table has 2^(100) slots...(almost all of
which have 0 as counts), so it is hard for a computer scientist or
dataminer to see that this is a "reduction" of anything. The important
thing for the IPS is then not to use or form the table at all, but work
directly on the list of observations and only use sufficient reductions
in connection with smaller submodels. CoCo has this way on thinking
integrated very strongly.
-------------
MIPS-algorithm 

input: list of multivariate categorial or real-valued cases, or sets of
sufficient statistics (counts, tables of mean vectors, tables of
SSD-matrices), as well as MIM-model formula. 
output: list of estimated log-probabilities/densities of observed cases
or tables of fitted counts, means, SSDs, etc.

Comment: Needs MIM model formulae language implemented. Same comments as
above wrt sufficiency, when concerned with discrete part. Is implemented
in MIM (and now also CoCo, I think, maybe even in efficient version?)
----------------------
E-imputation

Input: incomplete data and graphical model, with parameters; 
output: "expected data", imputed according to graphical model

Comment: implemented as part of EM-algorithm in CoCo for undirected
model,  HUGIN for discrete DAG-model, MIM for mixed models. Efficient
version uses probability propagation, see below. I prefer the imputation
step separated from the full EM-algorithm, because the imputation step
can be helpful in other contexts.
------------
MC-imputation

Input: incomplete data and graphical model;
output: "random data", sampled from the conditional distribtuion given
data in graphical model

Comment: implemented as random propagation in HUGIN for discrete
DAG-model. Uses probability propagation to be efficient. Related to the
Patefield-algorithm. Can be used as a block-updating step in connection
with MCMC methods. Should exploit "forward sampling" when no data is
available for descendants.
-----------------------
Gibbs-imputation 

input: graphical model with incomplete data; 
output: random data, each node (or block) sampled from full
conditionals. 

comment: in BUGS. Yields MCMC sample when used iteratively. Each node
sampler should be its own little clever algorithm. A systematic approach
to blocking would also be important. Not necessarily disjoint blocks!
----------------------
Gradient computation

input: parametrised graphical model, incomplete data
output: gradient (and Hessian) of log-likelihood function.

comment: uses probability propagation/E-imputation. implemented in GAMES
(DAG-program by Thiesson, now at Microsoft). Useful for MLE in complex
models, such as specified in BUGS (but BUGS does not use or compute
gradients).

=============================================================
Algorithms associated with probability propagation:
(at the heart of HUGIN, inmplemented in some form in GRAPPA):
=============================================================
collect.evidence(root, evcliques,jtree,pot)

input: junction tree jtree of cliques, with potentials pot assigned to
cliques and separators, a root clique and selected evcliques;
output: modified potentials of jtree, obtain by message passing (collect
to root) over smallest subtree containing root and evcliques. 

comment: calculates marginal to root clique over variables in subtree.
At the heart of HUGIN and implemented in some version in GRAPPA.
------------------------------------------

distribute.evidence(root,dcliques,jtree,pot)

input: junction tree jtree of cliques, with potentials pot assigned to
cliques and separators, a root clique and selected dcliques;
output: modified potentials of jtree, obtain by message passing
(distribute from root) over smallest subtree containing root and
dcliques.

comment: to be used after collect only, calculates marginals to all
cliques in jtree. Yields essentially E-imputation. At the heart of
HUGIN, in Grappa in some version.

-----------------------------------------
distribute.sample(root,dcliques,jtree)

input: junction tree jtree of cliques, with potentials pot assigned to
cliques and separators, a root clique and selected dcliques;
output: random sample of variables, obtained by sample passing
(distribute from root) over smallest subtree containing root and
dcliques.

comment: to be used after collect only, calculates MC-imputation.
Implemented in HUGIN. Useful for MCMC applications.
--------------------------------------------------------------------------------------
message(clique1,clique2, pot1,pot2,pot12)

Input: two sets of variables with associated potentials and potentials
on their intersection
output: same, but with message passed from clique1 to clique 2.
Identical to "collect" or "distribute" for a jtree with two cliques
only.

comment: basic element in other probability propagation algorithms.
-----------------------------------------------------------------
jtree(graph)

input: undirected graph
output: junction tree of cliques of triangulated cover of graph.

comment: implemented in HUGIN with many options for triangulation
method. This is a pure graph algorithm.

=======================
Model search algorithms
========================
General comments: These will all be associated with CRITERIA that decide
accept/reject each step of the algorithm, which can then be taken as
optional arguments to algorithms. I have tried to describe the
algorithms generically.
---------------------------------
PC-algorithm (with variants)

input: multivariate data for categorical or mixed variables
output: skeleton of unexplained dependencies (undirected graph),
possibly also a DAG(equivalence class) consistent with these.

comment: in TETRAD (and HUGIN). PC-algo is the simplest, but not the
best. Can also have modes in which it looks for UGs or CGs
------------------------

EH-algorithm

input: multivariate data for categorical or mixed variables, model class
which is a lattice (graphical, hierarchical etc). some starting model
and strategy for swithcing search modes.
output: list of (undirected) graphical or hierarchical models consistent
with data. 

comment: in MIM and CoCo. DIGRAM?
----------------------------
Local search algorithms

input: data and model class
output: sequence of models (or equivalence classes of such) considered
according to some criterion, with associated value of criterion.

comment: DEAL is an example of this. many such algos are in MIM, CoCo,
TETRAD. Steps of algos in DIGRAM.
-----------------------------------
Variational inference:

input: graphical model with incomplete data (as in BUGS).
output: variational approximation to node distribution

comment: not implemented in standard software as far as I know. Some
plans for VIBES (Microsoft) exist, but simpler and cruder than what I
have in mind. I am thinking of what corresponds to the theory sketched
in Beal and Ghahramani, Valencia 7, which essentially would be BUGS with
VI rather than MCMC. VIBES uses normal approximations throughout, I
think, essentially therefore just calculating first and second
(conditional) moments.