Variable Length Markov Chains:
Methodology, Computing and Software

Martin Mächler and Peter Bühlmann

March 2002

Abstract

We present a tutorial and new publicly available computational tools for variable length Markov chains (VLMC). VLMC's are Markov chains with the additional attractive structure that their memories depend on a variable number of lagged values, depending on how the actual past (the lagged values) looks like. They build a very flexible class of tree structured models for categorical time series. Fitting VLMC's from data is a non-trivial computational task. We provide an efficient implementation of the so-called context algorithm which requires O(n log(n)) operations only. The implementation, which is publicly available, includes additional important new features and options: diagnostics, goodness of fit, simulation and bootstrap, residuals and tuning the context algorithm. Our tutorial is presented with a version in R which is available from the Comprehensive R Archive Network CRAN . The exposition is self-contained, gives rigorous and partly new mathematical descriptions and is illustrated by analyzing a DNA sequence from the Epstein-Barr virus.

Download:

Compressed Postscript (156 Kb)
PDF (183 Kb)


Go back to the Research Reports from Seminar für Statistik.