[R] Making a markov transition matrix

Sun Jan 22 12:51:09 CET 2006

That solution for the case 'with gaps' merely omits transitions where
the transition information is not for a single time step.  (Mine can be
modified for this as well - see below.)

But if you know that a firm went from state i in year y to state j in
year y+3, say, without knowing the intermediate states, that must tell
you something about the 1-step transition matrix as well.  How do you
use this information?

That's a much more difficult problem but you can do it using maximum
likelihood, e.g.  You think about how to calculate the likelihood
function - and then to optimise it.  This is getting a bit away from the
original 'programming trick' question, but it is an interesting problem
that occurs more often than I had realised.  I'd be interested in
knowing if anyone had done anything slick in this area.

Bill Venables.

-----Original Message-----
From: Ajay Narottam Shah [mailto:ajayshah at mayin.org] 
Sent: Sunday, 22 January 2006 5:15 PM
To: R-help
Cc: jholtman at gmail.com; Venables, Bill (CMIS, Cleveland)
Subject: Re: [R] Making a markov transition matrix

On Sun, Jan 22, 2006 at 01:47:00PM +1100, Bill.Venables at csiro.au wrote:
> If this is a real problem, here is a slightly tidier version of the
> function I gave on R-help:
> 
> transitionM <- function(name, year, state) {
>   raw <- data.frame(name = name, state = state)[order(name, year), ]
>   raw01 <- subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), 
>                         name == name.1)
>   with(raw01, table(state, state.1))
> }

To modify this solution for the 'with gaps' case, omitting multiple step
transitions, you need to include the year in the 'raw' data frame and
then just change the subset condition to

			name == name.1 & year == year.1 - 1

> 
> Notice that this does assume there are 'no gaps' in the time series
> within firms, but it does not require that each firm have responses
for
> the same set of years.
> 
> Estimating the transition probability matrix when there are gaps
within
> firms is a more interesting problem, both statistically and, when you
> figure that out, computationally.

With help from Gabor, here's my best effort. It should work even if
there are gaps in the timeseries within firms, and it allows different
firms to have responses in different years. It is wrapped up as a
function which eats a data frame. Somebody should put this function
into Hmisc or gtools or something of the sort.

# Problem statement:
#
# You are holding a dataset where firms are observed for a fixed
# (and small) set of years. The data is in "long" format - one
# record for one firm for one point in time. A state variable is
# observed (a factor).
# You wish to make a markov transition matrix about the time-series
# evolution of that state variable.

set.seed(1001)

# Raw data in long format --
raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"),
                  year=c(83,   84,  85,  86,  83,  84,  85,  86),
                  state=sample(1:3, 8, replace=TRUE)
                  )

transition.probabilities <- function(D, timevar="year",
                                     idvar="name", statevar="state") {
  merged <- merge(D, cbind(nextt=D[,timevar] + 1, D),
	by.x = c(timevar, idvar), by.y = c("nextt", idvar))
  t(table(merged[, grep(statevar, names(merged), value = TRUE)]))
}

transition.probabilities(raw, timevar="year", idvar="name",
statevar="state")

-- 
Ajay Shah
http://www.mayin.org/ajayshah  
ajayshah at mayin.org
http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.