[R] Markov transition matrices , missing transitions for certain years

Tue Apr 19 12:37:07 CEST 2011

Make two assumptions:

(1)  The initial state probability distribution (``ispd'') is *NOT* a 
function of the
transition probability matrix (``tpm'').

(2) The boxes are stochastically independent of each other.

Both of these assumptions may be dubious.  The second assumption is
the crucial one, and I would guess it to be *highly* dubious.  However
without it, you simply can't get anywhere.

Subject to these assumptions the maximum likelihood estimates of the
entries of the tpm may be found as follows:

Count the number of times that any box is in state "i" at time "t" and
in state "j" at time "t+1".  Count over all boxes and all times t = 1, 
2, ..., m-1,
where you have observation over m  years.  (You have to stop at m-1
in order to be able to have observations at time t+1.)

Let this count be c_ij.  Let c_i. be the sum over j of c_ij

Let the tpm be P = [p_ij].

Then the maximum likelihood estimate of p_ij is equal to c_ij/c_i.

[The only time that things can go wrong here is if state "i" never appears
in any box, at any time t < m.  In such a case the p_ij (j = 1, 2, 3, 
..., K, where
K is the number of states or species)  are simply not estimable from the
available data.  We never observe state i making a transition to *any* 
state,
so we cannot estimate the probabilities of such transitions.]

Writing R code to effect this estimation procedure is easy and is left as an
exercise for the reader. :-)

     cheers,

             Rolf Turner

On 19/04/11 12:47, Abby_UNR wrote:
> Hi all,
> I am working for nest box occupancy data for birds and would like to
> construct a Markov transition matrix, to derive transition probabilities for
> ALL years of the study (not separate sets of transition probabilities for
> each time step). The actual dataset I'm working with is 125 boxes over 14
> years that can be occupied by 7 different species, though I have provided a
> slimmed down portion for this question...
> -
> A box can be in 1 of 4 "states" (i.e. bird species): 1,2,3,4
> Included here are 4 "box histories" over 4 years (y97, y98, y99, y00)
>
> These are the box histories
>> b1<- c(1,1,4,2)
>> b2<- c(1,4,4,3)
>> b3<- c(4,4,1,2)
>> b4<- c(3,1,1,1)
>> boxes<- data.frame(rbind(b1,b2,b3,b4))
>> colnames(boxes)<- c("y97","y98","y99","y00")
>> boxes
>     y97 y98 y99 y00
> b1   1   1   4   2
> b2   1   4   4   3
> b3   4   4   1   2
> b4   3   1   1   1
> My problem is that there are 16 possible transitions, but not all possible
> transitions occur at each time step. Therefore, don't think I could do
> something easy like create a table for each time step and add them together,
> for example:
>
>> t1.boxes<- table(boxes$y98, boxes$y97)
>> t1.boxes
>
>      1 3 4
>    1 1 1 0
>    4 1 0 1
>> t2.boxes<- table(boxes$y99, boxes$y98)
>> t2.boxes
>
>      1 4
>    1 1 1
>    4 1 1
> t1.boxes and t2.boxes could not be added together to calculate the frequency
> of each transition occurring because they are of different dimensions. I'm
> not quite sure how to deal with this, I have attempted to write a function
> (shown below), though I'm not sure if it is needed, I am a bit new the
> programming world. If I could get some help either with the function or a
> way around it that would be most appreciated! Thank you!
>
> --------------
> Function requires the commands already listed above:
>
> FMAT<- matrx(0, nrow=4, ncol=4, byrow=TRUE)
> #This is the matrix that will store the frequency of each possible
> transition occurring over the 4 years
>
> nboxes<- 4
> nyears<- 4
>
> for(row in 1:nboxes)
> {
> for(col in 1:(nyears-1))
> {
> FMAT[boxes[row,col+1], boxes[row,col]]<- boxes[boxes[row, col+1],
> boxes[row,col]]
> #This is the line of code I have been struggling with an am unsure about. I
> have tried
> #various versions of this and keep getting an assortment of error messages.
> }
> }
>
> FMAT
>
> }