[R] Markoc chain simulation of a sample path - to get empirical probabilities

Jim Lemon jim at bitwrit.com.au
Fri May 16 11:25:44 CEST 2014


On Thu, 15 May 2014 10:03:03 AM chantel7777777 wrote:

Dear Chantel,
The problem:
A perplexed student decides to post her/his (note non-sexist usage) 
homework to the R help list on a Friday evening. On Monday morning, 
she finds that she/he has received five different answers. Realizing that 
some might be incorrect, she/he persuades her/his smart friend to 
check them. Sadly, the friend, while very smart, is also very lazy and 
only does one now and then. The probabilities of the number of the 
answers judged incorrect each day are:

Incorrect	Probability
0		0.25
1		0.3
2		0.25
3		0.2

Whenever she/he gets this information, she/he strikes off those 
answers from the list. If she/he has one or no remaining answers at 
the end of each day, she/he posts the question to the R help list that 
evening, and receives five more answers the next morning. What is 
the most probable value for the number of remaining answers that 
she/he will have on Friday morning?
 First you need the transition matrix. Assuming that the steps are from 
morning to morning:

pmat<-
 matrix(c(0.25,0,0,0.45,0.3,
 0.3,0.25,0,0.2,0.25,
 0.25,0.3,0.25,0,0.2,
 0.2,0.25,0.3,0.25,0,
 0,0.2,0.25,0.3,0.25),
 nrow=5,ncol=5,byrow=TRUE)
 rownames(pmat)<-colnames(pmat)<-2:6
pmat

Since she/he began with five answers, we begin with state 4. Notice 
here that since each step is from morning to morning and the R help 
list is amazingly reliable, there are no 0 or 1 states.

initstate<-c(0,0,0,1,0)

One method of calculating the distribution after a number of steps is 
to multiply the initial state by the transition matrix raised to the power 
of the number of steps, so:

initstate%*%pmat%*%pmat%*%pmat%*%pmat

While this produces the correct answer, it is not the correct method. 
You want to collect a number of probabilistic outcomes and use this to 
estimate the most likely value of the number of potential answers 
remaining on Friday morning.

outcomes<-rep(NA,100)
for(round in 1:100) {
 start=4
 for(i in 1:4) start<-sample(1:5,1,prob=pmat[start,])
 outcomes[round]<-start
}
table(outcomes)

So our hypothetical student might be able to work out from this how to 
correct her/his code.

Jim



More information about the R-help mailing list