Dear R helpers,
I have a question about drawing random numbers from many categorical
distributions.
Consider n individuals, each follows a categorical distribution defined
over k categories.
Consider a simple case in which n=4, k=3 as below
catDisMat <-
rbind(c(0.1,0.2,0.7),c(0.2,0.2,0.6),c(0.1,0.2,0.7),c(0.1,0.2,0.7))
outVec <- rep(NA,nrow(catDisMat))
for (i in 1:nrow(catDisMat)){
outVec[i] <- sample(1:3,1, prob=catDisMat[i,], replace = TRUE)
}
I can think of one way to potentially speed it up (in reality, my n is very
large, so speed matters). The approach above only samples 1 value each
time. I could have sampled two values for c(0.1,0.2,0.7) because it appears
three times. so by doing some manipulation, I think I can have the idea,
"sample(1:3, 3, prob=c(0.1,0.2,0.7), replace = TRUE)", implemented to
improve speed a bit. But, I wonder whether there is a better approach for
speed?
Thanks in advance.
-Sean
[[alternative HTML version deleted]]