Hi there,
I have data on earnings of 12000 individuals at two points in time. I intend to construct a transition matrix, where the typical element, p_ij, gives the probability that an individual ends at the j-th decile of the earnings distribution given that he was was initially at the i-th decile. Thus, this is a bi-stochastic matrix. The problem is that the income data is nearly discrete in the sense that many individuals hold the same income level at each point. For instance, there are 1400 individuals who earned in period-one the minimum positive income (say, $100). Therefore, in the first decile there will be more than 10% of the individuals. This happens for both periods, and for a few income levels. As a result, the transition matrix won't have both rows and columns summing to one.
The solution I've found for this problem was to generate a uniform random vector, with entries ranging from, say, -.0001 to .0001, and ad it to both earnings vectors and compute the transition matrix. Repeat the procedure 1000 times and get the mean of the resulting matrices. The thing is I'm totally new to simulations. Here's part of what I'm trying to do:
# X is a two-comun data frame. column 1 is the period-one individuals' earnings and column two is the period-two.
n <- nrow(X) #12000
sim <- runif(n, -.0001, .0001)
X <- X + sim
q <- 10 # in order to compute deciles. it could be quintiles, quartiles, whatever
p <- seq(0,1,1/q)
f.x <- quantile(X[,1], p, names=F)
f.y <- quantile(X[,2], p, names=F)
f.x[1] <- 0; f.y[1] <- 0
a <- cut(X[,1], f.x, right=T)
b <- cut(X[,2], f.y, right=T)
P <- table(a,b)
P <- P/rowSums(P)
P
The point is that I don't know how to store the matrix P efficiently so that it can be averaged with the remianing 999.
Also, suggestions on how to solve the problem of discretized data are welcome.
Thanks a lot,
Dimitri
[[alternative HTML version deleted]]