[R] Simulating data with conditions

AC Del Re delre at wisc.edu
Wed May 25 00:36:58 CEST 2011


Hi,

I am wanting to simulate data where a percentage of the data has
multiple duplicated id variables (with unique values of another factor
variable for the dupicated id variables). Im having trouble figuring
out an efficent way to do so.

For example, consider this mock output [Note: Although the mock data
doesnt display this, I am eventually interested in 73% of id having 1
unique id, 22% with a duplicated id and 5% with 2 duplicated ids.
Also, I would like the 'al' variable to be randomly selected, perhaps
using sample() , from a 3-level factor "pt", "th", "ob" AND for an id
with duplicates to have unique values for the 'al' variable]:

Something like this:

id    z    al

1    .5    "pt"
2    .4    "ob"
3    .7    "pt"
4    .3     "th"
5    .5     "pt"
5    .6     "ob"
6    .3     "th"
6    .2     "ob"
7    .1     "pt"
7    .3     "th"
7    .1     "ob"

This would be the general idea although I will eventually create a
much larger data set with z based on rnorm(), etc.

Any help toward a solution is much appreciated!

AC



More information about the R-help mailing list