[R] dataframe, simulating data

Fri Dec 31 12:07:42 CET 2010

On Fri, Dec 31, 2010 at 01:51:18AM -0800, Sarah wrote:
> 
> Dear all,
> 
> I'm having trouble with my dataframe, and I hope someone can help me out...
> I have data from 40 subjects, displayed in a dataframe. I have randomly
> assigned subjects to group 1 or 0 (mar.y==0 or mar.y==1, with probabilities
> used).
> In the end, I want 34 cases assigned to group 0, with the rest of the
> subjects assigned to group 1. However, if there are more than 34 cases
> assigned to group 0 due to the randomness, I would like to keep 34 cases in
> group 0 (this is already written in my script below), but with the rest of
> the cases assigned to group 1. (Vice versa, if there are less than 34 cases
> assigned to group 0, I would like to sample cases from group 1 and put them
> in group 0, while retaining the rest of group 1 in my dataframe.)
> I can't figure out how to keep 34 cases in group 0, WHILE assigning the rest
> of the cases a value 1 (mar.y==1)... 
> 
> if (length(which(df$mar.y==0))>34) { 
> df <- df[sample(which(df$mar.y==0),34), ]
>  } else {
>  df <- df[c(which(df$mar.y==0),
> sample(which(df$mar.y==1),34-length(which(df$mar.y==0)))), ]
> }

I am not sure, what is the question. According to my tests, this code
works, if you want to rewrite df by a data frame with exactly 34 cases.
The command sample(which(...)) is slightly dangerous, since if which()
produces only one index, say i, then sample(which()) samples from 1:i.
However, with the parameters 34 and 40, your code uses sample() to vectors
of length at least 35 or at least 40 - 34.

If you want to keep all cases and only reassign the groups, you can either
modify df$mar.y (and not the whole df) or introduce a new column of df
with the index of the new group.

Petr Savicky.