[R] Randomly interleaving data frames while preserving order
Duncan Murdoch
murdoch.duncan at gmail.com
Tue Mar 31 19:44:47 CEST 2015
On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:
> Hello.
>
> I am trying to simulate recruitment in a randomized trial. Suppose I
> have three streams (strata) of patients represented by these data frames.
>
> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
>
> What I need to do is construct a data frame with all of these combined
> where the order of selection from one of the three data frames is
> randomized but once a stratum is selected patients are selected
> sequentially from that data frame.
>
> To see what I'm looking to achieve, suppose the first five subjects were
> to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The
> expected result should look like this:
>
> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
> strat id pid
> 1 1 1 1001
> 2 2 1 2001
> 21 1 2 1002
> 4 3 1 3001
> 22 2 2 2002
>
> I hope what I'm trying to accomplish makes sense. Maybe I'm missing
> something obvious, but I really have no idea at the moment how to
> achieve this elegantly. Since I need to simulate many trial recruitments
> it needs to be general and compact.
>
> I appreciate any advice.
How about something like this:
# Permute an ordered vector of selections:
sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3))))
# Create an empty dataframe to hold the results
df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]
# Put the original dataframes into the appropriate slots:
df[sel == 1,] <- df1
df[sel == 2,] <- df2
df[sel == 3,] <- df3
# Clean up the rownames
rownames(df) <- NULL
Duncan Murdoch
More information about the R-help
mailing list