[R] Randomly interleaving data frames while preserving order

Duncan Murdoch murdoch.duncan at gmail.com
Tue Mar 31 19:44:47 CEST 2015


On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:
> Hello.
> 
> I am trying to simulate recruitment in a randomized trial. Suppose I 
> have three streams (strata) of patients represented by these data frames.
> 
> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
> 
> What I need to do is construct a data frame with all of these combined 
> where the order of selection from one of the three data frames is 
> randomized but once a stratum is selected patients are selected 
> sequentially from that data frame.
> 
> To see what I'm looking to achieve, suppose the first five subjects were 
> to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The 
> expected result should look like this:
> 
> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
>     strat id  pid
> 1      1  1 1001
> 2      2  1 2001
> 21     1  2 1002
> 4      3  1 3001
> 22     2  2 2002
> 
> I hope what I'm trying to accomplish makes sense. Maybe I'm missing 
> something obvious, but I really have no idea at the moment how to 
> achieve this elegantly. Since I need to simulate many trial recruitments 
> it needs to be general and compact.
> 
> I appreciate any advice.

How about something like this:

# Permute an ordered vector of selections:
sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3))))

# Create an empty dataframe to hold the results
df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]

# Put the original dataframes into the appropriate slots:
df[sel == 1,] <- df1
df[sel == 2,] <- df2
df[sel == 3,] <- df3

# Clean up the rownames
rownames(df) <- NULL

Duncan Murdoch



More information about the R-help mailing list