[R] Randomly interleaving data frames while preserving order

Sarah Goslee sarah.goslee at gmail.com
Tue Mar 31 19:41:26 CEST 2015


That's a fun one. Here's one possible approach. (Note that it can be
done without using a loop, but I find that a loop here increases
readability.)

I wrote it to work on a list of data frames. If the selection is
random, I'd set it up so that size is passed to the function, but
selection is generated within the function using sample().

recruitment <- function(dflist, selection) {
    results <- data.frame(matrix(NA, nrow=length(selection),
ncol=ncol(dflist[[1]])))
    colnames(results) <- colnames(dflist[[1]])
    for(i in unique(selection)) {
        results[selection == i, ] <- dflist[[i]][seq_len(sum(selection == i)),]
    }
    results
}


# and your example:


df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)

touse <- c(1, 2, 1, 3, 1) # could be generated using sample

dfall <- list(df1, df2, df3)

touse <- c(1, 2, 1, 3, 1)
# could be generated using sample given the size argument
# touse <- sample(seq_along(dfall), size=5, replace=TRUE)

> recruitment(dfall, touse)
  strat id  pid
1     1  1 1001
2     2  1 2001
3     1  2 1002
4     3  1 3001
5     1  3 1003

Sarah

On Tue, Mar 31, 2015 at 1:05 PM, Kevin E. Thorpe
<kevin.thorpe at utoronto.ca> wrote:
> Hello.
>
> I am trying to simulate recruitment in a randomized trial. Suppose I have
> three streams (strata) of patients represented by these data frames.
>
> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
>
> What I need to do is construct a data frame with all of these combined where
> the order of selection from one of the three data frames is randomized but
> once a stratum is selected patients are selected sequentially from that data
> frame.
>
> To see what I'm looking to achieve, suppose the first five subjects were to
> come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The expected
> result should look like this:
>
> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
>    strat id  pid
> 1      1  1 1001
> 2      2  1 2001
> 21     1  2 1002
> 4      3  1 3001
> 22     2  2 2002
>
> I hope what I'm trying to accomplish makes sense. Maybe I'm missing
> something obvious, but I really have no idea at the moment how to achieve
> this elegantly. Since I need to simulate many trial recruitments it needs to
> be general and compact.
>
> I appreciate any advice.
>
> Kevin
>

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list