[R] Create factorial design

Sat Feb 5 12:12:14 CET 2011

On Sat, Feb 05, 2011 at 11:01:33AM +0100, Sascha Vieweg wrote:
> I have got data with one column indicating the area where the data 
> was recorded:
> 
> R: n <- 43
> R: df <- data.frame("area"=sample(1:7, n, repl=T), "dat"=rnorm(n))
> 
> In each of the 7 different areas I want to implement one of 7 
> specific strategies. The assignment should be random. Therefore, I 
> pair 7 areas with 7 strategies randomly by
> 
> R: ass <- as.data.frame(cbind("area"=sample(1:7, 7),
>    "strategy"=sample(1:7, 7)))
> 
> Now I want to create a new variable indicating, which case in the 
> original data should be assigned to which strategy. I thought 
> about
> 
> R: x <- numeric(n)
> R: for(i in 1:7){
>      x[df[, "area"]==i] <- ass[ ass[, "area"]==i , "strategy"]
>    }
> 
> and then binding the new variable to the data frame
> 
> R: str(df2 <- as.data.frame(cbind(df, "strategy"=x)))
> 
> which works fine. My question is whether there is a more elegant 
> way?

Hello.

If the table "ass" is sorted according to "area", then its second
column may be used as a function mapping "area" to "strategy". This
leads to the following

  ass2 <- ass[order(ass[, "area"]), "strategy"]
  y <- ass2[df[, "area"]]
  identical(x, y + 0)

  [1] TRUE

This also suggests that the same distribution on the random assignments
is obtained, if area is created already sorted and only the second
column of "ass" is random 

  ass <- as.data.frame(cbind("area"=1:7, "strategy"=sample(1:7, 7)))

Whether creating only this table is sufficient, depends on the application.

Hope this helps.

Petr Savicky.