[R] Strange behavior when sampling rows of a data frame

Daniel Nordlund djnord|und @end|ng |rom gm@||@com
Sat Jun 20 01:04:33 CEST 2020


On 6/19/2020 5:49 AM, Sébastien Lahaie wrote:
> I ran into some strange behavior in R when trying to assign a treatment to
> rows in a data frame. I'm wondering whether any R experts can explain
> what's going on.
>
> First, let's assign a treatment to 3 out of 10 rows as follows.
>
> df <- data.frame(unit = 1:10)
> df$treated <- FALSE
> s <- sample(nrow(df), 3)
> df[s,]$treated <- TRUE
> df
>     unit treated
> 1     1   FALSE
> 2     2    TRUE
> 3     3   FALSE
> 4     4   FALSE
> 5     5    TRUE
> 6     6   FALSE
> 7     7    TRUE
> 8     8   FALSE
> 9     9   FALSE
> 10   10   FALSE
>
> This is as expected. Now we'll just skip the intermediate step of saving
> the sampled indices, and apply the treatment directly as follows.
>
> df <- data.frame(unit = 1:10)
> df$treated <- FALSE
> df[sample(nrow(df), 3),]$treated <- TRUE
> df
>     unit treated
> 1     6    TRUE
> 2     2   FALSE
> 3     3   FALSE
> 4     9    TRUE
> 5     5   FALSE
> 6     6   FALSE
> 7     7   FALSE
> 8     5    TRUE
> 9     9   FALSE
> 10   10   FALSE
>
> Now the data frame still has 10 rows with 3 assigned to the treatment. But
> the units are garbled. Units 1 and 4 have disappeared, for instance, and
> there are duplicates for 6 and 9, one assigned to treatment and the other
> to control. Why would this happen?
>
> Thanks,
> Sebastien
>
Sébastien,

You have received good explanations of what is going on with your code.  
I think you can get what you want by making a simple modification of 
your treatment assignment statement. At least it works for me.

df[sample(nrow(df),3), 'treated'] <- TRUE

Hope this is helpful,

Dan

-- 
Daniel Nordlund
Port Townsend, WA  USA



More information about the R-help mailing list