[R] Strange behavior when sampling rows of a data frame

Fri Jun 19 14:49:25 CEST 2020

I ran into some strange behavior in R when trying to assign a treatment to
rows in a data frame. I'm wondering whether any R experts can explain
what's going on.

First, let's assign a treatment to 3 out of 10 rows as follows.

> df <- data.frame(unit = 1:10)

> df$treated <- FALSE

>

> s <- sample(nrow(df), 3)

> df[s,]$treated <- TRUE

>

> df

   unit treated

1     1   FALSE

2     2    TRUE

3     3   FALSE

4     4   FALSE

5     5    TRUE

6     6   FALSE

7     7    TRUE

8     8   FALSE

9     9   FALSE

10   10   FALSE

This is as expected. Now we'll just skip the intermediate step of saving
the sampled indices, and apply the treatment directly as follows.

> df <- data.frame(unit = 1:10)

> df$treated <- FALSE

>

> df[sample(nrow(df), 3),]$treated <- TRUE

>

> df

   unit treated

1     6    TRUE

2     2   FALSE

3     3   FALSE

4     9    TRUE

5     5   FALSE

6     6   FALSE

7     7   FALSE

8     5    TRUE

9     9   FALSE

10   10   FALSE

Now the data frame still has 10 rows with 3 assigned to the treatment. But
the units are garbled. Units 1 and 4 have disappeared, for instance, and
there are duplicates for 6 and 9, one assigned to treatment and the other
to control. Why would this happen?

Thanks,
Sebastien

	[[alternative HTML version deleted]]