[R] Strange behavior when sampling rows of a data frame
Sébastien Lahaie
@eb@@t|en@|@h@|e @end|ng |rom gm@||@com
Fri Jun 19 14:49:25 CEST 2020
I ran into some strange behavior in R when trying to assign a treatment to
rows in a data frame. I'm wondering whether any R experts can explain
what's going on.
First, let's assign a treatment to 3 out of 10 rows as follows.
> df <- data.frame(unit = 1:10)
> df$treated <- FALSE
>
> s <- sample(nrow(df), 3)
> df[s,]$treated <- TRUE
>
> df
unit treated
1 1 FALSE
2 2 TRUE
3 3 FALSE
4 4 FALSE
5 5 TRUE
6 6 FALSE
7 7 TRUE
8 8 FALSE
9 9 FALSE
10 10 FALSE
This is as expected. Now we'll just skip the intermediate step of saving
the sampled indices, and apply the treatment directly as follows.
> df <- data.frame(unit = 1:10)
> df$treated <- FALSE
>
> df[sample(nrow(df), 3),]$treated <- TRUE
>
> df
unit treated
1 6 TRUE
2 2 FALSE
3 3 FALSE
4 9 TRUE
5 5 FALSE
6 6 FALSE
7 7 FALSE
8 5 TRUE
9 9 FALSE
10 10 FALSE
Now the data frame still has 10 rows with 3 assigned to the treatment. But
the units are garbled. Units 1 and 4 have disappeared, for instance, and
there are duplicates for 6 and 9, one assigned to treatment and the other
to control. Why would this happen?
Thanks,
Sebastien
[[alternative HTML version deleted]]
More information about the R-help
mailing list