[R] Strange behavior when sampling rows of a data frame

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri Jun 19 17:45:42 CEST 2020


Hello,

I don't have an answer on the reason why this happens but it seems like 
a bug. Where?

In which of  `[<-.data.frame` or `[<-.default`?

A solution is to subset and assign the vector:


set.seed(2020)
df2 <- data.frame(unit = 1:10)
df2$treated <- FALSE

df2$treated[sample(nrow(df2), 3)] <- TRUE
df2
#  unit treated
#1     1   FALSE
#2     2   FALSE
#3     3   FALSE
#4     4   FALSE
#5     5   FALSE
#6     6    TRUE
#7     7    TRUE
#8     8    TRUE
#9     9   FALSE
#10   10   FALSE


Or


set.seed(2020)
df3 <- data.frame(unit = 1:10)
df3$treated <- FALSE

df3[sample(nrow(df3), 3), "treated"] <- TRUE
df3
# result as expected


Hope this helps,

Rui  Barradas



Às 13:49 de 19/06/2020, Sébastien Lahaie escreveu:
> I ran into some strange behavior in R when trying to assign a treatment to
> rows in a data frame. I'm wondering whether any R experts can explain
> what's going on.
>
> First, let's assign a treatment to 3 out of 10 rows as follows.
>
>> df <- data.frame(unit = 1:10)
>> df$treated <- FALSE
>> s <- sample(nrow(df), 3)
>> df[s,]$treated <- TRUE
>> df
>     unit treated
>
> 1     1   FALSE
>
> 2     2    TRUE
>
> 3     3   FALSE
>
> 4     4   FALSE
>
> 5     5    TRUE
>
> 6     6   FALSE
>
> 7     7    TRUE
>
> 8     8   FALSE
>
> 9     9   FALSE
>
> 10   10   FALSE
>
> This is as expected. Now we'll just skip the intermediate step of saving
> the sampled indices, and apply the treatment directly as follows.
>
>> df <- data.frame(unit = 1:10)
>> df$treated <- FALSE
>> df[sample(nrow(df), 3),]$treated <- TRUE
>> df
>     unit treated
>
> 1     6    TRUE
>
> 2     2   FALSE
>
> 3     3   FALSE
>
> 4     9    TRUE
>
> 5     5   FALSE
>
> 6     6   FALSE
>
> 7     7   FALSE
>
> 8     5    TRUE
>
> 9     9   FALSE
>
> 10   10   FALSE
>
> Now the data frame still has 10 rows with 3 assigned to the treatment. But
> the units are garbled. Units 1 and 4 have disappeared, for instance, and
> there are duplicates for 6 and 9, one assigned to treatment and the other
> to control. Why would this happen?
>
> Thanks,
> Sebastien
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus



More information about the R-help mailing list