[R] Strange behavior when sampling rows of a data frame

Sébastien Lahaie @eb@@t|en@|@h@|e @end|ng |rom gm@||@com
Sat Jun 20 01:45:40 CEST 2020


Thank you all for the responses, these are the insights I was hoping for.
There are many ways to get this right, and I happened to run into one that
has a glitch. I see from Luke's explanation how the strange output came
about. Glad to hear that this bug/behavior is already known.

On Fri, Jun 19, 2020 at 7:04 PM Daniel Nordlund <djnordlund using gmail.com>
wrote:

> On 6/19/2020 5:49 AM, Sébastien Lahaie wrote:
> > I ran into some strange behavior in R when trying to assign a treatment
> to
> > rows in a data frame. I'm wondering whether any R experts can explain
> > what's going on.
> >
> > First, let's assign a treatment to 3 out of 10 rows as follows.
> >
> > df <- data.frame(unit = 1:10)
> > df$treated <- FALSE
> > s <- sample(nrow(df), 3)
> > df[s,]$treated <- TRUE
> > df
> >     unit treated
> > 1     1   FALSE
> > 2     2    TRUE
> > 3     3   FALSE
> > 4     4   FALSE
> > 5     5    TRUE
> > 6     6   FALSE
> > 7     7    TRUE
> > 8     8   FALSE
> > 9     9   FALSE
> > 10   10   FALSE
> >
> > This is as expected. Now we'll just skip the intermediate step of saving
> > the sampled indices, and apply the treatment directly as follows.
> >
> > df <- data.frame(unit = 1:10)
> > df$treated <- FALSE
> > df[sample(nrow(df), 3),]$treated <- TRUE
> > df
> >     unit treated
> > 1     6    TRUE
> > 2     2   FALSE
> > 3     3   FALSE
> > 4     9    TRUE
> > 5     5   FALSE
> > 6     6   FALSE
> > 7     7   FALSE
> > 8     5    TRUE
> > 9     9   FALSE
> > 10   10   FALSE
> >
> > Now the data frame still has 10 rows with 3 assigned to the treatment.
> But
> > the units are garbled. Units 1 and 4 have disappeared, for instance, and
> > there are duplicates for 6 and 9, one assigned to treatment and the other
> > to control. Why would this happen?
> >
> > Thanks,
> > Sebastien
> >
> Sébastien,
>
> You have received good explanations of what is going on with your code.
> I think you can get what you want by making a simple modification of
> your treatment assignment statement. At least it works for me.
>
> df[sample(nrow(df),3), 'treated'] <- TRUE
>
> Hope this is helpful,
>
> Dan
>
> --
> Daniel Nordlund
> Port Townsend, WA  USA
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list