[R] Coding style question
Duncan Murdoch
murdoch.duncan at gmail.com
Tue Feb 17 19:42:19 CET 2015
On 17/02/2015 11:19 AM, John Posner wrote:
> In the course of slicing-and-dicing some data, I had occasion to create a list like this:
>
> list(
> subset(my_dataframe, GR1=="XX1"),
> subset(my_dataframe, GR1=="XX2"),
> subset(my_dataframe, GR1=="YY"),
> subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
> subset(my_dataframe, GR2=="Remission"),
> subset(my_dataframe, GR2=="Relapse"))
>
> I used %in% only once, because there was only one "compound value" (XX1 or XX2) for subsetting. But then it occurred to me to use %in% everywhere, taking advantage of the fact that a scalar value is the same as a length-1 vector:
>
> list(
> subset(my_dataframe, GR1 %in% "XX1"),
> subset(my_dataframe, GR1 %in% "XX2"),
> subset(my_dataframe, GR1 %in% "YY"),
> subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
> subset(my_dataframe, GR2 %in% "Remission"),
> subset(my_dataframe, GR2 %in% "Relapse"))
>
> It works just fine. Are there any problems with this style, from the standpoints of correctness, aesthetics, etc.?
If GR1 or GR2 has a missing value, you get NA from the equality tests,
but FALSE from the %in% tests. That won't affect subset (where NA and
FALSE both result in the omission of the observation), but it might
affect other code like this. For example, if you had selected rows
using a logical index instead of using subset, the NA entries in the
index would result in NA selections in the data.
Duncan Murdoch
More information about the R-help
mailing list