[R] Strange result when subsetting a data frame based on a character variable

Duncan Murdoch murdoch.duncan at gmail.com
Tue Nov 17 20:25:07 CET 2015


On 17/11/2015 2:14 PM, Karl Schilling wrote:
> Dear all,
>
> I have one observation that I do not quite understand. Maybe someone
> can clarify this issue for me.
>
> I have a data frame which I want to subset based on a grouping variable,
> say "group". Actually, "group" is a numeric value, but it is saved as a
> character. I give some code to generate an exemplary data frame below.
>
> Now, if I use
>
> MySubset <- subset(Data, Data$group == "..")
>
> everything works fine, as expected. ".." stands here for the value of
> group given as a character string.
>
> Surprisingly, I also get a correct subsetting if I simply give the plain
> numeric value of group (like MySubset <- subset(Data, Data$group == ..),
> AS LONG AS this numeric value is less then 100000.
>
> If the numeric value is 100000 or larger, I get an empty subset.
>
> OK, I know how to avoid this situation, but I wonder what the
> explanation for this for me rather strange behavior might be.
>
> Thank you so much for your suggestions.

If you are comparing a character value to a numeric value, the numeric 
value is converted to character using as.character() for the 
comparison.  as.character(100000) or a larger number is likely not 
"100000"; try it.  (With the options I have on my
computer, I get "1e+05".)

If you want a numeric comparison, be explicit:

subset(Data, as.numeric(Data$group) == ..)


Duncan Murdoch

>
>
> Karl Schilling
>
>
> #####
> Exemplary code for reproducing the above described problem:
>
> options(stringsAsFactors = F)
>
> # set up some data frame
> value <- c(1:6)
> group <- rep(c("20000", "99999", "100000"), each = 2)
> Data <- data.frame(value = value, group = group)
> str(Data)
>
> # subset data frame based on the value of the variable "group",
> # treating this value once as a character, and once as a number:
>
> Data20 <- subset(Data, Data$group =="20000")
> str(Data20)
> Data20N <- subset(Data, Data$group ==20000)
> str(Data20N)
>
>
> Data99 <- subset(Data, Data$group =="99999")
> str(Data99)
> Data99N <- subset(Data, Data$group ==99999)
> str(Data99N)
> Data100 <- subset(Data, Data$group =="100000")
> str(Data100)
> Data100N <- subset(Data, Data$group ==100000)
> str(Data100N)
>



More information about the R-help mailing list