[R] Strange result when subsetting a data frame based on a character variable

Thierry Onkelinx thierry.onkelinx at inbo.be
Tue Nov 17 20:30:18 CET 2015


Dear Karl,

Since you compare a character with a numeric, R converts the numeric
silently. And then you're into trouble.

as.character(99999) # "99999"
as.character(100000) # "1e+5"

Bottom line, use the same type on both sides of the binary operator.

Best regards,


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-11-17 20:14 GMT+01:00 Karl Schilling <karl.schilling op uni-bonn.de>:

> Dear all,
>
> I have one observation that I do not quite understand. Maybe someone
> can clarify this issue for me.
>
> I have a data frame which I want to subset based on a grouping variable,
> say "group". Actually, "group" is a numeric value, but it is saved as a
> character. I give some code to generate an exemplary data frame below.
>
> Now, if I use
>
> MySubset <- subset(Data, Data$group == "..")
>
> everything works fine, as expected. ".." stands here for the value of
> group given as a character string.
>
> Surprisingly, I also get a correct subsetting if I simply give the plain
> numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS
> LONG AS this numeric value is less then 100000.
>
> If the numeric value is 100000 or larger, I get an empty subset.
>
> OK, I know how to avoid this situation, but I wonder what the explanation
> for this for me rather strange behavior might be.
>
> Thank you so much for your suggestions.
>
>
> Karl Schilling
>
>
> #####
> Exemplary code for reproducing the above described problem:
>
> options(stringsAsFactors = F)
>
> # set up some data frame
> value <- c(1:6)
> group <- rep(c("20000", "99999", "100000"), each = 2)
> Data <- data.frame(value = value, group = group)
> str(Data)
>
> # subset data frame based on the value of the variable "group",
> # treating this value once as a character, and once as a number:
>
> Data20 <- subset(Data, Data$group =="20000")
> str(Data20)
> Data20N <- subset(Data, Data$group ==20000)
> str(Data20N)
>
>
> Data99 <- subset(Data, Data$group =="99999")
> str(Data99)
> Data99N <- subset(Data, Data$group ==99999)
> str(Data99N)
> Data100 <- subset(Data, Data$group =="100000")
> str(Data100)
> Data100N <- subset(Data, Data$group ==100000)
> str(Data100N)
>
> --
> Karl Schilling
>
> ______________________________________________
> R-help op r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list