[R] How to get warning about implicit factor to integer coercion?

Bill.Venables at csiro.au Bill.Venables at csiro.au
Mon Feb 14 23:12:32 CET 2011


Your complaint is based on what you think a factor should be rather than what it actually is andhow it works.  The trick with R (BTW I think it's version 2.12.x rather than 12.x at this stage...) is learning to work *with* it as it is rather than making it work the way you would like it to do.

Factors are a bit tricky.  The are numeric objects, even if arithmetic is inhibited.

> f <- factor(letters)
> is.numeric(f)  ## this looks strange
[1] FALSE
> mode(f)        ## but at a lower level
[1] "numeric"

Take a simple example.  

> x <- structure(1:26, names = sample(letters))
> x
 h  o  u  w  l  z  a  j  e  n  k  i  s  v  t  g  f  x  c  b  y  d  m  q  p  r 
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

If you use the factor f as an index, it behaves as a numeric vector of indices:

> x[f]
 h  o  u  w  l  z  a  j  e  n  k  i  s  v  t  g  f  x  c  b  y  d  m  q  p  r 
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

That's just the way it is.  That's the reality.  You have to learn to deal with it. 

Sometimes a factor behaves as a character string vector, when no other interpretation would make sense.  e.g.

> which(f == "o")
[1] 15
>

but in other cases they do not.  In this case you can make the coercion explicit of course, if that is your bent:

> which(as.character(f) == "o")
[1] 15
> 

but here there is no need.  There are cases were you *do* need to make an explicit coercion, though, if you want it to behave as a character string vector, and indexing is one:

> x[as.character(f)]
 a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z 
 7 20 19 22  9 17 16  1 12  8 11  5 23 10  2 25 24 26 13 15  3 14  4 18 21  6 
>  

If you want factors to behave universally as character string vectors, the solution is not to use factors at all but to use character string vectors instead.  You can get away with a surprising amount this way.  e.g. character string vectors used in model formulae are silently coerced to factors, anyway.  What you need to learn is how to read in data frames keeping the character string columns "as is" and stop them from being made into factors at that stage.  That is a lesson for another day...

Bill Venables.


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of WB Kloke
Sent: Monday, 14 February 2011 8:31 PM
To: r-help at stat.math.ethz.ch
Subject: [R] How to get warning about implicit factor to integer coercion?

Is there a way in R (12.x) to avoid the implicit coercion of factors to integers
in the context of subscripts?

If this is not possible, is there a way to get at least a warning, if any
coercion of this type happens, given that the action of this coercion is almost
never what is wanted?

Of course, in the rare case that as.integer() is applied explicitly onto a
factor, the warning is not needed, but certainly not as disastrous as in the
other case.

Probably, in a future version of R, an option similar to that for partial
matches of subscripts might be useful to change the default from maximal
unsafety to safe use.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list