[R] Factor function

peter dalgaard pdalgd at gmail.com
Tue Apr 26 19:59:22 CEST 2011

On Apr 26, 2011, at 18:52 , Petr Savicky wrote:

> On Tue, Apr 26, 2011 at 10:51:33AM +0200, Petr PIKAL wrote:
>> Hi
>> d<-data.frame(matrix(c("ww","ww","xx","yy","ww","yy","xx","yy","NA"), 
>> ncol=3, byrow=TRUE))
>> Change character value "NA" to missing value <NA>
>> d[d[,3]=="NA",3]<-NA
>> If you want drop any unused levels of a factor just use
>> factor(d[,3])
>> [1] xx   yy   <NA>
>> Levels: xx yy
> An explicit NA is a good idea. If the NA is introduced before
> creating the data frame, then also the data frame will not
> contain the unwanted level.
>  a<-matrix(c("ww","ww","xx","yy","ww","yy","xx","yy","NA"), 
>  ncol=3, byrow=TRUE)
>  a[a[,3]=="NA",3]<-NA
>  d<-data.frame(a)
>  d[,3]
>  [1] xx   yy   <NA>
>  Levels: xx yy
> If the replacement should be done in the whole matrix, then
>  a[a=="NA"]<-NA
> may be used.
> Petr Savicky.

I think there's a buglet in here. According  to the docs, "If exclude is used it should also be a factor with the same level set as x or a set of codes for the levels to be excluded". However, that plainly doesn't work:

> cc <- c("x","y","NA")
> ff <- factor(cc)
> factor(ff,exclude=1)
[1] x  y  NA
Levels: NA x y
> factor(ff,exclude=ff[3])
[1] x  y  NA
Levels: NA x y
> factor(ff,exclude=ff[2])
[1] x  y  NA
Levels: NA x y

In these cases, the internal logic converts exclude to integer, and then uses match(levels, exclude) where levels is unique(x), i.e., a factor. This won't work because match() matches on the _character_ representation of x.

The cleanest version that I can think of for the original problem is

> factor(ff, levels=setdiff(levels(ff), "NA"))
[1] x    y    <NA>
Levels: x y



Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

More information about the R-help mailing list