[Rd] c.factor

Matthew Dowle mdowle at concordiafunds.com
Wed Nov 22 17:29:00 CET 2006

I just noticed that a new feature in R 2.4 is that unlist of a list of
already does the operation that I proposed :

> x = factor(letters[1:5])
> y = factor(letters[4:8])
> unlist(list(x,y))
[1] a b c d e d e f g h
Levels: a b c d e f g h

Therefore, does it not make sense that c(x,y) should return the same as 
unlist(list(x,y)) ?

Also, the specific "if" for factors inside the definition of unlist, not

surprisingly, uses a very similar method to those previously posted. 
However, it first coerces the factors with as.character, before matching
the new level set. This is inefficient. Here is the c.factor method
that I proposed, which avoids the as.character and is therefore more 
efficient. Leaving aside the discussion about c.factor, or concat, or 
whatever, could 'unlist' be changed to use this method instead ? After 
all one of the key advantages of factors is to save main memory,
which coerces back to character is going to defeat the benefit.

> c.factor = function(...) {
args <- list(...)
if (!all(sapply(args, is.factor))) stop("all arguments must be factor")
newlevels = unique(unlist(lapply(args,levels)))
ans = unlist(lapply(args, function(x) {
m = match(levels(x), newlevels)
levels(ans) = newlevels
class(ans) = "factor"
> identical(c(x,y), unlist(list(x,y)))
[1] TRUE
> version
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
major 2
minor 4.0
year 2006
month 10
day 03
svn rev 39566
language R
version.string R version 2.4.0 (2006-10-03)

"Brian Ripley" <ripley at stats.ox.ac.uk> wrote in message 
news:Pine.LNX.4.64.0611150926070.19618 at auk.stats...
> On Tue, 14 Nov 2006, Bill Dunlap wrote:
>> On Tue, 14 Nov 2006, Prof Brian Ripley wrote:
>>> Well, R has managed without a factor method for c() for most of its 
>>> decade
>>> of existence (not that it originally had factors as we know them).
>>> I would argue that factors are best viewed as an enumeration type,
>>> anything which silently changes their level set is a bad idea. I can

>>> see
>>> a case for a c() method for factors that combines factors with the
>>> level sets, but I can also see this is best done by users who know
>>> level sets are same (c.factor would have to expend a considerable
>>> to check).
>>> You also need to consider the dispatch rules. c.factor will be
>>> whenever the first argument is a factor, whatever the others are. S4
>>> think, definitely S4-based versions of S-PLUS) has an alternative 
>>> concat()
>>> that works differently (recursively) and seems a more natural model.
>> In addition, c() has always had a double meaning of
>> (a) turning an object into a simple "vector" (an object
>> without "attributes"), as in
>> > c(factor(c("Cat","Dog","Cat")))
>> [1] 1 2 1
>> > c(data.frame(x=1:2,y=c("Dog","Cat")))
>> $x
>> [1] 1 2
>> $y
>> [1] Dog Cat
>> Levels: Cat Dog
> To my surprise that was not documented at all on the R help page, and
> clarified it. (BTW, at least in R it does not remove names, just all
> other attributes.)
>> (b) concatenating several such vectors into one.
>> The proposed c.factor does only (b).
> (Strictly not, as a factor is not a vector.)
> But the help page explicitly only describes the default method, and
> of the other methods do preserve some attributes, AFAIR.
>> Should we just
>> throw c() into the ash heap and use as.vector() or
>> concat() instead?
>> The whole concept of concatenating objects of disparate
>> types is suspect.
> I think working on a concat() for R would be helpful. I vaguely
> something like it in the Green Book, but the index does not help (but
> it is not very complete).
> Brian

More information about the R-devel mailing list