[R] Splicing factors without losing levels

Titus von der Malsburg malsburg at gmail.com
Tue Jun 9 12:45:02 CEST 2009


On Tue, Jun 09, 2009 at 11:23:36AM +0200, ONKELINX, Thierry wrote:
> For factors, you better convert them first back to character strings.
> 
>   splice <- function(x, y) {
> 	x <- levels(x)[x]
> 	y <- levels(y)[y]
> 	factor(as.vector(rbind(x, y)))
>   } 

Thank you very much, Thierry!

I failed to mention something important in my last mail: x and y have
the same levels.  (I assume that the integer to level name mapping of
a factor defines its class and that it only makes sense to combine
factors of the same class.)

Say

    > x <- factor(c(2,2,4,4), levels=1:4, labels=c("a","b","c","d"))

then

    > x
    [1] b b d d
    Levels: a b c d

    > as.integer(x)
    [1] 2 2 4 4

but

    > splice(x,x)
    [1] b b b b d d d d
    Levels: b d

    > as.integer(splice(x,x))
    [1] 1 1 1 1 2 2 2 2

I'd like to have a splice function that retains the level to label
mapping.  One candidate for a solution is:

    splice <- function(x,y) {
      xy <- as.vector(rbind(x, y))
      if (is.factor(x) && is.factor(y))
        xy <- factor(xy, levels=1:length(levels(x)), labels=levels(x))
      xy
    }

However, this relies on assumtions about the implementation of
factors that are neither mentioned nor guaranteed in the man page:
Levels are underlyingly integers starting from one and going to
length(levels).  levels(x) gives me the labels of these integers in an
order corresponding to 1:length(levels(x)).

Without these assumptions I see no way to recover the integer to level
name mapping for levels that are defined in a factor but do not occur.

I'd be happy if somebody could clarify this issue!

  Titus




More information about the R-help mailing list