[R] Splicing factors without losing levels
Titus von der Malsburg
malsburg at gmail.com
Tue Jun 9 12:45:02 CEST 2009
On Tue, Jun 09, 2009 at 11:23:36AM +0200, ONKELINX, Thierry wrote:
> For factors, you better convert them first back to character strings.
>
> splice <- function(x, y) {
> x <- levels(x)[x]
> y <- levels(y)[y]
> factor(as.vector(rbind(x, y)))
> }
Thank you very much, Thierry!
I failed to mention something important in my last mail: x and y have
the same levels. (I assume that the integer to level name mapping of
a factor defines its class and that it only makes sense to combine
factors of the same class.)
Say
> x <- factor(c(2,2,4,4), levels=1:4, labels=c("a","b","c","d"))
then
> x
[1] b b d d
Levels: a b c d
> as.integer(x)
[1] 2 2 4 4
but
> splice(x,x)
[1] b b b b d d d d
Levels: b d
> as.integer(splice(x,x))
[1] 1 1 1 1 2 2 2 2
I'd like to have a splice function that retains the level to label
mapping. One candidate for a solution is:
splice <- function(x,y) {
xy <- as.vector(rbind(x, y))
if (is.factor(x) && is.factor(y))
xy <- factor(xy, levels=1:length(levels(x)), labels=levels(x))
xy
}
However, this relies on assumtions about the implementation of
factors that are neither mentioned nor guaranteed in the man page:
Levels are underlyingly integers starting from one and going to
length(levels). levels(x) gives me the labels of these integers in an
order corresponding to 1:length(levels(x)).
Without these assumptions I see no way to recover the integer to level
name mapping for levels that are defined in a factor but do not occur.
I'd be happy if somebody could clarify this issue!
Titus
More information about the R-help
mailing list