[R] Splicing factors without losing levels
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Thu Jun 11 10:01:07 CEST 2009
Titus von der Malsburg wrote:
> On Tue, Jun 09, 2009 at 11:23:36AM +0200, ONKELINX, Thierry wrote:
>> For factors, you better convert them first back to character strings.
>>
>> splice <- function(x, y) {
>> x <- levels(x)[x]
>> y <- levels(y)[y]
>> factor(as.vector(rbind(x, y)))
>> }
>
> Thank you very much, Thierry!
>
> I failed to mention something important in my last mail: x and y have
> the same levels. (I assume that the integer to level name mapping of
> a factor defines its class and that it only makes sense to combine
> factors of the same class.)
>
> Say
>
> > x <- factor(c(2,2,4,4), levels=1:4, labels=c("a","b","c","d"))
>
> then
>
> > x
> [1] b b d d
> Levels: a b c d
>
> > as.integer(x)
> [1] 2 2 4 4
>
> but
>
> > splice(x,x)
> [1] b b b b d d d d
> Levels: b d
>
> > as.integer(splice(x,x))
> [1] 1 1 1 1 2 2 2 2
>
> I'd like to have a splice function that retains the level to label
> mapping. One candidate for a solution is:
>
> splice <- function(x,y) {
> xy <- as.vector(rbind(x, y))
> if (is.factor(x) && is.factor(y))
> xy <- factor(xy, levels=1:length(levels(x)), labels=levels(x))
> xy
> }
>
> However, this relies on assumtions about the implementation of
> factors that are neither mentioned nor guaranteed in the man page:
> Levels are underlyingly integers starting from one and going to
> length(levels). levels(x) gives me the labels of these integers in an
> order corresponding to 1:length(levels(x)).
>
> Without these assumptions I see no way to recover the integer to level
> name mapping for levels that are defined in a factor but do not occur.
>
> I'd be happy if somebody could clarify this issue!
Hm, well,... Some people have been quite insistent that factors should
be though of as isomorphic to vectors over small subsets of character
strings and not as isomorphic to small integers with labels. I tend to
disagree as it creates more complications than it solves.
Anyways, I would do it like this (generalizing "8" and the seq() bits is
left as an exercise)
> x <- factor(c(2,2,4,4), levels=1:4, labels=c("a","b","c","d"))
> xx <- factor(rep(NA,8),levels=levels(x))
> xx[seq(1,8,2)]<-x
> xx[seq(2,8,2)]<-x
> xx
[1] b b b b d d d d
Levels: a b c d
> as.integer(xx)
[1] 2 2 2 2 4 4 4 4
>
> Titus
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list