[R] how to concatenate factor vectors?
William Dunlap
wdunlap at tibco.com
Thu Oct 18 17:33:38 CEST 2012
c() has an unfortunate history. Originally, c(x) stripped the attributes,
except names but including dim, dimnames, and class, from x.
Also, c(x,y) stripped the attributes from both x and y and concatenated
them. Also, c(nameA=1,nameB=2) constructed a vector with a names attribute.
Then c() became a generic function and people wrote methods for certain
classes, typically newer classes without the weight of history on them, that kept
at least the class and would combine 2 or more items of that class. Adding
a c.factor became tricky because old code used c(factor(...)) to strip the class
and levels attributes to get the integer codes.
You can make a c() that does what you want for your factors by subclassing
factor and writing a c.<yourFactor> that does what you want. This will not
break old code. E.g.,
myFactor <- function(...) {
tmp <- factor(...)
class(tmp) <- class("myFactor", class(tmp))
tmp }
c.myFactor <- function(...) {
... compare levels of inputs with identical() and do what you want ...
... return something with the right class ...
}
Or, you can decide to write a new concatenation function
and stop using c().
As for EQ vs. EQUALP, don't even think of EQ in R: it doesn't make sense there.
identical() is a pretty quick way to check that two objects have identical contents.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Sam Steingold
> Sent: Thursday, October 18, 2012 8:02 AM
> To: r-help at r-project.org; Jorge I Velez
> Subject: Re: [R] how to concatenate factor vectors?
>
> hi Jorge,
>
> > * Jorge I Velez <wbetrvinairyrm at tznvy.pbz> [2012-10-18 16:43:58 +1100]:
> >
> >> a <- factor(5:1,levels=1:9)
> >> b <- factor(9:1,levels=1:9)
> >> lev <- sort(unique(f <- c(a, b)))
> >> f <- factor(f, levels = lev)
> >> str(f)
> > Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...
>
> is sort(unique()) really necessary?
> I think
> lev <- levels(a)
> should be enough.
>
> However, this does not quite do what I want.
> I want a function which will _NOT_ have a non-factor vector as an
> intermediate value because that would waste a LOT of memory in my case.
> I want a function which will check that a and b have identical levels
> (in Lisp lingo, the levels are EQ, not just EQUALP).
>
> --8<---------------cut here---------------start------------->8---
> > a <- factor(letters[sample(1:10,20,replace=TRUE)],levels=letters)
> [1] e e a b c e j d a b h i a e e g j a c e
> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> > b <- factor(letters[sample(1:10,30,replace=TRUE)],levels=letters)
> [1] d d f c j b d e j j g i g j j g g a j a b e d c b i i a b f
> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> > c(a,b)
> [1] 5 5 1 2 3 5 10 4 1 2 8 9 1 5 5 7 10 1 3 5 4 4 6 3 10
> [26] 2 4 5 10 10 7 9 7 10 10 7 7 1 10 1 2 5 4 3 2 9 9 1 2 6
> > factor(letters[c(a,b)],levels=letters)
> [1] e e a b c e j d a b h i a e e g j a c e d d f c j b d e j j g i g j j g g a
> [39] j a b e d c b i i a b f
> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> --8<---------------cut here---------------end--------------->8---
>
> however, this is not a "direct" way (unlike my unlist(list(...))):
> there is an intermediate integer vector c(a,b) which is mapped to a
> character vector via letters, which is converted back to integers
> (==factors).
>
> IIUC, a factor is an integer vector which knows that the integers refer
> to levels.
>
> c(a,b) creates such an integer vector.
> How do I tell it that it is a factor?
>
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
> http://www.childpsy.net/ http://palestinefacts.org http://www.memritv.org
> http://www.PetitionOnline.com/tap12009/ http://dhimmi.com
> usually: can't pay ==> don't buy. software: can't buy ==> don't pay
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list