[R] Removing and restoring factor levels (TYPO CORRECTED)

Duncan Murdoch murdoch at stats.uwo.ca
Thu Oct 13 20:31:09 CEST 2005


On 10/13/2005 1:07 PM, Marc Schwartz (via MN) wrote:
> On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote:
>> Sorry, a typo in my previous message (parens in the wrong place in the 
>> conversion).
>> 
>> Here it is corrected:
>> 
>> I'm doing a big slow computation, and profiling shows that it is
>> spending a lot of time in match(), apparently because I have code like
>> 
>> x %in% listofxvals
>> 
>> Both x and listofxvals are factors with the same levels, so I could
>> probably speed this up by stripping off the levels and just treating
>> them as integer vectors, then restoring the levels at the end.
>> 
>> What is the safest way to do this?  I am worried that at some point x
>> and listofxvals will *not* have the same levels, and the optimization
>> will give the wrong answer.  So I need code that guarantees they have
>> the same coding.
>> 
>> I think this works, where "master" is a factor with the master list of
>> levels (guaranteed to be a superset of the levels of x and listofxvals),
>> but can anyone spot anything that might go wrong?
>> 
>> # Strip the levels
>> x <- as.integer( factor(x, levels = levels(master) ) )
>> 
>> # Restore the levels
>> x <- structure( x, levels = levels(master), class = "factor" )
>> 
>> Thanks for any advice...
>> 
>> Duncan Murdoch
> 
> Duncan,
> 
> With the predicate that 'master' has the full superset of all possible
> factor levels defined, it would seem that this would be a reasonable way
> to go.
> 
> This approach would also seem to eliminate whatever overhead is
> encountered as a result of the coercion of 'x' as a factor to a
> character vector, which is done by match().
> 
> One question I have is, what is the advantage of using structure()
> versus:
> 
>    x <- factor(x, levels = levels(master))
> 
> ?

That one doesn't work.  What "factor(x, levels=levels(master))" says is 
to convert x to a factor, coding the values in it according the levels 
in master.  But at this point x has values which are integers, so  they 
won't match the levels of master, which are probably character strings.

For example:

 > master <- factor(letters)
 > print(x <- factor(letters[1:3]))
[1] a b c
Levels: a b c
 > print(x <- as.integer( factor(x, levels = levels(master) ) ) )
[1] 1 2 3
 > print(x <- factor(x, levels = levels(master)))
[1] <NA> <NA> <NA>
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

I get NA's at the end because the values 1,2,3 aren't in the vector of 
factor levels (which are the lowercase letters).

Duncan Murdoch




More information about the R-help mailing list