[R] Odp: regrouping factor levels

Petr PIKAL petr.pikal at precheza.cz
Mon May 25 14:48:16 CEST 2009


Hi

r-help-bounces at r-project.org napsal dne 22.05.2009 18:53:37:

> 
> Hi all,
> I had some trouble in regrouping factor levels for a variable. After 
some 
> experiments, I have figured out how I can recode to modify the factor 
levels. 
> I would now like some help to understand why some methods work and 
others don't.
> 
> Here's my code :
> rm(list=ls())
> ###some trials in recoding factor levels
> char<-letters[1:10]
> fac<-factor(char)
> levels(fac)
> print(fac)
> ##first method of recoding factors
> fac1<-fac
> levels(fac1)[c("a","b","c")]<-"A"
> levels(fac1)[c("d","e","f")]<-"B"
> levels(fac1)[c("g","h","i","j")]<-"C"
> levels(fac1)
> print(fac1)
> ##second method
> fac2<-fac  
> levels(fac2)[c(1,2,3)]<-"A"
> levels(fac2)[c(2,3,4)]<-"B" # not c(4,5,6)
> levels(fac2)[c(3,4,5,6)]<-"C" # not c(7,8,9,10)
> levels(fac2)
> print(fac2)
> #third method
> fac3<-fac
> 
levels(fac3)<-list("A"=c("a","b","c"),"B"=c("d","e","f"),"C"=c("g","h","i","j"))
> levels(fac3)
> print(fac3)
> 
> I first tried method 1 and had no luck with it at all. The levels A, B, 
and C 
> just got added to the existing levels without affecting the fac 
variable.
> After some time, I was able to figure out how I should use method 2.
> After reading the help documentation, I arrived at method 3.
> 
> I would appreciate help in understanding why the first method does not 
work. 

See the difference in those 2 selection methods

levels(fac1)[c("a","b","c")]

and

levels(fac2)[c(1,2,3)]

a, b, c are vector items not names so you need to check their presence by 
%in% operator

modified method1
which.levels <- levels(fac1) %in% c("a","b","c")
levels(fac1)[which.levels] <- "A"

Method4
fac4<-fac
levels(fac4)<-c(rep("A",3), rep("B", 3), rep("C",4))

You can either to replace all levels at once or to select levels and 
replace them with correct number of items.

Regards
Petr

> In my application, I had long factor names and Tinn-R just would not 
> accept statements running to several lines. Partial substitution was 
desirable
> then. Having spent a considerable amount of time on this, I would like 
to 
> understand the underlying problem with method 1 as it is. The deeper 
> understanding could be useful for me later. 
> Thanking You,
> Ravi 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list