[R] Thougt I understood factors but??

Liaw, Andy andy_liaw at merck.com
Mon Mar 1 20:23:04 CET 2010


From: David Winsemius
> 
> On Mar 1, 2010, at 12:07 PM, Nicholas Lewin-Koh wrote:
> 
> > Hi,
> > consider the following
> >> a<-gl(3,3,9)
> >> a
> > [1] 1 1 1 2 2 2 3 3 3
> > Levels: 1 2 3
> >> levels(a)<-3:1
> 
> That may look like the same re-ordered factor but you instead merely  
> re-labeled each level where the internal numbers that represent the  
> factor values stayed the same..
> 
> >> a
> > [1] 3 3 3 2 2 2 1 1 1

Indeed this is one of the (few, I believe) traps of R, because:

R> a
[1] 3 3 3 2 2 2 1 1 1
Levels: 3 2 1
R> as.numeric(a)
[1] 1 1 1 2 2 2 3 3 3
R> as.numeric(as.character(a))
[1] 3 3 3 2 2 2 1 1 1

Andy

> > Levels: 3 2 1
> >> a<-gl(3,3,9)
> >> factor(a,levels=3:1)
> 
> That is the right way IMO to safely change the ordering of 
> the levels  
> without changing the "semantics" or the "meaning" of the 
> factor level  
> assignments.
> 
> Try:
> 
> levels(a) <- letters[4:6]
> a
> 
> [1] d d d e e e f f f
> Levels: d e f
>  > a <- factor(a, levels=letters[1:3])
>  > a
> [1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
> Levels: a b c
> 
> Using the second form sets any non-existent (in the new level 
> vector)  
> factor values to NA's, in this case all of them. It is better in my  
> mind to get assignments to NA than it would be to get assignments to  
> incorrect levels.
> 
>  > b <-factor(c(0,0,0,0, 1, 1))
>  > b
> [1] 0 0 0 0 1 1
> Levels: 0 1
>  > levels(b) <-c(1,0)
>  > b
> [1] 1 1 1 1 0 0   # No longer the same "meaning"
> Levels: 1 0
>  > b <-factor(c(0,0,0,0, 1, 1))
>  > b<- factor(b, levels=c(1,0))
>  > b
> [1] 0 0 0 0 1 1
> Levels: 1 0      # Only the ordering has changed but the meaning is  
> the same
> 
> 
> This is especially so when working with factors as components of  
> data.frames.
> 
> 
> -- 
> David.
> 
> 
> 
> > [1] 1 1 1 2 2 2 3 3 3
> > Levels: 3 2 1
> > It is probably something obvious I missed, but reading the  
> > documentation
> > of factor, and levels I would have thought
> > that both should produce the same output as
> > factor(a,levels=3:1)
> > [1] 1 1 1 2 2 2 3 3 3
> > Levels: 3 2 1
> > The closest I could find in a quick search was this
> > http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2503.html
> >
> > Thanks
> > Nicholas
> >
> > sessionInfo()
> > R version 2.10.1 Patched (2009-12-20 r50794)
> > x86_64-unknown-linux-gnu
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> > [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> > [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> > [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> > attached base packages:
> > [1] splines   tcltk     stats     graphics  grDevices utils      
> > datasets
> > [8] methods   base
> >
> > other attached packages:
> > [1] mvtnorm_0.9-9      latticeExtra_0.6-9 RColorBrewer_1.0-2
> > lattice_0.18-3
> > [5] nlme_3.1-96        XML_2.6-0          gsubfn_0.5-0        
> > proto_0.3-8
> >
> > loaded via a namespace (and not attached):
> > [1] grid_2.10.1  tools_2.10.1
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:10}}



More information about the R-help mailing list