# [R] Thougt I understood factors but??

David Winsemius dwinsemius at comcast.net
Mon Mar 1 20:04:48 CET 2010

```On Mar 1, 2010, at 12:07 PM, Nicholas Lewin-Koh wrote:

> Hi,
> consider the following
>> a<-gl(3,3,9)
>> a
> [1] 1 1 1 2 2 2 3 3 3
> Levels: 1 2 3
>> levels(a)<-3:1

That may look like the same re-ordered factor but you instead merely
re-labeled each level where the internal numbers that represent the
factor values stayed the same..

>> a
> [1] 3 3 3 2 2 2 1 1 1
> Levels: 3 2 1
>> a<-gl(3,3,9)
>> factor(a,levels=3:1)

That is the right way IMO to safely change the ordering of the levels
without changing the "semantics" or the "meaning" of the factor level
assignments.

Try:

levels(a) <- letters[4:6]
a

[1] d d d e e e f f f
Levels: d e f
> a <- factor(a, levels=letters[1:3])
> a
[1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
Levels: a b c

Using the second form sets any non-existent (in the new level vector)
factor values to NA's, in this case all of them. It is better in my
mind to get assignments to NA than it would be to get assignments to
incorrect levels.

> b <-factor(c(0,0,0,0, 1, 1))
> b
[1] 0 0 0 0 1 1
Levels: 0 1
> levels(b) <-c(1,0)
> b
[1] 1 1 1 1 0 0   # No longer the same "meaning"
Levels: 1 0
> b <-factor(c(0,0,0,0, 1, 1))
> b<- factor(b, levels=c(1,0))
> b
[1] 0 0 0 0 1 1
Levels: 1 0      # Only the ordering has changed but the meaning is
the same

This is especially so when working with factors as components of
data.frames.

--
David.

> [1] 1 1 1 2 2 2 3 3 3
> Levels: 3 2 1
> It is probably something obvious I missed, but reading the
> documentation
> of factor, and levels I would have thought
> that both should produce the same output as
> factor(a,levels=3:1)
> [1] 1 1 1 2 2 2 3 3 3
> Levels: 3 2 1
> The closest I could find in a quick search was this
> http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2503.html
>
> Thanks
> Nicholas
>
> sessionInfo()
> R version 2.10.1 Patched (2009-12-20 r50794)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] splines   tcltk     stats     graphics  grDevices utils
> datasets
> [8] methods   base
>
> other attached packages:
> [1] mvtnorm_0.9-9      latticeExtra_0.6-9 RColorBrewer_1.0-2
> lattice_0.18-3
> [5] nlme_3.1-96        XML_2.6-0          gsubfn_0.5-0
> proto_0.3-8
>
> loaded via a namespace (and not attached):
> [1] grid_2.10.1  tools_2.10.1
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help