[R] Thougt I understood factors but??
David Winsemius
dwinsemius at comcast.net
Mon Mar 1 20:04:48 CET 2010
On Mar 1, 2010, at 12:07 PM, Nicholas Lewin-Koh wrote:
> Hi,
> consider the following
>> a<-gl(3,3,9)
>> a
> [1] 1 1 1 2 2 2 3 3 3
> Levels: 1 2 3
>> levels(a)<-3:1
That may look like the same re-ordered factor but you instead merely
re-labeled each level where the internal numbers that represent the
factor values stayed the same..
>> a
> [1] 3 3 3 2 2 2 1 1 1
> Levels: 3 2 1
>> a<-gl(3,3,9)
>> factor(a,levels=3:1)
That is the right way IMO to safely change the ordering of the levels
without changing the "semantics" or the "meaning" of the factor level
assignments.
Try:
levels(a) <- letters[4:6]
a
[1] d d d e e e f f f
Levels: d e f
> a <- factor(a, levels=letters[1:3])
> a
[1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
Levels: a b c
Using the second form sets any non-existent (in the new level vector)
factor values to NA's, in this case all of them. It is better in my
mind to get assignments to NA than it would be to get assignments to
incorrect levels.
> b <-factor(c(0,0,0,0, 1, 1))
> b
[1] 0 0 0 0 1 1
Levels: 0 1
> levels(b) <-c(1,0)
> b
[1] 1 1 1 1 0 0 # No longer the same "meaning"
Levels: 1 0
> b <-factor(c(0,0,0,0, 1, 1))
> b<- factor(b, levels=c(1,0))
> b
[1] 0 0 0 0 1 1
Levels: 1 0 # Only the ordering has changed but the meaning is
the same
This is especially so when working with factors as components of
data.frames.
--
David.
> [1] 1 1 1 2 2 2 3 3 3
> Levels: 3 2 1
> It is probably something obvious I missed, but reading the
> documentation
> of factor, and levels I would have thought
> that both should produce the same output as
> factor(a,levels=3:1)
> [1] 1 1 1 2 2 2 3 3 3
> Levels: 3 2 1
> The closest I could find in a quick search was this
> http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2503.html
>
> Thanks
> Nicholas
>
> sessionInfo()
> R version 2.10.1 Patched (2009-12-20 r50794)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] splines tcltk stats graphics grDevices utils
> datasets
> [8] methods base
>
> other attached packages:
> [1] mvtnorm_0.9-9 latticeExtra_0.6-9 RColorBrewer_1.0-2
> lattice_0.18-3
> [5] nlme_3.1-96 XML_2.6-0 gsubfn_0.5-0
> proto_0.3-8
>
> loaded via a namespace (and not attached):
> [1] grid_2.10.1 tools_2.10.1
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list