[Rd] factor() on a double vector
Simon Urbanek
simon.urbanek at r-project.org
Wed Feb 23 21:09:22 CET 2011
Herve,
the answer is simple - it's as.character() - it has nothing to do with factor or table.
> as.character(x)
[1] "3.66666666666667" "3.66666666666667" "3.66666666666666" "3.66666666666667"
That's what you are passing to factor, so you get the corresponding results.
Cheers,
Simon
On Feb 23, 2011, at 2:55 PM, Hervé Pagès wrote:
> Hi,
>
> When 'x' is a vector of doubles, it's not clear how 'factor(x)'
> compares its values in order to determine the levels. For example,
> here all the values in 'x' are "conceptually" the same:
>
> x <- c(11/3,
> 2/3 + 4/3 + 5/3,
> 50 + 11/3 - 50,
> 7.00001 - 1000003/300000)
>
> However, due to machine rounding errors, they are not strictly equal:
>
> > duplicated(x)
> [1] FALSE FALSE FALSE FALSE
> > unique(x)
> [1] 3.666667 3.666667 3.666667 3.666667
>
> but they are nearly equal:
>
> > all.equal(x, rep(11/3, 4))
> [1] TRUE
>
> Now factor(), and therefore table() (which seems to be using factor()
> internally), have a different opinion:
>
> > factor(x)
> [1] 3.66666666666667 3.66666666666667 3.66666666666666 3.66666666666667
> Levels: 3.66666666666666 3.66666666666667
>
> > table(x)
> x
> 3.66666666666666 3.66666666666667
> 1 3
>
> So factor() doesn't seem to be using "strict equality" or "near
> equality" to determine the levels. What does it use? Sorry if I
> missed it but I couldn't find any information about this in its
> man page.
>
> Wouldn't it be better if factor() was consistent with either
> duplicated() or all.equal() instead of introducing its own way
> of comparing doubles that lies somewhere in between?
>
> Cheers,
> H.
>
> > sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=en_US.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] tools_2.12.0
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
More information about the R-devel
mailing list