[Rd] factor() on a double vector

Simon Urbanek simon.urbanek at r-project.org
Wed Feb 23 21:09:22 CET 2011


Herve,

the answer is simple - it's as.character() - it has nothing to do with factor or table.

> as.character(x)
[1] "3.66666666666667" "3.66666666666667" "3.66666666666666" "3.66666666666667"

That's what you are passing to factor, so you get the corresponding results.

Cheers,
Simon



On Feb 23, 2011, at 2:55 PM, Hervé Pagès wrote:

> Hi,
> 
> When 'x' is a vector of doubles, it's not clear how 'factor(x)'
> compares its values in order to determine the levels. For example,
> here all the values in 'x' are "conceptually" the same:
> 
>  x <- c(11/3,
>         2/3 + 4/3 + 5/3,
>         50 + 11/3 - 50,
>         7.00001 - 1000003/300000)
> 
> However, due to machine rounding errors, they are not strictly equal:
> 
>  > duplicated(x)
>  [1] FALSE FALSE FALSE FALSE
>  > unique(x)
>  [1] 3.666667 3.666667 3.666667 3.666667
> 
> but they are nearly equal:
> 
>  > all.equal(x, rep(11/3, 4))
>  [1] TRUE
> 
> Now factor(), and therefore table() (which seems to be using factor()
> internally), have a different opinion:
> 
>  > factor(x)
>  [1] 3.66666666666667 3.66666666666667 3.66666666666666 3.66666666666667
>  Levels: 3.66666666666666 3.66666666666667
> 
>  > table(x)
>  x
>  3.66666666666666 3.66666666666667
>                 1                3
> 
> So factor() doesn't seem to be using "strict equality" or "near
> equality" to determine the levels. What does it use? Sorry if I
> missed it but I couldn't find any information about this in its
> man page.
> 
> Wouldn't it be better if factor() was consistent with either
> duplicated() or all.equal() instead of introducing its own way
> of comparing doubles that lies somewhere in between?
> 
> Cheers,
> H.
> 
> > sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
> [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
> [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=en_US.utf8       LC_NAME=C
> [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> loaded via a namespace (and not attached):
> [1] tools_2.12.0
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 



More information about the R-devel mailing list