[Rd] factor() on a double vector

Hervé Pagès hpages at fhcrc.org
Wed Feb 23 20:55:16 CET 2011


Hi,

When 'x' is a vector of doubles, it's not clear how 'factor(x)'
compares its values in order to determine the levels. For example,
here all the values in 'x' are "conceptually" the same:

   x <- c(11/3,
          2/3 + 4/3 + 5/3,
          50 + 11/3 - 50,
          7.00001 - 1000003/300000)

However, due to machine rounding errors, they are not strictly equal:

   > duplicated(x)
   [1] FALSE FALSE FALSE FALSE
   > unique(x)
   [1] 3.666667 3.666667 3.666667 3.666667

but they are nearly equal:

   > all.equal(x, rep(11/3, 4))
   [1] TRUE

Now factor(), and therefore table() (which seems to be using factor()
internally), have a different opinion:

   > factor(x)
   [1] 3.66666666666667 3.66666666666667 3.66666666666666 3.66666666666667
   Levels: 3.66666666666666 3.66666666666667

   > table(x)
   x
   3.66666666666666 3.66666666666667
                  1                3

So factor() doesn't seem to be using "strict equality" or "near
equality" to determine the levels. What does it use? Sorry if I
missed it but I couldn't find any information about this in its
man page.

Wouldn't it be better if factor() was consistent with either
duplicated() or all.equal() instead of introducing its own way
of comparing doubles that lies somewhere in between?

Cheers,
H.

 > sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
  [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
  [7] LC_PAPER=en_US.utf8       LC_NAME=C
  [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.12.0

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list