[Rd] factor() on a double vector
Hervé Pagès
hpages at fhcrc.org
Wed Feb 23 21:17:25 CET 2011
On 02/23/2011 12:09 PM, Simon Urbanek wrote:
> Herve,
>
> the answer is simple - it's as.character() - it has nothing to do with factor or table.
>
>> as.character(x)
> [1] "3.66666666666667" "3.66666666666667" "3.66666666666666" "3.66666666666667"
>
> That's what you are passing to factor, so you get the corresponding results.
I see. Thanks Simon.
I missed this:
levels: an optional vector of the values that ‘x’ might have taken.
The default is the unique set of values taken by
‘as.character(x)’, ...
Cheers,
H.
>
> Cheers,
> Simon
>
>
>
> On Feb 23, 2011, at 2:55 PM, Hervé Pagès wrote:
>
>> Hi,
>>
>> When 'x' is a vector of doubles, it's not clear how 'factor(x)'
>> compares its values in order to determine the levels. For example,
>> here all the values in 'x' are "conceptually" the same:
>>
>> x<- c(11/3,
>> 2/3 + 4/3 + 5/3,
>> 50 + 11/3 - 50,
>> 7.00001 - 1000003/300000)
>>
>> However, due to machine rounding errors, they are not strictly equal:
>>
>> > duplicated(x)
>> [1] FALSE FALSE FALSE FALSE
>> > unique(x)
>> [1] 3.666667 3.666667 3.666667 3.666667
>>
>> but they are nearly equal:
>>
>> > all.equal(x, rep(11/3, 4))
>> [1] TRUE
>>
>> Now factor(), and therefore table() (which seems to be using factor()
>> internally), have a different opinion:
>>
>> > factor(x)
>> [1] 3.66666666666667 3.66666666666667 3.66666666666666 3.66666666666667
>> Levels: 3.66666666666666 3.66666666666667
>>
>> > table(x)
>> x
>> 3.66666666666666 3.66666666666667
>> 1 3
>>
>> So factor() doesn't seem to be using "strict equality" or "near
>> equality" to determine the levels. What does it use? Sorry if I
>> missed it but I couldn't find any information about this in its
>> man page.
>>
>> Wouldn't it be better if factor() was consistent with either
>> duplicated() or all.equal() instead of introducing its own way
>> of comparing doubles that lies somewhere in between?
>>
>> Cheers,
>> H.
>>
>>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
>> [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
>> [7] LC_PAPER=en_US.utf8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.12.0
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-devel
mailing list