[Rd] Match .3 in a sequence

Petr Savicky savicky at cs.cas.cz
Tue Mar 17 07:07:49 CET 2009


On Mon, Mar 16, 2009 at 07:39:23PM -0400, Stavros Macrakis wrote:
...
> Let's look at the extraordinarily poor behavior I was mentioning. Consider:
> 
> nums <- (.3 + 2e-16 * c(-2,-1,1,2)); nums
> [1] 0.3 0.3 0.3 0.3
> 
> Though they all print as .3 with the default precision (which is
> normal and expected), they are all different from .3:
> 
> nums - .3 =>  -3.885781e-16 -2.220446e-16  2.220446e-16  3.885781e-16
> 
> When we convert nums to a factor, we get:
> 
> fact <- as.factor(nums); fact
> [1] 0.300000000000000 0.3               0.3               0.300000000000000
> Levels: 0.300000000000000 0.3 0.3 0.300000000000000
> 
> Not clear what the difference between 0.300000000000000 and 0.3 is
> supposed to be, nor why some 0.300000000000000 are < .3 and others are
...

When creating a factor from numeric vector, the list of levels and the
assignment of original elements to the levels is done using
double precision. Since the four elements in the vector are distinct,
we get four distinct levels. After this is done, the levels attribute is
formed using as.character(). This can map different numbers to the same
string, so in the example above, this leads to a factor, which contains
repeated levels.

This part of the problem may be avoided using

  fact <- as.factor(as.character(nums)); fact
  [1] 0.300000000000000 0.3               0.3               0.300000000000000
  Levels: 0.3 0.300000000000000

The reason for having 0.300000000000000 and 0.3 is that as.character()
works the same as printing with digits=15. The R printing mechanism
works in two steps. In the first step it tries to determine the shortest 
format needed to achieve the required relative precision of the output.
This step uses an algorithm, which need not provide an accurate result.
The next step is that the number is printed using C function sprintf
with the chosen format. This step is accurate, so we cannot get wrong
digits. We only can get wrong number of digits.

In order to avoid using 15 digits in as.character(), we can use round(,digits),
with digits argument appropriate for the current situation.

  > fact <- as.factor(round(nums,digits=1)); fact
  [1] 0.3 0.3 0.3 0.3
  Levels: 0.3

Petr.



More information about the R-devel mailing list