[R] rep() fails at times=0.29*100
(Ted Harding)
Ted.Harding at wlandres.net
Tue Apr 9 18:56:36 CEST 2013
[See at end]
On 09-Apr-2013 16:11:18 Jorge Fernando Saraiva de Menezes wrote:
> Dear list,
>
> I have found an unusual behavior and would like to check if it is a
> possible bug, and if updating R would fix it. I am not sure if should post
> it in this mail list but I don't where is R bug tracker. The only mention I
> found that might relate to this is "If times is a computed quantity it is
> prudent to add a small fuzz." in rep() help, but not sure if it is related
> to this particular problem
>
> Here it goes:
>
>> rep(TRUE,29)
> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> [28] TRUE TRUE
>> rep(TRUE,0.29*100)
> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> [28] TRUE
>> length(rep(TRUE,29))
> [1] 29
>> length(rep(TRUE,0.29*100))
> [1] 28
>
> Just to make sure:
>> 0.29*100
> [1] 29
>
> This behavior seems to be independent of what is being repeated (rep()'s
> first argument)
>> length(rep(1,0.29*100))
> [1] 28
>
> Also it occurs only with the 0.29.
>> length(rep(1,0.291*100))
> [1] 29
>> for(a in seq(0,1,0.01)) {print(sum(rep(TRUE,a*100)))} #also shows correct
> values in values from 0 to 1 except for 0.29.
>
> I have confirmed that this behavior happens in more than one machine
> (though I only have session info of this one)
>
>
>> sessionInfo()
> R version 2.15.3 (2013-03-01)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> [1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252
> LC_MONETARY=Portuguese_Brazil.1252
> [4] LC_NUMERIC=C LC_TIME=Portuguese_Brazil.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] spatstat_1.31-1 deldir_0.0-21 mgcv_1.7-22
>
> loaded via a namespace (and not attached):
> [1] grid_2.15.3 lattice_0.20-13 Matrix_1.0-11 nlme_3.1-108
> tools_2.15.3
The basic issue is, believe or not, that despite apparently:
0.29*100
# [1] 29
in "reality":
0.29*100 == 29
# [1] FALSE
In other words, as computed by R, 0.29*100 is not exactly equal to 29:
29 - 0.29*100
# [1] 3.552714e-15
The difference is tiny, but it is sufficient to make 0.29*100 slightly
smaller than 29, so rep(TRUE,0.29*100) uses the largest integer compatible
with "times = 0.29*100", i.e. 28. Hence the recommendation to "add a
little fuzz".
On the other hand, when you use rep(1,0.291*100) you will be OK:
This is because:
29 - 0.291*100
# [1] -0.1
so 0.291*100 is comfortably greater than 29 (but well clear of 30).
The reason for the small inaccuracy (compared with "mathematical
truth") is that R performs numerical calculations using binary
representations of numbers, and there is no exact binary representation
of 0.29, so the result of 0.29*100 will be slightly inaccurate.
If you do need to do this sort of thing (e.g. the value of "times"
will be the result of a calculation) then one useful precaution
could be to round the result:
round(0.29*100)
# [1] 29
29-round(0.29*100)
# [1] 0
length(rep(TRUE,0.29*100))
# [1] 28
length(rep(TRUE,round(0.29*100)))
# [1] 29
(The default for round() is 0 decimal places, i.e. it rounds to
an integer).
So, compared with:
0.29*100 == 29
# [1] FALSE
we have:
round(0.29*100) == 29
# [1] TRUE
Hoping this helps,
Ted.
-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 09-Apr-2013 Time: 17:56:33
This message was sent by XFMail
More information about the R-help
mailing list