[R] Question about quantile.default
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sat Oct 4 10:34:14 CEST 2008
Ben Bolker wrote:
> Lo, Ken <ken.lo <at> roche.com> writes:
>
>
>> Hi all,
>>
>> I am running into a snag using quantile function in stats. Basically, I
>> don't understand why the loop below throws the error that it does.
>>
>> test.data <- rnorm(1000, 0, 1)
>>
>> for (i in seq(0.00001, 0.001, 0.00001)){
>> test <- quantile(test.data, probs=seq(0,1,i));
>> print(i);
>> }
>>
>> It runs fine from 1e-05 to 0.00024, but then throws the error
>>
>> Error in quantile.default(test.data, probs=seq(0,1,i)):
>> 'probs' outside [0,1]
>>
>> I tested it some more by using
>>
>> test <- quantile(test.data, probs=seq(0,1,0.00024));
>> test <- quantile(test.data, probs=seq(0,1,0.00025));
>>
>> both ran fine. So, I'm baffled as to what the error actually is.
>>
>>
>
> This is a result of numerical imprecision
> (see FAQ 7.31), but I think it's not your fault.
>
> set.seed(1001) ## for reproducibility, although not
> ## really necessary here
>
> test.data <- rnorm(1000,0,1)
> for (i in seq(0.00001, 0.001, 0.00001)){
> test <- quantile(test.data, probs=seq(0,1,i))
> print(i)
> }
>
> fails as reported
> let's see what's going on --
>
>
> zz <- seq(1e-5,1e-3,1e-5)
> test <- quantile(test.data, probs=seq(0,1,2.5e-4)) ## OK
> test <- quantile(test.data, probs=seq(0,1,zz[25])) ## not OK
>
> zz[25]-2.5e-4
> [1] 5.421011e-20
>
> so let's look closer ...
>
>
> zzz <- range(seq(0,1,2.5e-4))
> zzz
> zzz-c(0,1)
>
> zzz2 <- range(seq(0,1,zz[25]))
> zzz2
> zzz2-c(0,1)
> [1] 0.000000e+00 2.220446e-16
>
> so the results of the seq() call exceed 1 by a little bit.
>
> I think this may actually count as a bug in R (congratulations!)
> the documentation for ?seq says.
>
> ## The second form generates 'from, from+by', ..., up to the sequence
> ## value LESS THAN OR EQUAL TO 'TO'. Specifying 'to - from' and 'by'
> ## of opposite signs is an error.
> [emphasis added]
>
> 1 + 2.220446e-16 (which is the maximum of the
> sequence that 'seq' generates) IS NOT <= 1 ...
>
> In the meantime, here's a workaround ...
>
> for (i in seq(0.00001, 0.001, 0.00001)){
> test <- quantile(test.data, probs=pmin(seq(0,1,i),1))
> print(i)
> }
>
> If I don't hear otherwise I will submit this as
> a bug to r-devel ...
>
It IS deliberate.... We discussed this when R was still in the toddler
stages.
If you don't allow slight overruns like that, you end up with sequences
sometimes being one element short, so seq() has a small "do what I mean"
fudge factor built in. We might polish off the end value though, or fix
quantile() to be less hysterical about values slightly out of range.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list