[R] Question about quantile.default

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sat Oct 4 10:34:14 CEST 2008


Ben Bolker wrote:
> Lo, Ken <ken.lo <at> roche.com> writes:
>
>   
>> Hi all,
>>
>> I am running into a snag using quantile function in stats.  Basically, I
>> don't understand why the loop below throws the error that it does.
>>
>> test.data <- rnorm(1000, 0, 1)
>>
>> for (i in seq(0.00001, 0.001, 0.00001)){
>> 	test <- quantile(test.data, probs=seq(0,1,i));
>> 	print(i);
>> }
>>
>> It runs fine from 1e-05 to 0.00024, but then throws the error
>>
>> Error in quantile.default(test.data, probs=seq(0,1,i)):
>> 'probs' outside [0,1]
>>
>> I tested it some more by using
>>
>> test <- quantile(test.data, probs=seq(0,1,0.00024));
>> test <- quantile(test.data, probs=seq(0,1,0.00025));
>>
>> both ran fine.  So, I'm baffled as to what the error actually is.
>>
>>     
>
>  This is a result of numerical imprecision
> (see FAQ 7.31), but I think it's not your fault.
>
> set.seed(1001) ## for reproducibility, although not
>  ## really necessary here
>
> test.data <- rnorm(1000,0,1)
> for (i in seq(0.00001, 0.001, 0.00001)){
>  	test <- quantile(test.data, probs=seq(0,1,i))
>  	print(i)
> }
>
> fails as reported
> let's see what's going on --
>
>   
> zz <- seq(1e-5,1e-3,1e-5)
> test <- quantile(test.data, probs=seq(0,1,2.5e-4)) ## OK
> test <- quantile(test.data, probs=seq(0,1,zz[25])) ## not OK
>
> zz[25]-2.5e-4
> [1] 5.421011e-20
>
> so let's look closer ...
>
>
> zzz <- range(seq(0,1,2.5e-4))
> zzz
> zzz-c(0,1)
>
> zzz2 <- range(seq(0,1,zz[25]))
> zzz2
> zzz2-c(0,1)
> [1] 0.000000e+00 2.220446e-16
>
> so the results of the seq() call exceed 1 by a little bit.
>
> I think this may actually count as a bug in R (congratulations!)
> the documentation for ?seq says.
>
> ##     The second form generates 'from, from+by', ..., up to the sequence
> ##     value LESS THAN OR EQUAL TO 'TO'.  Specifying 'to - from' and 'by'
> ##     of opposite signs is an error.
> [emphasis added]
>
> 1 + 2.220446e-16 (which is the maximum of the
> sequence that 'seq' generates) IS NOT <= 1 ...
>
> In the meantime, here's a workaround ...
>
> for (i in seq(0.00001, 0.001, 0.00001)){
>   test <- quantile(test.data, probs=pmin(seq(0,1,i),1))
>   print(i)
> }
>
>   If I don't hear otherwise I will submit this as
> a bug to r-devel  ...
>  
It IS deliberate.... We discussed this when R was still in the toddler 
stages.

If you don't allow slight overruns like that, you end up with sequences 
sometimes being one element short, so seq() has a small "do what I mean" 
fudge factor built in. We might polish off the end value though, or fix 
quantile() to be less hysterical about values slightly out of range.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-help mailing list