[Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities

Duncan Murdoch murdoch at stats.uwo.ca
Tue Feb 22 22:36:20 CET 2005


On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate
<tplate at blackmesacapital.com> wrote :

>Is it a bug that quantile() can return a lower sample quantile for a higher 
>probability?
>
> > ##### quantile returns decreasing results with increasing probs (data at 
>the end of the message)
> > quantile(x2, (0:5)/5)
>            0%           20%           40%           60%           80%
>-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
>          100%
>  0.2905596324
> > ##### the 40% quantile has a lower value than the 20% quantile
> > diff(quantile(x2, (0:5)/5))
>           20%           40%           60%           80%          100%
>  5.099206e-04 -1.084202e-19  1.726945e-04  1.568908e-04  2.911342e-01
> >
>
>This only happens for type=7:
>
> > for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5, 
>type=type))<0), "\n")
>1 FALSE
>2 FALSE
>3 FALSE
>4 FALSE
>5 FALSE
>6 FALSE
>7 TRUE
>8 FALSE
>9 FALSE
> >
>
>I know this is at the limits of machine precision, but it still seems 
>wrong.  Curiously, S-PLUS 6.2 produces exactly the same numerical result on 
>my machine (according to the R quantile documentation, the S-PLUS 
>calculations correspond to type=7).

I'd say it's not a bug in that rounding error is something you should
expect in a calculation like this, but it does look wrong.  And it
isn't only restricted to type 7.  If you make a vector of copies of
that bad value, you'll see it in more cases:

> x <- rep(-0.00090419678460984, 602)
> for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5, 
+ type=type))<0), "\n")
1 FALSE 
2 FALSE 
3 FALSE 
4 FALSE 
5 TRUE 
6 TRUE 
7 TRUE 
8 FALSE 
9 TRUE 

(at least on Windows).  What's happening is that R is doing linear
interpolation between two equal values, and not getting the same value
back, because of rounding.  

The offending line appears to be this one:

qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])

The equivalent calculation in the approx function (which doesn't
appear to have this problem) is

qs[i] + (x[hi[i]] - qs[i]) * h

Can anyone think of why this would not be better?  (The same sort of
calculation shows up again later in quantile().)

Duncan Murdoch



More information about the R-devel mailing list