[Rd] bug? quantile() can return decreasing sample
quantiles for increasing probabilities
Tony Plate
tplate at blackmesacapital.com
Tue Feb 22 23:14:00 CET 2005
Thanks for the diagnosis.
The reason I came across this was that I use both S-PLUS and R and I often
use the results of quantile() as the breaks for cut(). In S-PLUS, cut()
stops with an error if breaks has any decreasing values. Thus this example
caused an S-PLUS function to unexpectedly stop with an error. However,
cut() in R behaves differently: it sorts its breaks and thus does not
object to decreasing values in breaks. Another difference is that cut() in
R stops with an error if any breaks are duplicated, which, I guess, means
that in R I should use findInterval() instead of cut() for this
task. Except that findInterval() in R stops with an error if its breaks
are unsorted...
> findInterval(x2, quantile(x2, (0:5)/5))
Error in findInterval(x2, quantile(x2, (0:5)/5)) :
'vec' must be sorted non-decreasingly
>
-- Tony Plate
At Tuesday 02:36 PM 2/22/2005, Duncan Murdoch wrote:
>On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate
><tplate at blackmesacapital.com> wrote :
>
> >Is it a bug that quantile() can return a lower sample quantile for a higher
> >probability?
> >
> > > ##### quantile returns decreasing results with increasing probs (data at
> >the end of the message)
> > > quantile(x2, (0:5)/5)
> > 0% 20% 40% 60% 80%
> >-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
> > 100%
> > 0.2905596324
> > > ##### the 40% quantile has a lower value than the 20% quantile
> > > diff(quantile(x2, (0:5)/5))
> > 20% 40% 60% 80% 100%
> > 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01
> > >
> >
> >This only happens for type=7:
> >
> > > for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5,
> >type=type))<0), "\n")
> >1 FALSE
> >2 FALSE
> >3 FALSE
> >4 FALSE
> >5 FALSE
> >6 FALSE
> >7 TRUE
> >8 FALSE
> >9 FALSE
> > >
> >
> >I know this is at the limits of machine precision, but it still seems
> >wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on
> >my machine (according to the R quantile documentation, the S-PLUS
> >calculations correspond to type=7).
>
>I'd say it's not a bug in that rounding error is something you should
>expect in a calculation like this, but it does look wrong. And it
>isn't only restricted to type 7. If you make a vector of copies of
>that bad value, you'll see it in more cases:
>
> > x <- rep(-0.00090419678460984, 602)
> > for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5,
>+ type=type))<0), "\n")
>1 FALSE
>2 FALSE
>3 FALSE
>4 FALSE
>5 TRUE
>6 TRUE
>7 TRUE
>8 FALSE
>9 TRUE
>
>(at least on Windows). What's happening is that R is doing linear
>interpolation between two equal values, and not getting the same value
>back, because of rounding.
>
>The offending line appears to be this one:
>
>qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
>
>The equivalent calculation in the approx function (which doesn't
>appear to have this problem) is
>
>qs[i] + (x[hi[i]] - qs[i]) * h
>
>Can anyone think of why this would not be better? (The same sort of
>calculation shows up again later in quantile().)
>
>Duncan Murdoch
More information about the R-devel
mailing list