[R] quantile() returns a value outside the data range

Wed Aug 22 08:34:38 CEST 2007

Hi,

yes, it is all about numerical accuracy;

x <- c(6.402611, 6.402611, 6.420587)
x001 <- sapply(1:9, function(type) quantile(x, probs=0.01, type=type))
names(x001) <- NULL
print(x001)
## [1] 6.402611 6.402611 6.402611 6.402611 6.402611
## [6] 6.402611 6.402611 6.402611 6.402611

diff(x001)
## [1]  0.000000e+00  0.000000e+00  0.000000e+00
## [4]  0.000000e+00 -8.881784e-16  8.881784e-16
## [7] -8.881784e-16  8.881784e-16

So, (even) the different types of 1%-quantile estimates of 'x' differ.
 Thus, the equality part of your comparison (<=) (with sort(x)[1]) can
only return TRUE for some of the elements, but not all.

The following should do for "all.equal()" for '<=':  Assume 'x' and
'y' are scalars for simplicity.  To test 'x == y' with some precision,
you use all.equal(x,y, tolerance=eps), which tests |x-y| < eps <=>
-eps < x-y < eps where eps is a small positive number specifying your
precision.

To test 'x <= y' (<=> 'x - y <= 0') with some precision, you want to
test x - y <= eps, i.e.

lessOrEqual <- function(x, y, tolerance=.Machine$double.eps^0.5, ...) {
  (x - y <= tolerance);
}

sapply(x, x001, lessOrEqual)
> sapply(x, lessOrEqual, x001)
##       [,1] [,2]  [,3]
##  [1,] TRUE TRUE FALSE
##  [2,] TRUE TRUE FALSE
##  [3,] TRUE TRUE FALSE
##  [4,] TRUE TRUE FALSE
##  [5,] TRUE TRUE FALSE
##  [6,] TRUE TRUE FALSE
##  [7,] TRUE TRUE FALSE
##  [8,] TRUE TRUE FALSE
##  [9,] TRUE TRUE FALSE

Next time, please give a cut'n'paste reproducible example.

/Henrik

On 8/21/07, Jenny Bryan <jenny at stat.ubc.ca> wrote:
> Hello,
>
> I am getting an unexpected result from quantile().  Specifically, the
> return value falls outside the range of the data, which I wouldn't
> have thought possible for a weighted average of 2 order statistics.
> Is this an unintended accuracy issue or am I being too casual in my
> comparison (is there some analogue of 'all.equal' for "<=")?
>
> Small example:
>
>  > foo <- h2[,"17"][h2$plate==222] # some data I have, details not
> interesting
>  > foo <- sort(foo)[1:3]            # can demo with the 3 smallest
> values
>  > yo <- data.frame(matrix(foo,nrow=length(foo),ncol=10))
>  > fooQtile <- rep(0,9)
>  > for(i in 1:9) {               # compute 0.01 quantile, for all
> 'types'
> +   fooQtile[i] <- quantile(foo,probs=0.01,type=i)
> +   yo[,i+1] <- foo <= fooQtile[i]
> + }
>  > names(yo) <- c("myData",paste("qType",1:9,sep=""))
>  > yo
>      myData qType1 qType2 qType3 qType4 qType5 qType6 qType7 qType8
> qType9
> 1 6.402611   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE  FALSE  FALSE
> TRUE
> 2 6.402611   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE  FALSE  FALSE
> TRUE
> 3 6.420587  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE
> FALSE
>  > fooQtile
> [1] 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261
> 6.40261
>  > table(fooQtile)
> fooQtile
> 6.40261053520674 6.40261053520674
>                 2                7
>
> I expected the returned quantile to be either equal to or greater than
> the minimum observed value and that is the case for types 1-6 and
> 9. But for types 7 (the default) and 8, the returned quantile is less
> than the minimum observed value.  The difference between the type
> (1-6,9) and type (7,8) return values and between the returned quantile
> and the minimum is obviously very very small.
>
> Is there any choice for 'type' that is guaranteed to return values
> inside the observed range?
>
> Thanks, Jenny
>
> Dr. Jennifer Bryan
> Assistant Professor
> Department of Statistics and
>   the Michael Smith Laboratories
> University of British Columbia
> --------------------------------------------
> 333-6356 Agricultural Road
> Vancouver, BC Canada
> V6T 1Z2
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>