[R] quantile() returns a value outside the data range
Henrik Bengtsson
hb at stat.berkeley.edu
Wed Aug 22 08:34:38 CEST 2007
Hi,
yes, it is all about numerical accuracy;
x <- c(6.402611, 6.402611, 6.420587)
x001 <- sapply(1:9, function(type) quantile(x, probs=0.01, type=type))
names(x001) <- NULL
print(x001)
## [1] 6.402611 6.402611 6.402611 6.402611 6.402611
## [6] 6.402611 6.402611 6.402611 6.402611
diff(x001)
## [1] 0.000000e+00 0.000000e+00 0.000000e+00
## [4] 0.000000e+00 -8.881784e-16 8.881784e-16
## [7] -8.881784e-16 8.881784e-16
So, (even) the different types of 1%-quantile estimates of 'x' differ.
Thus, the equality part of your comparison (<=) (with sort(x)[1]) can
only return TRUE for some of the elements, but not all.
The following should do for "all.equal()" for '<=': Assume 'x' and
'y' are scalars for simplicity. To test 'x == y' with some precision,
you use all.equal(x,y, tolerance=eps), which tests |x-y| < eps <=>
-eps < x-y < eps where eps is a small positive number specifying your
precision.
To test 'x <= y' (<=> 'x - y <= 0') with some precision, you want to
test x - y <= eps, i.e.
lessOrEqual <- function(x, y, tolerance=.Machine$double.eps^0.5, ...) {
(x - y <= tolerance);
}
sapply(x, x001, lessOrEqual)
> sapply(x, lessOrEqual, x001)
## [,1] [,2] [,3]
## [1,] TRUE TRUE FALSE
## [2,] TRUE TRUE FALSE
## [3,] TRUE TRUE FALSE
## [4,] TRUE TRUE FALSE
## [5,] TRUE TRUE FALSE
## [6,] TRUE TRUE FALSE
## [7,] TRUE TRUE FALSE
## [8,] TRUE TRUE FALSE
## [9,] TRUE TRUE FALSE
Next time, please give a cut'n'paste reproducible example.
/Henrik
On 8/21/07, Jenny Bryan <jenny at stat.ubc.ca> wrote:
> Hello,
>
> I am getting an unexpected result from quantile(). Specifically, the
> return value falls outside the range of the data, which I wouldn't
> have thought possible for a weighted average of 2 order statistics.
> Is this an unintended accuracy issue or am I being too casual in my
> comparison (is there some analogue of 'all.equal' for "<=")?
>
> Small example:
>
> > foo <- h2[,"17"][h2$plate==222] # some data I have, details not
> interesting
> > foo <- sort(foo)[1:3] # can demo with the 3 smallest
> values
> > yo <- data.frame(matrix(foo,nrow=length(foo),ncol=10))
> > fooQtile <- rep(0,9)
> > for(i in 1:9) { # compute 0.01 quantile, for all
> 'types'
> + fooQtile[i] <- quantile(foo,probs=0.01,type=i)
> + yo[,i+1] <- foo <= fooQtile[i]
> + }
> > names(yo) <- c("myData",paste("qType",1:9,sep=""))
> > yo
> myData qType1 qType2 qType3 qType4 qType5 qType6 qType7 qType8
> qType9
> 1 6.402611 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
> TRUE
> 2 6.402611 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
> TRUE
> 3 6.420587 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> FALSE
> > fooQtile
> [1] 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261
> 6.40261
> > table(fooQtile)
> fooQtile
> 6.40261053520674 6.40261053520674
> 2 7
>
> I expected the returned quantile to be either equal to or greater than
> the minimum observed value and that is the case for types 1-6 and
> 9. But for types 7 (the default) and 8, the returned quantile is less
> than the minimum observed value. The difference between the type
> (1-6,9) and type (7,8) return values and between the returned quantile
> and the minimum is obviously very very small.
>
> Is there any choice for 'type' that is guaranteed to return values
> inside the observed range?
>
> Thanks, Jenny
>
> Dr. Jennifer Bryan
> Assistant Professor
> Department of Statistics and
> the Michael Smith Laboratories
> University of British Columbia
> --------------------------------------------
> 333-6356 Agricultural Road
> Vancouver, BC Canada
> V6T 1Z2
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list