[R] CDF of Sample Quantile
Duncan Murdoch
murdoch.duncan at gmail.com
Mon Feb 14 20:26:01 CET 2011
On 14/02/2011 9:58 AM, Bentley Coffey wrote:
> I need to calculate the probability that a sample quantile will exceed a
> threshold given the size of the iid sample and the parameters describing the
> distribution of each observation (normal, in my case). I can compute the
> probability with brute force simulation: simulate a size N sample, apply R's
> quantile() function on it, compare it to the threshold, replicate this MANY
> times, and count the number of times the sample quantile exceeded the
> threshold (dividing by the total number of replications yields the
> probability of interest). The problem is that the number of replications
> required to get sufficient precision (3 digits say) is so HUGE that this
> takes FOREVER. I have to perform this task so much in my script (searching
> over the sample size and repeated for several different distribution
> parameters) that it takes too many hours to run.
>
> I've searched for pre-existing code to do this in R and haven't found
> anything. Perhaps I'm missing something. Is anyone aware of an R function to
> compute this probability?
>
> I've tried writing my own code using the fact that R's quantile() function
> is a linear combination of 2 order statistics. Basically, I wrote down the
> mathematical form of the joint pdf for the 2 order statistics (a function of
> the sample size and the distribution parameters) then performed a
> pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's
> random draws) over the region where the sample quantile exceeds the
> threshold. In theory, this should work and it takes about 1000 times fewer
> clock cycles to compute than the Brute Force approach. My problem is that
> there is a significant discrepancy between the results using Brute Force and
> using this more efficient approach that I have coded up. I believe that the
> problem is numerical error but it could be some programming bug; regardless,
> I have been unable to locate the source of this problem and have spent over
> 20 hours trying to identify it this weekend. Please, somebody help!!!
>
> So, again, my question: is there code in R for quickly evaluating the CDF of
> a Sample Quantile given the sample size and the parameters governing the
> distribution of each iid point in the sample?
I think the answer to your question is no, but I think it's the wrong
question. Suppose Xm:n is the mth sample quantile in a sample of size
n, and you want to calculate P(Xm:n > x). Let X be a draw from the
underlying distribution, and suppose P(X > x) = p. Then the event Xm:n > x
is the same as the event that out of n independent draws of X, at least
n-m+1 are bigger than x: a binomial probability. And R can calculate
binomial probabilities using pbinom().
Duncan Murdoch
More information about the R-help
mailing list