[R] maximum difference between two ECDF's
Bart Vandewoestyne
Bart.Vandewoestyne at telenet.be
Thu Jun 28 11:42:28 CEST 2007
Hello,
I have a vector of samples x of length N. Associated with each
sample x_i is a certain weight w_i. All the weights are in another
vector w of the same length N.
I have another vector of samples y of length n (small n). All
these samples have equal weights 1/n. The ECDF of these samples
is defined as for example at
http://en.wikipedia.org/wiki/Empirical_distribution_function and
I can compute it using the ecdf() function in R.
I define the 'ECDF' of the samples x with their associated
weights in the following way:
F_N(x) = 1/N * sum_{i=1}^{N}w_i * Indicator(x_i <= x)
(does this 'ECDF' have another name???)
So it's basically the same formula as the one on the above URL, but the
only difference is that I multiply the indicator function for x_i with
the weight w_i.
Now suppose F_n(x) is the ECDF of the n samples with equal
weights 1/n, and F_N(x) is the 'ECDF' of the other samples with
their associated weights.
What I now would like to compute is the maximum difference
between these two, so:
max(abs(F_N(x)-F_n(x)))
So it's like computing the Kolmogorov-Smirnov statistic of two
discrete CDF's.
If i didn't have these weights, or if one of the two was a
continuous CDF, then I could simply use the ks.test() function.
However, my situation is different... my first set of samples has
associated weights and therefore the 'ECDF' has a slightly
different definition.
How can I compute max(abs(F_N(x)-F_n(x))) ? Do there exist
standard functions for this?
Thanks,
Bart
--
"Share what you know. Learn what you don't."
More information about the R-help
mailing list