[R] (newbie) Weighted qqplot?
murdoch at stats.uwo.ca
Wed Mar 15 19:50:27 CET 2006
On 3/15/2006 1:38 PM, Vivek Satsangi wrote:
> I am documenting what I finally did, for the next person who comes along...
> Following Dr. Murdoch's suggestion, I looked at qqplot. The following
> approach might be helpful to get to the same information as given by
> To summarize the ask: given x, y, xw and yw, show (visually is okay)
> whether a and b are from the same distribution. xw is the weight of
> each x observation and yw is the weight of each y observation.
> Put x and xw into a dataframe.
> Sort by x.
> Calculate cumulative x weights, normalized to total 1.
> Put y and yw into a dataframe.
> Sort by y
> Calculate cumulative weights, normalized to total 1.
> Plot x and y against cumulative normalized weights. The shapes of the
> two lines should be similar (to the eye)-- or the distribution is
One variation that would make the result more like a qqplot: you could
work out a vector of weights w (perhaps the cumulative weights from x or
from y or perhaps something else) and plot y(w) versus x(w), where y(w)
and x(w) are the linear interpolation values that approx gives you.
> On 3/15/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>> On 3/15/2006 8:31 AM, Vivek Satsangi wrote:
>> > Folks,
>> > Normally, in a data frame, one observation counts as one observation
>> > of the distribution. Thus one can easily produce a CDF and (in Splus
>> > atleast) use cdf.compare to compare the CDF (BTW: what is the R
>> > equivalent of the SPlus cdf.compare() function, if any?)
>> > However, if each point should not count equally, how can I weight the
>> > points before comparing the distributions? I was thinking of somehow
>> > creating multiple observations for each actual observation based on
>> > weights and creating a new dataframe etc. -- but that seem excessive.
>> > Surely there is a simpler way?
>> >> x <- rnorm(100)
>> >> y <- rnorm(10)
>> >> xw <- rnorm(100) * 1.73 # The weights. These won't add up to 1 or N or anything because of missing values.
>> >> yw <- rnorm(10) * 6.23 # The weights. These won't add up to 1 or to the same number as xw.
>> >> # The question to answer is, how can I create a qq plot or cdf compare of x vs. y, weighted by their weights, xw and yw (to eventually figure out if y comes from the population x, similar to Kolmogorov-Smirnov GOF)?
>> >> qqplot(x,y) # What now?
>> qqplot doesn't support weights, but it's a simple enough function that
>> you could write a version that did. Look at the cases where length(x)
>> is not equal to length(y): e.g. if length(y) < length(x), qqplot
>> constructs a linear approximation to a function mapping 1:nx onto the
>> sorted x values, then takes length(y) evenly spaced values from that
>> function. You want to do the same sort of thing, except that instead of
>> even spacing, you want to look at the cumulative sums of the weights.
>> You might want to use some kind of graphical indicator of whether points
>> are heavily weighted or not, but I don't know what to recommend for that.
>> By the way, your example above will give negative weights in xw and yw;
>> you probably won't like the results if you do that.
>> Duncan Murdoch
> -- Vivek Satsangi
> Student, Rochester, NY USA
> Life is short, the art long, opportunity fleeting, experiment
> treacherous, judgement difficult.
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help