[R] qqplot for count data
Jean-Christophe BOUËTTÉ
jcbouette at gmail.com
Thu Sep 1 16:39:47 CEST 2011
Dear list,
I just tried to do the same thing, and did not find anything on a
weighted qqplot. My weights are actually counts (positive integers).
Here is a modification of qqplot, following Duncan Murdoch's
suggestion. Any feedback would be welcome!
Thanks,
Jean-Christophe
weighted.qqplot <- function (x, y,
plot.it = TRUE, xlab = deparse(substitute(x)),
ylab = deparse(substitute(y)), x.counts=rep(1L,length.out=length(x)),
y.counts=rep(1L,length.out=length(y)), ...){
sx <- sort(x)
sy <- sort(y)
swx <- cumsum(x.counts[order(x)])
swy <- cumsum(y.counts[order(y)])
lenx <- length(sx)
leny <- length(sy)
sx <- approx(swx, sx, n=min(lenx,leny))$y
sy <- approx(swy, sy, n=min(lenx,leny))$y
if (plot.it)
plot(sx, sy, xlab = xlab, ylab = ylab, ...)
invisible(list(x = sx, y = sy))
}
#Sample example
n <- 15
a <- runif(n);b <- 1L:length(a);x <- rep(a,b)
c <- runif(n);d <- length(c):1L;y <- rep(c,d)
weighted.qqplot(x,y,type="b")
par(new=TRUE)
weighted.qqplot(a,c,x.counts=b,y.counts=d,type="b",pch="*",col="grey")
par(new=TRUE)
qqplot(x,y,type="b",pch="+",col="red")
From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Thu 16 Mar 2006 - 05:50:27 EST
On 3/15/2006 1:38 PM, Vivek Satsangi wrote:
> Folks,
> I am documenting what I finally did, for the next person who comes along...
>
> Following Dr. Murdoch's suggestion, I looked at qqplot. The following
> approach might be helpful to get to the same information as given by
> qqplot.
> To summarize the ask: given x, y, xw and yw, show (visually is okay)
> whether a and b are from the same distribution. xw is the weight of
> each x observation and yw is the weight of each y observation.
>
> Put x and xw into a dataframe.
> Sort by x.
> Calculate cumulative x weights, normalized to total 1.
>
> Put y and yw into a dataframe.
> Sort by y
> Calculate cumulative weights, normalized to total 1.
>
> Plot x and y against cumulative normalized weights. The shapes of the
> two lines should be similar (to the eye)-- or the distribution is
> "different".
One variation that would make the result more like a qqplot: you could
work out a vector of weights w (perhaps the cumulative weights from x
or from y or perhaps something else) and plot y(w) versus x(w), where
y(w) and x(w) are the linear interpolation values that approx gives
you.
Duncan Murdoch
More information about the R-help
mailing list