[Rd] qqline (PR#764)
Martin Maechler
Martin Maechler <maechler@stat.math.ethz.ch>
Tue, 12 Dec 2000 15:24:56 +0100
>>>>> "Setzer" == Setzer Woodrow <Setzer.Woodrow@epamail.epa.gov> writes:
Setzer> I think qqline does not do exactly what it is advertised to do
Setzer> ("`qqline' adds a line to a normal quantile-quantile plot which
Setzer> passes through the first and third quartiles.").
yes, the above may not be clear enough.
Setzer> Consider the graph:
Setzer> tmp <- qnorm(ppoints(10))
Setzer> qqnorm(tmp)
Setzer> qqline(tmp)
Setzer> The line (which I expected go through all the points), has a
Setzer> slightly shallower slope than does the points plotted by
Setzer> qqnorm. I think the problem is that qqline bases its line on
Setzer> the relationship between the quartiles in the data and the
Setzer> large sample expected quartiles for a normal distribution;
Setzer> qqnorm bases its plot on the relationship between the quantiles
Setzer> in the data and an approximation to the (finite-sample)
Setzer> expected quantiles for a normal distribution. In qqnorm, the
Setzer> x-coordinates of the first and third quartiles of the data
Setzer> vector ('tmp' in this case) are not qnorm(c(0.25,0.75)) (as
Setzer> qqline does), but rather something like
Setzer> quantile(qnorm(ppoints(length(tmp))),c(0.25,0.75)). I say
Setzer> "something like" because it is exactly right when the quartiles
Setzer> fall on data points, and an approximation otherwise.
good analysis!
Setzer> The following definition for qqline reflects this point:
Setzer> function (y, ...)
Setzer> {
Setzer> y <- y[!is.na(y)]
Setzer> n <- length(y)
Setzer> y <- quantile(y, c(0.25, 0.75))
Setzer> x <- quantile(qnorm(ppoints(n)),c(0.25, 0.75))
Setzer> slope <- diff(y)/diff(x)
Setzer> int <- y[1] - slope * x[1]
Setzer> abline(int, slope, ...)
Setzer> }
Setzer> I'm not sure it makes very much of a difference, though, for
Setzer> looking at real data, instead of something like expected
Setzer> quantiles.
The Development Version of R (R 1.2 in a few days) has
function (y, ...)
{
y <- quantile(y[!is.na(y)], c(0.25, 0.75))
x <- qnorm(c(0.25, 0.75))
slope <- diff(y)/diff(x)
int <- y[1] - slope * x[1]
abline(int, slope, ...)
}
which I think *does* what you suggest it should do.
HOWEVER I was quite a bit astonished to see
that the slope is still too small (for small sample sizes only).
par(mfrow=c(2,2))
for(n in 9:12){ x <- qnorm(ppoints(n));qqnorm(x,main=paste("n=",n));qqline(x) }
But I think we are now doing what Tukey defined in his EDA book(s)
and what the other S engines do as well.
{as a matter of fact, R should also return the (int, slope) vector !}
Note that you can also play with the " a = " argument of ppoints,
it's not directly clear to me which value is "optimal" for the above purpose...
---------
Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO D10 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._