[Rd] qqline (PR#764)

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Tue, 12 Dec 2000 15:24:56 +0100


>>>>> "Setzer" == Setzer Woodrow <Setzer.Woodrow@epamail.epa.gov> writes:

    Setzer> I think qqline does not do exactly what it is advertised to do
    Setzer> ("`qqline' adds a line to a normal quantile-quantile plot which
    Setzer> passes through the first and third quartiles.").  

yes, the above may not be clear enough.

    Setzer> Consider the graph:

    Setzer> tmp <- qnorm(ppoints(10))
    Setzer> qqnorm(tmp)
    Setzer> qqline(tmp)

    Setzer> The line (which I expected go through all the points), has a
    Setzer> slightly shallower slope than does the points plotted by
    Setzer> qqnorm.  I think the problem is that qqline bases its line on
    Setzer> the relationship between the quartiles in the data and the
    Setzer> large sample expected quartiles for a normal distribution;
    Setzer> qqnorm bases its plot on the relationship between the quantiles
    Setzer> in the data and an approximation to the (finite-sample)
    Setzer> expected quantiles for a normal distribution.  In qqnorm, the
    Setzer> x-coordinates of the first and third quartiles of the data
    Setzer> vector ('tmp' in this case) are not qnorm(c(0.25,0.75)) (as
    Setzer> qqline does), but rather something like
    Setzer> quantile(qnorm(ppoints(length(tmp))),c(0.25,0.75)).  I say
    Setzer> "something like" because it is exactly right when the quartiles
    Setzer> fall on data points, and an approximation otherwise.

good analysis!

    Setzer> The following definition for qqline reflects this point:

    Setzer> function (y, ...)
    Setzer> {
    Setzer> y <- y[!is.na(y)]
    Setzer> n <- length(y)
    Setzer> y <- quantile(y, c(0.25, 0.75))
    Setzer> x <- quantile(qnorm(ppoints(n)),c(0.25, 0.75))
    Setzer> slope <- diff(y)/diff(x)
    Setzer> int <- y[1] - slope * x[1]
    Setzer> abline(int, slope, ...)
    Setzer> }

    Setzer> I'm not sure it makes very much of a difference, though, for
    Setzer> looking at real data, instead of something like expected
    Setzer> quantiles.

The Development Version of R (R 1.2 in a few days) has

 function (y, ...) 
 {
     y <- quantile(y[!is.na(y)], c(0.25, 0.75))
     x <- qnorm(c(0.25, 0.75))
     slope <- diff(y)/diff(x)
     int <- y[1] - slope * x[1]
     abline(int, slope, ...)
 }

which I think *does* what you suggest it should do.

HOWEVER I was quite a bit astonished to see 
that the slope is still too small (for small sample sizes only).

 par(mfrow=c(2,2))
 for(n in 9:12){ x <- qnorm(ppoints(n));qqnorm(x,main=paste("n=",n));qqline(x) }

But I think we are now doing what Tukey defined in his EDA book(s)
and what the other S engines do as well.
 {as a matter of fact, R should also return the (int, slope) vector !}

Note that you can also play with the " a = " argument of ppoints,
it's not directly clear to me which value is "optimal" for the above purpose...

---------

Martin Maechler <maechler@stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO D10	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._