[R] Plotting positions in qqnorm?

Fri Apr 14 15:23:16 CEST 2006

On Fri, 14 Apr 2006, David Scott wrote:

> On Thu, 13 Apr 2006, Spencer Graves wrote:
>
>> 	  Do you know of a reference that discusses alternative choices for
>> plotting positions for a normal probability plot?  The documentation for
>> qqnorm says it calls ppoints, which returns qnorm((1:m-a)/(m+1-2*a))
>> with "a" = ifelse(n<=10, 3/8, 1/2)?  The help pages for qqnorm and
>> ppoints just refer to Becker, Chambers and Wilks (1988) The New S
>> Language (Wadsworth & Brooks/Cole), and I couldn't find any discussion
>> of this.

It's there, on the printed help page for ppoints.

>> 	  I seem to recall that this was discussed in 1960 or earlier in a
>> paper by Anscombe, but I can't find a reference and I wonder if someone
>> might suggest something else.  I've been asked to comment on specialized
>> software that allows the user to select "a" = +/-0.5, 0, 0.3, and 0.3175
>> (but not 0.375 = 3/8, curiously).
>>
>> 	  I'd also be interested in any examples of real data sets where the
>> choice of "a" actually made a difference.  When I've had so few data
>> points that the choice for "a" might make a difference, a normal
>> probability plot was not very informative, anyway, and I get more
>> information from a simple dot plot.  If your experience is different,
>> I'd like to know.
>>
>> 	  Thanks,
>> 	  Spencer Graves
>>
> I suspect that what you are looking for is this paper:
>
> Hyndman, Rob J. and Fan, Yanan (1996)
> Sample quantiles in statistical packages
> The American Statistician, 50, 361-365
>
> which discusses different definitions of sample quantiles. See also the
> documentation for quantile.

The adjustment is for the population and not the sample quantiles.  But 
the article is in a small part relevant as it discusses definitions of QQ 
plots, and refers to the work of Blom (see below).

The usual reason given is that 1-sample QQ plots should be against not the 
population quantiles but against the expected order statistics of a sample 
of size n from the population.  That's where the adjustment comes from, 
and is related to the type 9 in ?quantile (although that I think needs to 
say what they are supposed to be unbiased estimators of).

Unfortunately the CVS data is no longer available for package stats prior 
to its split from base.  But see the thread starting

http://www.r-project.org/nocvs/mail/r-devel/1998/0587.html

which is unfortunately typical of the misrepresentation that is far too 
prevalent here: MASS2 says what the values _were_ in S, not what they 
should be.

This is a Blue Book function, and that contains a reference to

G. Blom (1958) Statistical Estimates and Transformed Beta Variables. 
Wiley.

That's not readily available to me (it is in the library stacks), but 
maybe someone interested would like to look it up to expand on the brief 
mention in Hyndman & Fan.

Incidentally, S-PLUS (>= 6.2) differs from S and uses a=1/2.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595