[R-sig-teaching] Uses of Normal Probability Plots

Mon May 2 15:07:26 CEST 2016

Hello:

       Does anyone have a good reference on the uses of normal 
probability plots?

       The Wikipedia article on "Normal probability plot" includes 
histograms with normal plots of normal, right skewed, and uniformly 
distributed data.  I'd like to expand it to include examples with 
outliers, kurtosis, a need for transformations -- especially to 
log-normal -- and mixtures.  In addition, I'd like to include 
discussions of plotting, e.g., 15 effects from a 16-run 2-level 
fractional factorial identifying the significant effects as well as 
outliers.  I think the article should also discuss plotting multiple 
lines on the same plot to compare different samples and to search for 
heteroscedasticity.  And I'd like to show plots with datax = both TRUE 
and FALSE:  The default is FALSE.  However, that creates problems with 
visual processing with plots that are wider than they are tall, because 
research on cognitive processing of graphics indicates that human 
judgements about slope are more accurate with lines near 45 degrees that 
with other angles except for horizontal and vertical.  (I can find a 
reference;  I don't have it at my fingertips.)

       If I can't find such a paper on normal plots, I'd be happy to 
take the lead in writing one, but I'd like to have collaborators -- and 
preferably some confirmation from an R Journal editor that such an 
article would likely be favorably considered;  it may also need to 
include discussions of normal probability plotting with traditional 
graphics, lattice and ggplot2.

       Thanks,

       Spencer Graves

p.s.  I plan to change the qqnorm labeling from "Sample Quantiles" and 
"Theoretical Quantiles" to something more commonly understood like 
"data" and "normal theory".  "Sample Quantiles" and "Theoretical 
Quantiles" are fine with an audience who know what quantiles are -- or 
for a class where that's one thing you want them to know.  However, they 
are an obstacle to communications with a general audience -- which I 
recently learned from a group of non-statisticians with whom I'm working.

As noted above, normal probability plots can detect substantive 
departures from normality including outliers, skewness, kurtosis, a need 
for transformations, and mixtures.  They can also be used to identify 
significant effects in plots of coefficients, e.g., from designed 
experiments and outliers in such plots and look for heterscedasticity 
between levels of explanatory variables in regression and analysis of 
variance.  This section gives examples of each using simulated data.

       I was recently challenged by non-statistical collaborators, who 
raved about what I think was a histogram suggestive of Zipf's law, when 
a log-normal QQ plot of similar data suggested