[R-sig-teaching] Uses of Normal Probability Plots
Spencer Graves
spencer.graves at effectivedefense.org
Mon May 2 15:07:26 CEST 2016
Hello:
Does anyone have a good reference on the uses of normal
probability plots?
The Wikipedia article on "Normal probability plot" includes
histograms with normal plots of normal, right skewed, and uniformly
distributed data. I'd like to expand it to include examples with
outliers, kurtosis, a need for transformations -- especially to
log-normal -- and mixtures. In addition, I'd like to include
discussions of plotting, e.g., 15 effects from a 16-run 2-level
fractional factorial identifying the significant effects as well as
outliers. I think the article should also discuss plotting multiple
lines on the same plot to compare different samples and to search for
heteroscedasticity. And I'd like to show plots with datax = both TRUE
and FALSE: The default is FALSE. However, that creates problems with
visual processing with plots that are wider than they are tall, because
research on cognitive processing of graphics indicates that human
judgements about slope are more accurate with lines near 45 degrees that
with other angles except for horizontal and vertical. (I can find a
reference; I don't have it at my fingertips.)
If I can't find such a paper on normal plots, I'd be happy to
take the lead in writing one, but I'd like to have collaborators -- and
preferably some confirmation from an R Journal editor that such an
article would likely be favorably considered; it may also need to
include discussions of normal probability plotting with traditional
graphics, lattice and ggplot2.
Thanks,
Spencer Graves
p.s. I plan to change the qqnorm labeling from "Sample Quantiles" and
"Theoretical Quantiles" to something more commonly understood like
"data" and "normal theory". "Sample Quantiles" and "Theoretical
Quantiles" are fine with an audience who know what quantiles are -- or
for a class where that's one thing you want them to know. However, they
are an obstacle to communications with a general audience -- which I
recently learned from a group of non-statisticians with whom I'm working.
As noted above, normal probability plots can detect substantive
departures from normality including outliers, skewness, kurtosis, a need
for transformations, and mixtures. They can also be used to identify
significant effects in plots of coefficients, e.g., from designed
experiments and outliers in such plots and look for heterscedasticity
between levels of explanatory variables in regression and analysis of
variance. This section gives examples of each using simulated data.
I was recently challenged by non-statistical collaborators, who
raved about what I think was a histogram suggestive of Zipf's law, when
a log-normal QQ plot of similar data suggested
More information about the R-sig-teaching
mailing list