[R] Displaying a distribution -- was: Combining two histograms
Mulholland, Tom
Tom.Mulholland at dpi.wa.gov.au
Thu Feb 3 02:30:33 CET 2005
I am immediately reminded of something I read which goes
"A sufficiently trained statistician can read the vagaries of a Q-Q plot like a sharman can read a chicken's entrails, with a similar recourse to scientific principles. Interpreting Q-Q plots is more a visceral than an intellectual exercise. The uninitiated are often mystified by the process. Experience is the key here."
http://www.maths.murdoch.edu.au/units/statsnotes/samplestats/qqplot.html
Having said that I would suggest many people have difficulty understanding density plots, but think that they can understand histograms.
I am currently undergoing shaman training ;-) and find that my interpretation of the plots owes more to experience than it does to a structured method of analysis. I see the technique as additional rather than as a replacement for density estimates. As for the order of exploration, I tend to be non-linear in my explorations. In my perfect world I would like them to be simultaneous. The order of any information presentation can impact upon the output, so I tend to have lists of processes to be done without pre-ordaining the order. It could be that I see exploration as a different process to analysis. That is I am more ad-hoc with the generation of pieces of the puzzle and more structured with putting the picture together.
Tom.
> -----Original Message-----
> From: Berton Gunter [mailto:gunter.berton at gene.com]
> Sent: Thursday, 3 February 2005 12:52 AM
> To: 'Deepayan Sarkar'; r-help at stat.math.ethz.ch
> Subject: [R] Displaying a distribution -- was: Combining two
> histograms
>
>
> May I take this off topic a little to seek collective wisdom
> (and so feel
> free to reply privately).
>
> The catalyst is Deepayan's remark:
>
> > Histograms were appropriate for drawing density estimates by
> > hand in the good old days, but I can imagine very few
> situations where I
> > would not prefer to use smoother density estimates when I have the
> > computational power to do so.
> >
> > Deepayan
>
> Generally, I agree; but the appearance and thus one's perception and
> interpretation of both histograms and density plots depend upon the
> parameters chosen for the display (bin boundaries for
> histograms; bandwidth
> and kernel for density plots). Important data peculiarities
> like arbitrary
> rounding, favoring of certain values, resolution limitations,
> and so forth
> are therefore often lost. I would instead advocate that
> simple quantile
> plots -- plot(ppoints(x),sort(x)) -- or perhaps normal
> qqplots always be the
> first plot used to explore univariate data distributions. I
> believe this
> conforms to Bill Cleveland's recommendations, who says in the
> first sentence
> on p. 17 of VISUALIZING DATA on visualizing univariate data:
> "Quantiles are
> essential to visualizing distributions."
>
> While it is true that many people may be unfamiliar with
> quantile plots, I
> think we need to improve modern statistical practice not only
> by abandoning
> histograms in favor of density plots, but also by always first using
> quantile plots and explaining why this is necessary.
>
> Difficult issue: What should one do when when there are, say,
> a million
> values?
>
> Alternative views?
>
>
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
>
> "The business of the statistician is to catalyze the
> scientific learning
> process." - George E. P. Box
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list