[R] Displaying a distribution -- was: Combining two histograms

Mulholland, Tom Tom.Mulholland at dpi.wa.gov.au
Thu Feb 3 02:30:33 CET 2005


I am immediately reminded of something I read which goes

"A sufficiently trained statistician can read the vagaries of a Q-Q plot like a sharman can read a chicken's entrails, with a similar recourse to scientific principles. Interpreting Q-Q plots is more a visceral than an intellectual exercise. The uninitiated are often mystified by the process. Experience is the key here."

http://www.maths.murdoch.edu.au/units/statsnotes/samplestats/qqplot.html

Having said that I would suggest many people have difficulty understanding density plots, but think that they can understand histograms.

I am currently undergoing shaman training ;-) and find that my interpretation of the plots owes more to experience than it does to a structured method of analysis. I see the technique as additional rather than as a replacement for density estimates. As for the order of exploration, I tend to be non-linear in my explorations. In my perfect world I would like them to be simultaneous. The order of any information presentation can impact upon the output, so I tend to have lists of processes to be done without pre-ordaining the order. It could be that I see exploration as a different process to analysis. That is I am more ad-hoc with the generation of pieces of the puzzle and more structured with putting the picture together.

Tom.

> -----Original Message-----
> From: Berton Gunter [mailto:gunter.berton at gene.com]
> Sent: Thursday, 3 February 2005 12:52 AM
> To: 'Deepayan Sarkar'; r-help at stat.math.ethz.ch
> Subject: [R] Displaying a distribution -- was: Combining two 
> histograms
> 
> 
> May I take this off topic a little to seek collective wisdom 
> (and so feel
> free to reply privately).
> 
> The catalyst is Deepayan's remark:
> 
> > Histograms were appropriate for drawing density estimates by 
> > hand in the  good old days, but I can imagine very few 
> situations where I 
> > would not prefer to use smoother density estimates when I have the 
> > computational power to do so.
> > 
> > Deepayan
> 
> Generally, I agree; but the appearance and thus one's perception and
> interpretation of both histograms and density plots depend upon the
> parameters chosen for the display (bin boundaries for 
> histograms; bandwidth
> and kernel for density plots). Important data peculiarities 
> like arbitrary
> rounding, favoring of certain values, resolution limitations, 
> and so forth
> are therefore often lost. I would instead advocate that 
> simple quantile
> plots -- plot(ppoints(x),sort(x)) -- or perhaps normal 
> qqplots always be the
> first plot used to explore univariate data distributions. I 
> believe this
> conforms to Bill Cleveland's recommendations, who says in the 
> first sentence
> on p. 17 of VISUALIZING DATA on visualizing univariate data: 
> "Quantiles are
> essential to visualizing distributions."
> 
> While it is true that many people may be unfamiliar with 
> quantile plots, I
> think we need to improve modern statistical practice not only 
> by abandoning
> histograms in favor of density plots, but also by always first using
> quantile plots and explaining why this is necessary.
> 
> Difficult issue: What should one do when when there are, say, 
> a million
> values?
> 
> Alternative views?
> 
>  
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
>  
> "The business of the statistician is to catalyze the 
> scientific learning
> process."  - George E. P. Box
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list