[R] distribution of daily rainfall values in binned categories

```Hi Martin

I agree with all your previous concerns.  I was just answering her question
about visualizing frequencies for a continuous variable that is artificially
categorized.  However, she did mention the word *distribution* (a part that
appropriate. I am surprised nobody else jumped with the usual discussion
about violin plots and his friends   ;-)

Cheers

Francisco





>
> >>>>> "FJZ" == Francisco J Zagmutt <gerifalte28 at hotmail.com>
> >>>>>     on Wed, 28 Jun 2006 03:51:31 +0000 writes:
>
>     FJZ> Hi Etienne,
>     FJZ> Somebody asked a somehow related question recently.
>     FJZ> http://tolstoy.newcastle.edu.au/R/help/06/06/29485.html
>
>     FJZ> Take a look at cut? table? and barplot?
>     FJZ> i.e.
>
>       # Creates fake data from uniform(0,30)
>       set.seed(1) ## <<- added by MM
>       x=runif(50, 0,30)
>
>       # Creates categories
>       rain=cut(x,breaks=c( 0, 1,2.5,5, 10, 20, Inf))
>
>       # Creates contingency table of categories
>       tab=table(rain)
>
>       # Plots frequencies of rainfall
>       barplot(tab)
>
>
>No, no, no!  Do not confuse histograms with bar plots!
>
>-  barplot() is {one possibility} for visualizing discrete
>    ("categorical", "factor") data,
>-  hist() is for visualizing *continuous* data  (*)
>
>As Jim Porzak replied, do use hist(): the example really is a matter
>of visualization of a continuous distribution which should *not*
>be done by a barplot.  Instead, e.g.,
>
>   hist(x, breaks = c(0, 1,2.5,5, 10,20, max(pretty(max(x)))),
>        freq = TRUE, col = "gray")
>
>will give a graphic similar to the above --- BUT also
>warns you about the hidden deception (aka sillyness) of *both* graphics:
>Namely, the above hist() call warns you with
>
> >> Warning message:
> >> the AREAS in the plot are wrong -- rather use freq=FALSE in: ....
>
>and finally,
>
>   hist(x, breaks = c(0, 1,2.5,5, 10,20, max(pretty(max(x)))), col="gray")
>
>gives you a more honest graphic --- which -- for the runif()
>example -- may finally lead to you to realize that using unequal
>break may really not be such a good idea.
>Note however that for the OP rainfall data, that may well be different
>and if I look at rainfall data, I find I would rather view
>
>    hist(log10( <rainfall> ))
>or then
>    plot(density( log10( <rainfall> ) ))
>
>Martin Maechler, ETH Zurich
>
>(*) From statistical point of view, histograms just density estimators,
>     and -- as known for a while -- have quite some drawbacks.
>     Hence they should nowadays often be replaced by
>         plot(density(.), ..)
>
>
