[R] Re-binning histogram data
Duncan Murdoch
murdoch at stats.uwo.ca
Fri Jun 9 14:38:49 CEST 2006
On 6/8/2006 11:51 AM, Berton Gunter wrote:
> I would argue that histograms are outdated relics and that density plots
> (whatever your favorite flavor is) should **always** be used instead these
> days.
But my favourite density plot is a histogram!
I agree that computational complexity should weigh much less in the
decision to do something than it used to. But I'd say a histogram (with
more bins than the R default) is a good input to my mental density
estimator. Adding a rug of points below it is helpful in small
datasets. It is very easy to see how much smoothing has been done;
that's often hard to see in presentations of density plots produced in
other ways. It's also easier to recognize discrete atoms in the
distribution: they'll show up as isolated bars a lot higher than the usual.
For example, compare these two plots:
set.seed(123)
par(mfrow=c(2,1))
x <- c(rnorm(1000), rbinom(100, 3, 0.5))
hist(x, breaks=60)
plot(density(x))
This isn't a fair comparison, since I used the default bandwidth on the
smoother but not on the histogram (it would be fairer to compare to
plot(density(x,bw=0.05)) ), but I think it still illustrates my point:
in the latter density plot where the atoms are clearly visible, I still
need to read the text at the bottom to know the sample size and
bandwidth, whereas I can see those at a glance in the histogram. And an
untrained user could get a lot of information out of the histogram,
whereas they'd have a lot of trouble getting anything out of the density
plots.
>
> In this vein, I would appreciate critical rejoinders (public or private) to
> the following proposition: Given modern computer power and software like R
> on multi ghz machines, statistical and graphical relics of the pre-computer
> era (like histograms, low resolution printer-type plots, and perhaps even
> method of moments EMS calculations) should be abandoned in favor of superior
> but perhaps computation-intensive alternatives (like density plots, high
> resolution plots, and likelihood or resampling or Bayes based methods).
>
> NB: Please -- no pleadings that new methods would be mystifying to the
> non-cogniscenti. Following that to its logical conclusion would mean that
> we'd all have to give up our TV remotes and cell phones, and what kind of
> world would that be?! :-)
Now, if you were to suggest that the stem() function is a bizarre
simulation of a stone-age tool on a modern computer, I might agree.
Duncan Murdoch
>
> -- Bert Gunter
>
>
>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Petr Pikal
>> Sent: Thursday, June 08, 2006 6:17 AM
>> To: Justin Ashmall; r-help at stat.math.ethz.ch
>> Subject: Re: [R] Re-binning histogram data
>>
>>
>>
>> On 8 Jun 2006 at 11:35, Justin Ashmall wrote:
>>
>> Date sent: Thu, 8 Jun 2006 11:35:46 +0100 (BST)
>> From: Justin Ashmall <ja at space.mit.edu>
>> To: Petr Pikal <petr.pikal at precheza.cz>
>> Copies to: r-help at stat.math.ethz.ch
>> Subject: Re: [R] Re-binning histogram data
>>
>> >
>> > Thanks for the reply Petr,
>> >
>> > It looks to me that truehist() needs a vector of data just like
>> > hist()? Whereas I have histogram-style input data? Am I missing
>> > something?
>>
>> Well, maybe you could use barplot. Or as you suggested recreate the
>> original vector and call hist or truehist with other bins.
>>
>> > hhh<-hist(rnorm(1000))
>> > barplot(tapply(hhh$counts, c(rep(1:7,each=2),7), sum))
>> > tapply(hhh$mids, c(rep(1:7,each=2),7), mean)
>> 1 2 3 4 5 6 7
>> -3.00 -2.00 -1.00 0.00 1.00 2.00 3.25
>> > hhh1<-rep(hhh$mids,hhh$counts)
>> > plot(hhh, freq=F)
>> > lines(density(hhh1))
>> >
>>
>> HTH
>> Petr
>>
>>
>>
>>
>>
>>
>> >
>> > Cheers,
>> >
>> > Justin
>> >
>> >
>> >
>> > On Thu, 8 Jun 2006, Petr Pikal wrote:
>> >
>> > > Hi
>> > >
>> > > try truehist from MASS package and look for argument breaks or h.
>> > >
>> > > HTH
>> > > Petr
>> > >
>> > >
>> > >
>> > >
>> > > On 8 Jun 2006 at 10:46, Justin Ashmall wrote:
>> > >
>> > > Date sent: Thu, 8 Jun 2006 10:46:19 +0100 (BST)
>> > > From: Justin Ashmall <ja at space.mit.edu>
>> > > To: r-help at stat.math.ethz.ch
>> > > Subject: [R] Re-binning histogram data
>> > >
>> > >> Hi,
>> > >>
>> > >> Short Version:
>> > >> Is there a function to re-bin a histogram to new, broader bins?
>> > >>
>> > >> Long version: I'm trying to create a histogram, however my
>> > >> input-data is itself in the form of a fine-grained
>> histogram, i.e.
>> > >> numbers of counts in regular one-second bins. I want to produce a
>> > >> histogram of, say, 10-minute bins (though possibly irregular bins
>> > >> also).
>> > >>
>> > >> I suppose I could re-create a data set as expected by the hist()
>> > >> function (i.e. if time t=3600 has 6 counts, add six
>> entries of 3600
>> > >> to a list) however this seems neither elegant nor
>> efficient (though
>> > >> I'd be pleased to be mistaken!). I could then re-create
>> a histogram
>> > >> as normal.
>> > >>
>> > >> I guessing there's a better solution however! Apologies
>> if this is
>> > >> a basic question - I'm rather new to R and trying to get up to
>> > >> speed.
>> > >>
>> > >> Regards,
>> > >>
>> > >> Justin
>> > >>
>> > >> ______________________________________________
>> > >> R-help at stat.math.ethz.ch mailing list
>> > >> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >> PLEASE do read the posting guide!
>> > >> http://www.R-project.org/posting-guide.html
>> > >
>> > > Petr Pikal
>> > > petr.pikal at precheza.cz
>> > >
>> > >
>> >
>> > ______________________________________________
>> > R-help at stat.math.ethz.ch mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide!
>> > http://www.R-project.org/posting-guide.html
>>
>> Petr Pikal
>> petr.pikal at precheza.cz
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list