[R] rootogram for normal distributions

Deepayan Sarkar deepayan.sarkar at gmail.com
Tue Jan 18 11:23:33 CET 2011


On Mon, Jan 17, 2011 at 8:24 PM, S Ellison <S.Ellison at lgc.co.uk> wrote:
> I was distracted enough by the possibility of hijacking hist() for this
> to give it a go.
>
> The following code implements a basic hanging rootogram based on a
> normal density with hist() breaks used as bins and bin midpoints used as
> the hanging location (not exact, I suspect, but perhaops good enough).
> Extensions to other distributions are reasonably obvious.

This implements the "hanging", but misses the "root" part  --- you are
supposed to plot the square roots of the frequencies. Easy enough to
fix though.

-Deepayan

>
> S Ellison
>
>
> rootonorm <- function(x, breaks="Sturges", col="lightgrey", gap=0.2,
> ...) {
>        h<-hist(x, breaks=breaks)
>        nbins<-length(h$counts)
>        mu<-mean(x)
>        s<-sd(x)
>        normdens<-dnorm(h$mids, mu, s)
>
>        plot.range <- range(pretty(h$breaks))
>
>        plot(z <- seq(plot.range[1], plot.range[2], length.out=200),
>                        dens<-dnorm(z, mu,s), type="n", ...)
>
>        d.gap <- min(diff(h$breaks)) * gap /2
>
>        for(i in 1:nbins) {
>                rect(h$breaks[i]+d.gap, normdens[i]-h$density[i],
> h$breaks[i+1]-d.gap, normdens[i], col=col)
>
>        }
>
>        lines(z, dens, lwd=2)
>
>        points(h$mids, normdens)
>
> }
>
> set.seed(17*13)
> y <- rnorm(500, 10,3)
> rootonorm(y)
>
>
>>>> Deepayan Sarkar <deepayan.sarkar at gmail.com> 17/01/2011 05:06:54
>>>>
> On Sun, Jan 16, 2011 at 11:58 AM, Hugo Mildenberger
> <Hugo.Mildenberger at web.de> wrote:
>> Thank you very much for your qualified answers, and also for the
>> link to the Tukey paper. I appreciate Tukey's writings very much.
>
> Yes, thanks to Hadley for the nice reference, I hadn't seen it before.
>
>> Looking at the lattice code (below), a possible implementation might
>> involve  binning, not so?
>>
>> I see a problematic part here:
>>
>>   xx <- sort(unique(x))
>>
>> Unique certainly works well with Poisson distributed data, but is
>> essentially a no-op when confronted with continous floating-point
>> numbers.
>
> True, but as Achim said, rootogram() is intended to work with data
> arising from discrete distributions, not continuous ones. I see now
> that this is not as explicit as it could be in the help page (although
> "frequency distribution" gives a hint), which I will try to improve.
>
> I don't think automatic handling of continuous distributions is simple
> (because it is not clear how you would specify the reference
> distribution). However, a little preliminary work will get you close
> with the current implementation:
>
> xnorm <- rnorm(1000)
>
> ## 'discretize' by binning and replacing data by bin midpoints
> h <- hist(xnorm, plot = FALSE) # add arguments for more control
> xdisc <- with(h, rep(mids, counts))
>
> ## Option 1: Assume bin probabilities proportional to dnorm()
> norm.factor <- sum(dnorm(h$mids, mean(xnorm), sd(xnorm)))
>
> rootogram(~ xdisc,
>          dfun = function(x) {
>              dnorm(x, mean(xnorm), sd(xnorm)) / norm.factor
>          })
>
> ## Option 2: Compute probabilities explicitly using pnorm()
>
> ## pdisc <- diff(pnorm(h$breaks)) ## or estimated:
> pdisc <- diff(pnorm(h$breaks, mean = mean(xnorm), sd = sd(xnorm)))
> pdisc <- pdisc / sum(pdisc)
>
> rootogram(~ xdisc,
>          dfun = function(x) {
>              f <- factor(x, levels = h$mids)
>              pdisc[f]
>          })
>
> -Deepayan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:9}}



More information about the R-help mailing list