[R] curve of density on histogram

(Ted Harding) ted.harding at nessie.mcc.ac.uk
Sat Mar 10 20:14:42 CET 2007


On 08-Mar-07 Duncan Murdoch wrote:
> 
> On 3/8/2007 11:29 AM, (Ted Harding) wrote:
>> 
>> On 08-Mar-07 KOITA Lassana - STAC/ACE wrote:
>>> 
>>> [snip]
>> 
>> [snip]
>> The argument N to sdnorm is readily available from the argument x,
>> as N = length(x).
>> 
>> However, I cannot work out from the documentation for these panel
>> functions how to determine the width of the histogram bins, which
>> is argument binwid to sdnorm(). Hence I have simply set binwid=0.l0
>> to illustrate the point, since this gives an approximately correct
>> plot. But it will only be really correct when binwid can somehow
>> be determined from the hosyogram being plotted, and it is this
>> which I cannot see!
> 
> 
> The breaks are one of the ... args passed to the panel function, so you
> can get the binwidth from there.  But there's another problem:  the 
> panel.histogram function gives percent of total, so should integrate to
> 100, not to N.  I think this version gives what is wanted:
> 
> library(lattice)
> library(grid)
> resp  <- rnorm(2000)
> group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
>#### New function sdnorm:
> sdnorm
> <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}
> histogram(~ resp | group, col="steelblue",
>    panel = function(x, ...){
>      std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else
> "NA"
>      n <- length(x)
>      m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else
> "NA"
>      panel.histogram(x, ...)
> 
> 
>      breaks <- list(...)$breaks
>      binwid <- breaks[2]-breaks[1]
>      panel.mathdensity(dmath = sdnorm, col = "green",
>            args = 
> list(mean=mean(x),sd=sd(x),N=100,binwid=breaks[2]-breaks[1]))
> 
> 
>      panel.abline(v= mean(x), col = "red")
>      panel.abline(h=5)
>      panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
> col='yellow' )
> 
>      x1 <- unit(1, "npc") - unit(2, "mm")
>      y1 <- unit(1, "npc") - unit(2, "mm")
>      grid.text(label = bquote(n == .(n)), x = x1, y = y1, just =
> "right")
>      grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
> "lines"), just = "right")
>      grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
> unit(2, "lines"), just = "right")
> })

Duncan, your statement that "But there's another problem:  the 
panel.histogram function gives percent of total, so should integrate
to 100, not to N" got me challenged -- there must be a work-round!

Finally I found it, but you have to make the change in (to me)
an unexpected place.

The following code (your code above, with two changed lines commented
out, and followed by the changed version) really does do the intended
job of plotting a panel of histograms of *counts* with the appropriate
scaled normal densities superimposed.


library(lattice)
library(grid)
N<-20000
resp  <- rnorm(N)
group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = N/4)
#### New function sdnorm:
sdnorm <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}

##Changed line
#### histogram(~ resp | group, col="steelblue",
histogram(~ resp | group, col="steelblue", type="count",
   panel = function(x, ...){
     std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else
"NA"
     n <- length(x)
     m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else
"NA"
     panel.histogram(x, ...)


     breaks <- list(...)$breaks
     binwid <- breaks[2]-breaks[1]
     panel.mathdensity(dmath = sdnorm, col = "green",
           args =
## Changed line
#### list(mean=mean(x),sd=sd(x),N=100,binwid=breaks[2]-breaks[1]))
list(mean=mean(x),sd=sd(x),N=length(x),binwid=breaks[2]-breaks[1]))

     panel.abline(v= mean(x), col = "red")
     panel.abline(h=5)
     panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
col='yellow' )

     x1 <- unit(1, "npc") - unit(2, "mm")
     y1 <- unit(1, "npc") - unit(2, "mm")
     grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
     grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
"lines"), just = "right")
     grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
unit(2, "lines"), just = "right")
})


I first tried setting type="count" in the call to panel.histogram(),
as in

  panel.histogram(x, type="count", ...)

but this got me an error message

  Error in panel.histogram(x, type = "count", ...) : 
        formal argument "type" matched by multiple actual arguments

from which I (slowly) deduced that it had already got that parameter
from somewhere else, after which it seemed natural that it got it
from the preceding call to

  histogram(~ resp | group, col="steelblue",
    [etc]

so I tried setting it there and (with the factor N<-length(x) for
counts, instead of 100 for percentage) this worked!

(And I confess, when I posted my first suggestion, that I had
omitted to read the vertically aligned small print at the side
of the panel which said "Percent of Total"!).

Ah well, it's been an interesting little tour!

Thanks for the breakthrough, Duncan!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 10-Mar-07                                       Time: 19:14:39
------------------------------ XFMail ------------------------------



More information about the R-help mailing list