[R] curve of density on histogram

(Ted Harding) ted.harding at nessie.mcc.ac.uk
Thu Mar 8 17:29:48 CET 2007


On 08-Mar-07 KOITA Lassana - STAC/ACE wrote:
> 
> Hi R users,
> I would like to know why these following curve densities don't appear
> correctly on the histograms.
> Thank you for your help
> 
> 
> 
> library(lattice)
> library(grid)
> resp  <- rnorm(2000)
> group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
> histogram(~ resp | group, col="steelblue",
>   panel = function(x, ...){
>     std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else
> "NA"
>     n <- length(x)
>     m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else
> "NA"
>     panel.histogram(x, ...)
>     panel.mathdensity(dmath = dnorm, col = "green",
>                                 args = list(mean=mean(x),sd=sd(x)))
>     panel.abline(v= mean(x), col = "red")
>     panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
> col='yellow' )
> 
>     x1 <- unit(1, "npc") - unit(2, "mm")
>     y1 <- unit(1, "npc") - unit(2, "mm")
>     grid.text(label = bquote(n == .(n)), x = x1, y = y1, just =
> "right")
>     grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
> "lines"), just = "right")
>     grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
> unit(2, "lines"), just = "right")
> 
> })

The following is an approximation to what you want to do.
However, it needs one thing to be determined, which I have
not managed to work out how to do.

The reason for your bad density plots is that the density
function dnorm needs to be scaled (by the number of values
whose histogram is being lotted) and by the width of the
intervals.

Hence I first define a function sdnorm() (for "scaled dnorm")
and then change one line in your code, as follows:

library(lattice)
library(grid)
resp  <- rnorm(2000)
group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
#### New function sdnorm:
sdnorm <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}
histogram(~ resp | group, col="steelblue",
  panel = function(x, ...){
    std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else "NA"
    n <- length(x)
    m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else "NA"
    panel.histogram(x, ...)
#### Changed line:
    panel.mathdensity(dmath = sdnorm, col = "green",
          args = list(mean=mean(x),sd=sd(x),N=length(x),binwid=0.10))
    panel.abline(v= mean(x), col = "red")
    panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
col='yellow' )

    x1 <- unit(1, "npc") - unit(2, "mm")
    y1 <- unit(1, "npc") - unit(2, "mm")
    grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
    grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
"lines"), just = "right")
    grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
unit(2, "lines"), just = "right")
})

The argument N to sdnorm is readily available from the argument x,
as N = length(x).

However, I cannot work out from the documentation for these panel
functions how to determine the width of the histogram bins, which
is argument binwid to sdnorm(). Hence I have simply set binwid=0.l0
to illustrate the point, since this gives an approximately correct
plot. But it will only be really correct when binwid can somehow
be determined from the hosyogram being plotted, and it is this
which I cannot see!

I hope someone else will help us out of our difficulty.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-07                                       Time: 16:29:45
------------------------------ XFMail ------------------------------



More information about the R-help mailing list