[R] curve of density on histogram
Duncan Murdoch
murdoch at stats.uwo.ca
Thu Mar 8 20:17:40 CET 2007
On 3/8/2007 11:29 AM, (Ted Harding) wrote:
> On 08-Mar-07 KOITA Lassana - STAC/ACE wrote:
>>
>> Hi R users,
>> I would like to know why these following curve densities don't appear
>> correctly on the histograms.
>> Thank you for your help
>>
>>
>>
>> library(lattice)
>> library(grid)
>> resp <- rnorm(2000)
>> group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
>> histogram(~ resp | group, col="steelblue",
>> panel = function(x, ...){
>> std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else
>> "NA"
>> n <- length(x)
>> m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else
>> "NA"
>> panel.histogram(x, ...)
>> panel.mathdensity(dmath = dnorm, col = "green",
>> args = list(mean=mean(x),sd=sd(x)))
>> panel.abline(v= mean(x), col = "red")
>> panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
>> col='yellow' )
>>
>> x1 <- unit(1, "npc") - unit(2, "mm")
>> y1 <- unit(1, "npc") - unit(2, "mm")
>> grid.text(label = bquote(n == .(n)), x = x1, y = y1, just =
>> "right")
>> grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
>> "lines"), just = "right")
>> grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
>> unit(2, "lines"), just = "right")
>>
>> })
>
> The following is an approximation to what you want to do.
> However, it needs one thing to be determined, which I have
> not managed to work out how to do.
>
> The reason for your bad density plots is that the density
> function dnorm needs to be scaled (by the number of values
> whose histogram is being lotted) and by the width of the
> intervals.
>
> Hence I first define a function sdnorm() (for "scaled dnorm")
> and then change one line in your code, as follows:
>
> library(lattice)
> library(grid)
> resp <- rnorm(2000)
> group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
> #### New function sdnorm:
> sdnorm <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}
> histogram(~ resp | group, col="steelblue",
> panel = function(x, ...){
> std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else "NA"
> n <- length(x)
> m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else "NA"
> panel.histogram(x, ...)
> #### Changed line:
> panel.mathdensity(dmath = sdnorm, col = "green",
> args = list(mean=mean(x),sd=sd(x),N=length(x),binwid=0.10))
> panel.abline(v= mean(x), col = "red")
> panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
> col='yellow' )
>
> x1 <- unit(1, "npc") - unit(2, "mm")
> y1 <- unit(1, "npc") - unit(2, "mm")
> grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
> grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
> "lines"), just = "right")
> grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
> unit(2, "lines"), just = "right")
> })
>
> The argument N to sdnorm is readily available from the argument x,
> as N = length(x).
>
> However, I cannot work out from the documentation for these panel
> functions how to determine the width of the histogram bins, which
> is argument binwid to sdnorm(). Hence I have simply set binwid=0.l0
> to illustrate the point, since this gives an approximately correct
> plot. But it will only be really correct when binwid can somehow
> be determined from the hosyogram being plotted, and it is this
> which I cannot see!
The breaks are one of the ... args passed to the panel function, so you
can get the binwidth from there. But there's another problem: the
panel.histogram function gives percent of total, so should integrate to
100, not to N. I think this version gives what is wanted:
library(lattice)
library(grid)
resp <- rnorm(2000)
group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
#### New function sdnorm:
sdnorm <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}
histogram(~ resp | group, col="steelblue",
panel = function(x, ...){
std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else "NA"
n <- length(x)
m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else "NA"
panel.histogram(x, ...)
breaks <- list(...)$breaks
binwid <- breaks[2]-breaks[1]
panel.mathdensity(dmath = sdnorm, col = "green",
args =
list(mean=mean(x),sd=sd(x),N=100,binwid=breaks[2]-breaks[1]))
panel.abline(v= mean(x), col = "red")
panel.abline(h=5)
panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
col='yellow' )
x1 <- unit(1, "npc") - unit(2, "mm")
y1 <- unit(1, "npc") - unit(2, "mm")
grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
"lines"), just = "right")
grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
unit(2, "lines"), just = "right")
})
Duncan Murdoch
More information about the R-help
mailing list