[R] Curious Behavior with Curve() and dnorm()

Fri Feb 11 18:49:38 CET 2005

On Fri, 11 Feb 2005, Thomas Hopper wrote:

> Okay, I see how I'm using dnorm() incorrectly (my thanks to you and
> Prof. Ripley). I'll work on correcting that.
>
> The important issue resolved, I still don't understand why I get
> different results for dnorm() when supplying the same values, based on
> how those values were supplied. I've got three options, all of which
> give the same value, but which result in a different distribution from
> dnorm(): the direct output of the function sd(); a number typed
> manually; or a variable which was set by the output of the function
> sd()). Using sd() produces different results than using a variable set
> from sd().
>
> Having identified this seeming quirk, it's not a problem for my work; it
> just seems inconsistent and I'm having trouble understanding it.

You use sd(x) for two different x's, one your own and one inside curve.

Try:

Y <- rnorm(1000)
hist(Y, prob = TRUE)
curve(dnorm(x, mean(Y), sd(Y)), lty=3, add=T)
m <- mean(Y); z <- sd(Y)
curve(dnorm(x, m, z), lty=3, add=T)

and they are identical.

dnorm(x, mean=mean(x), sd=sd(x))  depends on x in three places, only one
of which you intended AFAICS.

>
> Thanks,
>
> Tom
>
> Peter Dalgaard wrote:
>
> >Thomas Hopper <thopper at cobasys.com> writes:
> >
> >
> >
> >>I am attempting to wrap the histogram function in my own custom
> >>function, so that I can quickly generate some standard plots.
> >>
> >>A part of what I want to do is to draw a normal curve over the histogram:
> >>
> >> > x <- rnorm(1000)
> >> > hist(x, freq=F)
> >> > curve(dnorm(x), lty=3, add=T)
> >>
> >>(for normal use, x would be a vector of empirical values, but the
> >>rnorm() function works for testing)
> >>
> >>That works just as you'd expect, but I've found something a bit strange.
> >>
> >>If I try the following:
> >>
> >> > curve(dnorm(x, mean=mean(x), sd=sd(x)), lty=3, add=T)
> >>
> >>I get a much flatter and broader curve (which looks like it probably
> >>has the same area as the first curve, though I haven't tested).
> >>
> >>However, if I do
> >>
> >> > z <- sd(x)
> >> > curve(dnorm(x, mean=mean(x), sd=z), lty=1, add=T)
> >>
> >>I get the curve you'd expect; it draws right over the first curve
> >>(curve(dnorm(x),...), above).
> >>
> >>
> >
> >I don't think that is guaranteed, actually.
> >
> >Notice that curve plots the *expression* as a function of the argument
> >"x". So it takes a bunch of x values, evenly spread across the
> >abscissa collects them into a vector and plugs that in as "x" in
> >
> >curve(dnorm(x, mean=mean(x), sd=sd(x)), lty=3, add=T)
> >
> >I.e. the x that gets plugged into mean(x) and sd(x) has nothing to do
> >with your original data (except that they both fit in the same xlim)!

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595